Hello,
First of all my system specifications:
- Linux 2.6 Kernel
- Debian Sid Packages of ejabberd
Recently my server had a filesystem issue.
After that ejabberd isn't starting any more.
The only processes I can see listening on a socket are:
0.0.0.0:4369 (TCP) - epmd
0.0.0.0:XXXXX (TCP) - beam
Here is the crasher report (/var/log/ejabberd/sasl.log):
=CRASH REPORT==== 31-Oct-2009::12:06:12 ===
crasher:
pid: <0.35.0>
registered_name: []
exception exit: {bad_return,
{{ejabberd_app,start,[normal,[]]},
{'EXIT',
{aborted,
{node_not_running,ejabberd@server42}}}}}
in function application_master:init/4
initial call: application_master:init(<0.5.0>,<0.34.0>,
{appl_data,ejabberd,
[ejabberd,ejabberd_sup,
ejabberd_auth,ejabberd_router,
ejabberd_sm,ejabberd_s2s,
ejabberd_local,ejabberd_listeners,
ejabberd_iq_sup,
ejabberd_service_sup,
ejabberd_s2s_out_sup,
ejabberd_s2s_in_sup,
ejabberd_c2s_sup,
ejabberd_mod_roster,
ejabberd_mod_echo,
ejabberd_mod_pubsub,
ejabberd_mod_irc,ejabberd_mod_muc,
ejabberd_offline,random_generator],
undefined,
{ejabberd_app,[]},
[acl,adhoc,configure,
cyrsasl_anonymous,cyrsasl,
cyrsasl_digest,cyrsasl_plain,
ejabberd_admin,ejabberd_app,
ejabberd_auth_anonymous,
ejabberd_auth,
ejabberd_auth_external,
ejabberd_auth_internal,
ejabberd_auth_ldap,
ejabberd_auth_odbc,
ejabberd_auth_pam,ejabberd,
ejabberd_c2s,ejabberd_c2s_config,
ejabberd_config,ejabberd_ctl,
ejabberd_frontend_socket,
ejabberd_hooks,ejabberd_http,
ejabberd_http_bind,
ejabberd_http_poll,
ejabberd_listener,ejabberd_local,
ejabberd_logger_h,
ejabberd_loglevel,
ejabberd_node_groups,
ejabberd_rdbms,ejabberd_receiver,
ejabberd_router,ejabberd_s2s,
ejabberd_s2s_in,ejabberd_s2s_out,
ejabberd_service,ejabberd_sm,
ejabberd_socket,ejabberd_sup,
ejabberd_system_monitor,
ejabberd_tmp_sup,ejabberd_update,
ejabberd_web_admin,ejabberd_web,
ejabberd_zlib,ejd2odbc,eldap,
eldap_filter,eldap_pool,
eldap_utils,'ELDAPv3',extauth,
gen_iq_handler,gen_mod,
gen_pubsub_node,
gen_pubsub_nodetree,iconv,idna,
jd2ejd,jlib,mod_adhoc,
mod_announce,mod_caps,
mod_configure2,mod_configure,
mod_ctlextra,mod_disco,mod_echo,
mod_http_bind,mod_http_fileserver,
mod_irc,mod_irc_connection,
mod_last,mod_last_odbc,mod_muc,
mod_muc_log,mod_muc_room,
mod_offline,mod_offline_odbc,
mod_privacy,mod_privacy_odbc,
mod_private,mod_private_odbc,
mod_proxy65,mod_proxy65_lib,
mod_proxy65_service,
mod_proxy65_sm,mod_proxy65_stream,
mod_pubsub,mod_register,
mod_roster,mod_roster_odbc,
mod_service_log,mod_shared_roster,
mod_stats,mod_time,mod_vcard,
mod_vcard_ldap,mod_vcard_odbc,
mod_version,node_buddy,node_club,
node_default,node_dispatch,
node_pep,node_private,node_public,
nodetree_default,nodetree_virtual,
p1_fsm,p1_mnesia,
ram_file_io_server,randoms,sha,
shaper,stringprep,stringprep_sup,
tls,translate,xml,xml_stream,
'XmppAddr'],
[],infinity,infinity},
normal)
ancestors: [<0.34.0>]
messages: [{'EXIT',<0.36.0>,normal}]
links: [<0.34.0>,<0.5.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 987
stack_size: 23
reductions: 114
neighbours:
The ejabberd.log wasn't updated since the crash.
And this is mnesia_lib:view("file").
***** logfile *****
----- logfile: "/var/lib/ejabberd/PREVIOUS.LOG" -----
***** "/tmp/mnesia_vcore_elem.TMP" *****
=ERROR REPORT==== 31-Oct-2009::12:14:39 ===
Mnesia(nonode@nohost): ** ERROR ** Cannot open log "/tmp/mnesia_vcore_elem.TMP": {not_a_log_file,
"/tmp/mnesia_vcore_elem.TMP"}
----- logfile: "/var/lib/ejabberd/LATEST.LOG" -----
***** "/tmp/mnesia_vcore_elem.TMP" *****
{log_header,trans_log,"4.3","4.4.11",ejabberd@panoptikum,
{1256,569932,104883}}
----- logfile: "/var/lib/ejabberd/DECISION_TAB.LOG" -----
***** "/tmp/mnesia_vcore_elem.TMP" *****
{log_header,dcl_log,"1.0","4.4.11",ejabberd@panoptikum,
{1256,569572,105492}}
I guess the solution could be touching /tmp/mnesia_vcore_elem.TMP?
Not fixed yet.
Hello,
Touching the /tmp/mnesia_vcore_elem.TMP did not fix the problem.
Still no solution
Hmm, the Problem still exists. Any more suggestions?
More verbose output from su - ejabberd -c /usr/sbin/ejabberd
{error_logger,{{2009,11,11},{17,17,5}},"Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,{error,duplicate_name}},[{inet_tcp_dist,listen,1},{net_kernel,start_protos,4},{net_kernel,start_protos,3},{net_kernel,init_node,2},{net_kernel,init,1},{gen_server,init_it,6},{proc_lib,init_p,5}]}]}
{error_logger,{{2009,11,11},{17,17,5}},crash_report,[[{pid,<0.20.0>},{registered_name,net_kernel},{error_info,{exit,{error,badarg},[{gen_server,init_it,6},{proc_lib,init_p,5}]}},{initial_call,{gen,init_it,[gen_server,<0.17.0>,<0.17.0>,{local,net_kernel},net_kernel,{ejabberd,shortnames,15000},[]]}},{ancestors,[net_sup,kernel_sup,<0.8.0>]},{messages,[]},{links,[#Port<0.7>,<0.17.0>]},{dictionary,[{longnames,false}]},{trap_exit,true},{status,running},{heap_size,610},{stack_size,23},{reductions,453}],[]]}
{error_logger,{{2009,11,11},{17,17,5}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},{mfa,{net_kernel,start_link,[[ejabberd,shortnames]]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}
{error_logger,{{2009,11,11},{17,17,5}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined},{name,net_sup},{mfa,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]}
{error_logger,{{2009,11,11},{17,17,5}},crash_report,[[{pid,<0.7.0>},{registered_name,[]},{error_info,{exit,{shutdown,{kernel,start,[normal,[]]}},[{application_master,init,4},{proc_lib,init_p,5}]}},{initial_call,{application_master,init,[<0.5.0>,<0.6.0>,{appl_data,kernel,[application_controller,erl_reply,auth,boot_server,code_server,disk_log_server,disk_log_sup,erl_prim_loader,error_logger,file_server_2,fixtable_server,global_group,global_name_server,heart,init,kernel_config,kernel_sup,net_kernel,net_sup,rex,user,os_server,ddll_server,erl_epmd,inet_db,pg2],undefined,{kernel,[]},[application,application_controller,application_master,application_starter,auth,code,code_aux,packages,code_server,dist_util,erl_boot_server,erl_distribution,erl_prim_loader,erl_reply,erlang,error_handler,error_logger,file,file_server,file_io_server,prim_file,global,global_group,global_search,group,heart,hipe_unified_loader,inet6_tcp,inet6_tcp_dist,inet6_udp,inet_config,inet_hosts,inet_gethost_native,inet_tcp_dist,init,kernel,kernel_config,net,net_adm,net_kernel,os,ram_file,rpc,user,user_drv,user_sup,disk_log,disk_log_1,disk_log_server,disk_log_sup,dist_ac,erl_ddll,erl_epmd,erts_debug,gen_tcp,gen_udp,gen_sctp,prim_inet,inet,inet_db,inet_dns,inet_parse,inet_res,inet_tcp,inet_udp,inet_sctp,pg2,seq_trace,wrap_log_reader,zlib,otp_ring0],[],infinity,infinity},normal]}},{ancestors,[<0.6.0>]},{messages,[{'EXIT',<0.8.0>,normal}]},{links,[<0.6.0>,<0.5.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,610},{stack_size,23},{reductions,127}],[]]}
{error_logger,{{2009,11,11},{17,17,5}},std_info,[{application,kernel},{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]}
{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}"}
Crash dump was written to: /var/log/ejabberd/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
Second run
p:~# su - ejabberd -c /usr/sbin/ejabberd
Erlang (BEAM) emulator version 5.6.3 [source] [async-threads:0] [kernel-poll:false]
Eshell V5.6.3 (abort with ^G)
(ejabberd@p)1>
=ERROR REPORT==== 11-Nov-2009::17:18:40 ===
Mnesia(ejabberd@p): ** ERROR ** (core dumped to file: "/var/lib/ejabberd/MnesiaCore.ejabberd@p_1257_956320_152964")
** FATAL ** mnesia_recover crashed: {"Bad decision log item",
{log_header,dcl_log,"1.0","4.4.11",
ejabberd@p,
{1256,569572,105492}},
load_decision_tab} state: {state,
<0.67.0>,
undefined,
undefined,
undefined,0,
false,[]}
=ERROR REPORT==== 11-Nov-2009::17:18:50 ===
** Generic server mnesia_monitor terminating
** Last message in was {'EXIT',<0.67.0>,killed}
** When Server state == {state,<0.67.0>,[],[],false,[],undefined,[]}
** Reason for termination ==
** killed
=ERROR REPORT==== 11-Nov-2009::17:18:50 ===
Mnesia(ejabberd@p): ** ERROR ** mnesia_event got unexpected event: {'EXIT',
<0.69.0>,
killed}
=INFO REPORT==== 11-Nov-2009::17:18:50 ===
application: mnesia
exited: {killed,{mnesia_sup,start,[normal,[]]}}
type: temporary
=INFO REPORT==== 11-Nov-2009::17:18:50 ===
application: ejabberd
exited: {bad_return,
{{ejabberd_app,start,[normal,[]]},
{'EXIT',{aborted,{node_not_running,ejabberd@p}}}}}
type: temporary
For explanation of this error
For explanation of this error message: ["inet_tcp",{{badmatch,{error,duplicate_name}},
see: error, duplicate_name
The most simple solution is
The most simple solution is to remove the spool files; when ejabberd starts, it will create them, empty. Of course, the problem in this case is that you loose all user accounts. You can them attempt to copy the files of tables you consider important (passwd.*, roster.*, ...). Maybe Mnesia accepts those old files and works correctly.
Another idea: maybe the problem is only with the vcard files? In that case, you can try to remove the files files vcard* and restart ejabberd. Of course you lose Vcard information, but that's preferable than not having any info.
Once solved, remember to write a script to make daily, or at least weekly backups to another machine.