Suddenly, "ejabberd is not running". WHY?!?

Last week at work, all of a sudden, the ejabberd server died on me. I tried almost every single solution to no avail. The sasl.log doesn't seem to help either, as ejabberd.log doesn't help since it stopped working. What freaks me out is that the config file (/etc/ejabberd/ejabberd.cfg) is absolutely unchanged, even changed the ports. I could do nothing to solve this.

It's a messy situation. Here's the sasl.log anyway:

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.50.0>},
                       {name,alarm_handler},
                       {mfa,{alarm_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,sasl_safe_sup}
             started: [{pid,<0.51.0>},
                       {name,overload},
                       {mfa,{overload,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.49.0>},
                       {name,sasl_safe_sup},
                       {mfa,
                           {supervisor,start_link,
                               [{local,sasl_safe_sup},sasl,safe]}},
                       {restart_type,permanent},
                       {shutdown,infinity},
                       {child_type,supervisor}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,sasl_sup}
             started: [{pid,<0.52.0>},
                       {name,release_handler},
                       {mfa,{release_handler,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
         application: sasl
          started_at: ejabberd@zamora1

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,kernel_safe_sup}
             started: [{pid,<0.55.0>},
                       {name,dets_sup},
                       {mfa,{dets_sup,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,1000},
                       {child_type,supervisor}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,kernel_safe_sup}
             started: [{pid,<0.56.0>},
                       {name,dets},
                       {mfa,{dets_server,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,mnesia_sup}
             started: [{pid,<0.64.0>},
                       {name,mnesia_event},
                       {mfa,{mnesia_sup,start_event,[]}},
                       {restart_type,permanent},
                       {shutdown,30000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,mnesia_kernel_sup}
             started: [{pid,<0.66.0>},
                       {name,mnesia_monitor},
                       {mfa,{mnesia_monitor,start,[]}},
                       {restart_type,permanent},
                       {shutdown,3000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,mnesia_kernel_sup}
             started: [{pid,<0.67.0>},
                       {name,mnesia_subscr},
                       {mfa,{mnesia_subscr,start,[]}},
                       {restart_type,permanent},
                       {shutdown,3000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,mnesia_kernel_sup}
             started: [{pid,<0.68.0>},
                       {name,mnesia_locker},
                       {mfa,{mnesia_locker,start,[]}},
                       {restart_type,permanent},
                       {shutdown,3000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,mnesia_kernel_sup}
             started: [{pid,<0.69.0>},
                       {name,mnesia_recover},
                       {mfa,{mnesia_recover,start,[]}},
                       {restart_type,permanent},
                       {shutdown,180000},
                       {child_type,worker}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,kernel_safe_sup}
             started: [{pid,<0.74.0>},
                       {name,disk_log_sup},
                       {mfa,{disk_log_sup,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,1000},
                       {child_type,supervisor}]

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
          supervisor: {local,kernel_safe_sup}
             started: [{pid,<0.75.0>},
                       {name,disk_log_server},
                       {mfa,{disk_log_server,start_link,[]}},
                       {restart_type,permanent},
                       {shutdown,2000},
                       {child_type,worker}]

=CRASH REPORT==== 20-Apr-2010::07:18:31 ===
  crasher:
    initial call: gen_event:init_it/6
    pid: <0.64.0>
    registered_name: mnesia_event
    exception exit: killed
      in function  gen_event:terminate_server/4
    ancestors: [mnesia_sup,<0.62.0>]
    messages: []
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 540
  neighbours:

=CRASH REPORT==== 20-Apr-2010::07:18:31 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.61.0>
    registered_name: []
    exception exit: {killed,{mnesia_sup,start,[normal,[]]}}
      in function  application_master:init/4
    ancestors: [<0.60.0>]
    messages: [{'EXIT',<0.62.0>,normal}]
    links: [<0.60.0>,<0.5.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 233
    stack_size: 24
    reductions: 102
  neighbours:

=CRASH REPORT==== 20-Apr-2010::07:18:31 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.37.0>
    registered_name: []
    exception exit: {bad_return,
                        {{ejabberd_app,start,[normal,[]]},
                         {'EXIT',
                             {aborted,{node_not_running,ejabberd@zamora1}}}}}
      in function  application_master:init/4
    ancestors: [<0.36.0>]
    messages: [{'EXIT',<0.38.0>,normal}]
    links: [<0.36.0>,<0.5.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 610
    stack_size: 24
    reductions: 96
  neighbours:

I'm pretty hopeless right now. I can't stand for false starts. I started the server with "sudo /etc/init.d/ejabberd start" and when checking with "sudo ejabberdctl status", the message is "Status: started. ejabberd is NOT running". I killed the processes, started over again... NOTHING!

Please help me before reinstalling again on an Ubuntu Server (Karmic Koala)... I think I've given up.

Greetings

Neurochild.

P.S.: FYI, Version is 2.0.5.

The crash happens while the

The crash happens while the Mnesia internal database system is starting. It doesn't say why exactly it crashes:

  • Not enough disk space to write temporary files?
  • Not read/write permissions to the Mnesia disk files?
  • A corrupted Mnesia disk file?

Something interesting is that the ejabberd starting process is paused for 13 seconds, and then crashes:

=PROGRESS REPORT==== 20-Apr-2010::07:18:18 ===
...
=CRASH REPORT==== 20-Apr-2010::07:18:31 ===

In order to know if the problem is in your current Mnesia database, you can try this:

  1. Find where is the Mnesia spool directory (with files like acl.DCD). Maybe it's /var/lib/ejabberd
  2. Backup those files to a safe place (copy, compress...)
  3. Delete those files
  4. Start ejabberd. It will create new files, all empty
  5. Does ejabberd start correctly now? Obviously, it doesn't have your accounts, rosters...
  6. If that is the case, then it means the problem is in your Mnesia database.
  7. If ejabberd doesn't start, and shows an error message like the old one, it means the problem is in your ejabberd installation or the machine, or the erlang installation (try to reinstall it)

The problem is the database

After I installed it again and fresh, I just put the files of the database on that directory... Unfortunately, it was forever damaged. It didn't worked.

I didn't know what happened, but that's it. Time to start anew and backup when everything is done.

On the bright side, The contacts are stored in pidgin ready to be created.

Thanks anyway.

Syndicate content