Hello all,
does anybody run Munin and the ejabberd plugins for it?
I set this up some time ago and it works - well at least some of the features of the actual plugin from the Debian packages works.
Munin produces nice graphs, see here:
After migrating our Jabber server to a new machine this night I stumbled upon the logfiles of Munin. This here I see in munin-node.log every five minutes:
2010/05/05-19:00:11 [20807] Error output from ejabberd_registered:
2010/05/05-19:00:11 [20807]
2010/05/05-19:00:11 [20807] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:00:11 [20807] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:00:12 [21024] Error output from ejabberd_uptime:
2010/05/05-19:00:12 [21024]
2010/05/05-19:00:12 [21024] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:00:12 [21024] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: illegal character: ^M
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: illegal character: '
2010/05/05-19:00:12 [21024] (standard_in) 1: illegal character: '
2010/05/05-19:05:12 [24886] Error output from ejabberd:
2010/05/05-19:05:12 [24886]
2010/05/05-19:05:12 [24886] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:05:12 [24886] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:05:13 [24910] Error output from ejabberd_registered:
2010/05/05-19:05:13 [24910]
2010/05/05-19:05:13 [24910] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:05:13 [24910] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:05:13 [24910] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:05:13 [24910] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:05:13 [24910] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:05:13 [24910] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
The interesting thing is that I see different things every five minutes. It seems that something is going wrong but not produces the same outcome every time it happens.
Plus: The plugins create graphs which show accurate data.
I cannot read the crash (which btw is ~250kb big), can somebody help?
And does this affect the ejabberd thread in any way?
I think that 700-800 megs of RAM for 100 online users is way too much, isn't it?
Kindest regards,
Martin
Roi wrote: I cannot read the
I cannot read the crash (which btw is ~250kb big), can somebody help?
The first 4 or 5 lines are plain text, and may be useful to determine what's the problem. Post them here.
I think that 700-800 megs of RAM for 100 online users is way too much, isn't it?
In a server that runs ejabberd 2.1.3 for typical Jabber chats, it takes 800 MB for 700 online users, and 200 s2s connections.
still crashes after upgrade to 2.1.3
Sorry it took me some time to reply, just did not have any time the last days.
I cannot read the crash (which btw is ~250kb big), can somebody help?
The first 4 or 5 lines are plain text, and may be useful to determine what's the problem. Post them here.
I just upgraded to ejabberd 2.1.3 (from 2.1.2, Debian package finally is available), the same crashes still happen.
Here you are, the first some lines of the erl_crash.dump file:
I think that 700-800 megs of RAM for 100 online users is way too much, isn't it?
In a server that runs ejabberd 2.1.3 for typical Jabber chats, it takes 800 MB for 700 online users, and 200 s2s connections.
Hm well, then there has to be some problem or strange thing going on, see here:
http://www3.hot-chilli.net/munin/hot-chilli.net/hyperion.hot-chilli.net/...
http://www3.hot-chilli.net/munin/hot-chilli.net/hyperion.hot-chilli.net/...
I just restarted ejabberd at 2pm when upgrading it to 2.1.3, but it then directly grabbed 700 megs. Now climbing up again to 1 gig and staying there. At least it looks so. We have between 50 and 120 online users and between 30 and 170 s2s connections at the moment. The machine is a fresh installed Debian amd64 SMP server, running just official Debian (squeeze and unstable) plus some backport packages.
Regards,
Martin
How to get error messages about erlang starting problem
Oh, those first lines aren't indicative enough.
If you still have this problem, there's a method to get more information.
Try this:
{error_logger,{{2010,5,10},{23,8,50}},"Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,{error,duplicate_name}
So, does that show anything new?
Regarding the RAM usage: I forgot to mention that my server is 32bits, not 64bits. And that it doesn't run any ICQ/MSN/... transport. That way there are less roster items on average. Also, I configured (using ejabberd WebAdmin, for example) some tables to be stored on Disk only, not RAM.
Yes, the problem is still
Yes, the problem is still there. I will do what you suggested, but as this is a live server, I have to do this sometimes during the night. So this could take some days until I return with the results.
What you say about RAM allocation makes sense for me. We have a lot of transports running and as much or even more users on the transport than logged into the Jabberserver itself.
Does it make sense to store some of the tables on the disk? It's not that the machine cannot handle more (it has 8gigs of RAM), but I'm afraid the task takes more and more, as longer as it is running. Although it looks like it is satisfied with about 1.1gigs of RAM. But to be sure I really have to run it some days and not restart it all the time because of configuration changes.