history_size muc

Hi all ;D

i run ejabberd 2.1.6 server with 2 nodes and around 300 users.

If i increase history_size in muc options to 10000 i got errors and people canÄt login anymore.
Unfortunalety i don't have any error log now.

There are 10 mucs overall...
Lot of RAM is still free if this error occurs.

May it be that i have to increase the max fsm queue size? or max stanza size?
Or is there any other important option which default or to low setting can't handle big muc history?

If i set history_size back to 200... everything works fine again...

Would be sooo nice if you can help me or give me a hint :D

have a nice weekend!!!

vince123 wrote: If i increase

vince123 wrote:

If i increase history_size in muc options to 10000 i got errors and people canÄt login anymore.
Unfortunalety i don't have any error log now.

I don't get that problem. Show the error when you get it again.

the problem does not appear

the problem does not appear immediately.
First login problems began after 1-2 days.

i'll try to get a log of the error within the next days.

hi again ;D tried to localize

hi again ;D

tried to localize the problem with big chat_history size.

Setup:
debian squeeze
ejabberd 2.1.6 from source
erlang R14A (happens with R140B too)
LDAP auth
muc_history, 10000

i setup two persistens chatrooms and piped many many messages to them.
While pushing everything is fine...
ram: 200-400/3000MB
cpu: 4% load average

after about 10min. i tried to login using pidgin as xmppclient.
I got some messages out of history and get a disconnect in pidgin.
The jabberserver is now on high load... 80-100% cpu usage
and consumes lot of ram... while i'm not connected anymore.

seems it prepares messages for my user to send.

Here's the log while my user is not connected and the load is very high:
This code repeads for each message

=INFO REPORT==== 8-Aug-2011::14:37:17 ===
D(<0.319.0>:ejabberd_router:313) : route
        from {jid,"test2","conference.host","test2","test2",
                  "conference.host","test2"}
        to {jid,"my_user","host","33173486811312807007194284",
                "my_user","host","33173486811312807007194284"}
        packet {xmlelement,"message",
                   [{"xml:lang","en"},
                    {"to","test2@conference.host"},
                    {"type","groupchat"}],
                   [{xmlelement,"body",[],
                        [{xmlcdata,<<"--- google.de ping statistics ---">>}]},
                    {xmlelement,"delay",
                        [{"xmlns","urn:xmpp:delay"},
                         {"from","test2@conference.host"},
                         {"stamp","2011-08-08T12:36:09Z"}],
                        [{xmlcdata,[]}]},
                    {xmlelement,"x",
                        [{"xmlns","jabber:x:delay"},
                         {"stamp","20110808T12:36:09"}],
                        []}]}

=INFO REPORT==== 8-Aug-2011::14:37:17 ===
D(<0.319.0>:ejabberd_local:286) : local route
        from {jid,"test2","conference.host","test2","test2",
                  "conference.host","test2"}
        to {jid,"my_user","host","33173486811312807007194284",
                "my_user","host","33173486811312807007194284"}
        packet {xmlelement,"message",
                           [{"xml:lang","en"},{"to",[...]},{[...],...}],
                           [{xmlelement,[...],...},{xmlelement,...},{...}]}

=INFO REPORT==== 8-Aug-2011::14:37:17 ===
D(<0.319.0>:ejabberd_sm:410) : session manager
        from {jid,"test2","conference.host","test2","test2",
                  "conference.host","test2"}
        to {jid,"my_user","host","33173486811312807007194284",
                "my_user","host","33173486811312807007194284"}
        packet {xmlelement,"message",
                           [{"xml:lang","en"},{"to",[...]},{[...],...}],
                           [{xmlelement,[...],...},{xmlelement,...},{...}]}

=INFO REPORT==== 8-Aug-2011::14:37:17 ===
D(<0.319.0>:ejabberd_sm:509) : sending to process <0.481.0>

In the end i got the following error message:

=INFO REPORT==== 8-Aug-2011::14:41:04 ===
D(<0.582.0>:ejabberd_router:313) : route
        from {jid,"my_user","host","35699611311312807051281432",
                  "my_user","host","35699611311312807051281432"}
        to {jid,"test2","conference.host","test2","test2",
                "conference.host","test2"}
        packet {xmlelement,"message",
                   [{"type","error"},
                    {"from","test2@conference.host"},
                    {"xml:lang","en"}],
                   [{xmlelement,"body",[],
                        [{xmlcdata,
                             <<"rtt min/avg/max/mdev = 9.009/9.009/9.009/0.000 ms">>}]},
                    {xmlelement,"delay",
                        [{"xmlns","urn:xmpp:delay"},
                         {"from","test2@conference.host"},
                         {"stamp","2011-08-08T12:28:01Z"}],
                        [{xmlcdata,[]}]},
                    {xmlelement,"x",
                        [{"xmlns","jabber:x:delay"},
                         {"stamp","20110808T12:28:01"}],
                        []},
                    {xmlelement,"error",
                        [{"code","503"},{"type","cancel"}],
                        [{xmlelement,"service-unavailable",
                             [{"xmlns","urn:ietf:params:xml:ns:xmpp-stanzas"}],
                             []}]}]}

Crash dump was written to: //var/log/ejabberd/erl_crash_20110808-142653.dump
eheap_alloc: Cannot allocate 1459620480 bytes of memory (of type "old_heap").
Aborted

While all of this action no other users can login as well.
Seems like ejabberd is totally overpowered with the big history size.
Is there any way to solve the problem? As ejabberd do not consume all memory as once, i don't think spending more ram will solve the disconnect problem.

Maybe there are some erlang tunings or stanze/fsm_que size settings wich can help me out.

Would be really nice if you can help me.

Ok, so the problem appears

Ok, so the problem appears when:
1. the room has many history messages
2. a user joins the room
3. the new occupant accepts to receive all the history
4. the muc starts sending the messages, but in the meantime the client is siconnected, the CPU rises and auth fails.

Apparently, the current implementation of MUC history recovery isn't capable of handling big history.

Some solutions:

A) Don't configure the room to store so many messages :P

B) Change the client, to tell the room to not send him all the history, only a few messages
http://xmpp.org/extensions/xep-0045.html#enter-managehistory

C) Investigate what exact part of mod_muc is problematic, and try to find an alternative implementation.

thanks for you reply badlop

thanks for you reply badlop :D

think i'll take solution A first :D
if i find time and a way for better history handling i'll let you know :D

have a nice week!

Syndicate content