Looking for ejabberd docs?

To access the most up-to-date ejabberd documentation, please visit docs.ejabberd.im »

MUC room creation race?

Submitted by sethalves on Tue, 2011-03-15 18:32

ejabberd Administration

We have two clustered linux machines running ejabberd-2.1.6. When two different clients connect to two different nodes (at the same moment) and create the same room, strange things can happen. Sometimes it works as expected. Sometimes both clients receive something like:

 <presence xml:lang="en" to="nagios0.tester@chat.damballah.lindenlab.com/nagios" from="nagios-test-group@conference.chat.damballah.lindenlab.com/nagios0.tester"><x xmlns="http://jabber.org/protocol/muc#user"><item role="moderator" jid="nagios0.tester@chat.damballah.lindenlab.com/nagios" affiliation="owner"/><status code="201"/></x></presence>

And then one can send to the muc and the other (when sending) will receive:

<message xml:lang="en" type="error" to="nagios1.tester@chat.damballah.lindenlab.com/nagios" from="nagios-test-group@conference.chat.damballah.lindenlab.com"><body>NwLRbBMqbHcdARZoWkK</body><nick xmlns="http://jabber.org/protocol/nick">nagios1.tester</nick><error code="406" type="modify"><not-acceptable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/><text xmlns="urn:ietf:params:xml:ns:xmpp-stanzas">Only occupants are allowed to send messages to the conference</text></error></message>

Both servers put something like this in ejabberd.log:

=INFO REPORT==== 2011-03-15 15:16:19 ===
I(<0.319.0>:ejabberd_listener:281) : (#Port<0.422>) Accepted connection {{216,82,32,4},58471} -> {{216,82,17,56},5222}

=INFO REPORT==== 2011-03-15 15:16:19 ===
I(<0.384.0>:ejabberd_c2s:767) : ({socket_state,tls,{tlssock,#Port<0.422>,#Port<0.424>},<0.383.0>}) Accepted authentication for nagios1.tester by ejabberd_auth_internal

=INFO REPORT==== 2011-03-15 15:16:20 ===
I(<0.384.0>:ejabberd_c2s:890) : ({socket_state,tls,{tlssock,#Port<0.422>,#Port<0.424>},<0.383.0>}) Opened session for nagios1.tester@chat.damballah.lindenlab.com/nagios

=INFO REPORT==== 2011-03-15 15:16:20 ===
I(<0.386.0>:mod_muc_room:126) : Created MUC room nagios-test-group@conference.chat.damballah.lindenlab.com by nagios1.tester@chat.damballah.lindenlab.com/nagios

The room is a temporary one.

Is this a problem with how we've configured mnesia? Is it a bug? Any advice is welcome. I can provide the configuration and the client code, if it might help.

It may be a race as you

Submitted by mfoss on Sat, 2011-03-19 21:52.

It may be a race as you thought, in mod_muc.erl line 474. The code is like this:

1. IF the room is not stored in the DB
2... THEN
3....... create the room
4....... and store the room in the DB
5... ELSE
6....... send this stanza to the existing room

In the code, the DB read operation of step 1 and the write operation in step 4 are not a "DB atomic transaction", but as mod_muc in a node is run by a single process, it's guaranteed to be atomic in a single node. However, probably there isn't guarantee with several nodes, as each node has a different process.

The mnesia DB table used in that code is 'muc_online_room'. Have you configured that mnesia table to be shared among the nodes?

Maybe the problem happens like this: while the first node is running step 2, the second node is evaluating 1 and entering step 2 too.

Does it make sense to you all that I said?

I've bundled up the program

Submitted by sethalves on Tue, 2011-03-22 15:34.

I've bundled up the program that can demonstrate this problem. On my ubuntu box, this will get it built:

sudo apt-get install check libexpat1-dev
mkdir test
cd test
# see http://code.stanziq.com/strophe/
git clone git://code.stanziq.com/libstrophe
wget http://headache.hungry.com/~seth/xmpp-nagios-check.tar.bz2
tar xvf xmpp-nagios-check.tar.bz2
cd libstrophe
patch -p1 < ../xmpp-nagios-check/strophe.diff
./bootstrap.sh
./configure
make
cd ../xmpp-nagios-check
# edit xmpp-nagios-check.c and set USER* and PASS* at the top
make
./xmpp-nagios-check -H host0 -H host1 -j chat.something.com -v -r

It may take several runs to see the problem. If it works, it exits after about 10 seconds. If it fails (and demonstrates the race), it will keep running for a couple minutes.

Experimental patch

Submitted by mfoss on Mon, 2011-05-02 19:10.

Try this patch for ejabberd 2.1.x:
http://tkabber.jabber.ru/files/badlop/4585-muc-creation-race.diff
And tell me if it solves the problem or not.

Looking for ejabberd docs?

MUC room creation race?

It may be a race as you

I've bundled up the program

Experimental patch

User login

Navigation