Issue in ejabberd cluster when others nodes are unavailable.

Hi everyone,

First I want to apologize for my english...

I am having troubles with ejabberd's clustering.
Here is my need:
- I want to have an ejabberd cluster between two sites (lets called them site1 and site2...).
- On each site all users needs to be allow to talk to everyone (including guys in the other site).
- Ejabberd will be used with ldap for authentication and shared roster

So my choice was to build one cluster with three nodes:
- two nodes on site1 (ejabberd@jabber1.site1 and ejabberd@jabber2.site1)
- one node on site2 (ejabberd@jabber1.site2)

My ejabberd's domain is myjabber.site.
My users on site1 are connecting to myjabber.site which referred to ejabberd@jabber1.site1 or ejabberd@jabber2.site1.
My users on site2 are also connecting to myjabber.site but on their side this is referred to ejabberd@jabber1.site2.

I configured my firewall to authorize communications between nodes and this is working great (all users can talk to each others, offline messages are working too, etc).

But when I tried so simulate some leased lines issue between this two sites I had some big troubles...
My test was to block communications betweens ejabberd on site1 and ejabberd on site2 (and to restart all ejabberd servers).
When I am in this situation ejabberd does not work at all...

ejabberd status tell me "ejabberd is not running in that node" for every node and there is no log at all (even in debug mode)...

I searched a lot on web but I didn't find any solution, any help would be really appreciate...
I am using ejabberd 2.1.5 on every node (packaged by Debian).

My final goal is to permit my users to communicate between them on each local site when the leased line is down.

Ok I think found the real

Ok I think found the real issue.

So during my tests I worked only on two nodes: ejabberd@jabber1.site1 and ejabberd@jabber2.site1.

Here is my tests:
- ejabberd is running on both nodes, all is working well
- I stopped ejabberd on ejabberd@jabber1.site1, ejabberd@jabber2.site1 saw it down as excepted
- I restarted ejabberd@jabber2.site1, it saw ejabberd@jabber1.site1 down and is worked as well
- I stopped ejabberd@jabber2.site1, at this point there is no more ejabberd running
- I started ejabberd@jabber1.site1 and there is the problem: he is not responding at all, it seems he is waiting for the second node...

So I guess when a node goes down, and if it know there was an other node UP in the past, he try to communicate with it (and wait for it) when it restart.
There is anyway to bypassed this feature?

I have exactly have same

I have exactly have same problem. Did you find any solution for this? Thanks.

http://www.ejabberd.im/node/5065

Syndicate content