I'm setting up a rather large-scale XMPP installation at my office... we expect to handle a few hundred thousand connections at any given time. We do not expect very much traffic, but a very large # of simultaneous (but 99% idle) connections.
Our current plan is to deploy in multiple world-wide datacenters "XMPP Racks"... each rack would have its own domain name (ie us.xmpp.domain.com, eu.xmpp.domain.com, etc). Each rack would then talk to the others world-wide using S2S. At each rack we're planning a mirrored pair of Microsoft AD LDS nodes to handle user accounts (which all sync back to a master somewhere), and a clustered pair of EJabber nodes. I'd expect each ejabber node to handle up to 100k connections or so.
We've done our preliminary testing with ejabberd + mnesia + internal authentication, but ran into some scaling issues when you have 400k registered XMPP accounts in the database locally. I've got a few questions I could really use some help with...
1) In an ideal world, each "xmpp node" would be 'dumb'. At any point I'd like to be able to completely wipe the node and rebuild it from scratch. I'm having a hard time seeing how to do this when MNesia is used in the back-end for clustering, since it seems to require some by-hand steps for the cluster setup.
2) When you use a clustered setup of XMPP nodes, how do you balance the connections? Do you use simple DNS round robin or do you need a real load balancer in front?
3) We ran into some MNesia resource limits when we hit about 75k connections split across two servers... when we pull the authentication out of the database and put it into an LDAP cluster, should that improve?
Quote: 1) In an ideal
1) In an ideal world, each "xmpp node" would be 'dumb'. At any point I'd like to be able to completely wipe the node and rebuild it from scratch. I'm having a hard time seeing how to do this when MNesia is used in the back-end for clustering, since it seems to require some by-hand steps for the cluster setup.
One simple solution, but I am not sure it will work (so you better try it first) is to setup the nodes of the cluster. Then stop all ejabberd nodes and backup the Mnesia spool dirs.
Each time you want to wipe the databases, simply stop all ejabberd nodes, recover the initial spool dirs, and start the nodes again.
2) When you use a clustered setup of XMPP nodes, how do you balance the connections? Do you use simple DNS round robin or do you need a real load balancer in front?
You have to tell the clients what node to connect. DNS round robin is enough for that task, right?
Related tutorial: http://www.ejabberd.im/cisco-slb
3) We ran into some MNesia resource limits when we hit about 75k connections split across two servers... when we pull the authentication out of the database and put it into an LDAP cluster, should that improve?
Sorry, I don't know.
If you don't get more detailed answers in this forum, you can also ask in the ejabberd mailing list, or contact for support from ProcessOne (likeOnline contact form ).