Hi,
I have a cluster of ejabberd nodes, which discover themselves dynamically, but sometimes the join_cluster command fails and I end up with a broken ejabberd node. I don't know how to recover after that except by cleaning the spool directory and starting a new ejabberd instance.
I work on avoiding this issue but I have several questions with regards to the join_cluster command.
Imagine there are 3 nodes (A, B, and C), what happens if :
- B and C both run simultaneously join_cluster A?
- C starts join_cluster B but just after and before C finishes, B starts join_cluster A?
- C run join_cluster B, then later on when C and B are in the same cluster, but B wants to join cluster A. Should B leaves {B,C} cluster before doing a join_cluster A?
Basically, I'm interested in which concurrency guarantees is offered by the join_cluster command. In the worst case scenario, I would have to be sure that in my whole network there is only one instance of join_cluster running at the same time :-(
Best regards,
Looking at the source code,
Looking at the source code, in the file ejabberd_cluster.erl, I don't see any specific code related to prevent concurrent runnings of the join function.
Ok, thanks for the answer. I
Ok, thanks for the answer. I had a cluster of 4 nodes few weeks ago and whenever I boot all of them simultaneously, one of them fails the join_cluster. However, the problem might be due to a network convergence issue, and if join_cluster was run too early, the other node might not be reachable at that time. I have fixed that problem now, so I let you know if join_cluster fails again.
Regards,