our two-instance cluster seems to become increasingly brittle (see also: https://www.ejabberd.im/forum/25504/ejabberdctl-commands-go-missing).
The MUC service seems to have stopped working properly, rooms that I create oftentimes don't open. This seems to happen more for room names that have been used previously, so these non-persistent room names seem to be "burned" now. The room list feature (in Pidgin) often doesn't return a result ("Stop" button stays enabled).
At the same time, I noticed that service discovery (also using Pidgin) randomly shows a subset of the services we are running (MUC, PubSub, HTTP Upload) and often doesn't finish either.
I updated the instances from 16.04 to 16.06.1, but no difference.
Any ideas how to investigate/clean up this mess?
This is kind of solved: I
This is kind of solved: I disassembled the cluster, deleted the mnesia files, restored the database from a backup and reassembled the cluster.
Now the service discovery command returns consistently and with correct results, as does the MUC room list command. I can re-use group chat names without problems.
The command list problem (see above) is still occurring, though.