I have two machines that I would like to cluster together. I have machine1 and machine2 at my.domain.net. Machine1 is running with a jabber domain of im.domain.net and started with -sname ejabberd. If I attempt to start it with any other -sname than ejabberd it fails. If I start it with -name ejabberd@machine1 it starts, but if I start it with anything other than ejabberd it fails. The failure message is the same each time, saying that the machine is configured to use fully qualified domain names. The reason I mention this is it is causing me problems when attempting to cluster the servers. What exactly do I have to use for them to both start up correctly and cluster?
If I do cluster these servers, what advantages does it give me? Can the users log in to either server? If machine1 is down, will the users fail over to machine2?
Clustering two servers
After doing additional troubleshooting, I have discovered in the ejabberd.log of the first machine the following line:
** Connection attempt from disallowed node ejabberd@node2 **
So, my nodes can in fact see each other. I have read and followed the instructions in http://www.ejabberd.im/interconnect-erl-nodes, so what am I doing wrong that the nodes don't trust each other?
Re: Clustering two servers
** Connection attempt from disallowed node ejabberd@node2 **
So, my nodes can in fact see each other. I have read and followed the instructions in http://www.ejabberd.im/interconnect-erl-nodes, so what am I doing wrong that the nodes don't trust each other?
I have no experience with clustering, but I guess it might help if you verify that the Erlang cookies on each node match eachother.
Re: Clustering two servers
If I run erlang:get_cookie(). on both nodes they are identical. If I attempt to run a remote session to my -detached session on node1 from node1 I get the same error.
Clustering two servers
It does occur to me that node1 is running as -name ejabberd@node1 and my test and node2 are running as -sname, would that cause this problem?
Re: Clustering two servers
It does occur to me that node1 is running as -name ejabberd@node1 and my test and node2 are running as -sname, would that cause this problem?
I guess so.
Re: Clustering two servers
If I attempt to run the following command:
erl -name ejabberd@node2 -mnesia extra_db_nodes "['ejabberd@node1']" -s mnesia
I get the following error:
** System running to use fully qualified hostnames **
** Hostname node1 is illegal **
And there is no connection attempt shown in the log of node1.
If I run the following command from node1:
erl -name test@node1 -remsh ejabberd@node1
I get the same error. What would cause that error?
erlang node name with fully qualified hostname
This is how it works for me, I don't know if it's the same for everybody:
Short hostname
Fully Qualified hostname
Start Erlang with a short hostname
Start Erlang with a short hostname (alternate)
Start Erlang with a Fully Qualified hostname
Start Erlang with a Fully Qualified hostname (alternate)
Hostnames on my server
Ok, I get the following results:
Hostname -s
node1
Hostname
node1@example.net
Ejabberd will only start and accept connections if I start it with -sname ejabberd. All others fail saying the system is configured to use fully configured hostnames.
The good news is that I am now able to cluster, if I start node2 with erl -sname ejabberd -mnesia extra_db_nodes "['ejabberd@node1']" -s mnesia I get the following error:
** ERROR ** (ignoring core) ** FATAL ** Failed to merge schema: Incompatible schema storage types (local). on ejabberd@node2 storage ram_copies, on ejabberd@node1 storage disc_copies
But at least they are talking to each other now.
Re: Hostnames on my server
The good news is that I am now able to cluster, if I start node2 with erl -sname ejabberd -mnesia extra_db_nodes "['ejabberd@node1']" -s mnesia I get the following error:
** ERROR ** (ignoring core) ** FATAL ** Failed to merge schema: Incompatible schema storage types (local). on ejabberd@node2 storage ram_copies, on ejabberd@node1 storage disc_copies
Maybe that is because you started ejabberd first wrong on the node so that the database contains the wrong domain names. Maybe you can try to remove the database of the server with the error? (make a backup of course)
Database removal
Is there more to it than deleting the contents of the spool directory where the files are? If not then I did that and get the same error.