Unable to Cluster Nodes Across WAN - ERL syntax error

I have fully installed Ejabberd 2.0.0 on both windows 2003 32bit servers within the WAN. I have copied the cookie from the first node to the second node. Both nodes run independently and the chat client is able to communicate with each server without an issue.

During the clustering setup I run the erl.exe program within the 'ejabberd-2.0.0\bin\' folder and run the following command line

erl -sname ejabberd -mnesia extra_db_nodes "['ejabberd@first']" -s mnesia.

I immediately receive the message '** 1: syntax error before: ejabberd **'. This command line is the same as described in many installation references and within the ejabberd setup guide itself. I have run it as one continous line and with the '\' code return after each individual statement as the guide illustrates.

erl -sname ejabberd \
-mnesia extra_db_nodes "['ejabberd@first']" \
-s mnesia

I still receive the same error message, so neither method is successful. I have not seen any reference to this error within this forum.

What would be the reason for this error if I copied the command verbatim from this installation guide? Could I have missed something during the install?

I am at a loss and the log files do not list any errors on either server. Any guidance to the root cause would be greatly appreciated.

Thank you,
Shane

Further Cluster Debugging

I am still trying to track down the error. In a troubleshooting test, I wanted to check connectivity between servers. I logged into both servers and ran the erl ping command on both, which in turn responded with 'pong'. This tells me that both ejabberd nodes can see each other without issue. I still receive the "** 1: syntax error before: ejabberd **" message every time I attempt to cluster. It almost acts as if it does not know its own commands or the published syntax is for an older revision.

Another test I ran was to enter "-sname" by itself to see what transpired, here is the result...

=ERROR REPORT==== 12-May-2008::09:37:34 ===
Error in process <0.353.0> on node 'ejabberd@second' with exit value: {badarith
,[{erlang,'-',[sname]},{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}
]}

** exited: {badarith,[{erlang,'-',[sname]},
                      {erl_eval,do_apply,5},
                      {shell,exprs,6},
                      {shell,eval_loop,3}]} **

I also get a similair message with "-mnesia" and "-s". Not sure if this helps track down a root cause, but thought it might help.

Not sure where to go from here.

Try checking database names

I'm still very new with ejabberd (running 2.0.0 on OpenSuSE), and I still have a few questions of my own, but this sounds similar to an issue I had when first experimenting with clusters.I discovered that one of my servers had named the ejabberd (mnesia) database "ejabberd@localhost", while the one on the other server was named "ejabberd@hostname.com". The server with the "@localhost" database would give me errors similar to what you're getting. I was also unable to replicate any tables, although I got a successful "ping" and "pong" from either server. I solved the problem by simply deleting (the server was not in production, so I was able to blow away the dbase without worries - you may want to keep a copy just in case) the database ending in "@localhost", and then creating another database by running:

~/erl -name ejabberd -mnesia dir '"/opt/ejabberd-2.0.0/database/ejabberd\@yourhostname.com"'

(Note "-name" instead of "-sname" and the "\" escape character before "@".)

Once I created the new dbase, I followed the instructions in the setup doc to replicate the tables over from the other server.

My theory is that the database name discrepancy is caused by your choice during the ejabberd install. If you select "No" for "Will this node be part of a cluster?", the database is named "ejabberd@localhost". If you select "Yes", it is named the other. In my case, I had an existing, non-clustered server, and then attempted to add a server and create a cluster, so the existing server had the "@localhost" database.

Hope this helps.

TF

Partial Success Clustering Server Nodes

Thanks for response TF!

I did the steps that you suggested, but still no success. After many tests and retests I started to see some light at the end of the tunnel.

I realized that all this time I had been opening the erl.exe command window directly and running these command line strings from within the erlang shell. This time I opened the command window and entered the /bin directory and ran the command from the directory prompt, which in turn opened the shell. Now I got a totally different response and was able to copy the database over. I then ran mnesia:info() and viola!, I see both nodes listed as running. I go back to the first server node and withing the admin web interface I click on nodes and both server nodes are listed as running. Now here is where I start to lose the momentum. I follow the remainder of the instructions typing q(). into the second node erlang shell window and the erlang shell exits. Now I click on the desktop start icon as I did on the first to start up the ejabberd second node and the admin web page opens. I go back to the first server to see how things are running and the second node is now listed and being stopped. What did I miss? I go back to the second node, checking the registry to make sure that the ARGS does not have a mnesia dir value and it does not. I even try to enter the ARG value in pointing to the newly created database folder.

As I read the instruction, item number 6 states to "Now run the ejabberd on second with almost the same config as on the first". What does this mean? I assumed that it would have set the second node up to do this already. Steps 1 through 5 went fine, but I am stuck at 6.

Can anyone guide me towards what I missed?

Thank you,
SV

Clustering Problem Just Resolved

As I was performing more tests and refering to older posts in the forum I came across the following posts, which exibited my second issue. Here is the first post http://www.ejabberd.im/node/2915. This lead me to the second post http://www.ejabberd.im/node/2901. Both were explaining the same issue as I was having, which was getting the ejabberd second node to point to the newly created cluster database.

I simply removed the old database contents and copied the new database files into the old database folder. I then started the second node up and the second server node showed as being active from within the first node. I could not find a variable to set to the new database. I tried editing the registry ARG value, but this seemed to have no affect.

I will need to send some messages across the network to see it is correctly set up, but I am much further than I was at the beginning. If this is not the correct fix to problem please let me know.

Regards,
SV

Syndicate content