Hi,
I've begun using the pubsub module, I intend to use it in an environment with many nodes (100K+).
I've noticed that starting ejabberd becomes drastically slower the more nodes stored in the mnesia database.
Also, the time to load doesn't seem linear, 15K nodes takes around 1 minute on my server, for 100K I'm sitting here and ejabberd is still loading after 40 minutes, stuck at:
=PROGRESS REPORT==== 18-Jun-2008::15:06:34 ===
application: mnesia
started_at: ejabberd@localhost
Here are my questions:
1. Could it be that there is a missing index somewhere in the mnesia database?
2. Wouldn't it be more efficient to allow on-demand access of pubsub nodes rather than a bulk load at startup? I can't imagine a usecase where you'll need all nodes in memory at startup... nodes are only used when someone subscribes or an event is published. To speed up discovery of all nodes, some small amount of info listing the nodes might be loaded at startup, but it hardly seems necessary to have all node data loaded.
3. Is there a more lightweight way of using pubsub nodes? Currently I've got it set up so that no items are persisted. I only want to be able to control pubsub of realtime events.
I'm using ejabberd 2.0.1_2,
erl: [root@server ejabberd]# erl -version
Erlang (ASYNC_THREADS,HIPE) (BEAM) emulator version 5.5.5
System: PENTIUM IV 3.0Ghz , 1GB RAM, RHEL 5:
[root@server ejabberd]# uname -a
Linux server 2.6.18-8.1.15.el5 #1 SMP Thu Oct 4 04:06:15 EDT 2007 i686 i686 i386 GNU/Linux
Thanks,
Kevin
Still loading mnesia... 1
Still loading mnesia... 1 hour 40 minutes...
It took 2 hours and 10
It took 2 hours and 10 minutes to load mnesia...
Workaround
One of my questions was whether the pubsub needed to be loaded completely into memory and if it would make sense to load it on-demand.
The guys in the ejabberd chatroom pointed me towards a nice workaround. Using the Admin webpage, you can set tables to load Disk Only copy. Once this is set, the startup takes seconds once again.
There was some mention of using an empty cfg file because the config table gets overwritten at startup, but I think this was not necessary in this case.
Note: This is a workaround, but I still think there is a scalability issue here. I can't explain why the load time would be non-linear. Badlop suggested that it could be because of RAM swap, which I will investigate further.
Here is a copy of the chatroom conversation:
[10:25:04] *** badlop has joined the room as a moderator and an owner
[10:30:51] <yellowdog> Kev: I've sent Mickael a mail regarding http://www.ejabberd.im/node/3109 Do you know if he's still very active in development?
[10:34:12] <badlop> yellowdog: also, i'll later mention that thread to the author of the newer pubsub module, who probably knows more the details of the code
[10:35:05] <teo> yellowdog: afaik, the person who is responsible for pubsub in ejabberd is christophe romain. i doubt that mickael can fix the scalability issue
[10:35:10] <Kev> yellowdog: I have no idea
[10:36:09] <badlop> right, as teo explained; in addition you can set the table to be stored only in disk (not in ram) using the web admin
[10:37:34] <Kev> I had the notion that web admin changes were only transient - am I wrong?
[10:37:38] <badlop> extauth:80 <-- this should mean line 80 of extauth.erl, and <0.292.0> is the Pid of the erlang process that was running that code
[10:37:48] <yellowdog> Thanks badlop. If you're going to mention it to him then I won't bother bugging him. I'll try your suggestion, I guess that will cause it to not get loaded at startup.
[10:39:17] <teo> Kev: not the changes to a database schema at least
[10:39:40] <Kev> ah, ta
[10:42:29] <badlop> and the config changes are stored in the temporary (only RAM) config table; those options are overwritten at startup with the ejabberd.cfg file
[10:43:45] <badlop> obvious trick: if you delete all the content of ejabberd.cfg and start ejabberd then the options stored are not overwritten
[10:44:53] <teo> badlop: it isn't true. config table is disk-only. but their entries are definitely overwritten on ejabberd start. but if you clear ejabberd.cfg ejabberd will use stored config from db
[10:46:01] <badlop> ah, right: config table is in mnesia (so it's permanent); i confused with the hooks table :S
[10:46:29] <Kev> oh good, I'm glad this isn't confusing ;)
[10:47:28] <yellowdog> Ok I've made the change... I'm about to restart, I don't want to get this wrong because it takes 2 hours to load the database as it is... You're all suggesting that I use an empty ejabberd.cfg file to startup?
[10:48:34] <teo> yellowdog: then you'll forget your server setup and after some time will be disappointed (for one or another reason)
[10:49:06] <yellowdog> teo: no worries, I'll just move it somewhere and touch ejabberd.cfg
[10:49:30] <yellowdog> This isn't a good long-term solution for me...
[10:50:40] <badlop> yellowdog: the trick i mentioned about starting with emtpy ejabberd.cfg is not documented for that reason, and also because it isn't possible to dump the config from ejabberd DB to a newer ejabberd.cfg
[10:56:40] <yellowdog> Ok, I tried a restart with an empty ejabberd.cfg and lo and behold, it was fast.
[10:57:14] <yellowdog> To be sure, I put back the original cfg, restarted... it's also fast.
[10:57:37] <yellowdog> I brought up the admin webapp, checked the database and the tables are still set to Disk Only.
Minor improvement
Since I made the pubsub tables Disk Only, the performance of actually publishing an item has become extremely slow.
create and subscribe do not seem to be affected.
There are three pubsub tables which I had changed:
pubsub_items 18412 records 8770315 memory
pubsub_nodes 76365 records 81849201memory
pubsub_state 83263 records 27473061 memory
I have made it so that the only table to be loaded Disk Only is pubsub_nodes. This seems to return the publish performance back to normal yet the load time is still fast.