Hi.
I am running ejabberd-2.1.3 from ports on freebsd 8.1pre, amd64, 8G ram, backended by native erlang mysql driver from ejabberd cvs (mysql-5.1).
My problem is that I can't get more than 32768 users online, accordind to ejabberctl connected_users_number (after 25k some connections are established, then dropped).
It's still plenty of ram and cpu available, but I just don't know where to go from here. Kernel is reasonably tuned, as far as I know.
I want to get at least 50k users online, maybe 100k. Thanks in advance for any advice!
Erlang R13B04 (erts-5.7.5) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:true]
Eshell V5.7.5 (abort with ^G)
(ejabberd@localhost)1> os:getenv("ERL_MAX_PORTS").
"500000"
(ejabberd@localhost)2> os:cmd("ulimit -n").
"200000\n"
(ejabberd@localhost)3>
>cat ejabberdctl.cfg | grep -v # | grep -v ^$
POLL=true
SMP=enable
ERL_MAX_PORTS=1000000
ERL_PROCESSES=3500000
ERL_MAX_ETS_TABLES=5000
>cat /boot/loader.conf
net.inet.tcp.syncache.hashsize=1024
net.inet.tcp.syncache.bucketlimit=100
net.inet.tcp.tcbhashsize=4096
kern.ipc.nsfbufs=10240
>cat /etc/sysctl.conf
kern.ipc.nmbclusters=65000
kern.ipc.somaxconn=4096
kern.ipc.maxsockets=204800
kern.maxfiles=204800
kern.maxfilesperproc=200000
net.inet.tcp.recvspace=16384
net.inet.tcp.sendspace=16384
net.inet.tcp.maxtcptw=55000
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535
net.inet.ip.portrange.randomized=0
net.inet.tcp.nolocaltimewait=1
net.inet.tcp.fast_finwait2_recycle=1
32768...
Hmm, 32768 is 2^15 so it sounds like your running into an issue with a with a 16 bit (possibly unsigned) integer somewhere.
I've never used FreeBSD before but I do see a few numbers of interest purely from a mathematical sense:
net.inet.tcp.recvspace=16384
net.inet.tcp.sendspace=16384
net.inet.ip.portrange.last=65535
16384 is 1/2 of 32768, while 65535 is double it, minus 1, so either of those may be causing a limit somehow.
My other suggestion would be to check is the number of open file descriptors allowed per process, which it looks like you've already done. On Linux and Solaris, this is done with:
ulimit -n
My experience has been that you typically want this number to be at least double the users you are trying to login because I think ejabberd consumes two file descriptors per connected client (though I'm not sure why). I always make it a bit more then double just to be safe.
For example, if your ulimit -n value is 65536, then ejabberd will be unable to login new clients after you've consumed all of these file descriptors. If it does consume two per login as I suspect, a limit of 65536 would cause it to start failing after 32768. If it only consumes one, then a limit of 32768 would cause similar issues. FreeBSD may have decided to set this to 2^16 by default.
However, since you've already got this set to 200000, that may be moot, and it may have more to do with the FreeBSD values I mentioned above. Or, for some reason, the ulimit erlang is reporting isn't the same as the one actually configured in the system?