MySQL Connection Pool Exhaustion

Hey All,
To better interface with an existing user directory I'm looking to MySQL/ODBC for the rosters and what not. While the re-compiling to include odbc went fine, I'm seeing what appears to be ejabberd exhausting an internal connection pool to MySQL. Starting up the daemon sans -detached the initial startup is fine and connections are being made to the DB but half way through the host_configs (19 out of 57) the following error starts coming up:

=ERROR REPORT==== 5-Dec-2007::15:17:58 ===
E(<0.869.0>:ejabberd_odbc:278): MySQL connection failed: connect_failed

Checking processlist in MySQL the total number is 195 for ejabberd which coincides with what I've heard that ejabberd's ODBC spawns 10 connections for every host. However I can't tell why it's stopping at 195 (I've stress tested the server and it was able to handle 500+ concurrent connections from a single login without any problems). It's worth noting that if I modify the configuration to only use 1 hosts/host_config it works just fine.
Version of ejabberd is 1.1.4 (non svn) with a patch applied ejabberd_auth_external.erl to allow the use of an external auth script while storing certain tables in MySQL instead of mnesia. MySQL drivers used are both the static binaries provided in the HOWTO as well as a set built from SVN.
On a tangent (though still odbc related) is there any plans or has anyone seen an odbc variant of mod_shared_roster? This is arguably the most important module used when setting up a new domain and would need to live in MySQL.

Regards,

Updated - file system limits

It turns out that the ulimit on the server may have been causing the problems though I luckily happen to catch it as it happened. Increasing the ulimit allowed all the MySQL processes to go through (540 total) but now there's a new problem:

(ejabberd@van-ejabberd1.globalrelay.net)1> [05:01 PM] van-ejabberd1:/opt/etc/ejabberd$ tail /tmp/start 
                               start,
                               [5223,
                                ejabberd_c2s,
                                [{access,c2s},
                                 tls,
                                 {certfile,"/opt/etc/ejabberd/server.pem"}]]}},
                       {restart_type,transient},
                       {shutdown,brutal_kill},
                       {child_type,worker}]
(ejabberd@van-ejabberd1.globalrelay.net)1> [05:01 PM] van-ejabberd1:/opt/etc/ejabberd$ tail -100 /tmp/start 
    registered_name: []
    error_info: {{badmatch,
                      {error,
                          {{badmatch,
                               {error,
                                   "SSL_CTX_use_certificate_file failed: error:02001018:system library:fopen:Too many open files"}},
                           [{ejabberd_c2s,init,1},
                            {gen_fsm,init_it,6},
                            {proc_lib,init_p,5}]}}},
                  [{ejabberd_listener,accept,3},{proc_lib,init_p,5}]}
    initial_call: {ejabberd_listener,
                     init,
                     [5223,
                      ejabberd_c2s,
                      [{access,c2s},
                       tls,
                       {certfile,"/opt/etc/ejabberd/server.pem"}]]}
    ancestors: [ejabberd_listeners,ejabberd_sup,<0.38.0>]
    messages: []
    links: [#Port<0.993>,<0.212.0>,#Port<0.287>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 377
    stack_size: 21
    reductions: 603
  neighbours:
(ejabberd@van-ejabberd1.globalrelay.net)1> 
=CRASH REPORT==== 10-Dec-2007::17:01:38 ===
  crasher:
    pid: <0.4552.0>
    registered_name: []
    error_info: {{badmatch,{error,"SSL_CTX_use_certificate_file failed: error:02001018:system library:fopen:Too many open files"}},
                  [{ejabberd_c2s,init,1},
                   {gen_fsm,init_it,6},
                   {proc_lib,init_p,5}]}
    initial_call: {gen,init_it,
                      [gen_fsm,
                       <0.214.0>,
                       self,
                       ejabberd_c2s,
                       [{gen_tcp,#Port<0.993>},
                        [{access,c2s},
                         tls,
                         {certfile,"/opt/etc/ejabberd/server.pem"}]],
                       []]}
    ancestors: [<0.214.0>,ejabberd_listeners,ejabberd_sup,<0.38.0>]
    messages: []
    links: [#Port<0.997>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 21
    reductions: 2082
  neighbours:
(ejabberd@van-ejabberd1.globalrelay.net)1> 
=SUPERVISOR REPORT==== 10-Dec-2007::17:01:38 ===
     Supervisor: {local,ejabberd_listeners}
     Context:    child_terminated
     Reason:     {{badmatch,
                      {error,
                          {{badmatch,
                               {error,
                                   "SSL_CTX_use_certificate_file failed: error:02001018:system library:fopen:Too many open files"}},
                           [{ejabberd_c2s,init,1},
                            {gen_fsm,init_it,6},
                            {proc_lib,init_p,5}]}}},
                  [{ejabberd_listener,accept,3},{proc_lib,init_p,5}]}
     Offender:   [{pid,<0.214.0>},
                  {name,5223},
                  {mfa,
                      {ejabberd_listener,
                          start,
                          [5223,
                           ejabberd_c2s,
                           [{access,c2s},
                            tls,
                            {certfile,"/opt/etc/ejabberd/server.pem"}]]}},
                  {restart_type,transient},
                  {shutdown,brutal_kill},
                  {child_type,worker}]

(ejabberd@van-ejabberd1.globalrelay.net)1> 
=PROGRESS REPORT==== 10-Dec-2007::17:01:38 ===
          supervisor: {local,ejabberd_listeners}
             started: [{pid,<0.4553.0>},
                       {name,5223},
                       {mfa,
                           {ejabberd_listener,
                               start,
                               [5223,
                                ejabberd_c2s,
                                [{access,c2s},
                                 tls,
                                 {certfile,"/opt/etc/ejabberd/server.pem"}]]}},
                       {restart_type,transient},
                       {shutdown,brutal_kill},
                       {child_type,worker}]

I've googled this error and variants and see some similar situations but nothing resolved (at least least in situations where ulimit has already been increased significantly.

$ ulimit -a
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
open files                    (-n) 65535
pipe size          (512 bytes, -p) 10
stack size            (kbytes, -s) 10240
cpu time             (seconds, -t) unlimited
max user processes            (-u) 29995
virtual memory        (kbytes, -v) unlimited
lsof for ejabberd's user shows 1790+ though this is still well below what the system is set to allow.

Investigate the problem trying different ODBC pool sizes

I can't help you in this problem, but maybe this gives you ideas to investigate the problem.

In ejabberd 1.1.4 and older, the number of SQL connections per host (which is 10 by default) is in the file ejabberd/src/odbc/ejabberd_odbc_sup.erl:

init([Host]) ->
    % TODO
    N = 10,

In ejabberd 2.0.0-rc1, this can be configured in ejabberd.cfg, so no need to recompile:

Quote:

By default ejabberd opens 10 connections to the database for each virtual host. Use this option to modify the value:

{odbc_pool_size, 10}.

You can configure an interval to make a dummy SQL request to keep alive the connections to the database. The default value is ’undefined’, so no keepalive requests are made. Specify in seconds: for example 28800 means 8 hours.

{odbc_keepalive_interval, undefined}.

Syndicate content