Hi,
I had another thread going, but this problem is a little different than my original post. I was told to use the Virtual Hosts page in the web admin interface, but I can't get it to work.
When I click "Virtual Hosts" it just hangs for a second and the screen never changes. I never get a list of my host.
After clicking on the "Virtual Hosts" link, the following 2 items show up in my log files:
**sasl.log**
=CRASH REPORT==== 3-Nov-2005::08:28:17 === crasher: pid: <0.321.0> registered_name: [] error_info: {timeout,{gen_fsm,sync_send_event, ['eldap_ejabberd_chatty.mcc.edu', {search, {eldap_search, wholeSubtree, "dc=mcc,dc=edu", {present,"uid"}, ["uid"], false, 0}}]}} initial_call: {ejabberd_http,receive_headers, [{state, gen_tcp, #Port<0.358>, undefined, undefined, undefined, undefined, undefined, undefined, "en", true, true, false, []}]} ancestors: [ejabberd_http_sup,ejabberd_sup,<0.36.0>] messages: [] links: [<0.216.0>,#Port<0.358>] dictionary: [] trap_exit: false status: running heap_size: 987 stack_size: 21 reductions: 1903 neighbours: =SUPERVISOR REPORT==== 3-Nov-2005::08:28:17 === Supervisor: {local,ejabberd_http_sup} Context: child_terminated Reason: {timeout,{gen_fsm,sync_send_event, ['eldap_ejabberd_chatty.mcc.edu', {search, {eldap_search, wholeSubtree, "dc=mcc,dc=edu", {present,"uid"}, ["uid"], false, 0}}]}} Offender: [{pid,<0.321.0>}, {name,undefined}, {mfa,{ejabberd_http,start_link, [{gen_tcp,#Port<0.358>}, [http_poll,web_admin]]}}, {restart_type,temporary}, {shutdown,brutal_kill}, {child_type,worker}]
**ejabberd.log**
=INFO REPORT==== 2005-11-03 08:29:48 === I(<0.223.0>:ejabberd_listener:90): (#Port<0.390>) Accepted connection {{10,100,230,117},48784} -> {{207,74,136,244},5280} =INFO REPORT==== 2005-11-03 08:29:48 === I(<0.216.0>:ejabberd_http:73): started: {gen_tcp,#Port<0.390>} =INFO REPORT==== 2005-11-03 08:29:48 === I(<0.322.0>:ejabberd_http:165): (#Port<0.390>) http query: 'GET' /admin/vhosts/
Thank you so much for your time.
--Marc
Below is my ejabberd config file:
---ejabber.cfg---
% $Id: ejabberd.cfg.example 332 2005-04-27 01:08:18Z alexey $ %override_acls. % Users that have admin access. Add line like one of the following after you % will be successfully registered on server to get admin access: {acl, admin, {user, "msmith"}}. %{acl, admin, {user, "ermine"}}. % Blocked users: %{acl, blocked, {user, "test"}}. % Local users: {acl, local, {user_regexp, ""}}. % Another examples of ACLs: %{acl, jabberorg, {server, "jabber.org"}}. %{acl, aleksey, {user, "aleksey", "jabber.ru"}}. %{acl, test, {user_regexp, "^test"}}. %{acl, test, {user_glob, "test*"}}. % Only admins can use configuration interface: {access, configure, [{allow, admin}]}. % Every username can be registered via in-band registration: {access, register, [{allow, all}]}. % After successful registration user will get message with following subject % and body: {welcome_message, {"Welcome!", "Welcome to Jabber Service. " "For information about Jabber visit http://jabber.org"}}. % Replace them with 'none' if you don't want to send such message: %{welcome_message, none}. % List of people who will get notifications about registered users %{registration_watchers, ["admin1@localhost", % "admin2@localhost"]}. % Only admins can send announcement messages: {access, announce, [{allow, admin}]}. % Only non-blocked users can use c2s connections: {access, c2s, [{deny, blocked}, {allow, all}]}. % Set shaper with name "normal" to limit traffic speed to 1000B/s {shaper, normal, {maxrate, 1000}}. % Set shaper with name "fast" to limit traffic speed to 50000B/s {shaper, fast, {maxrate, 50000}}. % For all users except admins used "normal" shaper {access, c2s_shaper, [{none, admin}, {normal, all}]}. % For all S2S connections used "fast" shaper {access, s2s_shaper, [{fast, all}]}. % Admins of this server are also admins of MUC service: {access, muc_admin, [{allow, admin}]}. % All users are allowed to use MUC service: {access, muc, [{allow, all}]}. % This rule allows access only for local users: {access, local, [{allow, local}]}. % Authentification method. If you want to use internal user base, then use % this line: %{auth_method, internal}. % For LDAP authentification use these lines instead of above one: {auth_method, ldap}. {ldap_servers, ["eswells.mcc.edu"]}. % List of LDAP servers {ldap_uidattr, "uid"}. % LDAP attribute that holds user ID {ldap_base, "dc=mcc,dc=edu"}. % Search base of LDAP directory {ldap_rootdn, ""}. % LDAP manager {ldap_password, ""}. % Password to LDAP manager % For authentification via external script use the following: %{auth_method, external}. %{extauth_program, "/path/to/authentification/script"}. % For authentification via ODBC use the following: %{auth_method, odbc}. %{odbc_server, "DSN=ejabberd;UID=ejabberd;PWD=ejabberd"}. % Host name: {hosts, ["chatty.mcc.edu"]}. % Default language for server messages {language, "en"}. % Listened ports: {listen, [{5222, ejabberd_c2s, [{access, c2s}, {shaper, c2s_shaper}, starttls, {certfile, "/etc/ssl/certs/ejabberd.pem"}] }, {5223, ejabberd_c2s, [{access, c2s}, tls, {certfile, "/etc/ssl/certs/ejabberd.pem"}]}, % Use these two lines instead if TLS support is not compiled %{5222, ejabberd_c2s, [{access, c2s}, {shaper, c2s_shaper}]}, %{5223, ejabberd_c2s, [{access, c2s}, ssl, {certfile, "./ssl.pem"}]}, {5269, ejabberd_s2s_in, [{shaper, s2s_shaper}]}, {5280, ejabberd_http, [http_poll, web_admin]}, {8888, ejabberd_service, [{access, all}, {hosts, ["icq.localhost", "sms.localhost"], [{password, "secret"}]}]} ]}. % If SRV lookup fails, then port 5269 is used to communicate with remote server {outgoing_s2s_port, 5269}. % Used modules: {modules, [ {mod_shared_roster, []}, {mod_register, [{access, register}]}, {mod_roster, []}, {mod_privacy, []}, {mod_configure, []}, {mod_configure2, []}, {mod_disco, []}, {mod_stats, []}, {mod_vcard, []}, {mod_offline, []}, {mod_announce, [{access, announce}]}, {mod_echo, [{host, "echo.localhost"}]}, {mod_private, []}, {mod_irc, []}, % Default options for mod_muc: % host: "conference." ++ ?MYNAME % access: all % access_create: all % access_admin: none (only room creator has owner privileges) {mod_muc, [{access, muc}, {access_create, muc}, {access_admin, muc_admin}]}, {mod_pubsub, []}, {mod_time, []}, {mod_last, []}, {mod_version, []} ]}. % Local Variables: % mode: erlang % End:
---ejabberd.cfg---
Try this workaround
Something about ldap appears there. It could be a bug related to LDAP, but in that case somebody else would have found it previously, and this is the first time I read about it.
You use the FreeBSD port, but I doubt that port introduced any new code that could produce the error. One thing you could try is to compile ejabberd from source.
Workaround: if you are lucky, the bug only affects '/admin/vhosts/'. If so, maybe you can access the virtual host directly at '/admin/server/chatty.mcc.edu/'.
That helps
I can do the workaround by going tohttp://host:5280/admin/server/host/ and it pulls up shared roster and all is good. If I click on Users, it crashes again. I'm assuming its trying to pull up a list of User from LDAP? Maybe it can't handle all the users in my LDAP tree (~13,000)?
I compiled ejabberd (0.9.8) from source, but I couldn't get it to run. It just always produces a erl_crash.dump that is really big.
I've read something about this when doing a Google search -- it seems other(s) have had this problem in the past:
Thanks,
Marc
Maybe it can't handle all
Looking at the error message, it seems possible:
With so many users, it could be the default timeout happens without receiving the answer.
If you are still interested in that list of users, you can try to increment those constants defined in ejabberd/src/eldap/eldap.erl (maybe they are related to the timeout you experience):
Of course, that's not possible right now as you can't compile from source.
Doesn't seem to help
I can't get ejabberd to compile with out the FreeBSD ports system, so I just used that:
# cd /usr/ports/net/ejabberd
# make clean
# make configure
# vi work/ejabberd-0.9.8/src/eldap/eldap.erl
I changed the 3 'TIMEOUT' to look like this:
-define(RETRY_TIMEOUT, 5000000).
-define(BIND_TIMEOUT, 10000000).
-define(CMD_TIMEOUT, 5000000).
Next:
# make build
# make install
I fired up ejabberd and tail'd the sasl.log file -- I get the same problem, but it doesn't seem to take any longer to appear after clicking something that trys to pull a list of users. Which I would think I'd notice a difference when multiplying the values by 1000.
I'll try another experiment -- I'll clone my LDAP server and take out all users except for 10 and see how that works.
I'll post again soon.
Thanks for the help.
--Marc
Works fine with ~100 users
I cloned my LDAP server and added roughly 100 users, and it works fine. So, 13,000 users is just too much for it? Is there anywhere else a 'timeout' variable is set besides what I already tried?
Has anyone else been using ejabberd & LDAP with a large number of users (13,000 or more)?
Thanks,
Marc
Check your LDAP server logs and settings
Check the logfiles and settings in your LDAP server. LDAP servers can limit the number of items returned in a search (or the amount of time a search is allowed to take).
Perhaps the limit you are reaching is configured in your LDAP server instead of your ejabberd.
Greg
Advanced LDAP config in ejabberd.cfg?
Hi,
I don't see any errors in my slapd log file when trying to access the virtual hosts page.
Nov 11 11:17:45 acad1 slapd[11452]: conn=4 op=34 SRCH base="ou=Users,dc=mcc,dc=edu" scope=2 deref=0 filter="(uid=msmith)"
Nov 11 11:17:45 acad1 slapd[11452]: conn=4 op=34 SEARCH RESULT tag=101 err=0 nentries=1 text=
Nov 11 11:17:45 acad1 slapd[11456]: conn=3 op=27 BIND anonymous mech=implicit ssf=0
Nov 11 11:17:45 acad1 slapd[11456]: conn=3 op=27 BIND dn="uid=msmith,ou=Users,dc=mcc,dc=edu" method=128
Nov 11 11:17:45 acad1 slapd[11456]: conn=3 op=27 BIND dn="uid=msmith,ou=Users,dc=mcc,dc=edu" mech=SIMPLE ssf=0
Nov 11 11:17:45 acad1 slapd[11456]: conn=3 op=27 RESULT tag=97 err=0 text=
Nov 11 11:17:45 acad1 slapd[11452]: conn=4 op=35 SRCH base="ou=Users,dc=mcc,dc=edu" scope=2 deref=0 filter="(uid=*)"
Nov 11 11:17:45 acad1 slapd[11452]: conn=4 op=35 SRCH attr=uid
Nov 11 11:17:51 acad1 slapd[11452]: conn=4 op=35 SEARCH RESULT tag=101 err=0 nentries=14055 text=
Other things that use LDAP work fine too... doing a 'getent passwd' with nss_ldap works fine, but I'm sure different applications can work differently and its not always the same. Are there any other settings for ejabberd with LDAP? Like advanced search filters?
I already have one setting in my slapd.conf for searches:
sizelimit 65435
I'll take a look for some other slapd timeout type settings. Should I turn up slapd logging level? Maybe I'm not seeing everything with a '256' loglevel?
Thanks for everyones help.
--Marc
Does TIMEOUT variable have a ceiling?
Hi,
I really don't know that much about programming, but I have done some experiments with this problem, and I can't make the timeout any longer than 6 seconds (real time).
-define(RETRY_TIMEOUT, 5000).
-define(BIND_TIMEOUT, 10000).
-define(CMD_TIMEOUT, 5000).
Those are the default values, but if I set each to say 100,000,000 the timeout is still only 6 seconds. I check this by doing a 'time ejabberdctl ejabberd@host registered-users' and its always 6 no matter how high I make it. Its also only 6 seconds with the default values.
Now, if I lower these values, say half the default values, it times out in half the time, 3 seconds. I've been changing all 3, but I'm assuming the only one I really need to be changing for my problem is CMD_TIMEOUT? Or maybe I need to change the others too?
So setting the values lower changed the results with my problem. I just can't get the timeout to increase past 6 seconds. So, I guess I'm just wondering if the variable is the wrong type or something similar. It seems as though I've hit some kind of ceiling.
Thanks for any help!
--Marc
some help for debugging
Offtopic: There's a new feature on this forum: now you can see the thread plain instead of nested.
In topic: look at ejabberd/src/eldap/eldap.erl line 554:
That function seams to be the one that sends the request and starts the timer. Check thedocumentation for start_timer. It only says: 'Time is a non-negative integer... Time ms ... The timeout value must fit in 32 bits.'
To help your debugging, you can apply this patch to eldap.erl (note that I've not even tried to compile this):
It forces a timeout of 43 seconds for commands, and prints on the log file some messages.
Anyway, I'm thinking now that maybe ejabberd should not list all the users on the LDAP server, but only the ones that have sometime logged on ejabberd. This behaviour should be used on Shared Roster Groups too: when you specify @all@, instead of adding all LDAP users to a shared roster group, add only the ones that logged on the Jabber server.
Won't Patch Correctly
Hi,
Thanks for the reply, but I can't get eldap.erl to patch correctly:
esdev2# patch < /root/ejabberd/eldap.erl.patch
Hmm... Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- eldap.erl (revisión: 433)
|+++ eldap.erl (copia de trabajo)
--------------------------
Patching file eldap.erl using Plan A...
Hunk #1 failed at 559.
Hunk #2 succeeded at 732 (offset -1 lines).
Hunk #3 succeeded at 765 (offset -1 lines).
1 out of 3 hunks failed--saving rejects to eldap.erl.rej
done
esdev2# cat eldap.erl.rej
***************
*** 559,565 ****
log2("~p~n",[{Name, Request}], S),
{ok, Bytes} = asn1rt:encode('ELDAPv3', 'LDAPMessage', Message),
ok = gen_tcp:send(S#eldap.fd, Bytes),
- Timer = erlang:start_timer(?CMD_TIMEOUT, self(), {cmd_timeout, Id}),
New_dict = dict:store(Id, [{Timer, From, Name}], S#eldap.dict),
{ok, S#eldap{id = Id,
dict = New_dict}}.
--- 559,567 ----
log2("~p~n",[{Name, Request}], S),
{ok, Bytes} = asn1rt:encode('ELDAPv3', 'LDAPMessage', Message),
ok = gen_tcp:send(S#eldap.fd, Bytes),
+ Timer = erlang:start_timer(43000, self(), {cmd_timeout, Id}),
+ io:format(" -- eldap - start_timer - ~p --~ncommand: ~p~nfrom: ~p~ns:~p~ntimer: ~p~n~n",
+ [erlang:now(), Command, From, S, Timer]),
New_dict = dict:store(Id, [{Timer, From, Name}], S#eldap.dict),
{ok, S#eldap{id = Id,
dict = New_dict}}.
esdev2# cat eldap.erl.patch
--- eldap.erl (revisión: 433)
+++ eldap.erl (copia de trabajo)
@@ -559,7 +559,9 @@
log2("~p~n",[{Name, Request}], S),
{ok, Bytes} = asn1rt:encode('ELDAPv3', 'LDAPMessage', Message),
ok = gen_tcp:send(S#eldap.fd, Bytes),
- Timer = erlang:start_timer(?CMD_TIMEOUT, self(), {cmd_timeout, Id}),
+ Timer = erlang:start_timer(43000, self(), {cmd_timeout, Id}),
+ io:format(" -- eldap - start_timer - ~p --~ncommand: ~p~nfrom: ~p~ns:~p~ntimer: ~p~n~n",
+ [erlang:now(), Command, From, S, Timer]),
New_dict = dict:store(Id, [{Timer, From, Name}], S#eldap.dict),
{ok, S#eldap{id = Id,
dict = New_dict}}.
@@ -732,6 +733,7 @@
cancel_timer(Timer) ->
erlang:cancel_timer(Timer),
+ io:format(" -- eldap - cancel_timer - ~p --~ntimer: ~p~n~n", [erlang:now(), Timer]),
receive
{timeout, Timer, _} ->
ok
@@ -764,6 +766,8 @@
%% Sort out timed out commands
%%-----------------------------------------------------------------------
cmd_timeout(Timer, Id, S) ->
+ io:format(" -- eldap - cmd_timeout - ~p --~ntimer: ~p~nid: ~p~ns:~p~n~n",
+ [erlang:now(), Timer, Id, S]),
Dict = S#eldap.dict,
case dict:find(Id, Dict) of
{ok, [{Timer, From, Name}|Res]} ->
I thought maybe this was because I wasn't using the CVS version, so I got eldap.erl from CVS, but it failed with the same message.
Thanks,
Marc
erlang CMD_TIMEOUT limit?
This couldn't be caused by another variable set in somewhere is the erlang install, could it?
I can adjust CMD_TIMEOUT to something lower (4000, 3000, etc.) and the time will decrease according to what its set at. I can't get the time to last any longer than 5 seconds and some change. I've tried setting CMD_TIMEOUT to 5050 and all the way up to 100000. It never goes past ~6 seconds!
This test server is a VMware virtual machine -- it couldn't be anything weird with time (the VMs are always a bit funk with the time stuff)?
Does anyone else use ejabberd & LDAP with lots (+13,000) users?
Thanks,
Marc
LDAP w/ lots of users, solution
Hi,
This problem has been resolved:
I'm not sure if this will be merged with the CVS branch or what, but its just a one line fix, well, and changing CMD_TIMEOUT to something larger than 5 seconds.
Thanks for everyones help!
--Marc