ejabberd - Comments for "strange loop - ejabberd_router" https://www.ejabberd.im/node/4670 en re https://www.ejabberd.im/node/4670#comment-57333 <div class="quote-msg"> <div class="quote-author"><em>dzejefkej</em> wrote:</div> <p> - get_status does not work (c2s hangs in call - thats mysterious also btw.)</p></div> <p>Try:<br /> ejabberd_sm:get_session_pid("badlop", "localhost", "home").</p> Sat, 14 May 2011 10:17:15 +0000 mfoss comment 57333 at https://www.ejabberd.im strange loop https://www.ejabberd.im/node/4670#comment-57325 <p>I think that {ejabberd_c2s,bounce_messages,0} is a root of my problems. It is executed endlessly.<br /> It is because session is not removed from session table.. but why?;)</p> Thu, 12 May 2011 21:53:38 +0000 dzejefkej comment 57325 at https://www.ejabberd.im strange loop https://www.ejabberd.im/node/4670#comment-57310 <div class="quote-msg"> <div class="quote-author"><em>badlop</em> wrote:</div> <div class="quote-msg"> <div class="quote-author"><em>dzejefkej</em> wrote:</div> <p>how to discover Pid of a reciever of the c2s?) </p></div> <p>Maybe I didn't understand your question. ejabberd creates two processes per XMPP session (one for sm and one for c2s). In this example, the processes are 0.483.0 and 0.484.0:</p> </div> <p>Your right, but that info comes from INFO debbug level. I cannot turn it on on production.<br /> Is there any other way to discover Pid of the reciever belonging to c2s?<br /> - get_status does not work (c2s hangs in call - thats mysterious also btw.)<br /> - process_info on pid which is lower by 1 integer 483 and 484 returns undefined, however under heavy traffic there is no guarantee that those pids diff by 1.</p> <div class="quote-msg"> <div class="quote-author"><em>badlop</em> wrote:</div> <div class="quote-msg"> <div class="quote-author"><em>dzejefkej</em> wrote:</div> <p> The data flow is:<br /> ejabberd_router:route<br /> ejabberd_router:do_route<br /> ejabberd_sm:route</p> <p> and again _router..</p> <p> How to determine if those are new messages or some loop in the code? </p></div> <p>Check if the function is called with the same arguments several times:</p> <pre> --- a/src/ejabberd_router.erl +++ b/src/ejabberd_router.erl @@ -65,6 +65,7 @@ start_link() -&gt; route(From, To, Packet) -&gt; + ?INFO_MSG("ejabberd_router:route:~nFrom: ~p~nTo: ~p~nPacket: ~p", [ From, To, Packet]), case catch do_route(From, To, Packet) of {'EXIT', Reason} -&gt; ?ERROR_MSG("~p~nwhen processing: ~p", </pre></div> <p>I attached DBG on production to malfunctioning process. It is all the time the same presence unavailable stanza<br /> with From = To. I traced calls and messages on _router, _local, _sm, _c2s, _roster. Here is the result, sometimes instead of presence I see unsubscribe. Couple sec of dbg generates ~50mb of trace :/</p> <div class="codeblock"><code>(&lt;6567.1465.114&gt;) &lt;&lt; {route,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {jid,&quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;},<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {jid,&quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;},<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {xmlelement,&quot;presence&quot;,[{&quot;type&quot;,&quot;unavailable&quot;}],[]}}<br />(&lt;6567.1465.114&gt;) call ejabberd_router:route({jid,&quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;},{jid,&quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;},{xmlelement,&quot;presence&quot;,[{&quot;type&quot;,&quot;unavailable&quot;}],[]})<br />(&lt;6567.1465.114&gt;) call ejabberd_local:route({jid,&quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;},{jid,&quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;},{xmlelement,&quot;presence&quot;,[{&quot;type&quot;,&quot;unavailable&quot;}],[]})<br />(&lt;6567.1465.114&gt;) call ejabberd_sm:route({jid,&quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;},{jid,&quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,&quot;example.test.com&quot;,&quot;mobile&quot;},{xmlelement,&quot;presence&quot;,[{&quot;type&quot;,&quot;unavailable&quot;}],[]})<br />(&lt;6567.1465.114&gt;) &lt;6567.1465.114&gt; ! {route,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {jid,&quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;},<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {jid,&quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;},<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {xmlelement,&quot;presence&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [{&quot;type&quot;,&quot;unavailable&quot;}],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; []}}<br />(&lt;6567.1465.114&gt;) &lt;&lt; {route,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {jid,&quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;},<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {jid,&quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;test43&quot;,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &quot;example.test.com&quot;,&quot;mobile&quot;},<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {xmlelement,&quot;presence&quot;,[{&quot;type&quot;,&quot;unavailable&quot;}],[]}}</code></div> <p>So the SM asks for PID of To - it gets it, because it is in session table. It routs the presence to the PID and gets it back from router. Is there any mechanism to prevent such a behavior when From = To ? Also.. I do not see c2s here in dbg, so even if it hangs in 'call' why it does not time out?</p> <div class="quote-msg"> <div class="quote-author"><em>badlop</em> wrote:</div> <div class="quote-msg"> <div class="quote-author"><em>dzejefkej</em> wrote:</div> <p>=ERROR REPORT==== 2011-05-05 23:37:17 ===<br /> E(&lt;0.7292.2960&gt;:ejabberd_c2s:1154) : Unexpected info: {'DOWN',<br /> #Ref&lt;0.0.1439.188998&gt;,<br /> process,<br /> &lt;10545.13676.2940&gt;,<br /> killed} </p></div> <p>A process number 0.XXX.0 means that the process lives in the local node where you see that number. A number like 123.XXX.0 means that the process lives in a remote node of the cluster. </p> <p>Maybe ejabberd got confused with the routing scheme, and is sending a XMPP packet from one node to the other, and back to the first, and to the second... until some process dies or timeouts?</p></div> <p>Yeah, I got the pid structure, for now it looks like completely unrelated to 'loop'. Still I do not like correlation in time of those 2 issues. How to investigate routing scheme? How would you look for a reason to kill this local process?<br /> I thought that socket may be closed, this kills receiver, which kills c2s and eventually affects the other side of conversation.</p> <p>appreciate any hints on that..</p> Wed, 11 May 2011 19:34:56 +0000 dzejefkej comment 57310 at https://www.ejabberd.im dzejefkej wrote: how to https://www.ejabberd.im/node/4670#comment-57302 <div class="quote-msg"> <div class="quote-author"><em>dzejefkej</em> wrote:</div> <p>how to discover Pid of a reciever of the c2s?) </p></div> <p>Maybe I didn't understand your question. ejabberd creates two processes per XMPP session (one for sm and one for c2s). In this example, the processes are 0.483.0 and 0.484.0:</p> <pre> =INFO REPORT==== 11-May-2011::00:44:10 === I(&lt;0.476.0&gt;:ejabberd_listener:281) : (#Port&lt;0.5268&gt;) Accepted connection {{127,0,0,1},59334} -&gt; {{127,0,0,1},5222} =INFO REPORT==== 11-May-2011::00:44:10 === I(&lt;0.484.0&gt;:ejabberd_c2s:517) : ({socket_state,gen_tcp,#Port&lt;0.5268&gt;,&lt;0.483.0&gt;}) Accepted legacy authentication for badlop@localhost/work by ejabberd_auth_internal </pre><div class="quote-msg"> <div class="quote-author"><em>dzejefkej</em> wrote:</div> <p> The data flow is:<br /> ejabberd_router:route<br /> ejabberd_router:do_route<br /> ejabberd_sm:route</p> <p> and again _router..</p> <p> How to determine if those are new messages or some loop in the code? </p></div> <p>Check if the function is called with the same arguments several times:</p> <pre> --- a/src/ejabberd_router.erl +++ b/src/ejabberd_router.erl @@ -65,6 +65,7 @@ start_link() -&gt; route(From, To, Packet) -&gt; + ?INFO_MSG("ejabberd_router:route:~nFrom: ~p~nTo: ~p~nPacket: ~p", [ From, To, Packet]), case catch do_route(From, To, Packet) of {'EXIT', Reason} -&gt; ?ERROR_MSG("~p~nwhen processing: ~p", </pre><div class="quote-msg"> <div class="quote-author"><em>dzejefkej</em> wrote:</div> <p>=ERROR REPORT==== 2011-05-05 23:37:17 ===<br /> E(&lt;0.7292.2960&gt;:ejabberd_c2s:1154) : Unexpected info: {'DOWN',<br /> #Ref&lt;0.0.1439.188998&gt;,<br /> process,<br /> &lt;10545.13676.2940&gt;,<br /> killed} </p></div> <p>A process number 0.XXX.0 means that the process lives in the local node where you see that number. A number like 123.XXX.0 means that the process lives in a remote node of the cluster. </p> <p>Maybe ejabberd got confused with the routing scheme, and is sending a XMPP packet from one node to the other, and back to the first, and to the second... until some process dies or timeouts?</p> Tue, 10 May 2011 22:53:08 +0000 mfoss comment 57302 at https://www.ejabberd.im