We are seeing some strangeness with BOSH connections and parallel requests, and I think they may have to do with the BOSH request and hold parameters.
The setup we have is a pretty stock ejabberd server with a home grown gloox-based client connecting to it. What we're seeing is that the client will sometimes get stuck in a state where it is unable to send messages so they get queued up for sending when a connection is available. Whenever there are 2 parallel http requests, it seems that the client needs to wait for a timeout before it can send another message.
During the BOSH session creation the client specifies a 'hold' parameter of 2 (meaning the CM should hold at max 2 parallel HTTP requests open). From line 152 of ejabberd_http_bind.erl (and my poor understanding of erlang), it looks to me like the Hold variable would be set to 1 (MAX_REQUESTS - 1) since max requests is hard coded to be 2:
Hold = case
string:to_integer(
xml:get_attr_s("hold",Attrs))
of
{error, _} ->
(?MAX_REQUESTS - 1);
{CHold, _} ->
if
(CHold > (?MAX_REQUESTS - 1)) ->
(?MAX_REQUESTS - 1);
true ->
CHold
end
end,
Then in the servers response to the BOSH session setup, it looks like 'requests' gets set to 2 (hold = 1 from above plus 1). Here's the code from line 874 of ejabberd_http_bind.erl. We've also confirmed that the 'requests' value returned in the BOSH session setup response is actually 2.
{200, ?HEADER,
xml:element_to_string(
{xmlelement,"body",
[{"xmlns",
?NS_HTTP_BIND},
{"sid", Sid},
{"wait", integer_to_list(Wait)},
{"requests", integer_to_list(Hold+1)},
<other stuff chopped out>
] ++ BOSH_attribs,OutEls})}
I would think the server should return the 'hold' attribute here, since the server has a different value than the client sent. But as long as the server respects the hold=1 value, I guess it doesn't matter.
Here's where we're seeing the issue. Once the client sends up 2 parallel requests, they both block (assuming the server has nothing to respond to the client with). It's almost as if the server isn't actually respecing the Hold variable and is actually holding more than 1 request open. From my reading of the logic, I would have expected the server to respond instantly to the second HTTP request, so only 1 is held open in parallel. We're going to try modifying the client to have hold=1 in the original request, but I'm doubtful that will change the behavior.
Any ideas if this is a bug or am I just misunderstanding what should be happening.
Issue with Parallel BOSH requests
We've done some more debugging, and it appears that the response from the client, which is based on the gloox XMPP SDK, is sending a response on one of the HTTP channels, but for some reason the ejabberd server is not receiving the response until a 30 second timeout has expired. When we get in this situation, we see 2 parallel requests open, and can watch the http request leave the client - but nothing shows up in the ejabberd log for around 30 seconds. It's as if the request is somehow cached in the ejabberd's HTTP processing (socket?) and the state machine doesn't get around to looking at it until max requests goes below 2. Is there a way to confirm that's the case, and figure out what the 2 parallel requests are when we get into this situation?
After yet more debugging, it
After yet more debugging, it appears that the system is only able to handle a single HTTP request per socket. Basically the pipelining option doesn't actually pipeline, it just processes the requests serially. It's not clear if this is a feature/issue with the gloox SDK or ejabberd.