ejabberd - Comments for "Character sets with mod_irc" https://www.ejabberd.im/node/4270 en Patch applied. https://www.ejabberd.im/node/4270#comment-56791 <div class="quote-msg"> <div class="quote-author"><em>olliex</em> wrote:</div> <p>The best patch is the patch in the file (W). I applied only this to 2.1.5 and it caused that the unknown characters (over first 7 bits) were dropped when they were not utf-8. The utf-8 content showed correctly. I never tested other characters sets than utf-8, however.</p> <p>Having this patch and the default changed to utf-8 would be good.</p></div> <p>Ok, I've applied patch W to ejabberd 2.1.x.</p> <p>Changing now in ejabberd 2.1.6 the default value of an option could be confusing to server admins. Interested admins can use the option to set utf-8 as default, now that the option really works.</p> Tue, 02 Nov 2010 21:47:39 +0000 mfoss comment 56791 at https://www.ejabberd.im The best patch is the patch https://www.ejabberd.im/node/4270#comment-56780 <p>The best patch is the patch in the file (W). I applied only this to 2.1.5 and it caused that the unknown characters (over first 7 bits) were dropped when they were not utf-8. The utf-8 content showed correctly. I never tested other characters sets than utf-8, however.</p> <p>Having this patch and the default changed to utf-8 would be good.</p> Fri, 29 Oct 2010 09:43:49 +0000 olliex comment 56780 at https://www.ejabberd.im Which patches? https://www.ejabberd.im/node/4270#comment-56771 <p>Sorry, more urgent and relevant tasks (for me, I mean) have arrived and I can't invest more time investigating this problem right now. Of course, if you get your hands on erlang and find later a better solution, please open a ticket and propose your patch.</p> <div class="quote-msg"> <div class="quote-author"><em>olliex</em> wrote:</div> <p>Do you think the character set conversion (or guessing) should/can be done in D?</p></div> <p>I don't know if conversion should/can. And I suspect guessing isn't possible.</p> <div class="quote-msg"> <div class="quote-author"><em>olliex</em> wrote:</div> <p>If not, do you know whether we will get your mod_irc utf-8 patch to some official ejabberd release?</p></div> <p>Yes, let's be pragmatic and commit into ejabberd mainline whatever you consider is better for general case (and breaks less the current mod_irc behaviour). In this sense, this thread contains four patches. Exactly which of them do you consider suitable for inclusion in ejabberd?</p> <p>Q) <a href="http://www.ejabberd.im/node/4270#comment-56609" title="http://www.ejabberd.im/node/4270#comment-56609">http://www.ejabberd.im/node/4270#comment-56609</a><br /> W) <noindex><a href="http://tkabber.jabber.ru/files/badlop/4270-215-ircencoding.patch" title="http://tkabber.jabber.ru/files/badlop/4270-215-ircencoding.patch" rel="nofollow" >http://tkabber.jabber.ru/files/badlop/4270-215-ircencoding.patch</a></noindex><br /> E) first of <a href="http://www.ejabberd.im/node/4270#comment-56725" title="http://www.ejabberd.im/node/4270#comment-56725">http://www.ejabberd.im/node/4270#comment-56725</a><br /> R) second of <a href="http://www.ejabberd.im/node/4270#comment-56725" title="http://www.ejabberd.im/node/4270#comment-56725">http://www.ejabberd.im/node/4270#comment-56725</a></p> Wed, 27 Oct 2010 22:25:00 +0000 mfoss comment 56771 at https://www.ejabberd.im I add some more information https://www.ejabberd.im/node/4270#comment-56752 <p>I add some more information to make this case clear:</p> <p>Without XMPP and user has utf-8 terminal settings (IRC client and server only):<br /> Client sends: iso8859, utf-8 or other character set. Client is recommended to convert from utf-8 to iso8859 if explicitly set by the client to do so. User should be able to force the settings "this channel/user has character set X".<br /> Client receives: iso8859, utf-8 or other character set. Client converts other than utf-8 to utf-8 and doesn't touch the utf-8 content. Client user can specify that "this channel/user has character set X". If client user did not specify, some guessing is used.</p> <p>With XMPP and user has utf-8 terminal settings:<br /> Client sends: utf-8 only. This is OK, as it forces others to update to utf-8 if they see wrong characters.<br /> Client receives: utf-8 only. This is not OK. XMPP should allow "write utf-8, read any" behavior. The XMPP user should not be forced to switch to IRC client. The IRC client user should be forced to utf-8. Because of this limitation all of this character set user/channel specific character set conversion must be done by the XMPP IRC transport.</p> Tue, 19 Oct 2010 08:28:36 +0000 olliex comment 56752 at https://www.ejabberd.im I reorder the letters a https://www.ejabberd.im/node/4270#comment-56749 <p>I reorder the letters a little bit to see it better:</p> <p>A) IRC client with iso8859 encoding<br /> B) XMPP client with utf-8 encoding<br /> C) XMPP server with utf-8 encoding<br /> D) IRC-XMPP transport<br /> E) IRC server with iso8859 encoding</p> <p>So D speaks (IRC server-&gt;client) iso8859 OR utf-8 with E and D speaks utf-8 with C.<br /> This leaves two solutions:<br /> 1. If D checks the characters set (to convert?) it has to know that it is iso8859.<br /> 2. If XMPP had the support, characters might be passed as-is from D to A.</p> <p>Now if the user uses utf-8 IRC client (A) as the current practice is, the character set has to be converted from X to utf-8 or do no conversion when it's already utf-8. The client must check at least that the input is utf-8 or not utf-8.</p> <p>When it's not utf-8 the client uses some guessing. It might not use guessing if the user has defined that this-and-this channel (or "room") uses iso8859 or this-and-this user uses this-and-this character set.</p> <p>The utf-8-only limitation (?) in XMPP causes that only option 1 can be used with guessing. The guessing would be a good feature if it works. I think the guessing would also need some extra parameters such as language to work better. That, again, needs some channel specific information (user might talk two different languages on different channels).</p> <p>I can live with the utf-8-only support for D. The utf-8 is anyway recommended for IRC. I still think we can't change the users to use utf-8-only OR the xmpp clients with IRC extensions to support other than utf-8 because there is no need to do it.</p> <p>Do you think the character set conversion (or guessing) should/can be done in D? If not, do you know whether we will get your mod_irc utf-8 patch to some official ejabberd release?</p> Mon, 18 Oct 2010 13:20:09 +0000 olliex comment 56749 at https://www.ejabberd.im I think the problem are the bugs, not the protocol https://www.ejabberd.im/node/4270#comment-56740 <div class="quote-msg"> <div class="quote-author"><em>olliex</em> wrote:</div> <p>1. RFC 3920: XMPP implementations MUST NOT attempt to use any other encoding" than UTF-8<br /> 2. Olliex: So this statement seems that xmpp can't be used for IRC at all. </p></div> <p>I don't see 2. as a consequence of 1.</p> <p>Let's imagine the worst scenario regarding encodings:<br /> A) IRC client with iso8859 encoding<br /> B) IRC server with iso8859 encoding<br /> C) IRC-XMPP transport<br /> D) XMPP server with utf-8 encoding<br /> E) XMPP client with utf-8 encoding</p> <p>That scenario satisfies 1., and will work perfectly as long as C speaks in iso8859 encoding with B, speaks in utf-8 encoding with D, and converts the content between encodings correctly.</p> <p>If your tests, C is mod_irc and D is ejabberd. You get encoding problems, and I think that means there are bugs in one or several programs (maybe in mod_irc), but I can't yet conclude that it couldn't work once the bugs are fixed.</p> Fri, 15 Oct 2010 19:08:59 +0000 mfoss comment 56740 at https://www.ejabberd.im Both of these iconv.erl https://www.ejabberd.im/node/4270#comment-56732 <p>Both of these iconv.erl patches caused a graphical question mark to be displayed. Then after this the irc client (irssi) showed a growing lag number and I was not able to write anything after the first characters were shown. The characters were some autoreplies to the channel joining.</p> <p>I was using the irssi-xmpp-plugin and irssi. It might be that a bug in this plugin caused the hanging. But I checked their page and they claim to support utf-8 only. I also checked the mcabber page. They also claim to support utf-8 only as the spec mandates.</p> <p>Then I checked the xmpp protocol (RFC 3920). It claims that "Implementations MUST NOT attempt to use any other encoding" than UTF-8. So this statement seems that xmpp can't be used for IRC at all. This is quite sad, really. The xmpp IRC solutions can't be used for flexible, backwards compatible "real world" use cases. XMPP people can create their own IRC channels but they can't contact other IRC users and discuss with them. IRC users will continue to use IRC servers. What's the point?</p> <p>In any case I think the first patch enables the correct behavior for ejabberd to start using the utf-8. I think it even should be made static and unchangeable as the spec says. I think the "default encoding" is useless and can be removed to add support for utf-8-only.</p> Tue, 12 Oct 2010 08:18:27 +0000 olliex comment 56732 at https://www.ejabberd.im Patch to disable conversion https://www.ejabberd.im/node/4270#comment-56725 <div class="quote-msg"> <div class="quote-author"><em>olliex</em> wrote:</div> <p>This seems like some code somewhere (in ejabberd?) is checking the "utf-8 validity" and strips out those "incorrect" characters.</p> <p>Normally an IRC client (at least irssi) does the character conversion because it is possible to convert from ISO-8859-1 to utf-8. So I think some type of "pass through", "no conversion" or "no check" option is needed to ejabberd so that it can transfer the character set check responsibility to the IRC client. </p></div> <p>This patch avoids making a stupid conversion (for example from utf-8 to utf-8):</p> <pre> --- a/src/mod_irc/iconv.erl +++ b/src/mod_irc/iconv.erl @@ -84,6 +84,8 @@ terminate(_Reason, Port) -&gt; +convert(From, To, String) when From == To -&gt; + String; convert(From, To, String) -&gt; [{port, Port} | _] = ets:lookup(iconv_table, port), Bin = term_to_binary({From, To, String}), </pre><p> This patch disables conversion at all, because in all cases the original string is returned without any change:</p> <pre> --- a/src/mod_irc/iconv.erl +++ b/src/mod_irc/iconv.erl @@ -85,6 +85,8 @@ terminate(_Reason, Port) -&gt; convert(From, To, String) -&gt; + String; +convert(From, To, String) -&gt; [{port, Port} | _] = ets:lookup(iconv_table, port), Bin = term_to_binary({From, To, String}), BRes = port_control(Port, 1, Bin), </pre> Mon, 11 Oct 2010 11:14:19 +0000 mfoss comment 56725 at https://www.ejabberd.im I applied the patch and it https://www.ejabberd.im/node/4270#comment-56714 <p>I applied the patch and it went OK without error messages. Now this looks much more promising. The people writing utf-8 show correct characters. However now the people writing ISO-8859-1 show no characters at all if the character is over the first 7 bits.</p> <p>This seems like some code somewhere (in ejabberd?) is checking the "utf-8 validity" and strips out those "incorrect" characters.</p> <p>Normally an IRC client (at least irssi) does the character conversion because it is possible to convert from ISO-8859-1 to utf-8. So I think some type of "pass through", "no conversion" or "no check" option is needed to ejabberd so that it can transfer the character set check responsibility to the IRC client.</p> <p>Would it be technically possible to bring this support to ejabberd?</p> Thu, 07 Oct 2010 07:49:20 +0000 olliex comment 56714 at https://www.ejabberd.im Second attempt https://www.ejabberd.im/node/4270#comment-56697 <div class="quote-msg"> <div class="quote-author"><em>olliex</em> wrote:</div> <p>Now I downloaded 2.1.5 sources, changed the mod_irc.erl with the changes in the patch, compiled and installed and added "utf-8" to the ejabberd.cfg. Still the same problem exists: people writing utf-8 show wrong characters but people writing 8859-1 show correct characters.</p></div> <p>It seems the option was not yet read by mod_irc.erl. Notice that I only test the code compiles, I don't test the functionality myself.</p> <p>I've rewritten the patch, get the new version here:<br /> <noindex><a href="http://tkabber.jabber.ru/files/badlop/4270-215-ircencoding.patch" title="http://tkabber.jabber.ru/files/badlop/4270-215-ircencoding.patch" rel="nofollow" >http://tkabber.jabber.ru/files/badlop/4270-215-ircencoding.patch</a></noindex><br /> You need to revert the previous patch, or get the original file.</p> <p>Let's hope this time the patch applies cleanly to your 2.1.5.</p> <p>Everytime the option is requested (either read from the config table, or using the default value), a line is written to ejabberd.log "The default_encoding configured for host ... is ...". This allows you to check if the option is read or not. If all works well, you can remove that line of your mod_irc.erl</p> Sun, 03 Oct 2010 22:06:57 +0000 mfoss comment 56697 at https://www.ejabberd.im Changed https://www.ejabberd.im/node/4270#comment-56692 <p>Now I downloaded 2.1.5 sources, changed the mod_irc.erl with the changes in the patch, compiled and installed and added "utf-8" to the ejabberd.cfg. Still the same problem exists: people writing utf-8 show wrong characters but people writing 8859-1 show correct characters.</p> Sat, 02 Oct 2010 12:19:28 +0000 olliex comment 56692 at https://www.ejabberd.im Change manual https://www.ejabberd.im/node/4270#comment-56637 <div class="quote-msg"> <div class="quote-author"><em>olliex</em> wrote:</div> <p>Thanks for the changes. For which release is it? I tried to apply it for 2.1.5 and got some errors (only 1/3 hunks succeeded with fuzzy logic).</p></div> <p>It's for ejabberd 2.1.5. Maybe the forum converted spaces to tabs, or viceversa. It's just 9 lines, you can change them manually.</p> Tue, 21 Sep 2010 17:01:11 +0000 mfoss comment 56637 at https://www.ejabberd.im Thanks for the changes. For https://www.ejabberd.im/node/4270#comment-56629 <p>Thanks for the changes. For which release is it? I tried to apply it for 2.1.5 and got some errors (only 1/3 hunks succeeded with fuzzy logic).</p> Sun, 19 Sep 2010 14:41:22 +0000 olliex comment 56629 at https://www.ejabberd.im olliex wrote: The system I'm https://www.ejabberd.im/node/4270#comment-56609 <div class="quote-msg"> <div class="quote-author"><em>olliex</em> wrote:</div> <p>The system I'm running is Ubuntu Lucid Lynx. I have this line in ejabberd.cfg:<br /> {mod_irc, [{access, all}, {default_encoding, "utf-8"}]},</p> <p>There seems to be no effect changing the default_encoding. What I should change next?</p></div> <p>That option is documented in the Guide, but I didn't see it implemented in the code. It can be implemented with this patch, can you try it and report if now it works?</p> <pre> --- a/src/mod_irc/mod_irc.erl +++ b/src/mod_irc/mod_irc.erl @@ -330,7 +330,7 @@ do_route1(Host, ServerHost, From, To, Packet) -&gt; [] -&gt; ?DEBUG("open new connection~n", []), {Username, Encoding, Port, Password} = get_connection_params( - Host, From, Server), + Host, ServerHost, From, Server), ConnectionUsername = case Packet of %% If the user tries to join a @@ -662,7 +662,14 @@ set_form(_Host, _, _, _Lang, _XData) -&gt; {error, ?ERR_SERVICE_UNAVAILABLE}. +%% Host = "irc.example.com" +%% ServerHost = "example.com" get_connection_params(Host, From, IRCServer) -&gt; + [_ | HostTail] = string:tokens(Host, "."), + ServerHost = string:join(HostTail, "."), + get_connection_params(Host, ServerHost, From, IRCServer). + +get_connection_params(Host, ServerHost, From, IRCServer) -&gt; #jid{user = User, server = _Server, luser = LUser, lserver = LServer} = From, US = {LUser, LServer}, @@ -682,7 +689,10 @@ get_connection_params(Host, From, IRCServer) -&gt; {value, {_, Encoding}} -&gt; {Username, Encoding, ?DEFAULT_IRC_PORT, ""}; _ -&gt; - {Username, ?DEFAULT_IRC_ENCODING, ?DEFAULT_IRC_PORT, ""} + Encoding = gen_mod:get_module_opt( + ServerHost, ?MODULE, default_encoding, + ?DEFAULT_IRC_ENCODING), + {Username, Encoding, ?DEFAULT_IRC_PORT, ""} end, {NewUsername, NewEncoding, </pre><p> Each user can also define what encoding he wants when connecting to specific IRC servers. This is possible by "registering" with the IRC transport.</p> Mon, 13 Sep 2010 08:38:42 +0000 mfoss comment 56609 at https://www.ejabberd.im