mod_muc: recognise more URI schemes in logged HTML (htmlize())?

Hi,

At my site we need to recognise https:// URLs. I've attached a patch to ejabberd 1.1.2 to do this.

But more generally, should more schemes be recognised, or should the regexp be generalised to match any lexically valid scheme?

--Toby

diff -Naur ejabberd-1.1.2-dist/src/mod_muc/mod_muc_log.erl ejabberd-1.1.2/src/mod_muc/mod_muc_log.erl
--- ejabberd-1.1.2-dist/src/mod_muc/mod_muc_log.erl     2006-09-27 16:49:57.000000000 -0400
+++ ejabberd-1.1.2/src/mod_muc/mod_muc_log.erl  2007-06-04 12:25:05.292940000 -0400
@@ -657,7 +657,7 @@
     S2 = element(2, regexp:gsub(S1, "\\&", "\\&")),
     S3 = element(2, regexp:gsub(S2, "<", "\\&lt;")),
     S4 = element(2, regexp:gsub(S3, ">", "\\&gt;")),
-    S5 = element(2, regexp:gsub(S4, "(http|ftp)://.[^ ]*", "<a href=\"&\">&</a>")),
+    S5 = element(2, regexp:gsub(S4, "(https?|ftp)://.[^ ]*", "<a href=\"&\">&</a>")),
     S5.
 
 get_room_info(RoomJID, Opts) ->

There are a lot of schemes

There are a lot of schemes at IANA, and maybe more on the future. So I think it's better to simply check for lexically valid scheme. Do you have knowledge/interest enought to try to write such a patch?

I found another related problem, maybe you can/want try to fix this too: when a message has an URI inside brackets, like:

Check the JSF site (http://www.jabber.org)

the generated HTML does an ugly thing with the ending bracket:

Check the JSF site (<a href="http://www.jabber.org)">http://www.jabber.org)</a>

I guess it should handle correctly characters like "", (), [], {}...

If you will not, tell me and I'll try on the following days.

something like this?

I think this works okay. I included single quote as well as your suggestions.

--- ejabberd-1.1.3-dist/src/mod_muc/mod_muc_log.erl     2006-09-22 03:58:58.000000000 -0400
+++ ejabberd-1.1.3/src/mod_muc/mod_muc_log.erl  2007-06-30 16:21:17.066776505 -0400
@@ -657,7 +657,7 @@
     S2 = element(2, regexp:gsub(S1, "\\&", "\\&")),
     S3 = element(2, regexp:gsub(S2, "<", "\\<")),
     S4 = element(2, regexp:gsub(S3, ">", "\\>")),
-    S5 = element(2, regexp:gsub(S4, "(http|ftp)://.[^ ]*", "<a href=\"&\">&</a>")),
+    S5 = element(2, regexp:gsub(S4, "[-+.a-zA-Z0-9]+://[^] )\'\"}]+", "<a href=\"&\">&</a>")),
     S5.
 
 get_room_info(RoomJID, Opts) ->

afterthought

'>' should probably end a URL also.

I updated your patch to

I updated your patch to ejabberd SVN and submitted here: Recognise more URI schemes in logged HTML.

I guess it will be applied soon.

The patch was commited to

The patch was commited to ejabberd SVN. So next version of ejabberd will have this enhancement.

Syndicate content