Looking for ejabberd docs?

To access the most up-to-date ejabberd documentation, please visit docs.ejabberd.im »

mod_muc: recognise more URI schemes in logged HTML (htmlize())?

Submitted by qu1j0t3 on Mon, 2007-06-04 17:05

Hi,

At my site we need to recognise https:// URLs. I've attached a patch to ejabberd 1.1.2 to do this.

But more generally, should more schemes be recognised, or should the regexp be generalised to match any lexically valid scheme?

--Toby

diff -Naur ejabberd-1.1.2-dist/src/mod_muc/mod_muc_log.erl ejabberd-1.1.2/src/mod_muc/mod_muc_log.erl
--- ejabberd-1.1.2-dist/src/mod_muc/mod_muc_log.erl     2006-09-27 16:49:57.000000000 -0400
+++ ejabberd-1.1.2/src/mod_muc/mod_muc_log.erl  2007-06-04 12:25:05.292940000 -0400
@@ -657,7 +657,7 @@
     S2 = element(2, regexp:gsub(S1, "\\&", "\\&amp;")),
     S3 = element(2, regexp:gsub(S2, "<", "\\&lt;")),
     S4 = element(2, regexp:gsub(S3, ">", "\\&gt;")),
-    S5 = element(2, regexp:gsub(S4, "(http|ftp)://.[^ ]*", "<a href=\"&\">&</a>")),
+    S5 = element(2, regexp:gsub(S4, "(https?|ftp)://.[^ ]*", "<a href=\"&\">&</a>")),
     S5.
 
 get_room_info(RoomJID, Opts) ->

There are a lot of schemes

Submitted by mfoss on Tue, 2007-06-05 14:29.

There are a lot of schemes at IANA, and maybe more on the future. So I think it's better to simply check for lexically valid scheme. Do you have knowledge/interest enought to try to write such a patch?

I found another related problem, maybe you can/want try to fix this too: when a message has an URI inside brackets, like:

Check the JSF site (http://www.jabber.org)

the generated HTML does an ugly thing with the ending bracket:

Check the JSF site (<a href="http://www.jabber.org)">http://www.jabber.org)</a>

I guess it should handle correctly characters like "", (), [], {}...

If you will not, tell me and I'll try on the following days.

something like this?

Submitted by qu1j0t3 on Sat, 2007-06-30 20:26.

I think this works okay. I included single quote as well as your suggestions.

--- ejabberd-1.1.3-dist/src/mod_muc/mod_muc_log.erl     2006-09-22 03:58:58.000000000 -0400
+++ ejabberd-1.1.3/src/mod_muc/mod_muc_log.erl  2007-06-30 16:21:17.066776505 -0400
@@ -657,7 +657,7 @@
     S2 = element(2, regexp:gsub(S1, "\\&", "\\&")),
     S3 = element(2, regexp:gsub(S2, "<", "\\<")),
     S4 = element(2, regexp:gsub(S3, ">", "\\>")),
-    S5 = element(2, regexp:gsub(S4, "(http|ftp)://.[^ ]*", "<a href=\"&\">&</a>")),
+    S5 = element(2, regexp:gsub(S4, "[-+.a-zA-Z0-9]+://[^] )\'\"}]+", "<a href=\"&\">&</a>")),
     S5.
 
 get_room_info(RoomJID, Opts) ->

afterthought

Submitted by qu1j0t3 on Sun, 2007-07-01 00:08.

'>' should probably end a URL also.

I updated your patch to

Submitted by mfoss on Tue, 2007-07-03 17:44.

I updated your patch to ejabberd SVN and submitted here: Recognise more URI schemes in logged HTML.

I guess it will be applied soon.

The patch was commited to

Submitted by mfoss on Tue, 2007-07-17 08:28.

The patch was commited to ejabberd SVN. So next version of ejabberd will have this enhancement.