If you put in your body this kind of code : <div>http://www.example.com</div>, the URL is not converted into a hyperlink. Actually, there is more than this : in the following example, lines 3 and 4 fail.

1. URL alone (followed by a stop) : http://www.example.com.
2. URL in a p tag : <p>http://www.example.com</p>
3. URL in a p tag containing HTML attributes : <p style="border:1px solid black">http://www.example.com</p>
4. URL in a div tag : <div>http://www.example.com</div>
5. URL without nothing following : http://www.example.com

(By reading my post preview, I see a bug on line 1, but this is caused by <code> tag. Another issue to raise ? Or maybe changing the filters order ?)

Also, I saw that the regular expression used to match incomplete URLs (like www.example.com) was a little bit different than the one used for full URLs : it doesn't work when following a <br> tag :
6. www adress following a &lt;br&gt; tag : <br />www.example.com
won't convert the URL.

Finally, seeing the regular expressions used, a URL, with no following text at all, should not match the patters. But it does, because, for a reason I ignore, a space is added to the end of the text the filter is applied on.

I want to stress that there must be no confusion : what is on grey background in my post is exactly what is to be pasted in a node's textarea, to encounter the bugs I am explaining (try yourself ;) ). The fact that only one URL is converted here is just side-effect.

So I tried to simplify and fix the regular expressions used to match full URLs, e-mail addresses and "www domains/addresses". Here is a list of things I did :

  • replacing the regular-expression surrounding double quotes for simple quotes (double quotes are useless and may be source of confusion : \n would be converted to a line-break with double quotes)
  • using a positive look-behind to match what must precede the URL, this avoids parasite contain inside $match variable
  • changing the index used to get $match content accordingly
  • use \s , which is an "any space" entity, instead of " \n\r\t" inside of the character class (square brackets)
  • remove the backslash escaping an open bracket, which is useless inside a character class
  • remove the part matching exactly <p>, <li>, or any kind of <br>, and, instead, matching a closing angled bracket (or greater-than sign, > which can be a sign of a previous tag)
  • repeating the three points above at the end of the regular expression
  • merging the part that checks if the URL is followed by a punctuation sign, and the part that "looks ahead" for a space character, or closing bracket, or opening angled bracket (potential sign of a following tag)
  • remove the now useless concatenations to $match[1] and $match[3], now that there are only "look arround" expressions around the part matching the URL
  • I did not modify the part in the regular expressions that is actually matching a URL/e-mail

I would like to complete SimpleTest code for this, but I didn't find any place where to download Drupal 7 core SimpleTest files (anyone can help ?).

Please find my patch attached.

Regards,

David

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Status: Needs review » Needs work

Well. Those are crazy regular expressions :)

We have a decent test suite for the URL filter. You should start by extending the tests for the cases you consider the URL filter doesn't do its job correctly. Then you can try to fix the errors.

Damien Tournoud’s picture

As I expected, your patch breaks several existing tests for the URL filter:

127 passes, 6 fails, and 0 exceptions

Developing a complex regular expression without a decent set of input data is not really a good idea :)

David Stosik’s picture

You are right. :)
Could you please tell me where can I get the tests files ?
Thanks,
David

Damien Tournoud’s picture

There are directly in the Drupal 7 code base (modules/filter/filter.test), especially FilterUnitTestCase::testUrlFilter().

David Stosik’s picture

Thanks. I should have seen that SimpleTest is in Drupal 7 core, instead of getting SimpleTest 2.x. :)

So, I've got some work. By the way, my patch fails at tests cases which are not very good, IMHO (just try to add a space between <code> or <script> and the URL, and then the test fails :). I mean, I could make these tests fail whith the current implementation, if I changed little bits in them (which would keep them valid).

Will try to post a better patch as soon as possible.
Damien, can you confirm me that what I first raise (URL in a div is not converted) can be considered as a bug ?

Regards,
David

Damien Tournoud’s picture

(3) is most certainly a bug. Not completely sure about (4). We certainly have to exclude some tags from processing (at least <pre>, <code>, <object>, <a>, ...)... but we might be able to switch from a whitelist to a blacklist?

David Stosik’s picture

Title: Form filter does not convert a link inside a div and other fixes » URL filter does not convert a link inside a div and other fixes
FileSize
3.9 KB

I don't have a solution at the moment, but I do know some use cases where URL filter fails, not handled by test cases.
I added them to the filter.test file, and provide a patch.
Here is a list :

  • Converting URLs -- following &nbsp; entity.
  • Converting URLs -- inside a <div> tag.
  • Converting domain names -- domain in a list, <li> tag with attributes.
  • Converting URLs -- really skip code contents (2).
  • Converting URLs -- do not process multiple line scripts.

These tests fail at the moment, and I think that should be considered as a bug. So, I will try to work out the regular expression (or maybe a little more complex system). Any help is welcome.

Regards,

David

sun’s picture

Status: Needs work » Postponed

Postponing on #161217: URL filter breaks generated href tags

This issue/patch is most probably obsolete afterwards. If some detail remains, we will care for that in this issue after #161217 landed.

sun’s picture

Status: Postponed » Needs review
FileSize
985 bytes

This is what remains.

sun’s picture

Title: URL filter does not convert a link inside a div and other fixes » URL filter does not convert a link inside a div
Status: Needs review » Reviewed & tested by the community

So it seems like #161217: URL filter breaks generated href tags resolved this issue. Let's just add the assertions then.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed the extra tests. Thanks.

David Stosik’s picture

Version: 7.x-dev » 6.x-dev

Thanks !

May a backport be possible ?

sun’s picture

Version: 6.x-dev » 7.x-dev

Unfortunately not. The actual URL Filter fix (the issue I linked to above) was relatively large, and since D6 does not have any unit tests, I'm rather opposed to trying to backport those changes. In D6, the core filters could be forked and fixed in a contributed module, so it's also not a critical requirement to fix it in core.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.