If you put in your body this kind of code : <div>http://www.example.com</div>
, the URL is not converted into a hyperlink. Actually, there is more than this : in the following example, lines 3 and 4 fail.
1. URL alone (followed by a stop) : http://www.example.com.
2. URL in a p tag : <p>http://www.example.com</p>
3. URL in a p tag containing HTML attributes : <p style="border:1px solid black">http://www.example.com</p>
4. URL in a div tag : <div>http://www.example.com</div>
5. URL without nothing following : http://www.example.com
(By reading my post preview, I see a bug on line 1, but this is caused by <code> tag. Another issue to raise ? Or maybe changing the filters order ?)
Also, I saw that the regular expression used to match incomplete URLs (like www.example.com) was a little bit different than the one used for full URLs : it doesn't work when following a <br> tag :
6. www adress following a <br> tag : <br />www.example.com
won't convert the URL.
Finally, seeing the regular expressions used, a URL, with no following text at all, should not match the patters. But it does, because, for a reason I ignore, a space is added to the end of the text the filter is applied on.
I want to stress that there must be no confusion : what is on grey background in my post is exactly what is to be pasted in a node's textarea, to encounter the bugs I am explaining (try yourself ;) ). The fact that only one URL is converted here is just side-effect.
So I tried to simplify and fix the regular expressions used to match full URLs, e-mail addresses and "www domains/addresses". Here is a list of things I did :
- replacing the regular-expression surrounding double quotes for simple quotes (double quotes are useless and may be source of confusion : \n would be converted to a line-break with double quotes)
- using a positive look-behind to match what must precede the URL, this avoids parasite contain inside $match variable
- changing the index used to get $match content accordingly
- use \s , which is an "any space" entity, instead of " \n\r\t" inside of the character class (square brackets)
- remove the backslash escaping an open bracket, which is useless inside a character class
- remove the part matching exactly <p>, <li>, or any kind of <br>, and, instead, matching a closing angled bracket (or greater-than sign, > which can be a sign of a previous tag)
- repeating the three points above at the end of the regular expression
- merging the part that checks if the URL is followed by a punctuation sign, and the part that "looks ahead" for a space character, or closing bracket, or opening angled bracket (potential sign of a following tag)
- remove the now useless concatenations to $match[1] and $match[3], now that there are only "look arround" expressions around the part matching the URL
- I did not modify the part in the regular expressions that is actually matching a URL/e-mail
I would like to complete SimpleTest code for this, but I didn't find any place where to download Drupal 7 core SimpleTest files (anyone can help ?).
Please find my patch attached.
Regards,
David
Comment | File | Size | Author |
---|---|---|---|
#9 | drupal.filter-url-div.9.patch | 985 bytes | sun |
#7 | url_filter_tests.patch | 3.9 KB | David Stosik |
url_filter_fixes.patch | 2.96 KB | David Stosik | |
Comments
Comment #1
Damien Tournoud CreditAttribution: Damien Tournoud commentedWell. Those are crazy regular expressions :)
We have a decent test suite for the URL filter. You should start by extending the tests for the cases you consider the URL filter doesn't do its job correctly. Then you can try to fix the errors.
Comment #2
Damien Tournoud CreditAttribution: Damien Tournoud commentedAs I expected, your patch breaks several existing tests for the URL filter:
Developing a complex regular expression without a decent set of input data is not really a good idea :)
Comment #3
David Stosik CreditAttribution: David Stosik commentedYou are right. :)
Could you please tell me where can I get the tests files ?
Thanks,
David
Comment #4
Damien Tournoud CreditAttribution: Damien Tournoud commentedThere are directly in the Drupal 7 code base (modules/filter/filter.test), especially
FilterUnitTestCase::testUrlFilter()
.Comment #5
David Stosik CreditAttribution: David Stosik commentedThanks. I should have seen that SimpleTest is in Drupal 7 core, instead of getting SimpleTest 2.x. :)
So, I've got some work. By the way, my patch fails at tests cases which are not very good, IMHO (just try to add a space between <code> or <script> and the URL, and then the test fails :). I mean, I could make these tests fail whith the current implementation, if I changed little bits in them (which would keep them valid).
Will try to post a better patch as soon as possible.
Damien, can you confirm me that what I first raise (URL in a div is not converted) can be considered as a bug ?
Regards,
David
Comment #6
Damien Tournoud CreditAttribution: Damien Tournoud commented(3) is most certainly a bug. Not completely sure about (4). We certainly have to exclude some tags from processing (at least
<pre>, <code>, <object>, <a>, ...
)... but we might be able to switch from a whitelist to a blacklist?Comment #7
David Stosik CreditAttribution: David Stosik commentedI don't have a solution at the moment, but I do know some use cases where URL filter fails, not handled by test cases.
I added them to the filter.test file, and provide a patch.
Here is a list :
These tests fail at the moment, and I think that should be considered as a bug. So, I will try to work out the regular expression (or maybe a little more complex system). Any help is welcome.
Regards,
David
Comment #8
sunPostponing on #161217: URL filter breaks generated href tags
This issue/patch is most probably obsolete afterwards. If some detail remains, we will care for that in this issue after #161217 landed.
Comment #9
sunThis is what remains.
Comment #10
sunSo it seems like #161217: URL filter breaks generated href tags resolved this issue. Let's just add the assertions then.
Comment #11
Dries CreditAttribution: Dries commentedCommitted the extra tests. Thanks.
Comment #12
David Stosik CreditAttribution: David Stosik commentedThanks !
May a backport be possible ?
Comment #13
sunUnfortunately not. The actual URL Filter fix (the issue I linked to above) was relatively large, and since D6 does not have any unit tests, I'm rather opposed to trying to backport those changes. In D6, the core filters could be forked and fixed in a contributed module, so it's also not a critical requirement to fix it in core.