Calling the l() function with a fully qualified url with an ampersand encodes the ampersand incorrectly.
Example: l("Fake site", "http://www.google.com/test1&test2")
Result: <a href="http://www.google.com/test1&amp;test2">Fake site</a>
Should be: <a href="http://www.google.com/test1%26test2">Fake site</a>

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Version: 6.4 » 7.x-dev
Assigned: Unassigned » Damien Tournoud

Hum. That's a tricky one. I suggest we write tests first for this, and only then fix the stuff (test driven development, yeah :p). Assigning myself.

j.somers’s picture

The problem lies in the decode_entities() function called by the filter_xss_bad_protocol() function. It will use the HTML translation table to do Unicode modifications.

I think quickest solution would be to replace the URL with the correct characters in the drupal_urlencode() function after all the internal checks are completed but then we are talking about much more than only the & sign. I am by far an expert but I don't think there is anything wrong with the current implementation per se, just that further checks are not done.

c960657’s picture

I think the current behaviour is correct.

Try this:
l("Sad pandas", "http://www.google.com/search?hl=en&q=sad%20pandas")

Current behaviour: <a href="http://www.google.com/search?hl=en&amp;q=sad%20pandas">Sad pandas</a>
Suggested behaviour: <a href="http://www.google.com/search?hl=en%26q=sad%20pandas">Sad pandas</a>

If you encode & as %26, the link will not direct you to a query for "sad pandas". Or did I misunderstand you?

Damien Tournoud’s picture

& should be encoded as %26 if part of the path. It should only stay &amp if part of the query. Type "sad & panda" in google, you will see.

c960657’s picture

Encoding issues are really tricky to discuss, because you never know which "level" people are referring to. Perhaps it would ease the understanding if someone would give an example of a real URL (not the dummy Google URL that returns 404 Not found) that is not being handled correctly by l().

alexanderpas’s picture

Perhaps it would ease the understanding if someone would give an example of a real URL [...] that is not being handled correctly by l().

behind this link, there are some examples: http://www.google.nl/search?q=inurl%3A%26

c960657’s picture

l('foo', 'http://www.refdag.nl/rubrieken/onderwijs+&+opvoeding.html');

Result:
<a href="http://www.refdag.nl/rubrieken/onderwijs+&amp;+opvoeding.html">foo</a>

If I click on that link, I get the same page as if I enter the URL directly in the browser, i.e. there doesn't seem to be a problem?

You could percent-encode certain characters so that the result becomes
<a href="http://www.refdag.nl/rubrieken/onderwijs+%26+opvoeding.html">foo</a>
or even
<a href="http://www.refdag.nl/ru%62rieken/%6Fnderwijs+&amp;+%6Fpvo%65ding.html">foo</a>
(the latter example has random characters percent-encode)
but that is unnecessary. It is legal (I think), but I imagine that some web servers or scripts may not support it.

AFAICT a URL containing an unescaped & in the path component (i.e. before ?) is valid according to RFC 3986 (the path component is described by the path-abempty production that is made up of slash-separated segment productions. These are made up pchar productions that are made up of sub-delim productions that contain &), so there is no need to do percent-escape it.

therzog’s picture

I just encountered the same bug in drupal 5, although the culprit is the same: the call to check_plain from filter_xss_bad_protocol from url from l, which results in amperands being encoded, even though they are in the query string. Here are real-world examples:

l("link title", "http://www.unep.org/Documents.Multilingual/Default.asp?DocumentID=43&Art...");

But this has the same problem, for both drupal and absolute urls:

l("link title", "node/1", array('query' => 'DocumentID=43&ArticleID=5252'));

c960657’s picture

therzog, what is the expected and actual HTML generated by the first example in #8?

therzog’s picture

Expected: the url verbatim:

http://www.unep.org/Default.asp?DocumentID=43&ArticleID=5252

Got:

http://www.unep.org/Default.asp?DocumentID=43*&*amp;ArticleID=5252

except w/o the asterisks... You may only see it via "View Source," as my browser's status bar decodes the special character.

I recall having had problems with some browsers dealing with encoded special characters in the query string (i.e., clicking the hyperlink wouldn't go to the correct url), but I'm stumped right now to recreate it. So maybe this is not a problem.

c960657’s picture

This is expected behaviour. The following is not wellformed HTML:
<a href="http://www.unep.org/Default.asp?DocumentID=43&ArticleID=5252">foo</a>

alexanderpas’s picture

Priority: Normal » Minor

HTMLpedia:

The ampersand ("&") is a special character in HTML. It marks the beginning of a entity, like "&nbsp" for a non-breaking space.

In XHTML, a entity must also end with a semicolon (";"). For example " "

Because this is so, any time a literal ampersand appears in a document, it needs to be written as a character entity, "&". Ampersands commonly appear in the query string of a URL, and need to be expressed as an entity there.

but how to encode it when it's part of the actual url, and not part of the query string of the url? that's the actual question here!

also putting this to minor, as the things are working, but we're talking here about trivial things

c960657’s picture

but how to encode it when it's part of the actual url, and not part of the query string of the url? that's the actual question here!

It should be encoded as &amp; like it is now. Or would you suggest otherwise?

I don't think that this is a bug. If somebody think otherwise, please provide a URL that works when entered directly into the browser but doesn't work when clicking on a link to that URL generated by l() (it is probably a code idea to put all URLs and code examples into <code> blocks to prevent accidental decoding by the comment system).

maartenvg’s picture

Status: Active » Closed (works as designed)

I believe it's up to the maker of the link to encode the URL properly. Real world examples:

Senario 1
Searching for fish&chips in google, with a Dutch interface, the URL should be
http://www.google.nl/search?q=fish%26chips, while http://www.google.nl/search?q=fish&amp;chips will incorrectly search for fish.

Senario 2
Searching for fish in google, with a Dutch interface, the URL should be
http://www.google.nl/search?q=fish&amp;hl=nl, while http://www.google.nl/search?q=fish%26hl=nl will incorrectly search for fish&hl=nl.

Combining both senarios in an l(), you'll have to do something like
l('Search Dutch Google for fish&chips', 'http://www.google.nl?q=' . htmlencode('fish&chips') . '&hl=nl'); which will correctly return <a href="http://www.google.nl?q=fish%26chips&amp;hl=nl">Search English Google for fish&chips</a>

There is no way url()/l() could make that distinction on its own. So no, this is not a bug, but by design.

alexanderpas’s picture

Project: Drupal core » Documentation
Version: 7.x-dev »
Component: base system » Correction/Clarification
Priority: Minor » Normal
Status: Closed (works as designed) » Active

or:
l('Search Dutch Google for fish&chips', 'http://www.google.nl/', array('query' => array('q' => 'fish&chips', 'hl' => 'nl')));
if i'm correctly.

MGParisi’s picture

Project: Documentation » Drupal core
Version: » 7.x-dev
Component: Correction/Clarification » documentation
Assigned: Damien Tournoud » Unassigned
Category: bug » feature

Moved to API Documentation

CitizenKane’s picture

Status: Active » Needs review
Issue tags: +tcdocsprint09
FileSize
988 bytes

Attached a patch which notes this in the inline documentation for l.

alexanderpas’s picture

how about this:
NOTE: If the URL contains an ampersand (e.g. http://example.com/test1&test2) you must encode it yourself because l can not distinguish whether the ampersand is part of the URL, part of another html-entity or if it separates GET parameters.

c960657’s picture

Personally I don't think there is need for a comment. A URL is a well-defined concept. I know there has been some confusion expressed in this issue, but I doubt a short comment will reduce this confusion much. On the contrary it might increase the confusion if a comment mentions that the URL should be encoded in a specific way.

+ *       NOTE: If the URL contains an ampersand (e.g. http://example.com/test1&test2) 
+ *       you must encode it yourself because l can not distinguish between the ampersand
+ *       being part of the URL or if it separates GET parameters.

I don't understand this. How am I supposed to encode the mentioned URL?
http://example.com/test1&test2 and http://example.com/test1%26test2 are two different URLs. I think it is wrong to imply that the latter is an encoded version of the former.

NOTE: If the URL contains an ampersand (e.g. http://example.com/test1&test2) you must encode it yourself because l can not distinguish whether the ampersand is part of the URL, part of another html-entity or if it separates GET parameters.

Even if an ampersand is used to separate GET parameters, I'd say it is part of the URL.
It is definitely not part of an HTML entity. l() does not accept HTML encoded URLs.

Status: Needs review » Needs work

The last submitted patch failed testing.

alexanderpas’s picture

@19 try clicking those example.com links :D

in both cases Apache will return:

The requested URL /test1&test2 was not found on this server.

so... they're the same URL!

alienzed’s picture

Is there any way to solve this issue? Here's my scenario:

I have a custom pager on one of my Views 2 views in drupal 6. Basically, I want the user to be able to surf to whatever page they want and then be able to switch languages and not lose what page they are on.
Here's my code:
//get language switcher link - taken from http://drupal.org/node/242646
// this is copy&paste from locale_block in locale.module
$languages = language_list('enabled');
$links = array();
foreach ($languages[1] as $language) {
if ($language->language != $current) {
if (isset($_GET['page'])) { //I am testing to see if the user is already viewing a subsequent page
$href = $_GET['q'] ."&page=". $_GET['page'];
$links[$language->language] = array(
'href' => $href,
'title' => $language->native,
'language' => $language,
'attributes' => array('class' => 'language-link'),
);
}
else {
$links[$language->language] = array(
'href' => $_GET['q'],
'title' => $language->native,
'language' => $language,
'attributes' => array('class' => 'language-link'),
);
}
}
}

But once the page is loaded, the link to seeing the site in another language becomes scrambled. The ampersand turns into %26, which obviously does not load the page correctly. Any advice?

*** I now realize that since the language will be changing, the paging won't be effective anymore and so the point is mute, but l() should be able to handle ampersands... I mean, why is the input converted at all? a URL is a URL.

c960657’s picture

@21
The fact that Apache considers the two URLs identical is not a proof that they are in fact identical. RFC 2616, section 3.2.3 says:

Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396) are equivalent to their ""%" HEX HEX" encoding.

In RFC 2396, section 2.2, "&" is listed as part of the reserved production. RFC 3986, section 2.3 seems to agree.

Anyway, whether they are identical or not, they are both valid URLs and thus don't need to be encoded.

eclauset’s picture

I'm attempting to autopopulate the title field of a child node. In the Relativity Module, I've modified this line to autopopulate the title. The link generated appears visually correct in the browser, but when clicked the ampersand and brackets are %encoded which results in a page not found error.

$items[] = l(t('Create new !type', array('!type' => $type_name)), "node/add/$type/parent/$parent_nid&edit[title]=$parent->title", array('class' => 'relativity_create_'. $type));

Is there a way I can intercept this link and urldecode() it before it's rendered to the browser?

EDIT: I figured it out:

$items[] = l(t('Create new !type', array('!type' => $type_name)), "node/add/$type/parent/$parent_nid", array('class' => 'relativity_create_'. $type, 'query' => 'edit[title]='.urlencode($parent->title)));

jhodgdon’s picture

Title: l() with full url and ampersand encodes incorrectly » l() documentation needs to say something about url encoding
Category: feature » bug
Status: Needs work » Needs review
FileSize
4.49 KB

I took another look at this rather ancient issue report...

The url() function doesn't do any special encoding, except that if you pass in a query array, the keys and values are both passed through the PHP rawurlencode() function.

The l() function, after calling url() to generate the full URL, is making an A href="" HTML element. So it does need to ensure that entities like & are encoded properly. It does this by calling check_plain() on the URL returned by the url() function.

Given that, I think what needs to be done here is that l() should mention that if you are using URLs with queries in them, you can either use the $query input to let Drupal encode things properly, or pass in a properly urlencoded URL. We can also mention that after constructing the URL from the path/query, it will pass the result through check_plain() to encode entities so that it will be well-formed HTML.

The function doc is also a bit of a mess -- grammar/punctuation issues and not following the doc standards. I also noticed a couple of errors, such as that it still says you can pass in a string for $options['query'], which is no longer true in Drupal 7.

Here's a patch.

c960657’s picture

The patch provides a very good and elaborate documentation of the l() function. However, since most of this applies to url() too, I think it would be better to move the documentation for url() and in the documentation for l() only mention things specific to that, and then just mention that you can pass any option allowed by url() to l() also ('query' and 'fragment' could be mentioned as examples), with a reference to the documentation for that (@see url()).

jhodgdon’s picture

That's a possibility, and probably a good idea. The check_plain() stuff only applies to l() and not url().

jhodgdon’s picture

FileSize
6.04 KB

OK, here's a patch that reduces duplication between l() and url(), and contains the above clarifications.

jhodgdon’s picture

FileSize
6.47 KB

Actually, I noticed a bit of additional cleanup needed on url(). Try this one.

jhodgdon’s picture

I just marked this other issue as a duplicate of this one, because the above patch fixes its issue:
#745184: Documentation problem with l(): $options['query'] must be an array

jhodgdon’s picture

I just remembered #720226: documentation for l() missing language key

This issue should wait until that one is committed (currently in RTBC), and then be rerolled in such a way as not to overwrite that fix.

aspilicious’s picture

You can start rerolling ;)

jhodgdon’s picture

FileSize
7.04 KB

Here's a redo with #720226: documentation for l() missing language key (which was committed) taken into account.

c960657’s picture

Status: Needs review » Needs work
+ *   - attributes: An associative array of HTML attributes to apply to the
+ *     anchor tag.

'attributes' needs to be quoted. Likewise for 'language'.

+ *   - language: An optional language object. If the path being linked to is

'language' is also an argument for url(). I don't think there is a need to document it in l() also, even though it is used for determining whether the link points to the current URL. I think the addition of the "active" CSS class should be documented in the main description for l(), i.e. not in the parameter list.

jhodgdon’s picture

Language is used both in url() and separately in l(), and not to mention with a diffrerent default, so I think it does need to be documented here. Could also be in the main description of l() though, that's a good idea.

Quotes - OK.

jhodgdon’s picture

FileSize
7.08 KB

OK. I've taken out the part of 'language' that is done by url() and just left the part that is in l() only. Also mentioning the 'active' class in the main description. So, all points in #24 EDIT: #34! have I think been addressed... I think...

jhodgdon’s picture

Status: Needs work » Needs review
jhodgdon’s picture

Just a note that if this patch is accepted, it should be ported to Drupal 6, incorporating the changes from that earlier issue as well (see above).

c960657’s picture

Status: Needs review » Reviewed & tested by the community

Looks good now.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

jhodgdon’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)

Needs D6 port, incorporating this patch and some earlier ones (see above).

rdrh555’s picture

Assigned: Unassigned » rdrh555
Status: Patch (to be ported) » Needs review
FileSize
8.07 KB

May need work.....

jhodgdon’s picture

Status: Needs review » Needs work

Mostly right! A couple of small things to fix:

a) There's a small error of fact for D6 l():

+ *   "http://example.com/foo". After the url() function is called to construct
+ *   the URL from $path and $options, the resulting URL is passed through
+ *   check_plain() before it is inserted into the HTML anchor tag, to ensure

It's check_url() in d6, not check_plain().

b) I think we want to do like we did for the D7 l() function http://api.drupal.org/api/function/l/7 and only list the parts of $options that are specific to l() rather than url(). Which would be attributes, language, html. Then put in the "others from url() like in D7.

c) In D6 we can still have 'query' option be a string. That was removed in D7.

rdrh555’s picture

Status: Needs work » Needs review
FileSize
7.87 KB

a)Sorry, knew that, forgot to change
b)l() now contains $options --attributes, language, html and 'others' only
c)went back to the original wording for D6 url() 'query' indicating that it can be a string
This should do it?

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Looks good now!

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Committed, thanks.

mdinc_org’s picture

Status: Fixed » Needs review

#36: 303987_fix34.patch queued for re-testing.

jhodgdon’s picture

Status: Needs review » Fixed

Why did you decide to retest the patch in #36 -- it was already added to Drupal?

Status: Fixed » Closed (fixed)
Issue tags: -tcdocsprint09

Automatically closed -- issue fixed for 2 weeks with no activity.