Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Calling the l() function with a fully qualified url with an ampersand encodes the ampersand incorrectly.
Example: l("Fake site", "http://www.google.com/test1&test2")
Result: <a href="http://www.google.com/test1&test2">Fake site</a>
Should be: <a href="http://www.google.com/test1%26test2">Fake site</a>
Comment | File | Size | Author |
---|---|---|---|
#44 | url-l-2-D6.patch | 7.87 KB | rdrh555 |
#42 | url-l-D6.patch | 8.07 KB | rdrh555 |
#36 | 303987_fix34.patch | 7.08 KB | jhodgdon |
#33 | 303987redo.patch | 7.04 KB | jhodgdon |
#29 | 303987fixed.patch | 6.47 KB | jhodgdon |
Comments
Comment #1
Damien Tournoud CreditAttribution: Damien Tournoud commentedHum. That's a tricky one. I suggest we write tests first for this, and only then fix the stuff (test driven development, yeah :p). Assigning myself.
Comment #2
j.somers CreditAttribution: j.somers commentedThe problem lies in the decode_entities() function called by the filter_xss_bad_protocol() function. It will use the HTML translation table to do Unicode modifications.
I think quickest solution would be to replace the URL with the correct characters in the drupal_urlencode() function after all the internal checks are completed but then we are talking about much more than only the & sign. I am by far an expert but I don't think there is anything wrong with the current implementation per se, just that further checks are not done.
Comment #3
c960657 CreditAttribution: c960657 commentedI think the current behaviour is correct.
Try this:
l("Sad pandas", "http://www.google.com/search?hl=en&q=sad%20pandas")
Current behaviour: <a href="http://www.google.com/search?hl=en&q=sad%20pandas">Sad pandas</a>
Suggested behaviour: <a href="http://www.google.com/search?hl=en%26q=sad%20pandas">Sad pandas</a>
If you encode & as %26, the link will not direct you to a query for "sad pandas". Or did I misunderstand you?
Comment #4
Damien Tournoud CreditAttribution: Damien Tournoud commented&
should be encoded as%26
if part of the path. It should only stay & if part of the query. Type "sad & panda" in google, you will see.Comment #5
c960657 CreditAttribution: c960657 commentedEncoding issues are really tricky to discuss, because you never know which "level" people are referring to. Perhaps it would ease the understanding if someone would give an example of a real URL (not the dummy Google URL that returns 404 Not found) that is not being handled correctly by l().
Comment #6
alexanderpas CreditAttribution: alexanderpas commentedbehind this link, there are some examples: http://www.google.nl/search?q=inurl%3A%26
Comment #7
c960657 CreditAttribution: c960657 commentedl('foo', 'http://www.refdag.nl/rubrieken/onderwijs+&+opvoeding.html');
Result:
<a href="http://www.refdag.nl/rubrieken/onderwijs+&+opvoeding.html">foo</a>
If I click on that link, I get the same page as if I enter the URL directly in the browser, i.e. there doesn't seem to be a problem?
You could percent-encode certain characters so that the result becomes
<a href="http://www.refdag.nl/rubrieken/onderwijs+%26+opvoeding.html">foo</a>
or even
<a href="http://www.refdag.nl/ru%62rieken/%6Fnderwijs+&+%6Fpvo%65ding.html">foo</a>
(the latter example has random characters percent-encode)
but that is unnecessary. It is legal (I think), but I imagine that some web servers or scripts may not support it.
AFAICT a URL containing an unescaped
&
in the path component (i.e. before?
) is valid according to RFC 3986 (the path component is described by the path-abempty production that is made up of slash-separated segment productions. These are made up pchar productions that are made up of sub-delim productions that contain&
), so there is no need to do percent-escape it.Comment #8
therzog CreditAttribution: therzog commentedI just encountered the same bug in drupal 5, although the culprit is the same: the call to check_plain from filter_xss_bad_protocol from url from l, which results in amperands being encoded, even though they are in the query string. Here are real-world examples:
l("link title", "http://www.unep.org/Documents.Multilingual/Default.asp?DocumentID=43&Art...");
But this has the same problem, for both drupal and absolute urls:
l("link title", "node/1", array('query' => 'DocumentID=43&ArticleID=5252'));
Comment #9
c960657 CreditAttribution: c960657 commentedtherzog, what is the expected and actual HTML generated by the first example in #8?
Comment #10
therzog CreditAttribution: therzog commentedExpected: the url verbatim:
http://www.unep.org/Default.asp?DocumentID=43&ArticleID=5252
Got:
http://www.unep.org/Default.asp?DocumentID=43*&*amp;ArticleID=5252
except w/o the asterisks... You may only see it via "View Source," as my browser's status bar decodes the special character.
I recall having had problems with some browsers dealing with encoded special characters in the query string (i.e., clicking the hyperlink wouldn't go to the correct url), but I'm stumped right now to recreate it. So maybe this is not a problem.
Comment #11
c960657 CreditAttribution: c960657 commentedThis is expected behaviour. The following is not wellformed HTML:
<a href="http://www.unep.org/Default.asp?DocumentID=43&ArticleID=5252">foo</a>
Comment #12
alexanderpas CreditAttribution: alexanderpas commentedHTMLpedia:
but how to encode it when it's part of the actual url, and not part of the query string of the url? that's the actual question here!
also putting this to minor, as the things are working, but we're talking here about trivial things
Comment #13
c960657 CreditAttribution: c960657 commentedIt should be encoded as
&
like it is now. Or would you suggest otherwise?I don't think that this is a bug. If somebody think otherwise, please provide a URL that works when entered directly into the browser but doesn't work when clicking on a link to that URL generated by l() (it is probably a code idea to put all URLs and code examples into <code> blocks to prevent accidental decoding by the comment system).
Comment #14
maartenvg CreditAttribution: maartenvg commentedI believe it's up to the maker of the link to encode the URL properly. Real world examples:
Senario 1
Searching for
fish&chips
in google, with a Dutch interface, the URL should behttp://www.google.nl/search?q=fish%26chips
, whilehttp://www.google.nl/search?q=fish&chips
will incorrectly search forfish
.Senario 2
Searching for
fish
in google, with a Dutch interface, the URL should behttp://www.google.nl/search?q=fish&hl=nl
, whilehttp://www.google.nl/search?q=fish%26hl=nl
will incorrectly search forfish&hl=nl
.Combining both senarios in an l(), you'll have to do something like
l('Search Dutch Google for fish&chips', 'http://www.google.nl?q=' . htmlencode('fish&chips') . '&hl=nl');
which will correctly return<a href="http://www.google.nl?q=fish%26chips&hl=nl">Search English Google for fish&chips</a>
There is no way url()/l() could make that distinction on its own. So no, this is not a bug, but by design.
Comment #15
alexanderpas CreditAttribution: alexanderpas commentedor:
l('Search Dutch Google for fish&chips', 'http://www.google.nl/', array('query' => array('q' => 'fish&chips', 'hl' => 'nl')));
if i'm correctly.
Comment #16
MGParisi CreditAttribution: MGParisi commentedMoved to API Documentation
Comment #17
CitizenKane CreditAttribution: CitizenKane commentedAttached a patch which notes this in the inline documentation for l.
Comment #18
alexanderpas CreditAttribution: alexanderpas commentedhow about this:
NOTE: If the URL contains an ampersand (e.g. http://example.com/test1&test2) you must encode it yourself because l can not distinguish whether the ampersand is part of the URL, part of another html-entity or if it separates GET parameters.
Comment #19
c960657 CreditAttribution: c960657 commentedPersonally I don't think there is need for a comment. A URL is a well-defined concept. I know there has been some confusion expressed in this issue, but I doubt a short comment will reduce this confusion much. On the contrary it might increase the confusion if a comment mentions that the URL should be encoded in a specific way.
I don't understand this. How am I supposed to encode the mentioned URL?
http://example.com/test1&test2 and http://example.com/test1%26test2 are two different URLs. I think it is wrong to imply that the latter is an encoded version of the former.
Even if an ampersand is used to separate GET parameters, I'd say it is part of the URL.
It is definitely not part of an HTML entity. l() does not accept HTML encoded URLs.
Comment #21
alexanderpas CreditAttribution: alexanderpas commented@19 try clicking those example.com links :D
in both cases Apache will return:
so... they're the same URL!
Comment #22
alienzed CreditAttribution: alienzed commentedIs there any way to solve this issue? Here's my scenario:
I have a custom pager on one of my Views 2 views in drupal 6. Basically, I want the user to be able to surf to whatever page they want and then be able to switch languages and not lose what page they are on.
Here's my code:
//get language switcher link - taken from http://drupal.org/node/242646
// this is copy&paste from locale_block in locale.module
$languages = language_list('enabled');
$links = array();
foreach ($languages[1] as $language) {
if ($language->language != $current) {
if (isset($_GET['page'])) { //I am testing to see if the user is already viewing a subsequent page
$href = $_GET['q'] ."&page=". $_GET['page'];
$links[$language->language] = array(
'href' => $href,
'title' => $language->native,
'language' => $language,
'attributes' => array('class' => 'language-link'),
);
}
else {
$links[$language->language] = array(
'href' => $_GET['q'],
'title' => $language->native,
'language' => $language,
'attributes' => array('class' => 'language-link'),
);
}
}
}
But once the page is loaded, the link to seeing the site in another language becomes scrambled. The ampersand turns into %26, which obviously does not load the page correctly. Any advice?
*** I now realize that since the language will be changing, the paging won't be effective anymore and so the point is mute, but l() should be able to handle ampersands... I mean, why is the input converted at all? a URL is a URL.
Comment #23
c960657 CreditAttribution: c960657 commented@21
The fact that Apache considers the two URLs identical is not a proof that they are in fact identical. RFC 2616, section 3.2.3 says:
In RFC 2396, section 2.2, "&" is listed as part of the
reserved
production. RFC 3986, section 2.3 seems to agree.Anyway, whether they are identical or not, they are both valid URLs and thus don't need to be encoded.
Comment #24
eclauset CreditAttribution: eclauset commentedI'm attempting to autopopulate the title field of a child node. In the Relativity Module, I've modified this line to autopopulate the title. The link generated appears visually correct in the browser, but when clicked the ampersand and brackets are %encoded which results in a page not found error.
$items[] = l(t('Create new !type', array('!type' => $type_name)), "node/add/$type/parent/$parent_nid&edit[title]=$parent->title", array('class' => 'relativity_create_'. $type));
Is there a way I can intercept this link and urldecode() it before it's rendered to the browser?
EDIT: I figured it out:
$items[] = l(t('Create new !type', array('!type' => $type_name)), "node/add/$type/parent/$parent_nid", array('class' => 'relativity_create_'. $type, 'query' => 'edit[title]='.urlencode($parent->title)));
Comment #25
jhodgdonI took another look at this rather ancient issue report...
The url() function doesn't do any special encoding, except that if you pass in a query array, the keys and values are both passed through the PHP rawurlencode() function.
The l() function, after calling url() to generate the full URL, is making an A href="" HTML element. So it does need to ensure that entities like & are encoded properly. It does this by calling check_plain() on the URL returned by the url() function.
Given that, I think what needs to be done here is that l() should mention that if you are using URLs with queries in them, you can either use the $query input to let Drupal encode things properly, or pass in a properly urlencoded URL. We can also mention that after constructing the URL from the path/query, it will pass the result through check_plain() to encode entities so that it will be well-formed HTML.
The function doc is also a bit of a mess -- grammar/punctuation issues and not following the doc standards. I also noticed a couple of errors, such as that it still says you can pass in a string for $options['query'], which is no longer true in Drupal 7.
Here's a patch.
Comment #26
c960657 CreditAttribution: c960657 commentedThe patch provides a very good and elaborate documentation of the l() function. However, since most of this applies to url() too, I think it would be better to move the documentation for url() and in the documentation for l() only mention things specific to that, and then just mention that you can pass any option allowed by url() to l() also ('query' and 'fragment' could be mentioned as examples), with a reference to the documentation for that (
@see url()
).Comment #27
jhodgdonThat's a possibility, and probably a good idea. The check_plain() stuff only applies to l() and not url().
Comment #28
jhodgdonOK, here's a patch that reduces duplication between l() and url(), and contains the above clarifications.
Comment #29
jhodgdonActually, I noticed a bit of additional cleanup needed on url(). Try this one.
Comment #30
jhodgdonI just marked this other issue as a duplicate of this one, because the above patch fixes its issue:
#745184: Documentation problem with l(): $options['query'] must be an array
Comment #31
jhodgdonI just remembered #720226: documentation for l() missing language key
This issue should wait until that one is committed (currently in RTBC), and then be rerolled in such a way as not to overwrite that fix.
Comment #32
aspilicious CreditAttribution: aspilicious commentedYou can start rerolling ;)
Comment #33
jhodgdonHere's a redo with #720226: documentation for l() missing language key (which was committed) taken into account.
Comment #34
c960657 CreditAttribution: c960657 commented'attributes' needs to be quoted. Likewise for 'language'.
'language' is also an argument for url(). I don't think there is a need to document it in l() also, even though it is used for determining whether the link points to the current URL. I think the addition of the "active" CSS class should be documented in the main description for l(), i.e. not in the parameter list.
Comment #35
jhodgdonLanguage is used both in url() and separately in l(), and not to mention with a diffrerent default, so I think it does need to be documented here. Could also be in the main description of l() though, that's a good idea.
Quotes - OK.
Comment #36
jhodgdonOK. I've taken out the part of 'language' that is done by url() and just left the part that is in l() only. Also mentioning the 'active' class in the main description. So, all points in
#24EDIT: #34! have I think been addressed... I think...Comment #37
jhodgdonComment #38
jhodgdonJust a note that if this patch is accepted, it should be ported to Drupal 6, incorporating the changes from that earlier issue as well (see above).
Comment #39
c960657 CreditAttribution: c960657 commentedLooks good now.
Comment #40
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks.
Comment #41
jhodgdonNeeds D6 port, incorporating this patch and some earlier ones (see above).
Comment #42
rdrh555 CreditAttribution: rdrh555 commentedMay need work.....
Comment #43
jhodgdonMostly right! A couple of small things to fix:
a) There's a small error of fact for D6 l():
It's check_url() in d6, not check_plain().
b) I think we want to do like we did for the D7 l() function http://api.drupal.org/api/function/l/7 and only list the parts of $options that are specific to l() rather than url(). Which would be attributes, language, html. Then put in the "others from url() like in D7.
c) In D6 we can still have 'query' option be a string. That was removed in D7.
Comment #44
rdrh555 CreditAttribution: rdrh555 commenteda)Sorry, knew that, forgot to change
b)l() now contains $options --attributes, language, html and 'others' only
c)went back to the original wording for D6 url() 'query' indicating that it can be a string
This should do it?
Comment #45
jhodgdonLooks good now!
Comment #46
Gábor HojtsyCommitted, thanks.
Comment #47
mdinc_org CreditAttribution: mdinc_org commented#36: 303987_fix34.patch queued for re-testing.
Comment #48
jhodgdonWhy did you decide to retest the patch in #36 -- it was already added to Drupal?