The query string should be encoded using rawurlencode() and not drupal_urlencode(). The latter causes double-encoding of the query string contents, which is not needed: the double-encoding is only needed for clean URLs due to issues with apache's mod_rewrite, and these issues don't affect the query string part of the page request.
Related issues:
http://drupal.org/node/178615 - includes the double-encoding problem, proposes a fix by making drupal_urlencode() more complex.
http://drupal.org/node/74070 - an old "fix" that I think was wrong, or is now wrong, and needs undoing.
http://drupal.org/node/301192 - workaround for the same bug, as found in the Printer module.
Note: the only place that drupal_urlencode() is called (correctly) from core is within the url() function. It might make sense to remove the drupal_urlencode() function and move its contents into url(). This would remove the temptation to call it from inappropriate places. Contributed modules should use url() to construct encoded URLs.
Comment | File | Size | Author |
---|---|---|---|
#45 | 310139-rawurlencode-45-D5.patch | 3.21 KB | pwolanin |
#38 | drupal_urlencode-310139-38-D6.patch | 5.5 KB | pwolanin |
#34 | drupal_urlencode-310139-34.patch | 10.28 KB | pwolanin |
#26 | drupal_urlencode-11.patch | 15.27 KB | c960657 |
#24 | drupal_urlencode-10.patch | 14.4 KB | c960657 |
Comments
Comment #1
fonant CreditAttribution: fonant commentedHere's the very simple patch needed to fix the double-encoding problem.
Comment #2
fonant CreditAttribution: fonant commentedHere's an improved patch, including a comment within the drupal_urlencode() description to warn that the function shouldn't be used for query strings.
(For some reason diff thinks the change is in drupal_json(), but it isn't).
Comment #3
Damien Tournoud CreditAttribution: Damien Tournoud commentedOk, please remove drupal_urlencode().
Also this will need to go to the current development branch (Drupal 7.x), and we will need to write a complete test case for this.
Comment #4
fonant CreditAttribution: fonant commentedHere's a patch for HEAD (1.797), not much different apart from the additional space in string concatenation on the second changed line. Hopefully I'm doing this correctly - I'm new to getting involved with Drupal core (I've been coding sites with Drupal 5 and 6 for a year or so now).
This just corrects query_string_encode(), leaving drupal_urlencode() as a callable function. I suspect there's a whole separate discussion needed as to whether drupal_urlencode() is redundant (and dangerous?) - unless by "please remove" you mean get rid of the function too?
Now I just need to investigate writing test cases! I found the problem with data in a query string and the standard Drupal sortable tables (which use drupal_query_string_encode() to generate the sort links in the headers).
Comment #5
fonant CreditAttribution: fonant commentedHere's another patch, with fixes for some other places where drupal_urlencode() is called on query strings.
I've also improved the function documentation for url() to make it clear when the query optional argument should be urlencoded or not. [I've a feeling the argument should never be encoded, but that would seem to be a fairly big task to change everywhere that calls url() with a pre-encoded and non-array query option.]
Not sure about the two remaining places where this sort of thing happens:
1) misc/drupal.js has a copy of drupal_urlencode(). We might also need to investigate usage of encodeURIComponent() in javascript code, to check it isn't being used on query strings.
2) modules/search/search.test uses drupal_urlencode(), but I don't know if drupalGet() also does encoding...
Comment #6
c960657 CreditAttribution: c960657 commentedNow that Drupal 7 requires PHP5, I think that drupal_query_string_encode() can be simplified a lot by using http_build_query().
Comment #7
c960657 CreditAttribution: c960657 commentedThe patch looks good.
You are right about both points in #5.
1) The problem with drupal.js can be seen on the page admin/settings/date-time: Select “Custom format” and add "//" to the text field. Note how the example date below the field looks wrong.
2) It's true that drupalGet() encodes the path (via url()), so the drupal_encode() call can simply be removed. Your can verify the problem by modifying drupal_web_test_case::drupalCreateNode() to include some special characters in the node title.
I fixed these two things.
Comment #9
c960657 CreditAttribution: c960657 commentedReroll.
Comment #11
c960657 CreditAttribution: c960657 commentedReroll.
Comment #12
Gábor HojtsyJust marked the issue submitted at #298498: Drupal assumes mod_rewrite bugs which are not there a duplicate of this one. That involves the JS Drupal.encodeURIComponent() usage of which breaks URLs in l10n_client. Unfortunately when implementing the encoding code for l10n_client, I've followed the usage of Drupal.encodeURIComponent() from system.js, which turns out is bugos. This bugfix should also be backported to Drupal 6!
I'd also ask for a little doc update to Drupal.encodeURIComponent() to mention that it should not be used on URL components other then the path.
Comment #13
Pedro Lozano CreditAttribution: Pedro Lozano commentedSubscribing, I found this today in a site, especial characters in query values encoded twice.
Comment #14
c960657 CreditAttribution: c960657 commentedI renamed Drupal.encodeURIComponent() to Drupal.urlencode() to emphasize that it is closer related to drupal_urlencode() than to JavaScript's built-in encodeURIComponent(), and added a comment about its usage.
See also #284899: Drupal url problem with clean urls that suggests another workaround for the mod_rewrite limitation that eliminates the need to use different functions for encoding the path and the query string.
Comment #16
c960657 CreditAttribution: c960657 commentedReroll.
Comment #17
fago>The latter causes double-encoding of the query string content.
Have a look at the related issue #385918: destination parameter gets two times urlencoded.
As I've noted there I think it gets double-encoded because it calls drupal_url_encode/url_encode twice on it.
Comment #18
c960657 CreditAttribution: c960657 commentedComment #19
cburschkaThe patch looks great. Now we just need a test case! :)
Comment #20
c960657 CreditAttribution: c960657 commentedI added a few tests. I added them to the class that already contains the tests for l(), as the functions seem related. Future tests for url() may be added to this class too.
Comment #22
c960657 CreditAttribution: c960657 commentedReroll.
Comment #24
c960657 CreditAttribution: c960657 commentedReroll.
Comment #25
Damien Tournoud CreditAttribution: Damien Tournoud commentedLet's rename drupal_urlencode() to drupal_encode_path(), and stop the confusion. The patch itself looks good.
Comment #26
c960657 CreditAttribution: c960657 commentedRenamed drupal_urlencode() to drupal_encode_path(), and the JS function Drupal.urlencode() to Drupal.encodePath().
This should be added to the upgrade docs before this bug is marked as fixed.
Comment #27
c960657 CreditAttribution: c960657 commentedComment #28
Damien Tournoud CreditAttribution: Damien Tournoud commentedNice! Let's get this in.
Comment #30
lilou CreditAttribution: lilou commentedHEAD is broken : http://testing.drupal.org/node/35
Comment #31
pwolanin CreditAttribution: pwolanin commentedI'm ready to roll a D6 backport as soon as this goes in.
Comment #33
pwolanin CreditAttribution: pwolanin commentedtestbot is right - patch does not apply. Re-rolling
Comment #34
pwolanin CreditAttribution: pwolanin commentedre-rolled - must have just been whitespace changes.
Comment #35
Damien Tournoud CreditAttribution: Damien Tournoud commentedMakes complete sense, the test bot it happy, let's get this in.
Comment #36
Dries CreditAttribution: Dries commentedThis makes sense indeed. Committed.
Comment #37
pwolanin CreditAttribution: pwolanin commentedgetting ready to backport minus the function name changes.
Comment #38
pwolanin CreditAttribution: pwolanin commentedComment #40
c960657 CreditAttribution: c960657 commentedLooks good.
Comment #41
Gábor HojtsyCommitted to Drupal 6, thanks!
Comment #42
pwolanin CreditAttribution: pwolanin commentedbackport for D5?
http://api.drupal.org/api/function/drupal_query_string_encode/5
Comment #43
Island Usurper CreditAttribution: Island Usurper commentedThis should have been documented before it was backported. http://drupal.org/node/224333 in particular needs to be updated because I had to track this issue down from the commit logs.
Comment #44
Island Usurper CreditAttribution: Island Usurper commentedDocumentation added at http://drupal.org/update/modules/6/7#remove-drupal-urlencode.
Comment #45
pwolanin CreditAttribution: pwolanin commentedpossible Drupal 5 patch.
Comment #46
c960657 CreditAttribution: c960657 commentedLooks good.
Comment #47
pwolanin CreditAttribution: pwolanin commented@c960657 - did you test it?
Comment #48
c960657 CreditAttribution: c960657 commentedYes, I did some rudimentary testing.
Comment #49
marcingy CreditAttribution: marcingy commentedMarking as won't fix as d5 is end of life.