Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
When you do a search containing '&', it is encoded as '%26' in the URL. mod_rewrite decodes this and puts anything after '&' in the query string. This patch encodes the '%' to '%25', giving us '%2526', a double-encoded ampersand.
Makes searches and other paths containing '&' work.
Comment | File | Size | Author |
---|---|---|---|
#25 | url_special_char.patch | 2.45 KB | NaX |
#14 | common.inc.2.patch | 421 bytes | Creazion |
#12 | common.inc_27.patch | 439 bytes | Creazion |
#6 | common.inc.diff_0.txt | 1.28 KB | drumm |
#3 | mod_rewrite.sucks.patch | 1.16 KB | Steven |
Comments
Comment #1
Steven CreditAttribution: Steven commentedThe same problem exists for #. I was going to look into it, but I'd much rather prefer a fix in the rewrite rules rather than producing such ugly URLs. Unfortunately it might be hardcoded behaviour.
Shouldn't the comment say: "... to counter-act the extra decoding in mod_rewrite." ?
Comment #2
drummI haven't looked at #. This is meant to solve &.
I spent some time reading over mod_rewrite rules and am fairly certain that this is not fixable within the confines of .htaccess. Here is the relevant bug over on Apache's side: http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c8 (doesn't seem too likely to be fixed.)
Comment #3
Steven CreditAttribution: Steven commentedYou're right, it cannot be fixed in .htaccess. However, the patch on that Apache issue you linked to would not help. There is already a suitable RewriteMap to use to undo most of the damage (
escape
), but in order to use it, you need a global RewriteMap directive in httpd.conf (not even per VirtualHost or Directory). That rules it out for us:I updated that Apache Bugzilla report all the same. It's another shining example of why blindly applying text transformation functions is the best way to screw yourself over.
So, I did a test and aside from & and #, no other characters need to be escaped (*). Patch attached. I'm not sure why you had that odd (bool) cast and TRUE check. Isn't that what if ($var) implicitly does? I also merged the second str_replace into the first.
(*) For some values of 'no'.
Comment #4
drummLooks okay to me. I was copying the clean URL test from earlier in the code, which I thought was a bit weird, but decided to be consistent with it. This is good too.
Comment #5
Dries CreditAttribution: Dries commentedCould we clarify 'problems' in the PHPdoc. It would be nice to be a little bit more specific (not verbose).
Comment #6
drummHow are these comments?
Comment #7
drummTested on a 4.7 install and works well.
Comment #8
Dries CreditAttribution: Dries commenteddrumm: that's more clear, thanks.
I think "mod_rewrite's unescapes" should be "mod_rewrite unescapes" though (no "'s").
Feel free to commit.
Comment #9
Dries CreditAttribution: Dries commented(I wonder how this affect IIS or Lighttpd, but I guess we'll figure that out ...)
Comment #10
drummCommitted to HEAD.
Comment #11
(not verified) CreditAttribution: commentedComment #12
Creazion CreditAttribution: Creazion commentedHi,
why so much code, the patch at the attachment does the same with less lines?
Comment #13
Dries CreditAttribution: Dries commentedCreazion: your patch looks incomplete. You missed the '#'.
Comment #14
Creazion CreditAttribution: Creazion commentedHi Dries,
sorry i made a wrong patch with diff on windows. The attached patch includes the '#'.
Comment #15
drumm- This misses ampersand encoding, which has the problem as #.
- The extra encoding should only happen when clean urls are on since this is a workaround for a bug in mod_rewrite.
- This needs to be a patch against HEAD.
Comment #16
Steven CreditAttribution: Steven commentedCreazion: your patch does not do the same as what was committed. The goal is to double encode ampersands and hashes. Yours does not encode them at all.
Comment #17
killes@www.drop.org CreditAttribution: killes@www.drop.org commenteddrumm's patch was also committed to 4.7
Comment #18
(not verified) CreditAttribution: commentedComment #19
onionweb CreditAttribution: onionweb commentedShouldn't this have looked more like:
function drupal_urlencode($text) {
if (variable_get('clean_url', '0')) {
return str_replace(array('%2F', '%26', '%23'),
array('/', '%2526', '%2523'),
urlencode($text));
}
else {
$text = str_replace('%2F', '/', urlencode($text));
return str_replace('%23', '#', urlencode($text));
}
}
Otherwise all the #'s are '%23, such as in the "login or register to post comments" links when clean urls is enabled.
Comment #20
onionweb CreditAttribution: onionweb commentedhmm... well that bit I posted above doesn't actually fix the problem, but on drupal.org, when you login from a link below a post, you get page not found because of the %23.
Comment #21
onionweb CreditAttribution: onionweb commentedchanged this from task to bug since this committed patch created a bug in the login.
Comment #22
AjK CreditAttribution: AjK commentedRaising awareness (critical) for this is it's easily reproduced on d.o. and clearly broken.
Comment #23
NaX CreditAttribution: NaX commentedYou guys are forgetting about "Named Anchors" double-encoded # mean you cant have a Named Anchor as a menu item.
I think when the requested page is the search page then double encode # but every where else leave #
The problem with this solution is that any Named Anchor menu links breaks on the search page and when clicked on goes to page not found. Maybe a better solution would be to look at replacing # with double-encoded # somehow only when it comes from the search form id=search_form and form id=search or in the search_menu function ('path' => 'search/'. $name . $keys).
Comment #24
Steven CreditAttribution: Steven commentedIn 5.0, this code has been changed to:
Does this need to be backported? Is there a problem still? Are we sure this is not just a case of some places lacking a call to drupal_urlencode() ?
Comment #25
NaX CreditAttribution: NaX commentedThis patch makes it possible to both search for special characters (#,&) and allow named anchors in menu items (node/*#name). It is not very elegantly implemented and I suggest somebody that understands the workings of the core modules better than me look to implement it in a better way. But you can get the general idea of what I was trying to achieve.
This is the first patch I have created that involves multiple files, hope I did it correctly.
I hope you find it useful.
Comment #26
Steven CreditAttribution: Steven commentedThis patch is a completely wrong approach to solving this issue. Search.module should receive no special treatment whatsoever.
Again: is there still a bug that needs to be addressed in the latest 4.7 release?
Comment #27
pwolanin CreditAttribution: pwolanin commentedYes, this is an issue and can be seen now on drupal.org if you log out. The "login to comment" links look like:
http://drupal.org/user/login?destination=comment/reply/88728%2523comment...
Where the '#' has been encoded
Comment #28
NaX CreditAttribution: NaX commentedInstead of trying to change the drupal_urlencode function maybe we should be focusing on the search module as it seams that is where the problem lies when it comes to special characters.
Here is another go.
But all these solutions means their is code in the drupal_urlencode function just for the search module maybe we should focus on the data the search module receives rather than all the current solutions.
Comment #29
Steven CreditAttribution: Steven commentedDid you bother to actually take a look at the URL in question?
http://drupal.org/user/login?destination=comment/reply/88728%2523comment...
This means: We are on the "user/login" page. After we finish logging in, we want to proceed to "comment/reply/88728#comment...". In other words, the #comment fragment identifier is part of the destination value, not part of the normal URL. If it was not escaped, it would be ignored by PHP.
By design.
Comment #30
valiant-1 CreditAttribution: valiant-1 commentedsee:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c12
That works for us in Gallery 2. We're using THE_REQUEST for a long time now, with success.
e.g.
Comment #31
asimmonds CreditAttribution: asimmonds commentedReverting title