When you do a search containing '&', it is encoded as '%26' in the URL. mod_rewrite decodes this and puts anything after '&' in the query string. This patch encodes the '%' to '%25', giving us '%2526', a double-encoded ampersand.

Makes searches and other paths containing '&' work.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Steven’s picture

The same problem exists for #. I was going to look into it, but I'd much rather prefer a fix in the rewrite rules rather than producing such ugly URLs. Unfortunately it might be hardcoded behaviour.

Shouldn't the comment say: "... to counter-act the extra decoding in mod_rewrite." ?

drumm’s picture

I haven't looked at #. This is meant to solve &.

I spent some time reading over mod_rewrite rules and am fairly certain that this is not fixable within the confines of .htaccess. Here is the relevant bug over on Apache's side: http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c8 (doesn't seem too likely to be fixed.)

Steven’s picture

FileSize
1.16 KB

You're right, it cannot be fixed in .htaccess. However, the patch on that Apache issue you linked to would not help. There is already a suitable RewriteMap to use to undo most of the damage (escape), but in order to use it, you need a global RewriteMap directive in httpd.conf (not even per VirtualHost or Directory). That rules it out for us:

# httpd.conf
RewriteMap escape int:escape

# .htaccess
RewriteRule ^(.*)$ index.php?q=${escape:$1} [L,QSA]

I updated that Apache Bugzilla report all the same. It's another shining example of why blindly applying text transformation functions is the best way to screw yourself over.

So, I did a test and aside from & and #, no other characters need to be escaped (*). Patch attached. I'm not sure why you had that odd (bool) cast and TRUE check. Isn't that what if ($var) implicitly does? I also merged the second str_replace into the first.

(*) For some values of 'no'.

drumm’s picture

Status: Needs review » Reviewed & tested by the community

Looks okay to me. I was copying the clean URL test from earlier in the code, which I thought was a bit weird, but decided to be consistent with it. This is good too.

Dries’s picture

+ * - To avoid problems with mod_rewrite's built-in unescaping, we double-escape
+ *   ampersands and hashes, when clean URLs are used.

Could we clarify 'problems' in the PHPdoc. It would be nice to be a little bit more specific (not verbose).

drumm’s picture

FileSize
1.28 KB

How are these comments?

drumm’s picture

Tested on a 4.7 install and works well.

Dries’s picture

drumm: that's more clear, thanks.

I think "mod_rewrite's unescapes" should be "mod_rewrite unescapes" though (no "'s").

Feel free to commit.

Dries’s picture

(I wonder how this affect IIS or Lighttpd, but I guess we'll figure that out ...)

drumm’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD.

Anonymous’s picture

Status: Fixed » Closed (fixed)
Creazion’s picture

Version: » 4.7.2
Status: Closed (fixed) » Needs review
FileSize
439 bytes

Hi,

why so much code, the patch at the attachment does the same with less lines?

Dries’s picture

Category: bug » task
Status: Needs review » Fixed

Creazion: your patch looks incomplete. You missed the '#'.

Creazion’s picture

Status: Fixed » Needs review
FileSize
421 bytes

Hi Dries,

sorry i made a wrong patch with diff on windows. The attached patch includes the '#'.

drumm’s picture

Status: Needs review » Needs work

- This misses ampersand encoding, which has the problem as #.
- The extra encoding should only happen when clean urls are on since this is a workaround for a bug in mod_rewrite.
- This needs to be a patch against HEAD.

Steven’s picture

Status: Needs work » Closed (won't fix)

Creazion: your patch does not do the same as what was committed. The goal is to double encode ampersands and hashes. Yours does not encode them at all.

killes@www.drop.org’s picture

Status: Closed (won't fix) » Fixed

drumm's patch was also committed to 4.7

Anonymous’s picture

Status: Fixed » Closed (fixed)
onionweb’s picture

Version: 4.7.2 » 4.7.3
Status: Closed (fixed) » Active

Shouldn't this have looked more like:

function drupal_urlencode($text) {
if (variable_get('clean_url', '0')) {
return str_replace(array('%2F', '%26', '%23'),
array('/', '%2526', '%2523'),
urlencode($text));
}
else {
$text = str_replace('%2F', '/', urlencode($text));
return str_replace('%23', '#', urlencode($text));

}
}

Otherwise all the #'s are '%23, such as in the "login or register to post comments" links when clean urls is enabled.

onionweb’s picture

hmm... well that bit I posted above doesn't actually fix the problem, but on drupal.org, when you login from a link below a post, you get page not found because of the %23.

onionweb’s picture

Category: task » bug

changed this from task to bug since this committed patch created a bug in the login.

AjK’s picture

Priority: Normal » Critical

Raising awareness (critical) for this is it's easily reproduced on d.o. and clearly broken.

NaX’s picture

You guys are forgetting about "Named Anchors" double-encoded # mean you cant have a Named Anchor as a menu item.

I think when the requested page is the search page then double encode # but every where else leave #

function drupal_urlencode($text) {
  $hash = (arg(0) == 'search' ? '%2523' : '#');
  if (variable_get('clean_url', '0')) {
    return str_replace(array('%2F', '%26', '%23'),
                       array('/', '%2526', $hash),
                       urlencode($text));
  }
  else {
    return str_replace('%2F', '/', urlencode($text));
  }
}

The problem with this solution is that any Named Anchor menu links breaks on the search page and when clicked on goes to page not found. Maybe a better solution would be to look at replacing # with double-encoded # somehow only when it comes from the search form id=search_form and form id=search or in the search_menu function ('path' => 'search/'. $name . $keys).

Steven’s picture

In 5.0, this code has been changed to:

function drupal_urlencode($text) {
  if (variable_get('clean_url', '0')) {
    return str_replace(array('%2F', '%26', '%23'),
                       array('/', '%2526', '%2523'),
                       urlencode($text));
  }
  else {
    return str_replace('%2F', '/', urlencode($text));
  }
}

Does this need to be backported? Is there a problem still? Are we sure this is not just a case of some places lacking a call to drupal_urlencode() ?

NaX’s picture

Status: Active » Needs work
FileSize
2.45 KB

This patch makes it possible to both search for special characters (#,&) and allow named anchors in menu items (node/*#name). It is not very elegantly implemented and I suggest somebody that understands the workings of the core modules better than me look to implement it in a better way. But you can get the general idea of what I was trying to achieve.

This is the first patch I have created that involves multiple files, hope I did it correctly.
I hope you find it useful.

Steven’s picture

Status: Needs work » Active

This patch is a completely wrong approach to solving this issue. Search.module should receive no special treatment whatsoever.

Again: is there still a bug that needs to be addressed in the latest 4.7 release?

pwolanin’s picture

Yes, this is an issue and can be seen now on drupal.org if you log out. The "login to comment" links look like:

http://drupal.org/user/login?destination=comment/reply/88728%2523comment...

Where the '#' has been encoded

NaX’s picture

Instead of trying to change the drupal_urlencode function maybe we should be focusing on the search module as it seams that is where the problem lies when it comes to special characters.

Here is another go.

function drupal_urlencode($text) {
  $search = array('%2F', '%23');
  $replace = array('/', '#');
  if (variable_get('clean_url', '0')) {
     $search[] = '%26';
     $replace[] ='%2526';
  }
  return str_replace($search, $replace, urlencode($text));
}

But all these solutions means their is code in the drupal_urlencode function just for the search module maybe we should focus on the data the search module receives rather than all the current solutions.

function drupal_urlencode($text) {
  return str_replace(array('%2F', '%23'), array('/', '#'), urlencode($text));
}
Steven’s picture

Status: Active » Closed (works as designed)

Did you bother to actually take a look at the URL in question?

http://drupal.org/user/login?destination=comment/reply/88728%2523comment...

This means: We are on the "user/login" page. After we finish logging in, we want to proceed to "comment/reply/88728#comment...". In other words, the #comment fragment identifier is part of the destination value, not part of the normal URL. If it was not escaped, it would be ignored by PHP.

By design.

valiant-1’s picture

Title: Handle ampersands in search queries and other URLs when clean URLs are on » Use THE_REQUEST

see:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32328#c12

If you need the unescaped uri with all its consequences, use the ENV
THE_REQUEST, which contains the full untouched request string like
GET /foo%20bar?foo=bar HTTP/1.1

That works for us in Gallery 2. We're using THE_REQUEST for a long time now, with success.

e.g.

    RewriteCond %{THE_REQUEST} /gallery2/tag/([^?/]+)
    RewriteRule .   /gallery2/main.php?g2_view=tags.VirtualAlbum&g2_tagName=%1   [QSA,L]
asimmonds’s picture

Title: Use THE_REQUEST » Handle ampersands in search queries and other URLs when clean URLs are on

Reverting title