So, we are having an interesting discussion going on downstream over in #1759040: Varnish.module doesn't properly escape backslashes in url. Would like to get the input from upstream. Should expire urlencode its urls, or should we leave it to downstream modules that implement hook_expire_cache?

Varnish needs the urls encoded for it work with the Varnish CLI. So, we could fix it over there. However, it could also be fixed in expire so it would only need to be done once and all modules that implement hook_expire_cache would get that goodness.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

heddn’s picture

Title: Should expire urlencode urls » Should expire.module urlencode urls
Component: Code » Miscellaneous
Category: feature » support
Status: Active » Needs work
Spleshka’s picture

Version: 7.x-1.x-dev » 7.x-2.x-dev
Status: Needs work » Postponed

So let's leave this issue postponed until #1759040: Varnish.module doesn't properly escape backslashes in url is solved. More over, pay attention at 7.x-2.x branch.

JeremyFrench’s picture

Issue summary: View changes

I think we have a deadlock on these two issues waiting for updates on each other. I'll apply the varnish one.

Spleshka’s picture

Cool, thanks. Any progress on this issue?

bmunslow’s picture

A word of warning, if Cache Expiration should indeed encode URLs, attention should be given to GET parameters that may exist in URLs to be expired.

For instance, if I want to expire the following URL:

foo/bar/2513?myarg=2588

It gets transformed to:

foo/bar/2513%3Fmyarg%3D2588

And obviously no URL is expired (this is specially problematic if used in combination with the Boost module, the file never gets deleted).

Each URL should be processed so that the proper query arguments are passed on to url();

This is currently a bug in Cache Expiration 7.x-2.0-rc3, since URLs with GET params are not expired if setting EXPIRE_INCLUDE_BASE_URL is set to TRUE.

achton’s picture

In our case, we use Purge module to purge content in Varnish via PURGE requests. I was implementing expiration of image styles over in #2490954: Expire image styles, but see now that I am not properly handling the itok query parameter.
When passing image style URLs to ExpireAPI::executeExpiration($expire_urls, 'file', $file);, it is run through the url() function which in turn passes the whole thing through rawurlencode() and replaces the '?' character (and more).

Suggested approach:
It seems to me that executeExpiration needs to be extended to support URLs with query params. As @bmunslow commented, it is vital that the URLs to expire are exact, and I think it is better left to the receiving modules (ie, Boost, Varnish, Purge, etc) to urlencode if necessary.
One (quite vast) change that could be made is to change the $url parameter to support an associative array of URLs which may contain query params for each URL.
Another, simpler approach would be to detect this before passing the path to url() -- here:

      // If base site url should be included, then simply add it to the internal paths.
      if ($include_base_url) {
        foreach ($urls as $internal_path) {
          // TODO: Detect query params here and pass them to url() below in the $options array.
          $urls[$internal_path] = url($internal_path, array('absolute' => TRUE, 'alias' => TRUE, 'language' => $language));
        }
      }
cmlara’s picture

@achton:
"As @bmunslow commented, it is vital that the URLs to expire are exact, and I think it is better left to the receiving modules (ie, Boost, Varnish, Purge, etc) to urlencode if necessary."

Technically per RFC's the encoded version is actually the "real" URI and the unencoded is just a human-readable representation of the URI.

RFC3986:
" In local or regional contexts and with improving technology, users
might benefit from being able to use a wider range of characters;
such use is not defined by this specification. Percent-encoded
octets (Section 2.1) may be used within a URI to represent characters
outside the range of the US-ASCII coded character set if this
representation is allowed by the scheme or by the protocol element in
which the URI is referenced. Such a definition should specify the
character encoding used to map those characters to octets prior to
being percent-encoded for the URI.
"

This is why I argued in the past over on the Varnish module that url encoding needs to be done up at the expire level, because when it gets down to the Varnish module how is it to know that the ? is part of the query (and should not be encoded) or is part of a human representation of a ? and should be encoded. I'll admit the ? is somewhat a weak argument I've never seen a URL where "?" is in a URL w/o being a query parameter and strictly speaking it shouldn't be but your example shows how hard it could be for a module down stream that is only going to get a "URI" to figure out what to encode and not to encode. What really brought up this whole thread 2 years ago was that Expire passed a "\" as part of the URL down to Varnish and it was forced to be dealt with down there (I'll admit I fell out of the discussion when life got busy)

  • Spleshka committed 1e4ccc9 on 7.x-2.x
    Issue #1780144: Should expire.module urlencode urls
    
Spleshka’s picture

Status: Postponed » Fixed

I've made a few tests myself and turns out that if expiring url has a query string with _GET arguments then cache expiration doesn't work even for internal cache expiration, because a string which is being passed to cache_clear_all() contains encoded string with actually wrong chars. So I've fixed this issue, hope it will solve the problem.

Leon Kessler’s picture

Status: Fixed » Needs review
FileSize
490 bytes

Have discovered an issue this patch has brought in.

If you are trying to expire any path that has an alias, it will produce a fatal error.
Warning: Illegal string offset 'path' in ExpireAPI::executeExpiration() (line 96...

ExpireAPI::executeExpiration() is now expecting each path to be an array('path' => $path, 'query' => $query). But we are still adding in just the $path (this happens on line 441).

Have attached a patch that fixes the format into an array. Query is always blank because aliases cannot have a query (as they are already part of a query).

  • Spleshka committed 7316423 on 7.x-2.x authored by leon.nk
    Issue #1780144 by leon.nk: Should expire.module urlencode urls
    
Spleshka’s picture

Status: Needs review » Fixed

Nice catch, @leon.nk! Thanks, was happy to commit your patch.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

joelpittet’s picture

This mixed output is causing some problems for authcache and likely other consumers of the expired cache urls.

Created a follow-up:
#2549647: Expire url should have a consistent output instead of string or array depending on $include_base_url

Stemming from:
#2507797: authcache_builtin_expire_v2_expire_cache is a killer with locale on

donquixote’s picture

This mixed output is causing some problems for authcache and likely other consumers of the expired cache urls.

I would say it is an API break.
The hook_expire_cache() is documented to receive an array of string paths or urls.
Changing this to an array of arrays is a hard break.
E.g. purge_expire_cache() does expect an array of strings. (It has its own problems because it gets confused with relative paths, but that's another story - #1886546: Purge URLs sometimes concatenated incorrectly)

As for the encoding, I'd say the module should continue doing whatever it did in the last release, for best backwards compatibility.

Created a follow-up:
#2549647

Going to follow there, just wanted to mention the above.