Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
So, we are having an interesting discussion going on downstream over in #1759040: Varnish.module doesn't properly escape backslashes in url. Would like to get the input from upstream. Should expire urlencode its urls, or should we leave it to downstream modules that implement hook_expire_cache?
Varnish needs the urls encoded for it work with the Varnish CLI. So, we could fix it over there. However, it could also be fixed in expire so it would only need to be done once and all modules that implement hook_expire_cache would get that goodness.
Comment | File | Size | Author |
---|---|---|---|
#10 | should_expire_module-1780144-10.patch | 490 bytes | Leon Kessler |
Comments
Comment #1
heddnComment #2
SpleshkaSo let's leave this issue postponed until #1759040: Varnish.module doesn't properly escape backslashes in url is solved. More over, pay attention at 7.x-2.x branch.
Comment #3
JeremyFrench CreditAttribution: JeremyFrench commentedI think we have a deadlock on these two issues waiting for updates on each other. I'll apply the varnish one.
Comment #4
SpleshkaCool, thanks. Any progress on this issue?
Comment #5
bmunslow CreditAttribution: bmunslow commentedA word of warning, if Cache Expiration should indeed encode URLs, attention should be given to GET parameters that may exist in URLs to be expired.
For instance, if I want to expire the following URL:
foo/bar/2513?myarg=2588
It gets transformed to:
foo/bar/2513%3Fmyarg%3D2588
And obviously no URL is expired (this is specially problematic if used in combination with the Boost module, the file never gets deleted).
Each URL should be processed so that the proper
query
arguments are passed on tourl()
;This is currently a bug in Cache Expiration 7.x-2.0-rc3, since URLs with GET params are not expired if setting
EXPIRE_INCLUDE_BASE_URL
is set toTRUE
.Comment #6
achtonIn our case, we use Purge module to purge content in Varnish via
PURGE
requests. I was implementing expiration of image styles over in #2490954: Expire image styles, but see now that I am not properly handling theitok
query parameter.When passing image style URLs to
ExpireAPI::executeExpiration($expire_urls, 'file', $file);
, it is run through theurl()
function which in turn passes the whole thing throughrawurlencode()
and replaces the '?' character (and more).Suggested approach:
It seems to me that
executeExpiration
needs to be extended to support URLs with query params. As @bmunslow commented, it is vital that the URLs to expire are exact, and I think it is better left to the receiving modules (ie, Boost, Varnish, Purge, etc) to urlencode if necessary.One (quite vast) change that could be made is to change the $url parameter to support an associative array of URLs which may contain query params for each URL.
Another, simpler approach would be to detect this before passing the path to
url()
-- here:Comment #7
cmlara@achton:
"As @bmunslow commented, it is vital that the URLs to expire are exact, and I think it is better left to the receiving modules (ie, Boost, Varnish, Purge, etc) to urlencode if necessary."
Technically per RFC's the encoded version is actually the "real" URI and the unencoded is just a human-readable representation of the URI.
RFC3986:
" In local or regional contexts and with improving technology, users
might benefit from being able to use a wider range of characters;
such use is not defined by this specification. Percent-encoded
octets (Section 2.1) may be used within a URI to represent characters
outside the range of the US-ASCII coded character set if this
representation is allowed by the scheme or by the protocol element in
which the URI is referenced. Such a definition should specify the
character encoding used to map those characters to octets prior to
being percent-encoded for the URI.
"
This is why I argued in the past over on the Varnish module that url encoding needs to be done up at the expire level, because when it gets down to the Varnish module how is it to know that the ? is part of the query (and should not be encoded) or is part of a human representation of a ? and should be encoded. I'll admit the ? is somewhat a weak argument I've never seen a URL where "?" is in a URL w/o being a query parameter and strictly speaking it shouldn't be but your example shows how hard it could be for a module down stream that is only going to get a "URI" to figure out what to encode and not to encode. What really brought up this whole thread 2 years ago was that Expire passed a "\" as part of the URL down to Varnish and it was forced to be dealt with down there (I'll admit I fell out of the discussion when life got busy)
Comment #9
SpleshkaI've made a few tests myself and turns out that if expiring url has a query string with _GET arguments then cache expiration doesn't work even for internal cache expiration, because a string which is being passed to cache_clear_all() contains encoded string with actually wrong chars. So I've fixed this issue, hope it will solve the problem.
Comment #10
Leon Kessler CreditAttribution: Leon Kessler commentedHave discovered an issue this patch has brought in.
If you are trying to expire any path that has an alias, it will produce a fatal error.
Warning: Illegal string offset 'path' in ExpireAPI::executeExpiration() (line 96...
ExpireAPI::executeExpiration() is now expecting each path to be an
array('path' => $path, 'query' => $query)
. But we are still adding in just the$path
(this happens on line 441).Have attached a patch that fixes the format into an array. Query is always blank because aliases cannot have a query (as they are already part of a query).
Comment #12
SpleshkaNice catch, @leon.nk! Thanks, was happy to commit your patch.
Comment #14
joelpittetThis mixed output is causing some problems for authcache and likely other consumers of the expired cache urls.
Created a follow-up:
#2549647: Expire url should have a consistent output instead of string or array depending on $include_base_url
Stemming from:
#2507797: authcache_builtin_expire_v2_expire_cache is a killer with locale on
Comment #15
donquixote CreditAttribution: donquixote commentedI would say it is an API break.
The hook_expire_cache() is documented to receive an array of string paths or urls.
Changing this to an array of arrays is a hard break.
E.g. purge_expire_cache() does expect an array of strings. (It has its own problems because it gets confused with relative paths, but that's another story - #1886546: Purge URLs sometimes concatenated incorrectly)
As for the encoding, I'd say the module should continue doing whatever it did in the last release, for best backwards compatibility.
Going to follow there, just wanted to mention the above.