Closed (fixed)
Project:
Varnish
Version:
7.x-1.x-dev
Component:
Code
Priority:
Major
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
27 Oct 2011 at 12:19 UTC
Updated:
12 Oct 2017 at 08:15 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
karljohann commentedThis can apparantly be set in Cache Expiration settings by unchecking "Include base URL in expires"
Comment #2
karljohann commentedComment #3
neilnz commented(reopening this issue from the Expire queue as support for the above-mentioned option would be nice).
The Expire module now supports adding the domain to the paths that it passes to hook_expire_cache(), especially to support multi-domain sites using the Domain Access module.
In order to use Expire with the Varnish module when using this option, the Varnish module needs support to receive absolute URLs in varnish_expire_cache().
The attached patch adds support by reading the expire module's variable and sorting the passed absolute URLs into domain buckets before sending a purge request for each of the mentioned domains in the set. This supports expiring content across multiple domains in one go.
When the option is turned off, the existing functionality is used (unchanged).
Given that this variable defaults to being turned on in current releases of Expire, it would be nice to get this into the Varnish module, as it's a bit tricky to diagnose why the Expire module is failing to purge pages on a new install.
Comment #4
neilnz commented... and here is the backport of the patch to D6 (which we use). Exactly the same code, just the D7 patch doesn't apply because of some code formatting differences between the D6 and D7 module.
Comment #5
neilnz commentedWeird, looks like I managed to revert my title change for this issue. Setting it back again.
Comment #6
fabsor commentedWe need to check if the expire module is available here before we start using constants from the expire module. The varnish module doesn't depend on expire, so we need to have a check for that.
Comment #7
neilnz commentedSure, but that code is inside hook_expire_cache(), a hook provided by the Expire module. Is it not a reasonable assumption that this code is only ever called by Expire?
Happy to wrap it if you like. I guess the benefit would be other modules could manually call this hook while EXPIRE_INCLUDE_BASE_URL is not defined (so the default could be to not include the base URL).
Comment #8
fabsor commentedA lot of people actually call that function directly, especially before when varnish_purge function was not available, so I don't want to break that. It would be really nice if you could wrap it!
Comment #9
neilnz commentedSorry about the delay. Here is a revised patch that checks if Expire is enabled, and uses constant() to read the constant to avoid issues with that if the module isn't there (and therefore the default constant is not set).
Comment #10
heddn#9 works for me.
Comment #11
jay.dansand commentedBumping this issue - it should really make it into the full project.
Varnish wasn't getting selectively purged for me at all (thanks to the $purge variable being malformed as the original post says), so I ended up hacking varnish_expire_cache() in a similar way before I found this (better written) patch. Most site developers will probably not be willing/able to track down the problem and patch Varnish.module themselves, and Expire support doesn't work out of the box, so this patch really needs to make it into a release.
Comment #12
k.minkov commentedCheck out the patch in this issue it is addressing the same problem and a few others concerning the work of Varnish with Expire and Domain Access modules:
#1996410: Varnish together with Expire and Domain Access modules
Comment #13
mgiffordWhat is holding up the issue in #10? It's been RTBC for over a year now.
EDIT: Patch still applies nicely on SimplyTest.me
Comment #14
JeremyFrench commentedTried to commit but patch didn't apply cleanly. So rerolled. If someone can check I will push it.
Comment #15
mgiffordIt was from the latest release I was upgrading.
I so I got these errors, but it went away when I change the settings.php variables to https://drupal.org/files/issues/varnish-fix-README-instructions-1605308-...
$ drush cc all
WD php: Exception: Invalid or missing cache bin specified: external_varnish_page in DrupalDatabaseCache->clear() (line 516 of [error]
/DRUPAL7/includes/cache.inc).
Exception: Invalid or missing cache bin specified: external_varnish_page in DrupalDatabaseCache->clear() (line 516 of /DRUPAL7/includes/cache.inc).
And then also when applying the update.php.
Might be useful to have a warning when folks upgrade.
I can't see where it's broken, but it's not quite working for me. Might be a larger config issue.
I can run
drush cc allbut still have Varnish come back with X-Varnish-Cache: HIT so something must be wrong.I can't confirm that this patch works yet.
Comment #16
bibo commentedThis issue is basically about the same problem: #2017097: Ban/purge requests to varnish are malformed.
Actually, there are about 5 different issues about this in the Varnish-queue (referenced in the above issue), and a few more in expire-module's issue queue. Not sure which ones should be set as duplicate - but still, I set a few older inactive issues to "duplicate".
Oh, and does this cache-bin change really make sense?
-$conf['cache_class_cache_page'] = 'VarnishCache';
+$conf['cache_class_external_varnish_page'] = 'VarnishCache';
It doesn't seem to be used actually anywhere yet (based on Google). How would for example cache_actions work with this bin? Custom rules with cache clears to cache_page currently work ok. And expire relies on cache_page too.
Comment #17
JeremyFrench commentedRE #16
>Oh, and does this cache-bin change really make sense?
>-$conf['cache_class_cache_page'] = 'VarnishCache';
>+$conf['cache_class_external_varnish_page'] = 'VarnishCache';
I need to document this I have about 90 Mins to dedicate to vanish today so I will try to get things cleaned up a bit.
If you are not using the expire module you probably want:
This will follow the default Drupal behavior and purge the entire page cache if one page change.
However if you are using expire you probably don't want that to happen every time a page is changed (that is kind of the point of the module) so having vanish as the default page cache is not a good idea.
You could just remove varnish from the cache class altogether. However there are times when you explicitly want to clear the whole cache and you would loose this ability.
Putting
in your config means that if you do drush cc all, or click clear all caches in the admin. Varnish will also be purged. But on a normal node save the whole cache will not be cleared.
Comment #18
thewilkybarkid commentedI haven't triggered it yet, but doesn't #14 revert #1481136: Avoid "Connection reset by peer" on large purge list by batching paths?
Comment #19
mgiffordthis seems like it needs work.
Comment #20
mfbThis patch was enough to get it working with the current releases of Expire and Varnish modules.
Comment #21
mfbupping this to major as some sites need to to enable
expire_include_base_urlso they can alter or add additional base URLsComment #22
damienmckennaIn my testing of Expire 7.x-2.0-rc4+5-dev (2015-06-16) and Varnish 7.x-1.1+1-dev (2016-07-11) the $wildcard values do not have a hostname, it's just a system path, so when it gets to the if(module_exists('expire')) part it hits this error: "Undefined index: host in varnish_expire_cache"
Comment #23
damienmckennaThis adds an extra change: the values in $paths now use the absolute URLs originally passed in, rather than the system paths that are used in the $wildcards structure.
Comment #24
damienmckennaNever mind, ignore my patch, you shouldn't enable expire_include_base_url when using Varnish anyway.
Comment #25
mfb@DamienMcKenna can you explain that statement?
We have expire_include_base_url enabled (with this patch) because we absolutely need it, in order to modify the absolute URLs that are purged by Fastly and Varnish modules (yes we use multiple external caches with expire module).
It looks like we're pretty close to getting it working here...
Comment #26
damienmckennaChanging my mind.. it should use the version of the path from the $paths argument instead of the $wildcards argument when adding new wildcard paths.
Comment #27
damienmckennaLooking at how the Akamai module does it, it only deals with the keys of the $paths list, it ignores the full URL and manually adds those directly. Shouldn't the Varnish module do likewise?
Comment #28
damienmckenna@mfb: fyi I'm trying to use Varnish and Akamai, so I understand and appreciate the goal of supporting multiple systems.
Out of interest, what sort of changes are you making to the URLs via hook_expire_urls_alter? I'm going to be using it to do some adjustments on Panels paths, but I was thinking of managing the hostnames in Varnish and Akamai because that would mean the list of URLs would be shorter thus easier to manage, a separation of concerns if you will.
Comment #29
damienmckennaBTW while working with the Akamai module I'm starting to think that Expire should only know about the internal paths & aliases, it should be up to the module processing hook_expire_cache() to deal with the hostnames.
Comment #30
manarth commentedAre you able to elaborate on your use-case a little Damien?
I can hazard a guess - for example, a third-party CDN taking care of www.example.com, but mapping the backend path to something like backend.example.com which would be handled by Varnish - but a concrete example with the issue you're facing might give us better context, and more insight into where any alter hooks would be best suited.
I mostly run Varnish, but I've had a couple sites running both Varnish and Stingray (now Riverbed) Traffic Manager as internal caches under our control, as well as an external CDN. Working with multiple internal and external caches always felt a bit of a balancing act, and whatever I end up doing, I'm constantly questioning whether it's the "correct" approach. I think we need to get all the maintainers together in a room to hash out appropriate policies…when's the next Drupalcon again? :-D
Comment #31
mfbThe reason I configure Expire module to include the base URL is that I want to be able to implement hook_expire_urls_alter() on my site and modify the list of absolute URLs that are being purged, as documented at http://cgit.drupalcode.org/expire/tree/expire.api.php#n82
I would say modules that implement hook_expire_cache() should fully support the API. They may get an array of absolute URLs or internal paths, depending on the configuration of Expire module, as described at http://cgit.drupalcode.org/expire/tree/expire.api.php#n8
I can't speak for why Akamai module doesn't fully support the API; it might just be that the developers chose not to support some complex use cases like multiple hostnames / base URLs for the same site.
Comment #32
mfbYes, we have a Varnish instance handling an internal hostname, and Fastly handling the main hostname.
AFAIK Fastly module doesn't have any built-in hook to alter the URLs that are passed in by Expire module, so we rely on hook_expire_urls_alter() to alter the list of URLs (adding additional hostnames, etc.)
Curently Fastly module works fine with hook_expire_urls_alter(). But if Varnish module goes in the direction of not accepting full URLs, then we'd have to add a hook to Fastly module to provide for URL altering.
I don't feel too strongly about it, but I think it makes sense for Expire module to work with absolute URLs. After all, Drupal core certainly knows about absolute URLs (the page cache is keyed based on absolute URL) so why shouldn't Expire module. Theoretically this would mean less code in all the implementing modules, like Fastly, to allow for URL altering.
Comment #33
manarth commentedThe Varnish module currently uses an API command that requires the hostname -
ban req.http.host ~ $host && req.url ~ "$pattern", but Varnish supports an alternative command that ignores the hostname:ban.url.It's worth noting that the syntax is different for Varnish 1 and Varnish 2, but both are deprecated so maybe they can be ignored.
In the same way that the Varnish module allows a choice between ban and ban-lurker, it might be worth providing a configuration choice between ban and ban.url.
For the comments about
hook_expire_cache(): I absolutely agree that we should fully implement the API specified by the Expire module; partial implementation would become more confusing. There may be some value in Expire changing the API to separate hostnames from URLs, but if that changed, that's probably a big enough change to expect a new major version, and that's something we can test for.Comment #34
mfbHere's a slight optimization: Add the wildcard regex to the path it's associated with, rather than a new path.
Comment #35
damienmckennaCould someone please clarify what the "[/?]" part of the change means? Why is it insufficient to not just add ".*" to the end of the path?
FYI the "ban.url" command was removed in 4.0 (https://lassekarstensen.wordpress.com/2014/06/03/what-happened-to-ban-ur...), and they just released 5.0, so we shouldn't be focused on what was available in v1 or v2.
Comment #36
damienmckennaThe exact scenario I'm working on has a few pieces to it. As guessed, we have an editorial instance of the site, say "admin.example.com", and a public instance of the site, "www.example.com"; only the public site is cached with Varnish and Akamai, the admin side is not, so when we edit things on admin.example.com we have to be able to make it clear the caches at www.example.com for the corresponding path.
Furthermore, we have Panels pages that are a) have different aliases for each locale, b) have arguments passed to them. Take a Panels page with the path "my-page/%". It might have an alias "fr/moi-page/%" for French, "de/das-page/%" for German, etc. Then, there are the arguments passed in, so the actual visible URL could be "my-page/12345" for English, "fr/moi-page/12345" for French, etc. When someone edits the Panels page we need to clear the caches of both the translated paths but also all of the paths with arguments. Varnish appears to be able to handle our needs, with a few patches, and I'm just getting into Akamai today.
Comment #37
damienmckennaJust to finish explaining my plans, my intention was to use Expire Panels to register when a Panels page was changed, use some custom logic to expand that one path to cover all of the different locale-based aliases available, and then the Varnish module automatically handles the wildcards. Varnish can automatically also process the "www.example.com" paths by setting the "varnish_front_domains" variable. Expire's architecture doesn't seem to let you use multiple hostnames or a different hostname without using hook_expire_urls_alter() to duplicate values, so that means it'll be easiest for us to disable expire_include_base_url.
Incidentally, the Expire UI states "Enabling [expire_include_base_url] when Varnish or Acquia Purge modules are used as a cache backend is not recommended." and I can understand why, the hostnames can get a bit confused.
Comment #38
mfbI believe [/?] is there so if you expire node/12 with a wildcard it will expire node/12/foo and node/12?foo but not node/123
Comment #39
damienmckennaBut doesn't that mean that it will only match "node/12/foo" or "node/12?foo" but not "node/12", because it will now require either a slash or a question mark? I think this means that patch #23 is the correct approach, so that both the $path and also $path & subdirectories and $path & query strings are supported.
Comment #40
mfbAh yes, not sure what I was thinking. Here's the question mark to make all those extra characters optional.
Comment #41
mfbby the way, it's strange that parse_url() is called on a regular expression, but it appears that it should work fine in this case.
Also, I'm not sure if Varnish module is escaping the URLs before using them in regular expressions? If not that could cause some weirdness, but would be for another issue.
Comment #43
misc commentedThanks! Patch committed to latest dev.