We should pass the url query string to the purge method. What Varnish decides to do with the query string -- we can change that in the VCL file, but the query string should be passed nonetheless. I'm passing the pager query string for comments, so comments that show up on multiple pages for the same node can get flushed.

Currently, in purge_urls() in purge.inc, you have:

      // The default PURGE method. Native to Squid and configurable in Varnish and Nginx
      if ($method == 'purge') {
        // Make it a PURGE request (not GET or POST)
        $purge_requests[$current_purge_request]['request_method'] = 'PURGE';
        // Construct a new url
        $proxy_url_base = $proxy_url_parts['scheme'] . "://" . $proxy_url_parts['host'];
        if (array_key_exists('port', $proxy_url_parts)) {
          $proxy_url_base = $proxy_url_base . ":" . $proxy_url_parts['port'];
        }
        $purge_requests[$current_purge_request]['purge_url'] = $proxy_url_base . $purge_url_parts['path'];
        // Set the host header to the sites hostname
        $purge_requests[$current_purge_request]['headers'] = array("Host: " . $purge_url_host);
      }

I'd like to suggest:

      // The default PURGE method. Native to Squid and configurable in Varnish and Nginx
      if ($method == 'purge') {
        // Make it a PURGE request (not GET or POST)
        $purge_requests[$current_purge_request]['request_method'] = 'PURGE';
        // Construct a new url
        $proxy_url_base = $proxy_url_parts['scheme'] . "://" . $proxy_url_parts['host'];
        if (array_key_exists('port', $proxy_url_parts)) {
          $proxy_url_base = $proxy_url_base . ":" . $proxy_url_parts['port'];
        }
        $purge_path = array_key_exists('query', $purge_url_parts) ? $purge_url_parts['path'] . '?' . $purge_url_parts['query'] : $purge_url_parts['path'];
        $purge_requests[$current_purge_request]['purge_url'] = $proxy_url_base . $purge_path;
        // Set the host header to the sites hostname
        $purge_requests[$current_purge_request]['headers'] = array("Host: " . $purge_url_host);
      }

The modified lines are

        $purge_path = array_key_exists('query', $purge_url_parts) ? $purge_url_parts['path'] . '?' . $purge_url_parts['query'] : $purge_url_parts['path'];
        $purge_requests[$current_purge_request]['purge_url'] = $proxy_url_base . $purge_path;

Let me know what you think.

Comments

mauritsl’s picture

When we are purging lists (i.e. views) we most of the times want to purge all pages. Expanding a list of URL's to all pages can cost a lot of resources.

I suggest adding a wildcard in the PURGE configuration of varnish to expire that page with all possible GET params ( req.url ~ /view_page(\?.*)?^ )

djbobbydrake’s picture

Thanks for the reply. We're going to go this route. I have an additional question: so when this purge request is made, it's added to Varnish's purge list. Does Varnish actually keep track of a purge request for "/node/123/comments?page=1" vs "/node/123/comments?page=5" if the match in the purge list is a regex? How long does that regex stay in that purge list?

SqyD’s picture

if I understand it correctly Varnish (which confusingly uses the term "ban" instead of purge since Varnish 3.0) will keep bans in the same chronological lists as it's cached objects. Any requested objects matching the ban first before any cached object in that list will not be served from cache and will be fetched. So an object stored in cache after a matching ban has been issued will be served from cache. With this method a single url or a regex will be handled exactly the same without much performance costs. It's quite simple and brilliant at the same time.

By the end of this week my schedule should allow for some quality time with these modules. I hope to have a go at your patches, feature requests and these ideas.

djbobbydrake’s picture

Thx, that makes sense. We're keeping the patch I suggested above - it allows us to do stuff like tack on "?varnish=x" to our list of urls to purge. Then, in the vcl file, we can match for "?varnish=x", and build some flushing logic around that.

SqyD’s picture

Status: Active » Needs review

Ok, this makes sense to me as well. I've just committed a commented version of this for D6 and D7.

@djbobbydrake Please review/test them so I can include this in the next stable release. Thanks!

djbobbydrake’s picture

I can confirm this works for D6, since this is the patch that we have been running in our environment for a few months. Thanks for making the change!

SqyD’s picture

Status: Needs review » Closed (fixed)

I also added this feature to the other GET and AH methodes for both d6 and D7. Thanks!

omega8cc’s picture

Version: 6.x-1.3 » 6.x-1.x-dev
Priority: Normal » Critical
Status: Closed (fixed) » Active

Now it is broken for GET method in Nginx completely.

You have copied the same code and probably forgot that there is a "key" query string ?purge_method=get already.

We are using separate cache keys for mobile devices, so it looks like:

$conf['purge_proxy_urls'] = "http://127.0.0.1:8888/purge-normal?purge_method=get http://127.0.0.1:8888/purge-mobile-tablet?purge_method=get"

With this patch applied for GET method the log for purge server looks like below:

127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET / HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET / HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET /node HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET /node HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET /rss.xml HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET /rss.xml HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET /rss.xml HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET /rss.xml HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET /node HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:42:46 -0400] "GET /node HTTP/1.1" 404 162 "-" "-"

After reverting this patch it works again:

127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-normal/node/1 HTTP/1.1" 200 304 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-other/node HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-smart/node HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-tablet/node HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-normal/node HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-other/rss.xml HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-smart/rss.xml HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-tablet/rss.xml HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-normal/rss.xml HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-other/ HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-smart/ HTTP/1.1" 404 162 "-" "-"
127.0.0.1 - - [20/Mar/2012:18:55:31 -0400] "GET /purge-mobile-tablet/ HTTP/1.1" 404 162 "-" "-"
djbobbydrake’s picture

Would the solution be to ignore that one query string if using Nginx?

omega8cc’s picture

Probably. The purge_method=get should be ignored/excluded in GET method logic, as it has special purpose.

SqyD’s picture

Title: Sending purge request via "purge" method should retain url query string » Sending purge request should retain url path and query string

Okay, finally figured this one out. Must admit that djbobbydrake's "advanced" php syntax threw me off since I've never encountered that shorthand.
I've refactored the code so it will retain both path and query string and make sure no double slashes are added.
Code will end up in 6.x-1.x and 7.x-1.x branches.
Thank you all for your patience...

SqyD’s picture

Status: Active » Fixed

fixed in 1.6 releasese for D6 and D7.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.