My situation results from some external/anonymous users hitting the proxy server with one URL/port while internal/registered users (editors of content) go straight to Apache through a different URL/port. In my situation the specific URL difference is just the port, but I fear the symptom might also be relevant for external/anonymous users hitting the external site through a URL such as external.mydomain.com while editors hit the site from an internal URL such as internal.mydomain.com.
The key here is that editors make the changes on one URL/port which need to purge the proxy server for external/anonymous users on a different URL/port. The proxy server obviously only knows about the URL/host/port as accessed by external users, so the internal URL is almost irrelevant.
If it matter, the different URL's are mapped by the proxy server (squid) and Apache as named virtual hosts.
The code in function purge_urls() in purge.inc builds up individual $purge_request arrays. Here is an example showing the elements:
purge_requests: Array
(
[0] => Array
(
[purge_url] => http://local.mydomain.com:82/node/114
[proxy_url] => http://local.mydomain.com:82
[request_method] => PURGE
[headers] => Array
(
[0] => Host: local.mydomain.com
)
)
)
Note that the [headers] specification of the host does not include port 82 even though the purge_url and proxy_url do properly specify port 82. The code includes explicit checks for port numbers for the purge_url and for the proxy_url, but not for the host header.
// Determine the host
$purge_url_host = $purge_url_parts['host'];
// Add portnames to the host if any are set
if (array_key_exists('port', $purge_url_parts)) {
$purge_url_host = $purge_url_host . ":" . $purge_url_parts['port'];
}
...
// Construct a new url
$proxy_url_base = $proxy_url_parts['scheme'] . "://" . $proxy_url_parts['host'];
if (array_key_exists('port', $proxy_url_parts)) {
$proxy_url_base = $proxy_url_base . ":" . $proxy_url_parts['port'];
}
...
$purge_requests[$current_purge_request]['headers'] = array("Host: " . $purge_url_host);
At a minimum this causes all attempts to purge pages from my proxy server to fail. I use squid, which returns 404 errors that the purge_url wasn't found, presumably because it is also considering the host header. This generally feels inconsistent to me. It feels like all communication with the proxy server should use the domain as identified by the proxy_url. I'm not 100% certain this is a general bug or just an edge case that I am hitting. Most people probably use the exact same URL/domain/port all the time so it's never an issue. That would certainly impact whether this might result in a change to the code as written or a new feature to allow users to specify whether to use the proxy_url or the purge_url as the host header.
It feels like all domain references should use the proxy_url as specified in the administrative interface, including any port specification, but that may be jumping to a conclusion. At a minimum, my proxy cache is never purged because of this. Can anyone think of a use-case where the host header domain/port should NOT match the domain/port in the purge url? If so, then perhaps the solution is either a preference or simple a second URL field in the administrative settings, one for the proxy server itself and one for the header host if the value should not be taken from the user's URL???
Comments
Comment #1
SqyD commentedHi there and thank you for your bug report. I will look into this, if only because you're probably the first confirmed user of Squid that uses this module. For the upcoming 2.x versions I already have some code in that will take this into account but it could be a while before that's stable. Hacking this into the current stable 1.x is not high on my priority list but I am always happy to see patches.
(Not wanting to sound too much like another Varnish fanboy but have you looked at Varnish yet? It may be your best solution to this and many other issues since the Varnish vcl has options to simply strip the port before caching.)
Comment #2
carteriii commentedIf I'm truly the first confirmed user with Squid . . . Yikes! I knew Varnish was more popular, but I had to go with Squid for other reasons unrelated to Drupal. Basically this server already had squid running for a different application so it was unrealistic (and perhaps impossible) to run Squid *and* Varnish on the same server. So at least for now, Squid it is, and so far I'm reasonably happy.
I do not need any changes put in 1.x. Please just don't drop Squid support in 2.x, and please consider my use case when working on 2.x. At a minimum, I'll gladly test it out for you.
Comment #3
japerry