As mentioned here, mixing caches for http and https (ssl) requests in one cache can have downsides, especially if a self-signed certificate is in use: #1466480: Caches URLs as HTTPS leads to issues when accessing by HTTP

Thinking about this, wouldn't be a solution to create different static caches for both http and https? I noticed that this works fine for different themes depending on the subdomain (in my case a mobile site accessible through m.example.com). So I imagine that it would be possible to also have a separate cache for https visits.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Anonymous’s picture

But

RewriteCond %{HTTPS} on

means that boost disables for https parts of sites. So that a page with user entered details is never cached, e.g. someone enters their address details but misses a field out and then boost accidentally caches the "you should have entered X page"

Philip.

yan’s picture

My problem is, that I would like to cache pages for anonymous users that visit the site through https. Right now I can't, because it makes the site unusable for anonymous users through http. That means, that https uses no cache at all (or just the views cache) which slows things down a lot.

yan’s picture

Title: Create seperate caches for pages accessed through http and https » Create separate caches for pages accessed through http and https
Anonymous’s picture

There are many ways to do this if you really want to, but as mentioned before HTTPS disabling is there for a reason. You could duplicate the block of boost rewrite rules and assign something other than "normal" as the variable, and then hack boost.module to add another folder if HTTPS is set.

Or you could redirect all non-SSL requests to a differing domain/ ip address and then disable the HTTPS on rule which would require substantial relinking of your site but would negate needing to modify boost to use a differing directory other than normal.

You could also redirect all anonymous users to the non-ssl site using a rewrite rule too. (safest and easiest option)

I'd be very careful with the rewrite rules to make sure that no user information is accidentally cached.

yan’s picture

Thanks for your answer.

There are many ways to do this if you really want to, but as mentioned before HTTPS disabling is there for a reason. You could duplicate the block of boost rewrite rules and assign something other than "normal" as the variable, and then hack boost.module to add another folder if HTTPS is set.

That might be an option, but hacking the module isn't really what I want to do. It would be nice to have that functionality to optionally use it.

Or you could redirect all non-SSL requests to a differing domain/ ip address and then disable the HTTPS on rule which would require substantial relinking of your site but would negate needing to modify boost to use a differing directory other than normal.

I don't really like that Idea. I prefer to have just one domain for both http and https

You could also redirect all anonymous users to the non-ssl site using a rewrite rule too. (safest and easiest option)

I think that's rather the unsafest option because then all anonymous traffic would not be encrypted. I would like better to encourage users to use https to protect their privacy.

I'd be very careful with the rewrite rules to make sure that no user information is accidentally cached.

Definitely. But I think there is a misunderstanding: Using https or not doesn't say anything about which part of the website I'm using and what data I am sending. It's the module's task to determine whether or not content should be cached, but that doesn't have anything to do with http or https.

Anonymous’s picture

The problem with caching https is that although boost would not cache a POST request. It would cache a return value from a password form, e.g. Joe enters his password wrong, his username appears on the cached page returned to the browser so then everyone else sees his username even if there is no password value. So to have a separate https cache would require creating an exclusion list that would end in a configuration on a site by site basis, which doesn't really fit into the project although the general rule

RewriteCond %{REQUEST_URI} (^/(admin|cache|misc|modules|sites|system|openid|themes|node/add|comment/reply))|(/(edit|user|user/(login|password|register))$) [OR]

may very well cover it. For "the boost project" to create to differing caches would require duplicating virtually everything in admin, from the cache lengths through to the pages covering excluded pages, and would mean shipping a module that was insecure by default with the inherent risk of presenting cached information to anon users.

It's the module's task to determine whether or not content should be cached, but that doesn't have anything to do with http or https.

No it's the site owner's task to configure boost and determine what should be cached or not, removing the one line from the .htaccess file gives the option to cache https.

I would like better to encourage users to use https to protect their privacy.

so redirect everyone from http to https and remove the line in .htaccess, although https doesn't protect privacy and in the case of boost it worsens it by instructing the browser to cache pages for long periods of time, the only thing https does is encrypt content from a man in the middle, it doesn't stop an ISP saying at X hour user Y connected to website Z, for that you would need Tor.

yan’s picture

Thanks for your answer Philip_Clarke. But I still disagree on some points or maybe I haven't understood you right.

The problem with caching https is that although boost would not cache a POST request. It would cache a return value from a password form, e.g. Joe enters his password wrong, his username appears on the cached page returned to the browser so then everyone else sees his username even if there is no password value.

I see the problem, but not the connection to https. For example, the login form shouldn't be cached for what you said. But that is true independent from visiting it through http or https, i.e. the criteria to decide if a page is cached or not is it's content but not the way it is delivered to the visitor. That's what I meant when I said that "it's the module's task to determine whether or not content should be cached". Or am I wrong?

For "the boost project" to create to differing caches would require duplicating virtually everything in admin, from the cache lengths through to the pages covering excluded pages, and would mean shipping a module that was insecure by default with the inherent risk of presenting cached information to anon users.

I'm not sure what you mean by "duplicating virtually everything in admin", but wouldn't a separate cache for https technically be the same as the cache for a subdomain?

No it's the site owner's task to configure boost and determine what should be cached or not, removing the one line from the .htaccess file gives the option to cache https.

But the line in .htaccess does not have any effect on how the pages are cached, right? Just to get back to the main problem that made me open the feature request: If you use a self-signed certificate for SSL and parts of the cached page include https calls (for example images) then they won't be visible to users that haven't added an exception rule for the certificate. That is why I proposed to create a separate cache for https so that there can be a cached version of every page once with http and once with https.

although https doesn't protect privacy and in the case of boost it worsens it by instructing the browser to cache pages for long periods of time, the only thing https does is encrypt content from a man in the middle, it doesn't stop an ISP saying at X hour user Y connected to website Z, for that you would need Tor.

Yes, you're right. But at least the content exchanged between client and server is encrypted. How does https instruct the browser to cache pages for long periods of time?

Anonymous’s picture

the criteria to decide if a page is cached or not is it's content

which is what the project has achieved with "normal" drupal and then someone adds a theme with a differing naming convention or a log in box as a block (note that this forum has no log box but a link), and then I end up fielding support requests for situations that can't work. Same with shops and carts, bad news if the theme shows a cached cart page because of direct links in a theme (we disable boost for POST in an effort to minimize this).

There is simply no easy way apart from redirecting people entirely to accept the certificate, setting up multiple domains would mean copying content across, so it's either redirect all to https and then remove the boost .htaccess line and go through the pages to check that the theme has no inappropriately cached areas (and then use the current exclusion ability built in). From what I've seen of the site then you would be looking at editing drupal core for the image links for each article to instruct them to look for http or https access before even getting to boost as the mix and match nature of drupal's link construction is not compatible with self signed certificates or indeed mixing https on a http page.

How does https instruct the browser to cache pages for long periods of time?

https does not, but boost does try and use mod_headers to increase the expiry length for pages in the browser cache. So boost turned on for https pages would increase the likelihood of caching "private" information. There is an additional problem though it was patched. Boost's cache structure is very well defined so someone can type in

cache/normal/domain/page_.html

and pull up a page as the rewrite rules would not work otherwise. The issue that could occur is that although rewrite disables to present the page to the browser, a page may still be generated by boost (I must check that patch), so user/ customer details could theoretically be written to the folder and accessible on the file system for a https website by directly typing in the the URI thereby leading to a small identity theft possibility. I would not recommend boost on any https site BMG is coming back soon and I'm sure he'll agree.

yan’s picture

Thanks again for the quick reply, Philip_Clarke. I really do not want to appear ignorant, but I still don't see how https is a cause of the problems you describe. I understand the problem to detect whether or not a page should be cached and that it becomes harder if there are custom elements in the site. And as far as I can tell, the module does a great job in this.

But: Let's take the login block as an example. Somehow Boost needs to 'know' that it shouldn't cache a page if it contains a login block (or is the login page). But although it is more likely, that a page visited through https contains forms to send data to the server, that is not a reliable criterion to determine whether a page should be cached. For example, visiting http://example.com/user should not be cached (although it is http) while https://example.com should be (although it is https). See what I mean?

From what I've seen of the site then you would be looking at editing drupal core for the image links for each article to instruct them to look for http or https access before even getting to boost as the mix and match nature of drupal's link construction is not compatible with self signed certificates or indeed mixing https on a http page.

I think it's possible to have Drupal create https links on https pages and http links on http pages, although in some cases it means a little more thinking for developers. But I think it is ok to leave that to them, because the module's task is to cache the content that is put out and the rest is the site developer's task.

setting up multiple domains would mean copying content across

Right now I am using a subdomain for a mobile version of the site (m.example.com) and it is also cached - in a different directory. By "copying content across" do you mean the fact that there are multiple folders containing caches, like it is done for subdomains?

yan’s picture

Issue summary: View changes

Any more ideas on this topic? Unfortunately my knowledge is too limited to contribute with programming at this level.

nmalinoski’s picture

Here's a quick and dirty patch to allow anonymous caching on HTTP and HTTPS simultaneously. Instead of writing cache files to cache/normal/<domain>/*, it writes them to cache/normal/<domain>/<scheme>/*. This gets around Boost's apparent limitation of one cache per site, and is still compatible with cache flushes.

In addition to applying this patch, you will need to adjust the rewrites in your .htaccess/vhost file(s). You will need to replace instances of cache/%{ENV:boostpath}/%{HTTP_HOST}%{REQUEST_URI} with cache/%{ENV:boostpath}/%{HTTP_HOST}/%{REQUEST_SCHEME}%{REQUEST_URI}.

This setup appears to work for my applications. Unfortunately, I don't know if any other part of Boost relies on the former directory structure. I would really appreciate some eyes on this to make sure it's an appropriate implementation.

Edit: You will also need to uncheck "Bypass the boost cache for ssl requests." on the ".htaccess" tab in the Boost settings or otherwise remove the RewriteCond %{HTTPS} on [OR] directive in your .htaccess/vhost.

SocialNicheGuru’s picture

Status: Active » Needs review