This is a continuation of a discussion here:

http://drupal.org/node/17190

I've opened a new thread due to some confusion regarding the title of the thread above, which has kinda taken away from the importance of the subject.

Firstly, I think it is important to set the scope of this problem. This is not about problems with setting up a drupal site behind a proxy firewall, and having problems with RSS and the like. There are many issues that have been discussed about that, and I believe the patches supplied address this issue adequately ( http://drupal.org/node/9706 )

The general problem is this:

1) For an end user accessing a Drupal based site through a proxy firewall, pages they are shown may not be the most current page

2) Pages that are rendered by the proxy may even be protected versions of the page that were recently accessed by another user with escalated privelleges.

3) when dealing with node level access control, nodes that have been recently changed from 'public' to 'private' (for lack of better terms) may still be available to a user who shouldn't have access to them.

The problems this access causes are varied. In my cases, dealing more with node level access control issues, since the pages were being rendered with their public status intact, it also meant the form elements for making them private weren't being rendered properly - this inadvertantly had the effect of making the real version of the page public once again because the user failed to realize that the checkmark was not checked in the cached version.

I can see as well that since most permissions in drupal are based on page access and not on the actions themselves that are performed through those pages, this could lead to folks having access to change other things based on the pages they are presented with. For example, edit pages for nodes are restricted based on who gets to see the edit tab. Once the edit tab is rendered, I believe that there is no additional check, by token nor cookie nor session ID, to make sure that the person from whose session the form is being posted from has access to complete the edit action.

For example: Person A is an admin. They have just completed editing a node. Person B requests the same node page and is presented with a page that includes the edit tab. They click edit and are presented with the cached edit page. Because the action of submitting the edit form is being done from a page that was rendered by the admin, the changes person B makes will be commited to the DB.

This is all very dire sounding - and just to confuse this already confusing matter the solution is not immediately apparent.

It has already been suggested (thanks robertDouglass ) to put HTTP-Equiv meta tags in a given theme's head section like so:

META HTTP-EQUIV="Pragma" CONTENT="no-cache"
META HTTP-EQUIV="Expires" CONTENT="-1"

That should work for IE and Mozilla flavour browsers. Although this is a good idea, and I would recommend making this a standard part of a theme (or at least an option ) proxy servers often ignore these directives. Where this really needs to be implemented is the actual HTTP header, in advance of the actual page itself.

The following articles describes this in more detail.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

See section 14.9 Cache Control

http://www.htmlhelp.com/faq/html/publish.html#no-cache

This article has PHP code implementation:

http://james.cridland.net/code/caching.html

(click the link at the top to make the page readible **shakes head**)

I think ideally the sending of these headers should be optional, and configurable depending on what the purpose of the particular drupal implementation is - a site with much static content for example may not want all caching disabled for all pages.

When searching for discussion on this topic, I found some scattered references to what seems to be a partial implementation of some kind of http cache-control mechanism in drupal. Unfortunately in analyzing the code, I couldn't figure out where this was implemented or how.

In this node

http://drupal.org/node/9001

killes mentions at the bottom that caching is enabled for anonymous users only. I wonder if, based on the problems I and others are seeing, if he means browser cache control?

I hope this has described the problem in enough detail to differentiate it from other similar termed problems.

~Tat~

Comments

asimmonds’s picture

I'm experimenting with the following patch to bootstrap. Seems to fix the weird caching problem I was having but probably should be tested further.
From what I understand of the HTTP spec, the 'cache-control' header is only HTTP/1.1 at a minimum so may not work with older proxies. Our corporate proxy is a recent squid and HTTP/1.1 compliant.

Index: includes/bootstrap.inc
===================================================================
RCS file: /cvs/drupal/drupal/includes/bootstrap.inc,v
retrieving revision 1.38
diff -u -r1.38 bootstrap.inc
--- includes/bootstrap.inc	9 Jan 2005 09:22:38 -0000	1.38
+++ includes/bootstrap.inc	6 Mar 2005 05:45:23 -0000
@@ -416,10 +416,19 @@
  * Set HTTP headers in preparation for a page response.
  */
 function drupal_page_header() {
+  global $user;
+  
   if (variable_get('dev_timer', 0)) {
     timer_start();
   }
 
+  if ($user->uid) {
+    header("Cache-Control: private");
+  }
+  else {
+    header("Cache-Control: public");
+  }
+  
   if (variable_get('cache', 0)) {
     if ($cache = page_get_cache()) {
       bootstrap_invoke_all('init');
tatonca’s picture

... and things seem better! I will report more as I get a few more days experience with this, and try it through a few more proxy server types too...

Thanks for the patch! If this works out, and no one else would like to discuss doing this/ different ways of doing this, we should likely file this as a proper feature request...

I also wonder if a better way to do this is to include in conf.php a variable for additional header commands that the user can specify and then the patch in bootstrap.inc would be to include this if it exists... just a different way of doing it...

I prefer what you already have for my own use though...

~Tat~

dseron’s picture

Indeed HTTP Headers should always be used over Pragma which do not seem to be very useful. But are you sure that "Cache-Control: private" is part of the HTTP protocol?

Tonight I added the "Cache-Control: must-revalidate" header to the drupal_page_header function. That seems to really solve the problems I was experiencing behind the corporate firewall. It tells browsers as well as proxies to always perform a validation before releasing a cached copy.

I also added support for HTTP/1.0 persistent connections by adding the Content-length header.

      // Set default values:
      $date = gmdate("D, d M Y H:i:s", $cache->created) ." GMT";
      $etag = '"'. md5($date) .'"';
      $filesize = strlen($cache->data);

...

      // Send appropriate response:
      header("Last-Modified: $date");
      header("ETag: $etag");
      header("Cache-Control: must-revalidate");
      header("Content-Length: $filesize");
tatonca’s picture

I like the inclusion of the content length directive.

Couple of things.

In answer to your question of whether the public directive is part of the protocal : it is part of HTTP 1.1 as defined here.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1

I assume from your question that it didn't work out for you. It is possible that the cache you are behind doesn't honor it or that the cache is 1.0 instead of 1.1 compliant. It is working for me behind a CISCO Content server without any problems.

I like the public/private implementation personally because it fits with the paradigm the old content HTTP-META tags were trying to implement - "let anonymous user get cached content, but site members need current pages. "

However, it would seem that the must-revalidate directive has a stronger implementation since it is also used for some protocal based features.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4

I think in your case it probably makes sense to use it instead of private/public.

dseron’s picture

Thanks for the URLs, they were very informative.

My patch has been up there now for exactly one month and it's been working like a charm for me. Also, I haven't received a single complaint from any other user so it I think it is safe to say that the patch is working. Note that this site is being used very heavily, with more than 13 thousand registered users and hundreds of anonymous visitors each day. My suggestion therefore would be to incorporate the patch in the next Drupal release.

Regarding your question: I haven't tried the primary patch suggested here since I only noticed this thread after I applied mine. Looking at the code however I wonder whether it will be very effective. Determining whether or not a user is anonymous or priviliged (and thus whether to cache or not) is already handled by the page_get_cache function.

On the SSL subject: my site is using it and I patched the conf.php. This will prevent 'Undefined index' errors from PHP during non-SSL use:

# Base URL
$https = isset($_SERVER["HTTPS"]) ? $_SERVER["HTTPS"] : '';
if ($https == "on") {
    $base_url = "https://www.dse.nl";
} else {
    $base_url = "http://www.dse.nl";
}
drubeedoo’s picture

dseron,

Thanks for the code. I too had to turn off caching until I found this thread.

It was the "Content-Length: $filesize" that seemed to do the trick in my case, though I also added "Cache-Control: must-revalidate" as well. I certainly hope this makes it into the next point release, I hate to hack in core.

Chaos_Zeus’s picture

This is described as a patch, but it looks like a CVS Diff file. Is there a specific link to a file with the contents you have above patched into the bootstrap.inc file? or should this code be copied/pasted somewhere?. I am afraid my CVS experience amounts to failing to understand how to install it.

I am behind a proxy and my experience is that I can sometimes log into a page. about once an hour. but if I log out and log back in, I get looping login messages.

To me this spells a cache problem with the proxy, so I was interested in trying this fix.

Thanks for your help!

chx’s picture

It's one thing that you can see the edit form. It's an HTML page, nothing much more. You may install a Drupal, and rip the HTML page from there, and change the form action.

However, when you POST this data, you will try to access node/123/edit. If you look into Drupal code, you can see that there is a node_access('update', $node) check made for node/123/edit. You need to be logged in as someone who has privileges to edit this node. If you managed to get the session cookie for someone else, as the PHP manual says "even session cookies can be sniffed on a network or logged by a proxyserver", then well, I am afraid, there's hardly anything Drupal can do.

It is possible, to check for originating address and user agent, but if you are behind the same proxy, it is highly possible that not just your originating address but UA is the very same (by corporate standard).

I think that HTTPS (by changing the $base_url setting) is the only answer here.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

tatonca’s picture

...thanks for pointing out the node_access check.

"session cookies... ...logged by a proxy server"

That's the part that frightens me...

As an update to this issue, I am once again having problems - seems like the primary patch didn't work after all - I have yet to apply the second which seems like a better fix.

"check originating address and user agent"

It's funny - I was thinking about something like that, a little anyway..

What if pages with forms added a random extension to the end of the path, something that was stored in the DB with a timelimit -

e.g. /node/add/expirecode09543934584

Then even if someone could get the session ID - or God forbid - had to use PHPsession info in the url, there would be a hard limit on how long that form was "active" for...

This would allow for persistent logins for people that have drupal implemented that way, but mitigate the risk somewhat by shortening the exploit window from the time of the PHPSession expiry to the time of the form page expiry...

Just a thought...

I will be applying the other Cache-Control suggestion here in the meantime to see if I can get my primary problem taken care of.

Thanks

~Tat~

chx’s picture

My friend, only HTTPS will solve your problem. It's complete end-to-end security.

"session cookies... ...logged by a proxy server" That's the part that frightens me...

this is not our league, Drupal can not repair this problem.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

tatonca’s picture

... but when you can't fix it, the next best thing is to mitigate. It means to "reduce the likely hood", in this case, to reduce the likelyhood that someone can "exploit" (bad, bad word!) this potential issue.

So instead of forms being static in a proxy cache, just waiting to be used, they expire. It reduces the window of opportunity for a hacker to find and make use of something.

VPN, for example, is often implemented using Two Factor authentication, or biometric passwords in an effort to mitigate the risk created by users who write down thier passwords for others to find. We know that they do this, and we can't really stop them. But what we can do is give them something in addition to thier password that changes in way that is beyond their control to prevent (a token or key fob) or something that is absolutely unique to them that they don't have to remember (like thier finger print or eyeball) - it doesn't eliminate the risk, cause someone could steal their token, or sever thier finger (or gouge thier eye out like on Alias last night !!) but it does manage or mitigate the risk, by reducing the likelyhood of breach, to a level where we are comfortable to accept the risk.

Now - to put this in perspective - these are just blogs we are talking about. But there may be someone (I was just reading about someone using drupal for payroll... yikes) who may need this level of mitigation in thier environement because the risk is too great - like someone using drupal on an intranet where private employee info needs to be protected, and they are most likely using proxy firewall... then this may be an option if they feel the risk outways the cost of coding and implementing this.

Just to add, if the risk has extrinisc costs, like potential litigation or fines (in Canada we take our privacy very seriously these days ) then the $500 bucks or so to do something like this might make sense...

...course back to your original point - they could just set up an HTTPS server if they were really worried ;) But it was a good thought exercise...

arhak’s picture

subscribing (what happen with the proxy issue? since 2004!)