Attached is a patch from David Strauss to add reverse proxy support to drupal.org. I have not reviewed this patch yet, but it should be reviewed and tested for possible merging into our webroot. Reverse proxy support would be extremely useful for performance and mitigation of large spikes in traffic.

CommentFileSizeAuthor
my_unidiff.patch35.61 KBnnewton
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Amazon’s picture

http://en.wikipedia.org/wiki/Reverse_proxy

What does this do for us? We have two load balancers and distribution among the webservers already. I am sure it's useful, but could you be more specific.

ieran

nnewton’s picture

Right, sorry, used too much lingo there. This would add support for Reverse *Caching* Proxies such as Squid and Varnish. These not only distribute requests to the backend, but cache the returned results for a specified time period. This patch would build in support for sane caching headers to support such reverse proxies. It is the caching we are interested in and not the actual front-end to back-end distribution. We currently run with Squid as a reverse caching proxy, for static files only. This would expand that to anonymous page views. (or that is my understanding based on other such patches, I have not reviewed this one yet, only the previous incarnation)

dww’s picture

Status: Active » Needs work

Dare I say it, but can we get this into D7 HEAD first, along with all the reviews and possible improvements that would entail, and then backport that onto d.o? I'd hate to see such a huge patch on d.o that ended up not getting into D7, and then having to keep a similarly large patch for the next 2+ years while d.o runs D7... Plus, the act of getting this into HEAD is going to almost certainly involve making this a better patch. At the very least, it's going to get a lot more reviews and attention as a core issue than here as an infra issue. Maybe. ;)

The flip side is that it'll (hopefully) be easier to justify committing this to HEAD if it's already running on d.o. However, I think if the original post of the core issue makes it clear this is a patch we're intending to run on d.o, that'll be just about as good in terms of motivating it.

Quick skim of the patch turned up a few questions/issues:

A) I see you changed cache.inc. Seems like we're going to need a corresponding change for memcache's cache.inc for this to work on d.o, right?

B) It seems weird that drupal_get_session() on an undefined variable returns FALSE not NULL. Makes it harder to distinguish undefined from defined as FALSE.

C) Since the patch contains no context for what functions each hunk belongs to, it's hard to tell what the hunk in openid.module is for. A comment there would help.

D) Direct references to "Pressflow" in the comments should probably be removed. ;)

E) A fair number of the function comments don't start with 1 line summaries.

F) "If the client send a session cookie" s/send/sent/

G) It's not immediately obvious why inside drupal_set_message() we have to test for and conditionally use drupal_set_session(). When would drupal_set_message() ever be called when drupal_set_session() doesn't exist? If there are cases where drupal_set_session() might not exist, that could use a comment in the code explaining it. If drupal_set_session() always exists, we should just use it.

H) The interaction of drupal_session_start() and drupal_session_is_started() seems slightly clumsy. Not sure how to make it any better, but something about it doesn't sit well.

I) "Sun, 19 Nov 1978 05:00:00 GMT" has special significance for the Drupal project. ;) Is there some technical reason you changed it to "Sun, 11 Mar 1984 12:00:00 GMT", or is that someone else's birthday?

J) All contribs running on d.o need to use drupal_set_session() instead of directly touching $_SESSION, right? That's sort of a pain. I guess if we already assume a logged in user, we can assume drupal_session_start() was already called and can touch $_SESSION directly. So, we just have to audit every occurrence of $_SESSION on d.o that could be hit via an anonymous user, right? Still ugh...

K) It's not clear why the API and default behavior of page_get_cache() changes with this patch. The comment explains the new API, and mentions "Default of TRUE added to maximize Drupal 5 compatibility", but this is D6 we're patching, not D5. We should match D6 more closely, not D5.

There's probably a lot more -- I'm not doing a super-careful review here, but just some initial thoughts on reading the patch. Haven't tried applying it, and certainly haven't tried testing any of this.

Damien Tournoud’s picture

Naranyan should have mention that this is mostly a back-port of the work that was done for Drupal 7, especially #147310: Implement better cache headers for reverse proxies and #201122: Drupal should support disabling anonymous sessions.

Some more improvements should be discussed and go in first in D7 (for example, as far as I understand this right, the LOGGED_IN cookie is really not necessary, so are CACHE: hit and CACHE: miss ones). For D6, the main issue that remains is that this patch requires modifications in contrib modules that are using $_SESSION to store data for anonymous users, because this data would be lost in drupal_set_session() is not used. I worked on an automatic, lazy-starting of the session, but this will need to be discussed in D7 first, too.

killes@www.drop.org’s picture

While I agree that first getting this into D7 would make sure it can be run without on d.o without an upgrading nightmare, this is in no way certain, ie the code might be broken in suble ways. Historically, d.o has been used to test patches for inclusion into the next Drupal version and I have no objections for reviving this. I'd totally would like to cache node objects fwiw.

However, Derek has addressed multiple issues and we should investigate them before deploying this.

nnewton’s picture

This was originally posted for review after a short discussion in #drupal-infrastructure and my assumption was that this was somewhat similar to the reverse proxy patches I've seen before. However, this patch is much larger and more intrusive than the previous patches I've seen.

David would you mind commenting on what this patch provides us with that the older patch doesn't? I can see some areas where this would be more technically correct, but would like your view on that and dww's issues.

David Strauss’s picture

Most of this patch involves the back-port of lazy session creation for anonymous users. The reverse proxy caching support leverages this functionality by sending "Vary: Cookie" headers. So, instead of the common hack of adding a special "logged in" cookie and configuring non-standard cache behavior, we can rely on the lack of session cookies (and other cookies) to indicate that pages are cacheable.

This is a nearly unchanged backport of the D7 functionality. I had to change a handful of things, but those were the result of other D7 -> D6 differences.

dww’s picture

Thanks for the clarifications. I haven't been following D7 development particularly closely, and had no idea about these other issues. That said, comparing this patch to http://drupal.org/files/issues/sessions_1.patch which is what Dries committed to HEAD:

(A) Is still a concern.

(B) is indeed "broken" in D7 HEAD. I'll open a new issue about that, I guess.

(C) Still a concern: the hunk for openid.module in this patch doesn't look much like the one in HEAD.

(D) Still a concern.

(E) Partly "broken" in HEAD, partly made worse in this patch.

(F) Still a concern: HEAD doesn't have that broken comment.

(G) Still a concern: HEAD doesn't conditionally invoke drupal_set_session() inside drupal_set_message().

(H) Just as clumsy in HEAD, not really worth opening a new issue for that.

(I) Still a "concern": HEAD uses the older date. ;)

(J) Still a (big) concern, but just as bad in HEAD. Damien protests about this at #201122-74: Drupal should support disabling anonymous sessions and suggests there might be a better approach that doesn't hurt DX as badly. Said he was going to open a new issue, but a quick search doesn't reveal an issue nid. @Damien, any news?

(K) Still a concern: The D7 patch doesn't do this weird thing with the API for "D5 compatibility".

...

David Strauss’s picture

@dww

(A) Yes. Headers are managed (and must be managed) in an array in this patch, and it was easier to provide a backwards-compatible D6 API and fundamentally change internal header handling to use arrays.

(B) Agreed. I was just staying true to D7's implementation.

(C) I'll look into this.

(D) and (I) This patch was pulled directly from the Pressflow repository, specifically the reverse-proxy-support branch. That's why the comments make references to Pressflow, and that's why the "expires" header is my birthday. I have to re-brand Pressflow for trademark reasons, though I try to keep those changes in a separate rebranding branch when possible. These changes aren't intended for Drupal or Drupal.org use, and I can ensure they're safely relegated to the rebranding branch before merging this patch into the Drupal.org core branch I also maintain.

(E) and (F) I don't think cosmetic and API indexing issues should be a concern for this proposal, which is primarily patching Drupal.org. These ought to be raised directly against D7.

(G) The installer in D6 sets messages without fully bootstrapping. If there's a more elegant solution, I'm interested.

(H) DamZ has taken the lead in improving the DX for this. Most of the current DX-friendly proposals seem a little hack-ish to me.

(J) If we come up with a more transparent solution, I'd be all for it.

(K) This is merely a leftover from the initial D5 backport I did of this. Agreed that it should be removed.

dww’s picture

Re: (A) I'm not doubting the change. Managing headers internally as an array makes a lot of sense. I'm just wondering where's the corresponding patch for d.o's copy of memcache, which we're also going to need before we can deploy this.

Re: (G) ahh, i see. A comment to that effect would be helpful. However, why doesn't D7 core have to do that?

David Strauss’s picture

@dww Regarding (G), D7 core might really need to do that. I haven't checked. The message only gets set in D6 if you don't have settings.php in place with proper permissions.

David Strauss’s picture

I've updated the Bazaar branch to make the patch compatible with memcache's standard cache.inc.

nnewton’s picture

After the massive search-bot hit on saturday that caused d.o to be unstable/slow, we tested and deployed David's latest patch on d6.drupal.org. It may not be perfect, but the ability to serve out of the squid cache for these bots would save us in situations like we just had.

Please test d6.drupal.org and thanks for the code review dww!

Note: for those wanting to look at the code deployed, its in d.o svn.

David Strauss’s picture

The Bazaar branch has been updated to include Damien's refinements. I'm updating trunk and branches/drupal6 in Subversion and deploying to d6.drupal.org.

David Strauss’s picture

Status: Needs work » Needs review
David Strauss’s picture

Status: Needs review » Active

While I was investigating the reverse proxy changes necessary for production, I noticed that some security updates were only partially deployed to drupal.org. To avoid extracting and manually applying the security changes, I simply updated d.o, including the latest version of the reverse proxy changes.

Most real browsers won't encounter the proxy cache after the first page load because Google Analytics sets cookies we don't have Squid ignoring yet. The cache should help with spiders, though.

markus_petrux’s picture

Would it be possible to manage this patch as if it was a community managed project?

This would make possible for others to benefit from it, as well as to keep contributing back based on experience, fact that could be easily shared.

David Strauss’s picture

@markus_petrux

Patches are publicly hosted on the Four Kitchens Bazaar server. This "patch" is actually managed as a fork of Drupal 6 core; our CVS tools on Drupal.org are completely inadequate for this purpose (lots of branching and merging). Using Bazaar, you can branch, make changes, commit, and submit your changes back to folks on the infrastructure team. You don't need our permission or any special access to contribute.

David Strauss’s picture

@markus_petrux

If you'd like to work this way, download and install Bazaar. Then run bzr branch bzr://vcs.pressflow.org/pressflow/6-patches/reverse-proxy-support. From that copy, you can commit, merge from upstream, and bzr send -o mychanges.patch to create a portable copy of your work for us to merge back in.

Damien Tournoud’s picture

Status: Active » Fixed

Yay! Marking as fixed.

markus_petrux’s picture

@David: Thanks for the hints. I'll take a look and if we can enhance anything, we'll post back, maybe at g.d.o.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

tomlobato’s picture

I got some errors when patching drupal 6.19 with this file. is there a 6.19 compatible version?

m.sant’s picture

Subscribing

jthomasbailey’s picture

Subscribing

geekgirlweb’s picture

+1 subscribing

jguffey’s picture

Sub, and bump. Can we get this updated for 6.19?

adanelova’s picture

for 6.19 or 6.20?

msti’s picture

where can u find a 6.19 patch?

_-.’s picture

Subscribing

velocis’s picture

Title: Reverse Proxy Patch » Reverse Proxy Patch - 6.20
Issue tags: +Drupal, +varnish, +varnish cache, +6.20

Just tried to apply this patch on a 6.20 version of Drupal with the following results (see below)

The failure to patch default.settings.php is expected due to the fact I am patching my core files which are shared by few drupal installs and thus the sites/default directory is kept seperately in each site.

Any tips or explanations regarding the following would be helpful
patching file includes/bootstrap.inc
Hunk #1 FAILED at 1.
Hunk #3 FAILED at 723.
2 out of 7 hunks FAILED -- saving rejects to file includes/bootstrap.inc.rej

patching file includes/common.inc
Hunk #4 FAILED at 2630.
1 out of 4 hunks FAILED -- saving rejects to file includes/common.inc.rej

Patching output below::

patching file .htaccess
Hunk #1 FAILED at 66.
1 out of 1 hunk FAILED -- saving rejects to file .htaccess.rej
patching file includes/batch.inc
Hunk #1 succeeded at 339 (offset 1 line).
patching file includes/bootstrap.inc
Hunk #1 FAILED at 1.
Hunk #2 succeeded at 641 (offset 96 lines).
Hunk #3 FAILED at 723.
Hunk #4 succeeded at 1088 (offset 31 lines).
Hunk #5 succeeded at 1282 (offset 96 lines).
Hunk #6 succeeded at 1264 (offset 35 lines).
Hunk #7 succeeded at 1365 (offset 96 lines).
2 out of 7 hunks FAILED -- saving rejects to file includes/bootstrap.inc.rej
patching file includes/cache.inc
patching file includes/common.inc
Hunk #1 succeeded at 132 (offset 7 lines).
Hunk #2 succeeded at 324 (offset 10 lines).
Hunk #3 succeeded at 1604 (offset 41 lines).
Hunk #4 FAILED at 2630.
1 out of 4 hunks FAILED -- saving rejects to file includes/common.inc.rej
patching file includes/form.inc
Hunk #1 succeeded at 2395 (offset 19 lines).
patching file includes/session.inc
Hunk #1 succeeded at 58 (offset 2 lines).
Hunk #3 succeeded at 194 (offset 2 lines).
patching file modules/comment/comment.module
Hunk #1 succeeded at 1672 (offset 4 lines).
patching file modules/cookie_cache_bypass/cookie_cache_bypass.info
patching file modules/cookie_cache_bypass/cookie_cache_bypass.module
patching file modules/dblog/dblog.admin.inc
patching file modules/node/node.admin.inc
Hunk #1 succeeded at 318 (offset -5 lines).
patching file modules/openid/openid.module
Hunk #1 succeeded at 168 (offset 6 lines).
patching file modules/user/user.admin.inc
patching file modules/user/user.module
Hunk #1 succeeded at 1405 (offset 33 lines).
patching file modules/user/user.pages.inc
Hunk #1 succeeded at 148 with fuzz 1 (offset 2 lines).
can't find file to patch at input line 943
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|
|=== modified file 'sites/default/default.settings.php'
|--- sites/default/default.settings.php 2008-12-06 09:02:33 +0000
|+++ sites/default/default.settings.php 2009-01-29 17:10:33 +0000
--------------------------
File to patch:
Skip this patch? [y] y
Skipping patch.
1 out of 1 hunk ignored
patching file update.php
Hunk #1 succeeded at 642 (offset 25 lines).

I will now try and go through the rejected patches and see if I can massage the changes in

velocis’s picture

I manually patched the rejected code on bootstrap.inc, common.inc and .htaccess file.

The site appears to be running well now (now when you log out admin, it does not cache the devel toolbar when you re-view the page.)

I am however still getting the following errors. Any help in this final bit would be much appreciated and then I will upload the patch file for the 6.20

[Tue Feb 15 12:45:42 2011] [error] [client 127.0.0.1] PHP Notice:  unserialize() [<a href='function.unserialize'>function.unserialize</a>]: Error at offset 0 of 38 bytes in /home/web/home/home.drupal.core/varnish.drupal.6.20.patched/includes/cache.inc on line 29

[Tue Feb 15 12:45:42 2011] [error] [client 127.0.0.1] PHP Warning:  Invalid argument supplied for foreach() in /home/web/home/home.drupal.core/varnish.drupal.6.20.patched/includes/bootstrap.inc on line 792

[Tue Feb 15 12:45:42 2011] [error] [client 127.0.0.1] PHP Notice:  Use of undefined constant CACHE_NONE - assumed 'CACHE_NONE' in /home/web/home/home.drupal.core/varnish.drupal.6.20.patched/includes/bootstrap.inc on line 809

[Tue Feb 15 12:45:42 2011] [error] [client 127.0.0.1] PHP Warning:  Invalid argument supplied for foreach() in /home/web/home/home.drupal.core/varnish.drupal.6.20.patched/includes/bootstrap.inc on line 829
velocis’s picture

Clearing the cache fixed any of the unserialise and invalid argument issues.

The only error (NOTIVE) I have left is the following, what should CACHE_NONE equal ???

[Tue Feb 15 13:00:14 2011] [error] [client 127.0.0.1] PHP Notice:  Use of undefined constant CACHE_NONE - assumed 'CACHE_NONE' in /home/web/home/home.drupal.core/varnish.drupal.6.20.patched/includes/bootstrap.inc on line 809
killes@www.drop.org’s picture

Issue tags: -Drupal, -varnish, -varnish cache, -6.20

Please stop spamming the infra queue.

The patch has been applied tothe pressflow Drupal distribution, I suggest you download that.

Component: Webserver » Servers