I'm putting Drupal onto a high traffic site and I'm concerned with the scalability of the anonymous sessions. I asked about disabling the sessions in #drupal on freenode and there was interest in the solution, but no one had done it before, so I'm bringing the question here.
A lot of the modules (both built in and 3rd party like CCK) use $_SESSION, but it looks like that is only for administration forms or functions that a user will have to be logged in to use anyway. includes/session.inc specifically mentions the throttle module and the "Who's Online" block - both of which I won't be using, so I don't see anything that jumps out at me as requiring an anonymous session and yet I don't see a way to disable it.
This is inspired from my forum topic on the same subject: http://drupal.org/node/183006
Comment | File | Size | Author |
---|---|---|---|
#59 | sessions.patch | 38.95 KB | moshe weitzman |
#55 | anon-sessions-14.patch | 52.12 KB | c960657 |
#53 | anon-sessions-13.patch | 33.34 KB | c960657 |
#50 | anon-sessions-12.patch | 32.63 KB | c960657 |
#44 | anon-sessions-11.patch | 50.08 KB | c960657 |
Comments
Comment #1
c960657 CreditAttribution: c960657 commentedA setting should probably have three different values:
The latter option will allow $_SESSION to work for anonymous users, but the session will only be written if a module actually uses this for anonymous users.
Comment #2
moshe weitzman CreditAttribution: moshe weitzman commentedI agree that supporting those 3 modes is a great idea.
I wrote #3 a while back but it was committed and then rolled back because we broke the 'who's online' and throttle modules. Throttle is out of core so we don't need to deal with that. We can just note incompatibity in the UI with Who's Online. See the first patch at #40545: Improve speed by avoiding unnecessary updates in sess_write()
Comment #3
c960657 CreditAttribution: c960657 commentedThis implements a variation of #3. It tries not to start a session and set a session cookie unless necessary. If a session cookie already exists, the session is started automatically like today. But if an anonymous user hits a cached page or a page that is saved to the cache, no session is started and no session cookie set.
If output buffering is not active (i.e. when caching is CACHE_DISABLED, or REQUEST_METHOD is POST etc.) the session is also started like today, because we cannot be sure whether something is written to $_SESSION after the HTTP headers have been sent.
The idea is not only to avoid writing empty anonymous sessions to the database but also to allow better caching in reverse (and forward) proxies using the Vary header as discussed in #147310: Implement better cache headers for reverse proxies (from comment #34 and forward). Users with no cookies will be able to get pages from the HTTP proxy without hitting the web server. This offers much better performance than any caching done inside PHP/Drupal.
The downside—apart from the added complexity to the bootstrap code—is that modules should call drupal_session_start() (new function added by the patch) prior to modifying $_SESSION.
What do you think of this approach?
Comment #4
moshe weitzman CreditAttribution: moshe weitzman commentedThanks c9. Could we do this without the need to call drupal_session_start()?
The issue that I meant to link to is #44947: Optimize session handling. See my patch in #16, which was later rolled back in that issue because of Throttle and Who's Online. Throttle is gone, and Who's online is barely significant, so I think we should revisit this. Specifically look at the line in sess_write() that reads
+if ($user->uid >= 1 || $value) {
.It is quite possible though that your patch is tuned to avoiding a session altogether whereas mine simply aimed to avoid a DB write into session table.
Also, the change below is a bit alarming IMO. Many modules rightfully use sessions for anon users. I'm unsure if we can afford to exclude these from page cache. It might be the proper thing to do, but still ....
Comment #5
c960657 CreditAttribution: c960657 commentedYes, my patch just uses the current optimization in _sess_write() that prevents sessions from being written unless there is a session cookie. I must admit that I have focused on getting rid of the session cookie and haven't looked much into _sess_write(), so the two patches may actually be combined. I can split the "get rid of session cookie" part into a separate issue, if you think.
Could you give some examples on how sessions are used in modules? I would assume that if a module puts something in $_SESSION for an anonymous user, then the page output of the following pages is not necessarily identical to that of a completely anonymous user with an empty $_SESSION and thus shouldn't be cached.
Comment #6
attiks CreditAttribution: attiks commentedsubscribing
Comment #7
Damien Tournoud CreditAttribution: Damien Tournoud commentedI'm sorry, but we already do what's described in point 3 of comment #1...
More over, on high traffic site, you should better use a remplacement session handling, like the one offered by Memcached Integration.
Comment #8
moshe weitzman CreditAttribution: moshe weitzman commented@Damien - Your excerpt requires that users have no cookie in order to avoid a write query. In practice, only web crawlers and (first time visitors && first page view) meet that requirement. This issue is about removing the empty cookie requirement.
Examples of anonymous session usage:
- Comment module allows anon users to store their name, email, and homepage in the session. thus the comment form is filled for them when they return. This info obviously only matters on the comment form.
- Ubercart almost certainly tracks the contents of your shopping cart for anonymous users. In most implementations, these contents are not reflected in the page rendering. I'll acknowledge that sometimes they are. But the problem with your patch is that we have a situation whereby Ubercart sites can rarely use the page cache.
Comment #9
c960657 CreditAttribution: c960657 commentedThe replacement of
count(drupal_set_message()) == 0
withempty($_SESSION)
is not essential to the patch and can be omitted without problems. It was just an attempt to generalize the check, i.e. check not only $_SESSION['messages'] but all of $_SESSION.How does Drupal know that a page displaying the contents of an anonymous user's shopping cart should not be saved to the cache and displayed to other anonymous users?
BTW, the comment module appears to be using cookies and client-side script instead of $_SESSION, i.e. the personal information is not contained in the HTML page. This allows the page containing the comment form to be cached.
Comment #10
moshe weitzman CreditAttribution: moshe weitzman commentedYou make a good point. Lets get some more reviews. This is looking reasonable to me ...
Comment #11
c960657 CreditAttribution: c960657 commentedUpdated patch - now aggressively destroy session and delete the session cookie for anonymous users with empty $_SESSION.
Initial experiments with this patch and some Vary headers show promising results :-)
Comment #12
moshe weitzman CreditAttribution: moshe weitzman commentedI like this patch a lot. Unfortunately, I am seeing a lot of test failures. Here are a few .. The committters will not review this until the tests pass.
Blog functionality 166 passes, 48 fails, 2 exceptions
Blog API functionality 52 passes, 1 fail, 0 exceptions
Book functionality 105 passes, 13 fails, 0 exceptions
Comment interface 182 passes, 9 fails, 0 exceptions
Comment on a node page 64 passes, 2 fails, 0 exceptions
Comment on a node page 128 passes, 83 fails, 6 exceptions
Comment approval 140 passes, 42 fails, 3 exceptions
Site-wide contact form 218 passes, 11 fails, 0 exceptions
Personal contact form 80 passes, 29 fails, 0 exceptions
Comment #13
c960657 CreditAttribution: c960657 commentedI guess I was a bit trigger happy there :-( This version passes all tests.
Comment #14
keith.smith CreditAttribution: keith.smith commentedMinor, but the next time there is a reroll, this comment:
has an extra "is" in it, I think.
Comment #15
moshe weitzman CreditAttribution: moshe weitzman commentedDoesn't apply anymore :(
Comment #16
c960657 CreditAttribution: c960657 commentedUpdated patch - now with some tests.
Still work in progress. We need to figure out some things:
- what about the "who's online" block?
- should the new behaviour be used by default when caching is enabled, or should it have a setting
Also _sess_write() may be optimized further.
Comment #17
c960657 CreditAttribution: c960657 commentedWRT the "who's online" block I propose the following:
Is that overdoing it?
Here are some less pervasive proposals:
Any comments on this?
Comment #18
moshe weitzman CreditAttribution: moshe weitzman commentedI like "Simply make the "Who's Online" block not display the anonymous user count when caching is enabled - and add a note about this on the "Who's Online" admin page.".
Comment #19
kbahey CreditAttribution: kbahey commentedSubscribe
Comment #20
catchI agree with Moshe in #18, although maybe put the message on the performance page (and only if the who's online block is enabled)?
Comment #21
mfbBy the way, I believe you could remove all other references to CURLOPT_HEADER in drupal_web_test_case.php if you're removing it from drupalHead().
Comment #22
sdboyer CreditAttribution: sdboyer commentedI like the direction here, and I'm glad I happened across this, because...I want to make sure there's no duplication - or worse, incompatible - work! #335411: Switch to Symfony2-based session handling. The need to protect the UID kicked off OO-ification of session handling.
There are two immediate advantages I can think of with grafting your work here into the OO-based approach to sessions:
- We could change the way $_SESSION gets accessed such that other modules/subsystems just make the calls to various session functions, and we mask all the figuring out of whether & how sessions should be enabled and handled within the class itself.
- The next logical step (I think) with 335411 is to make sessions pluggable, which would mean it'd be easy to clean the added weight out of bootstrap.inc and let it be handled cleanly by the subsystem.
I'm not meaning to end-run the work here - mostly just hoping to avoid either of us having to do a sticky reroll after a commit. You've probably already got a bit to do, now the #280934: Use httponly cookie support when available has gone in...
Comment #23
moshe weitzman CreditAttribution: moshe weitzman commented@c960657 - any chance you can updater this patch considering my comments in #18? I think we can develop independently from #335411: Switch to Symfony2-based session handling for now.
Comment #24
c960657 CreditAttribution: c960657 commentedNow the "Who's online" block does not show the number of guests when caching is enabled. I added a note about this on the block's settings page and on admin/settings/performance when the block is enabled.
The patch includes a rerolled (and slightly modified) version of the patch in #330582: Retrieve HTTP response headers that adds functionality to SimpleTest required by the session tests.
Requesting review in order to let the patch be tested by the testbot. However, #330582 needs to land prior to this issue.
Comment #25
Dries CreditAttribution: Dries commentedSome performance numbers (without proxy cache)? Are we talking about a 1-3% improvement, or 5% and more?
Either way, this patch will need to be rerolled because #330582: Retrieve HTTP response headers went in today.
Thanks.
Comment #26
c960657 CreditAttribution: c960657 commentedReroll.
I don't think the performance improvement without a proxy cache will be noticable unless you have a busy site with many anonymous users causing congestion in the users table.
To measure the performance impact you need a tool that can emulate a number of concurrent users sessions with individual cookie jars. I don't know of such a tool (Siege was touched briefly in #147310: Implement better cache headers for reverse proxies comments #39 and #40).
Comment #27
mfbpassing in an anonymous session cookie to apache bench -- as in
ab -c 10 -n 1000 -C SESS8f59e747e05a0b0ee1e79ba058dceea6=fc1175003fb154d8871a5bed93e557e5 http://localhost/d7/
-- gives me 65 requests per second. Without the session cookie I can do 81 requests per second. This is with drupal page cache set to "normal", no APC. So, it's a large performance improvement by this measure. It won't help with requests that already don't have a session cookie (first-time hits) or that do need a session cookie for anonymous users (e.g. shopping cart).Put varnish in front and it would be an even larger performance improvement.
minor suggestion, it might be cleaner to use session_get_cookie_params() to get the session cookie params in _sess_destroy_sid(), aside from lifetime?
Comment #28
mfbI tried to do some benchmarking of the patch itself in a situation where its optimization doesn't come into play, by adding some random data to an anonymous session and passing this session cookie into ab. Without intending to I hit the potential drawback discussed above: the page cache will no longer be used if the anonymous user has anything stored in the session.
I actually think this is ok; on a lot of sites with some kind of fancy interactivity I've had to disable the drupal page cache altogether to avoid issues. Example scenario with this patch: an anonymous user could be allowed to modify the filter at http://drupal.org/project/Installation+profiles and thereby start a session, and the page automatically wouldn't get cached. On the other hand, the user's page loads would then slow down as long as they kept the filter setting and session cookie, as the page cache wouldn't be used for anything else either.
Ideally developers could indicate if some particular session data should disable the page cache, or even what in particular could no longer use the cache.
Performance wise I think this patch would be a net improvement for many sites, assuming, for example, that most anonymous users browse an e-commerce site without adding items to their cart, i.e. relatively few anonymous users would be maintaining session data and get hit with the no-page-cache penalty. But as mentioned, this particular part of the patch is not actually required..
Comment #29
mfbI had a minor suggestion but didn't intend to change status.
Comment #30
c960657 CreditAttribution: c960657 commentedIn order alleviate this, we could decrease the session lifetime for anonymous users. An anonymous user probably wouldn't expect that he could put something in a shopping cart on one day and still find it there the next week. Or make a function that sets session variables with a lifetime, e.g.
drupal_session_variable_set($name, $value, $ttl)
. Or require modules to explicitly specify their cache dependencies in each pageview (or in hook_menu()) as discussed in #339958: Cached pages returned in wrong language when browser language used.However, whether to cache pages where $_SESSSION is non-empty is not crucial to the patch. The logic was just changed due to comments #9 and #10. Perhaps we should address this in a separate issue.
Comment #31
Dries CreditAttribution: Dries commentedmfb, thanks for benchmarking. Those numbers turn me on. ;-)
Could we add another sentence to this, please? Let's explain why we don't cache those requests so that people can better understand the code. If not in the phpDoc of the function declaration, maybe in the inline code comments from the function.
$bootstrap
seems like a weird name for the paragemer to drupal_get_cache(). Would $retrieve be better?Personally, I think the explicit calls to drupal_start_session() reduce the developer experience. It is yet another thing to be aware off. Can we make these more automatic? For example, I think it would be _easier_ to have a rule that says you can't write to $_SESSION directly, but that you have to use drupal_set_session() which will then start the session.
Thoughts?
Comment #32
moshe weitzman CreditAttribution: moshe weitzman commentedI agree that a drupal_set_session() is an improvement.
Comment #33
sdboyer CreditAttribution: sdboyer commentedI'd second Moshe's agreement that drupal_set_session() as a gateway to writing to $_SESSION is the preferable route. Actually, it's the only feasible thing I've come up with that really allows for lazy-instanciation of sessions since Crell suggested it be explored - #335411-17: Switch to Symfony2-based session handling. It ended up being a little knotty, so I'm looking forward to borrowing from this patch :)
Comment #34
Dries CreditAttribution: Dries commentedAs we're all on the same page, let's mark this CNW and explore a drupal_set_session() some more.
Comment #35
c960657 CreditAttribution: c960657 commentedThis patch adds
drupal_set_session($name, $value)
. This involves changes to a lot of modules, including upgrade functions that probably aren't covered by tests, so this needs manual testing. I have replaced all occurences (I hope) of$_SESSION['foo'] = $bah
, even where I know that there is a session, so that core always does the same thing the same way. All other access, including$_SESSION['foo']['bar'] = $bah
andunset($_SESSION['foo'])
is left unchanged (but with the necessary isset() checks added).I am not familiar with the coding pattern that is reused in dblog.admin.inc, node.admin.inc and user.admin.inc, so I don't know whether it is necessary to call
isset($_SESSION['node_overview_filter'])
everywhere.Do you think I should revert the change discussed in comments #9 and #10 and leave the discussion about when to cache a page to another issue?
When caching is not enabled, the session is always started during the bootstrap process. This is to ensure that the session is started before any output is sent. (It would be nice, if PHP offered a callback that was invoked when the headers were sent). I wonder whether we can postpone it even more. Most pages don't output anything until the print statement in index.php, but some do. Perhaps modules can return information about cacheability and session handling in the menu hook. Or perhaps this is a bigger project that should be dealt with in another issue.
Comment #36
mfbI don't have a problem with it, but if the change in logic for page caching is debatable at all then we should just consider it out-of-scope for this issue.
One thing I noticed about drupal_set_session() is that in many cases, there's nothing wrong with setting $_SESSION directly because it's a POST request, so it can be expected that the session already started.
Comment #37
alexanderpas CreditAttribution: alexanderpas commentedas we now have a
drupal_set_session()
we should also have adrupal_get_session()
Comment #38
c960657 CreditAttribution: c960657 commentedI have reverted it for now.
I actually used drupal_set_session() everywhere intentionally. I think it reduces general confusion, if the same pattern is used everywhere. Alternatively it may be necessary to prepend every occurrence of
$_SESSION['x'] = 1
with a comment explaining why we don't need to use drupal_set_session() in that specific case - and that doesn't make the code easier to read IMO.I added this for completeness, but I didn't use it anywhere. If you require all access to $_SESSION to go through drupal_get_session()/drupal_set_session(), you loose the ability to do stuff like
$_SESSION['x'][] = 1
, or$_SESSION['x']['y']['z'] = 1
.Comment #39
c960657 CreditAttribution: c960657 commentedReroll.
Comment #41
c960657 CreditAttribution: c960657 commentedReroll.
Comment #42
David Strausssubscribe
Comment #43
Dries CreditAttribution: Dries commentedLooks good. There is still a start_session() in update.php that _seems_ unnecessary.
I personally like "if (drupal_session_is_started())" better than "if (drupal_session_start(FALSE))".
This is one of my favorite patches at the moment. :)
Comment #44
c960657 CreditAttribution: c960657 commentedI think you are right about the drupal_start_session() in update.php. I removed it, and added a new function, drupal_session_is_started(), as suggested.
Comment #46
andypost_get & _set covers all needs
Example
Maybe it's good idea to put this in one with static caching
Comment #47
chx CreditAttribution: chx commentedI severely dislike session get/set functions. Can we merge / borrow ideas from #335411: Switch to Symfony2-based session handling instead?
Comment #48
Dries CreditAttribution: Dries commentedComment #49
moshe weitzman CreditAttribution: moshe weitzman commentedNice tag. I will try to review this. ... We allow spaces in tags, FYI.
Comment #50
c960657 CreditAttribution: c960657 commentedReroll.
This patch reverts the introduction of drupal_get_session()/drupal_set_session() and restores the explicit calls to drupal_session_start(). This leaves the decision of how to start the session implicitly to #335411: Switch to Symfony2-based session handling and allows this issue to proceed on its own. The few drupal_session_start() calls are easy to find and remove.
Comment #51
Dries CreditAttribution: Dries commentedI favor #44.
Let's wait for Moshe's review.
Comment #53
c960657 CreditAttribution: c960657 commentedReroll due to #8: Let users cancel their accounts. When a user account is deleted, the user session is terminated in the middle of a batch process. This breaks the batch progress and triggers a warning. The fix for this has the nice side-effect that you get a pretty error message when accessing a /batch?id=1 with a reference to an expired batch.
Comment #54
moshe weitzman CreditAttribution: moshe weitzman commentedI tested #53 thoroughly and it works as advertised. Code looks good as well. Some minor comment changes are listed below. I made sure the Session tests pass but we need a full green test run before committing.
So, the remaining question is about drupal_set_session(). Both Dries and I like it, but chx does not. chx and I are a wash, but I Dries is a pretty important vote. I recommend re-introducing that. Sorry - I know it is a big hassle.
If we decide to commit this without drupal_set_session(), I recommend adding the clause below to drupal_page_footer(). It is a warning to developers who used $_SESSION without calling drupal_start_session().
+ * $fetch is TRUE, only return either TRUE or FALSE.
. Should be $retrieve not $fetch+ * users when $_SESSION is non-empty. $_SESSSION may contain status messages
. Typo SESSSION+ // hook_init() is called latter in the bootstrap process, but not in cached
. Typo latter+ } else {
. Code style. Use 2 lines.Comment #55
c960657 CreditAttribution: c960657 commentedUpdated patch, now with drupal_set_session().
I added the suggested watchdog warning and an accompanying test.
Comment #56
moshe weitzman CreditAttribution: moshe weitzman commentedCould you add back drupal_get_session() as well? Thats useful if you want to fiddle with the array like your example
you loose the ability to do stuff like $_SESSION['x'][] = 1
. But then we have the problem of how to write the array back with drupal_set_session(). I think drupal_set_session() should take an optional 3rd param called $override which overwrites the existing $_SESSION. Like it? If not, just omit that suggestion since the calling code can certainly overwrite $_SESSION itself.Comment #57
c960657 CreditAttribution: c960657 commentedI'm not sure I understand your suggestion about $override. Could you give an example?
Comment #58
moshe weitzman CreditAttribution: moshe weitzman commentedIgnore that part of my request. It realize that it is not needed.
Comment #59
moshe weitzman CreditAttribution: moshe weitzman commentedAdded back in drupal_get_session() from c906557's earlier patch. RTBC now.
Comment #60
Dries CreditAttribution: Dries commentedThis looks good, although it seems like some of the code comments could be cleaned up a little bit. I'm running all tests now.
I'm also trying to remember why we don't always enable output buffering -- even in the case when caching is disabled. If output buffering where always enabled, we'd be able to streamline some code here.
Comment #61
Dries CreditAttribution: Dries commentedI've committed the current patch to CVS HEAD. Thanks. I'm going to mark this as 'code needs work' so we can talk about #60 a bit more -- or until we have created a new issue for #60.
Also, please update the upgrade instructions before settings this to 'fixed'.
Thanks all!
Comment #62
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedFrom the commit message:
User: dries Branch: HEAD Date: Mon, 19 Jan 2009 10:46:52 +0000
Added files:
/modules/simpletest/tests session.test.orig
This doesn't look right.
Comment #64
moshe weitzman CreditAttribution: moshe weitzman commentedI added upgrade docs - http://drupal.org/node/224333
Comment #65
c960657 CreditAttribution: c960657 commentedIt may be a performance optimization. Using output compression increases memory usage, because all output is stored in memory before being sent to the client. A Google search indicates that it isn't expensive CPU-wise, though. An often cited issue with output buffering is that big, slow pages aren't sent to the client until the whole page has been generated. However, this issue isn't relevant for Drupal, because here the whole output is usually built as a string that is printed in index.php. So it may not be a bad idea to always enable output buffering.
Comment #66
mfbOn my servers I have output buffering enabled via
output_buffering = 4096
in php.ini and never experienced any problems. I also did some benchmarking of this setting and with Drupal I didn't find much change in requests-per-second or memory usage, whereas some other PHP apps did have a small performance improvement.Comment #67
Dave ReidComment #68
aryanto CreditAttribution: aryanto commentedIs there any plan to backport this patch to 6.11? Or is there anybody using similar patch like this one in 6.11?
Comment #69
David StraussI'm interested in back-porting this to D6, but I wouldn't expect it to go into D6 proper.
Comment #70
aryanto CreditAttribution: aryanto commentedWow! It is great to know that somebody interested to backport this patch. But if that would not go to the main core, then the maintenance will be hard as in the worst case we have to manually patch Drupal 6 everytime we do the upgrade.
I know that this is not a place for discussion, but is there anyway we can override the cookie handling via a module?
I will open a topic in the forum to continue the discussion.
Comment #71
David Strauss@aryanto No, I'd roll it into Pressflow. It would be maintained by the Four Kitchens team.
Comment #72
Dries CreditAttribution: Dries commentedUpdating version number.
Comment #73
David StraussFor anyone interested, I am maintaining a backport of this to Drupal 5 and Drupal 6.
Comment #74
Damien Tournoud CreditAttribution: Damien Tournoud commentedFor the record, I haven't followed that issue too closely, but I think that the explicit drupal_set_session() is horrible DX-wise. I don't get why we can't simply automatically start the session just before outputting the content of the page (either by using output buffering or not), as soon as $_SESSION is not empty?
I'll open a separate issue to study this.
Comment #75
Carl Johan CreditAttribution: Carl Johan commentedI'm interested! I'd like to see if this is something we could use.
Comment #76
kbahey CreditAttribution: kbahey commentedFor Drupal 6, you can use this module http://drupal.org/project/no_anon
Comment #77
gpk CreditAttribution: gpk commentedFollow-up at #590092: Who's online block should never report anonymous users.
(And for the record the issue mentioned at #74 is #477944: Fix and streamline page cache and session handling.)
Comment #78
gpk CreditAttribution: gpk commentedAnother follow-up at #592482: Regression: data stored in $_SESSION in hook_boot() or hook_exit() during a cached page response is lost.
Comment #79
BenK CreditAttribution: BenK commentedJust need to track this issue....
Comment #80
awm CreditAttribution: awm commentedIs this still in progress?
Comment #81
gpk CreditAttribution: gpk commented@80: the original issue was dealt with in Jan 2009 (patch committed). It actually resulted in all anonymous sessions being disabled in Drupal 7.x, unless a session is specifically needed e.g. by a form POST.
Comment #82
samy.ranavela CreditAttribution: samy.ranavela commented53: anon-sessions-13.patch queued for re-testing.