Applying the memcached patches to store Drupal's caches into memcache, moving sessions into memcache, and enabling aggressive caching, it nearly becomes possible to serve anonymous cached pages directly from ram without hitting the database. Unfortunately the bootstrap process prevents this as DRUPAL_BOOTSTRAP_SESSION and DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE come after DRUPAL_BOOTSTRAP_DATABASE and DRUPAL_BOOTSTRAP_ACCESS.
To reach this end, the attached patch, which was sponsored by PageSix.com, simply re-orders the bootstrap process. We shuffle things from the current order:
DRUPAL_BOOTSTRAP_CONFIGURATION
DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE
DRUPAL_BOOTSTRAP_DATABASE
DRUPAL_BOOTSTRAP_ACCESS
DRUPAL_BOOTSTRAP_SESSION
DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE
To:
DRUPAL_BOOTSTRAP_CONFIGURATION
DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE
DRUPAL_BOOTSTRAP_SESSION
DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE
DRUPAL_BOOTSTRAP_DATABASE
DRUPAL_BOOTSTRAP_ACCESS
(This patch requires that aggressive caching is enabled, as _init hooks frequently involve a database query..)
Finally, there is then one piece of ugliness that we have to deal with when the variables array is flushed from the cache, where we manually force a database bootstrap out of order. The end result is the attached rather small patch, and a website that can serve anonymous cached pages directly from memcached, particularly significant for a high traffic infrastructure with many webservers.
Some simple benchmarks I ran showed a very significant performance boost.
--
Though this is accomplished with a small patch, it's not something that could be merged into core as is. Toward that aim, perhaps the database bootstrap could become a conditional event that is automatically triggered on the first db_query call?
Perhaps there are better ways to accomplish this end goal? I'm very interested in seeing some discussion both on any potential pitfalls with this existing patch, as well as ways to generically improve the bootstrap process to allow pages to be served without bootstrapping the database.
Comment | File | Size | Author |
---|---|---|---|
#1 | bootstrap.patch | 2.78 KB | Jeremy |
Comments
Comment #1
Jeremy CreditAttribution: Jeremy commentedIt will likely be easier to review the patch if it's attached...
Comment #2
chx CreditAttribution: chx commentedWhy are you not using DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE ? Maybe with a cookie which indicates whether the user is expected to be anonymous?
Comment #3
firebus CreditAttribution: firebus commentedthis sounds promising, but i think you want it in the memcache project, and not the advcache project...
many people use them both together, but advcache helps a lot even without memcache.
i'm concerned about the requirement to use aggressive caching - could we add a statement to check if aggressive caching is enabled and then reorder the phases conditionally?
Comment #4
Jeremy CreditAttribution: Jeremy commentedchx: flexibility. I'd like to see the Drupal 7 bootstrap process be more flexible and aware of the various alternative methods for storing data -- the current hard coded bootstrap order assumes (enforces) that sessions and variables are in the database. This is no longer the only way things are done... With the above patch, you're able to get page_cache_fastpath performance while utilizing standard Drupal logic to determine the logged in status of users, etc.
firebus: This flow change should be beneficial to other projects beyond memcache -- anyone that has implemented session and variable caching outside of the database (hence why it is in the advanced cache project, where I assume more people that are interested in this area of the code are watching). And yes, the patch needs to be more intelligent to not impose requirements ie aggressive caching, but currently it's more a proof of concept around which I was hoping to see a little discussion.
Comment #5
Jeremy CreditAttribution: Jeremy commentedUpdating title to reflect that this concept should work for other caching methods beyond memcached.
Comment #6
firebus CreditAttribution: firebus commentedadvcache, despite the name, is not "a module for anything advanced about caching", but rather a module for caching things that core drupal doesn't cache, using standard core caching code.
thus we have extra caching for nodes, comments, searches, forums objects, block objects etc.
serving cached pages without hitting the db isn't interesting to the advcache module. boost, memcache, and perhaps fastpath_fscache would benefit from this change.
it might be an interesting change to make to core.
Comment #7
Jeremy CreditAttribution: Jeremy commentedYes, I would very much like to rework this into something that could be merged into core in Drupal 7. My current focus is on improving the performance of existing caches on high-traffic websites, not just adding new caches.
One example that gains from this patch is another that I've posted at http://drupal.org/node/230290, removing page cache flushing and allowing cached pages to be served to all but the first request for an expired cache which will rebuild it. If this is all not relevant to the advcache project, where would you suggest these efforts are best focused?
Comment #8
firebus CreditAttribution: firebus commentedfor this particular issue, i think either the memcache project, or drupal core would be a better place to discuss it.
memcache project probably has a higher likelihood of accepting the patch quickly, and also making it available to more people running 5 and 6, providing a proof of concept for 7, etc.
perhaps robertDouglass, who co-maintains both memcache and advcache, has an opinion...
let me comment on the second issue over there...
Comment #9
robertDouglass CreditAttribution: robertDouglass commented@Jeremy: glad to see your face in this neck of the woods =)
I too would like to see the D5/6 solutions (advcache/memcache) focus on a DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE strategy for sidestepping db interaction in sessions. However, I can't claim to have researched the topic properly. Your work on this is greatly appreciated and the suggested approach here will be studied appropriately.
Comment #10
m3avrck CreditAttribution: m3avrck commentedsubscribing...
Comment #11
slantview CreditAttribution: slantview commentedI am very interested in this as well. Part of the problem with session handling in core is that in sess_read() in session.inc what happens is that we hit the DB to join the session data with the user table.
If we don't find the user in the database we use drupal_anonymous_user() which creates an anonymous user without loading it from the database.
Here is the code:
The problem however is that we still hit the database for a join on the session table and we still need the session table for $_SESSION data.
I will respond with a few more suggestions after I take a deeper look at this patch and come up with some ideas, but I feel very strongly that we should be able to serve cached pages via memcache for anonymous users without having to touch the database at all, so I am giving this a strong +1, but I need a little more time to take a look at some options.
-s
Comment #12
cabbiepete CreditAttribution: cabbiepete commentedsubscrbing
Comment #13
jojje CreditAttribution: jojje commentedI am also very interested in this issue. We are currently planning to build a high traffic site with Drupal and I have been looking into performance, trying to minimize the database traffic. The best option I have discovered so far is using memcache for cache and sessions, but it still leaves some database queries for access control, as you mentioned. In my opinion this is a very important feature for high traffic sites. If not included in core or some of the cache modules we will have to build our own solution or maybe contribute some work to the existing modules.
Another idea, that would work for us, is to make access control configurable (on/off). The access table, in our case, will always be emtpy. Any banning of hosts or IP-addresses is done in the firewall.
Comment #14
mcurry CreditAttribution: mcurry commentedSubscribing
Comment #15
Omeyocan CreditAttribution: Omeyocan commentedHello,
I just tried to contact Robert Douglas a few minutes ago when I bumped into this thread.
If you add this to memcache.module:
and if you add this function to dmemcache.inc:
You'll have the early page cache serving from memcached.... at least I did some testing and it appears to work correctly.
If you use memcache.db.inc (with db fallback), you 'll have to add a check if the db connection is there, otherwise you'll get a fatal error.
Doing it like this seems to work:
So far, I think it works, but any remarks or improvements are welcome of course. The above was heavily inspired by the fastpath_fscache module.
kind regards
Jonas
www.php-professionals.com
www.omeyocan.net
Comment #16
Omeyocan CreditAttribution: Omeyocan commentedHello,
Right before flushing the db cache table a check for an active db connection should be done too (see line 7 in the next code snippet).
regards,
Jonas
www.php-professionals.com
www.omeyocan.net
Comment #17
jbeall CreditAttribution: jbeall commentedSubscribing... this is something I've been trying to do...