Applying the memcached patches to store Drupal's caches into memcache, moving sessions into memcache, and enabling aggressive caching, it nearly becomes possible to serve anonymous cached pages directly from ram without hitting the database. Unfortunately the bootstrap process prevents this as DRUPAL_BOOTSTRAP_SESSION and DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE come after DRUPAL_BOOTSTRAP_DATABASE and DRUPAL_BOOTSTRAP_ACCESS.

To reach this end, the attached patch, which was sponsored by PageSix.com, simply re-orders the bootstrap process. We shuffle things from the current order:
DRUPAL_BOOTSTRAP_CONFIGURATION
DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE
DRUPAL_BOOTSTRAP_DATABASE
DRUPAL_BOOTSTRAP_ACCESS
DRUPAL_BOOTSTRAP_SESSION
DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE

To:
DRUPAL_BOOTSTRAP_CONFIGURATION
DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE
DRUPAL_BOOTSTRAP_SESSION
DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE
DRUPAL_BOOTSTRAP_DATABASE
DRUPAL_BOOTSTRAP_ACCESS

(This patch requires that aggressive caching is enabled, as _init hooks frequently involve a database query..)

Finally, there is then one piece of ugliness that we have to deal with when the variables array is flushed from the cache, where we manually force a database bootstrap out of order. The end result is the attached rather small patch, and a website that can serve anonymous cached pages directly from memcached, particularly significant for a high traffic infrastructure with many webservers.

Some simple benchmarks I ran showed a very significant performance boost.

--

Though this is accomplished with a small patch, it's not something that could be merged into core as is. Toward that aim, perhaps the database bootstrap could become a conditional event that is automatically triggered on the first db_query call?

Perhaps there are better ways to accomplish this end goal? I'm very interested in seeing some discussion both on any potential pitfalls with this existing patch, as well as ways to generically improve the bootstrap process to allow pages to be served without bootstrapping the database.

CommentFileSizeAuthor
#1 bootstrap.patch2.78 KBJeremy
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Jeremy’s picture

FileSize
2.78 KB

It will likely be easier to review the patch if it's attached...

chx’s picture

Why are you not using DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE ? Maybe with a cookie which indicates whether the user is expected to be anonymous?

firebus’s picture

Project: Advanced cache » Memcache API and Integration

this sounds promising, but i think you want it in the memcache project, and not the advcache project...

many people use them both together, but advcache helps a lot even without memcache.

i'm concerned about the requirement to use aggressive caching - could we add a statement to check if aggressive caching is enabled and then reorder the phases conditionally?

Jeremy’s picture

Project: Memcache API and Integration » Advanced cache

chx: flexibility. I'd like to see the Drupal 7 bootstrap process be more flexible and aware of the various alternative methods for storing data -- the current hard coded bootstrap order assumes (enforces) that sessions and variables are in the database. This is no longer the only way things are done... With the above patch, you're able to get page_cache_fastpath performance while utilizing standard Drupal logic to determine the logged in status of users, etc.

firebus: This flow change should be beneficial to other projects beyond memcache -- anyone that has implemented session and variable caching outside of the database (hence why it is in the advanced cache project, where I assume more people that are interested in this area of the code are watching). And yes, the patch needs to be more intelligent to not impose requirements ie aggressive caching, but currently it's more a proof of concept around which I was hoping to see a little discussion.

Jeremy’s picture

Title: Serve pages from memcached w/o hitting database » Serve pages from cache w/o hitting database

Updating title to reflect that this concept should work for other caching methods beyond memcached.

firebus’s picture

advcache, despite the name, is not "a module for anything advanced about caching", but rather a module for caching things that core drupal doesn't cache, using standard core caching code.

thus we have extra caching for nodes, comments, searches, forums objects, block objects etc.

serving cached pages without hitting the db isn't interesting to the advcache module. boost, memcache, and perhaps fastpath_fscache would benefit from this change.

it might be an interesting change to make to core.

Jeremy’s picture

Yes, I would very much like to rework this into something that could be merged into core in Drupal 7. My current focus is on improving the performance of existing caches on high-traffic websites, not just adding new caches.

One example that gains from this patch is another that I've posted at http://drupal.org/node/230290, removing page cache flushing and allowing cached pages to be served to all but the first request for an expired cache which will rebuild it. If this is all not relevant to the advcache project, where would you suggest these efforts are best focused?

firebus’s picture

for this particular issue, i think either the memcache project, or drupal core would be a better place to discuss it.

memcache project probably has a higher likelihood of accepting the patch quickly, and also making it available to more people running 5 and 6, providing a proof of concept for 7, etc.

perhaps robertDouglass, who co-maintains both memcache and advcache, has an opinion...

let me comment on the second issue over there...

robertDouglass’s picture

@Jeremy: glad to see your face in this neck of the woods =)

I too would like to see the D5/6 solutions (advcache/memcache) focus on a DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE strategy for sidestepping db interaction in sessions. However, I can't claim to have researched the topic properly. Your work on this is greatly appreciated and the suggested approach here will be studied appropriately.

m3avrck’s picture

subscribing...

slantview’s picture

I am very interested in this as well. Part of the problem with session handling in core is that in sess_read() in session.inc what happens is that we hit the DB to join the session data with the user table.

If we don't find the user in the database we use drupal_anonymous_user() which creates an anonymous user without loading it from the database.

Here is the code:

<?php
 // Otherwise, if the session is still active, we have a record of the client's session in the database.
  $user = db_fetch_object(db_query("SELECT u.*, s.* FROM {users} u INNER JOIN {sessions} s ON u.uid = s.uid WHERE s.sid = '%s'", $key));

  // We found the client's session record and they are an authenticated user
  if ($user && $user->uid > 0) {
    // This is done to unserialize the data member of $user
    $user = drupal_unpack($user);

    // Add roles element to $user
    $user->roles = array();
    $user->roles[DRUPAL_AUTHENTICATED_RID] = 'authenticated user';
    $result = db_query("SELECT r.rid, r.name FROM {role} r INNER JOIN {users_roles} ur ON ur.rid = r.rid WHERE ur.uid = %d", $user->uid);
    while ($role = db_fetch_object($result)) {
      $user->roles[$role->rid] = $role->name;
    }
  }
  // We didn't find the client's record (session has expired), or they are an anonymous user.
  else {
    $session = isset($user->session) ? $user->session : '';
    $user = drupal_anonymous_user($session);
  }

  return $user->session;
?>

The problem however is that we still hit the database for a join on the session table and we still need the session table for $_SESSION data.

I will respond with a few more suggestions after I take a deeper look at this patch and come up with some ideas, but I feel very strongly that we should be able to serve cached pages via memcache for anonymous users without having to touch the database at all, so I am giving this a strong +1, but I need a little more time to take a look at some options.

-s

cabbiepete’s picture

subscrbing

jojje’s picture

I am also very interested in this issue. We are currently planning to build a high traffic site with Drupal and I have been looking into performance, trying to minimize the database traffic. The best option I have discovered so far is using memcache for cache and sessions, but it still leaves some database queries for access control, as you mentioned. In my opinion this is a very important feature for high traffic sites. If not included in core or some of the cache modules we will have to build our own solution or maybe contribute some work to the existing modules.

Another idea, that would work for us, is to make access control configurable (on/off). The access table, in our case, will always be emtpy. Any banning of hosts or IP-addresses is done in the firewall.

mcurry’s picture

Subscribing

Omeyocan’s picture

Hello,

I just tried to contact Robert Douglas a few minutes ago when I bumped into this thread.

If you add this to memcache.module:

if (variable_get('page_cache_fastpath', 0)) {
    function memcache_user($op, &$edit, &$account, $category) {
        switch ($op) {
            case 'login':
                // Cookie used to find out is user is logged in.
                setcookie('drupal_uid', $user->uid, time() + (60 * 60 * 24 * 30), '/');
                break;
            case 'logout':
                // Clear the cookie
                setcookie('drupal_uid', $user->uid, time() - 60, '/');
                break;
        }
    }

}

and if you add this function to dmemcache.inc:

/**
 * Main callback from DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE phase
 * This enables us to get the cached page from memcache, avoiding db calls
 */
function page_cache_fastpath() {
  global $base_root;
  if (empty($_POST) && !$_COOKIE['drupal_uid']) {//anon user and no submit
    $cache = cache_get($base_root . request_uri(), 'cache_page');
      if (!empty($cache)) {
        // display cached page and exit
        drupal_page_header();
        if (function_exists('gzencode')) {
          header('Content-Encoding: gzip');
        }
        print $cache->data;
        //print gzencode("<!-- from early page cache -->"); //you can comment the previous line and uncomment this one to see if it works.
        return TRUE;
      }
  }
  else {
    // If $_POST is not empty, the user has submit a form (ie a comment was
    // posted) so we don't serve the page from the cache, instead letting Drupal
    // process the form submission.  If the 'drupal_uid' is set, a logged in
    // user is viewing the page and so again we don't serve the page from the
    // cache.
    return;
  }
}

You'll have the early page cache serving from memcached.... at least I did some testing and it appears to work correctly.

If you use memcache.db.inc (with db fallback), you 'll have to add a check if the db connection is there, otherwise you'll get a fatal error.
Doing it like this seems to work:

function cache_get($key, $table = 'cache') {
  global $user, $active_db;
  ....
  // Look for a database cache hit.
  // Check first if the db connection exists since in the early page cache bootstrap fase (DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE), we have no db connection
  if ($active_db && $cache = db_fetch_object(db_query("SELECT data, created, headers, expire, serialized FROM {". $table ."} WHERE cid = '%s'", $key))) {
....

So far, I think it works, but any remarks or improvements are welcome of course. The above was heavily inspired by the fastpath_fscache module.

kind regards
Jonas

www.php-professionals.com
www.omeyocan.net

Omeyocan’s picture

Hello,

Right before flushing the db cache table a check for an active db connection should be done too (see line 7 in the next code snippet).

function cache_get($key, $table = 'cache') {
  global $user, $active_db;

  // Garbage collection necessary when enforcing a minimum cache lifetime
  $cache_flush = variable_get('cache_flush', 0);
  // Check first if the db connection exists since in the early page cache bootstrap fase (DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE), we have no db connection
  if ($active_db && $cache_flush && ($cache_flush + variable_get('cache_lifetime', 0) <= time())) {
    // Time to flush old cache data
    db_query("DELETE FROM {". $table ."} WHERE expire != %d AND expire <= %d", CACHE_PERMANENT, $cache_flush);
    variable_set('cache_flush', 0);
  }

  // If we have a memcache hit for this, return it.
  if ($cache = dmemcache_get($key, $table)) {
    return $cache;
  }

  // Look for a database cache hit.
  // Check first if the db connection exists since in the early page cache bootstrap fase (DRUPAL_BOOTSTRAP_EARLY_PAGE_CACHE), we have no db connection
  if ($active_db && $cache = db_fetch_object(db_query("SELECT data, created, headers, expire, serialized FROM {". $table ."} WHERE cid = '%s'", $key))) {
 ...

regards,
Jonas

www.php-professionals.com
www.omeyocan.net

jbeall’s picture

Subscribing... this is something I've been trying to do...