One of Drupals biggest challenges is maintaining good performance for authenticated users (witness the struggles on Drupal.org). We already have a pretty solid (though simple) cache system for anonymous users, but a large proportion of the page load is still generated afresh for authenticated users. The reason for this is that caching for authenticated users is a much harder balancing act to get right - it is all too easy to build a caching system fantastically sophisticated, but then find that the overhead of the caching is greater than the original page load.

What is this patch?

This patch is an attempt to build a caching system for authenticated users that is as simple as possible, yet uses some (I hope!) elegant tricks to manage the complexity, and keep things useful for 'client' module (those modules that will actually use the caching). One of the hardest things with this is deciding between the 10 billion different possible approaches to per user caching, so my aim here was really to stick with the simplest option whenever possible - the caching logic (minus comments) is only about 70 lines of code. This way, we can code up some clients to use the caching, and benchmark away to figure out a baseline. Once we have this baseline we can experiment with the multitude of potential tweaks without needing to change the client API and find the right balance of performance and simplicity for core.

How does this patch work?

The basic idea is to store caches in such a way so that when a user requests a cache we know which of that various instances of that cache we should return. To do this, we take md5 hashes of each new cache that is set peruser, and append them to the cache key. We then keep track of these hashes for the current user (bearing in mind the current language and path). When the user returns to request the cache, we can then figure out what hash we need to append to the key so that they get the same data that was cached.

The placeholder path (e.g. node/*/edit) is used, rather than the full path (e.g. node/23/edit), to limit the number of combinations stored per user - but please note that this does not mean that the hash list changes on each request. Hashes are added cumulatively, so that this list will contain all the hashes for all the pages on this path that the user has visited. The hashes are removed automatically if the cache item they refer to can no longer be loaded (i.e. it has been cleared, or has expired) - this prevents build up of 'cruft' in the list of hashes.

Note how this setup naturally groups caches that are identical between users - this greatly reduces the number of additional caches we need to store. For example, if user A caches 'hello', and user B also needs to cache 'hello' then the two will share the same cache row, which will be the key specified, with the hash (5d41402abc4b2a76b9719d911017c592) appended.

One very common scenario for modules wanting to implement the cache, is that they are called multiple times over a page view - each time generating a new object or array. If they wanted to cache all (or some, if they were not all cachable) it would mean implementing a static variable, and adding a hook_exit function that pulls out this variable and adds it to the cache. This adds quite a lot of code to the function, and also risks changing the functions interface (to ensure it can be called with no parameters). To get around this, I have created a cache_collect() function, which simply adds each data item passed in to an array (based on the cache key). At the end of the page request each keys array is added to the cache automatically.

At this point I suggest reading through the patch and then re-reading the above paragraphs to make sure you got all the details :)

What can be cached

Well, this is similar to the current (flat) caching system, but with some enhancements and some caveats:

  • Can now be cached:
    • Things that vary per-user
    • Things that vary per-language
    • Things that vary per placeholder path (this is not of much practical use, it is really used just to help manage the size of the hash lists)
  • Should not be cached:
    • Things that vary on every path (the user will see the first-cached item on the other paths, rather than the correct item).
    • Things that have dynamic code, or are time-dependent (this is the same as with the existing cache)
  • Cached with care:
    • Things that change very often (the site admin needs control over the expiry time, with a default option to not cache).

Note that modules are still responsible for clearing (invalidating) their own caches, or setting appropriate expiry times. Modules should use the $wildcard parameter of cache_clear_all() to ensure that all per-user cache items are cleared when necessary.

Benchmarking

Regarding benchmarking, this is likely to be the hardest aspect of reviewing this patch. The usual ab/ab2 is really quite useless in this case, except for a quick check that the patch does not make things slower for anonymous users. Cookies or hacks can be used to simulate authenticated users, but without realistic load patterns we are unlikely to get meaningful results. Hence, it would be idea if tools like siege with sproxy could be used to really simulate a real site, with real data and real traffic - this will give us a much better idea of how this will work outside of the sandbox.

Please note that this is still 'concept' code in many respects - there is little point in really benchmark it until we have some 'client' code actually using it. Here are 3 major opportunities:

  • Locale module: I already wrote a rough implementation here, which is included in the patch. I have tested it and it does work, although I am pretty sure there is still some code left behind from the old caching strategy that could be removed.
  • The dynamic object translation patch: This is one of the main use cases I had in mind here. Without some kind of user/page caching there is really no way this patch could get in core, and so modifying that patch so we can test it with this caching is important if we want to see dynamic object translation in core!
  • Block module: This would be some more work to implement, mainly because there are some blocks we will really never want cached. Adding the code to add blocks to the peruser cache is easy - the laborious part is really extending block module so that hook_block can handle a $cache parameter returned (true, false or timestamp) that is used to determine how that block should be dealt with.

Eventually I am pretty confident that this caching could be used for many other things, but for now I think we should focus on these as 'the big three'.

Future possibilities

I think we should try to steer clear of these (in both discussion and code) until we have some basic benchmarks on the table. However, I want to document them so that people can see where this could head in the future.

  • We could experiment with the ways the hashes are stored and managed. Some ideas:
    • Storing the set of hashes for a user per actual path (rather than the placeholder path I use currently). Watch out for mega-long tables.
    • Storing the set of hashes in a single row per-user (with the path information part of a structured array). The risk here is that the array could get very large and hard to manage.
    • Storing the hashes in a structured table (separate columns for user, language, path) would probably be faster than simply building up the key. No major issues here, just extra complexity.
  • We could try preemptively loading the caches for a page (all in a single SQL statement). Easy to do, but could potentially end up loading things that aren't needed on the current page.
  • We could try and make things adaptive, so (for example) in can increase the 'detail' of path/user information stored to manage exceptions better. Major complexity risks here, but some opportunities if done right.

Todos

  1. Check this is really a sane direction
  2. Write up the 3 client implementations mentioned above
  3. Benchmarking - ideally with (simulated) real load patterns and data sets
  4. Try some tweaks based on the ideas above, and the results of the benchmarking so far
  5. More benchmarking, tweaks etc, etc, etc -> RTBC....
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Crell’s picture

subscribing, no time to review now.

moshe weitzman’s picture

subscribe ... why is it better to store the strings per user as opposed to the current "short string" strategy?

i don't immediately know if this is sane or not. it adds a fair amount of complexity, which is unfortunate. the payback better be significant.

Owen Barton’s picture

@Moshe: Please note that the main aim of this patch is not to improve locale module performance - I merely chose that as a simple place to write an example implementation. The primary use case (although I think there should be many others) is the dynamic object translation API patch, because this cannot easily yeald to the 'short string' strategy that we use for locale (the strings are generally much longer, and much more numerous).

To answer your question however, this is how it differs:

  • Locale using 'short string' approach: ALL short strings (about 1900 in Drupal 5) are loaded for each page access (they are loaded en-mass from a single cache entry, but they are loaded nonetheless), whether they are needed or not. Every long string (greater than 75 characters) incurs an additional SQL query each time it is used.
  • Locale using per user cache approach: On the first hit to each placeholdered path (e.g. node/*/edit) all the strings used on that path are loaded (one-by-one right now, but this could be changed easily) and cache_collect()-ed. On the second and subsequent visits to pages on the same path that same set of strings will be loaded automatically. Any missing strings (e.g. because the new page has some different strings to the previous, or there are new translations) are loaded and added to the cache automatically. Identical caches are shared between users (by using the hash), so we don't duplicate rows unnecessarily.

So why is the second approach better? Well, look at the memory used in the first approach:
- Before the string cache is loaded: 1,307,048 (1,313,088 max)
- After the string cache is loaded: 1,721,136 (1,783,232 max)
In other words - 400KB of RAM is used for each request to load all 1900 of these strings - even though on a single page only a fraction of these are needed (e.g. /admin- one of the heavier examples - uses about 50 strings). And that is *before* any contrib modules have been added - throw in your regular cck/views/etc/etc and I am pretty sure that number will soon double or triple. Note that the main rationale for building the new menu system was that it was using 800K or RAM.

There is a cost to the second approach too, of course - there is an additional single cache_get at the beginning of the page load to retrieve the hashes for the current user. There is also some logic to manage the locale cache (although this is not very far removed from the first approach), and the hash listings. However in the memory aspect at least it is massively better, only loading the strings used on the placeholdered path, which should be a fraction of the total list. There is also the complexity cost - but we are only talking about ~70 lines of code, which I don't think is a show stopper, if this works as it is supposed to.

It is quite possible that the second approach turns out to be less performant (despite the much better memory efficiency) than the first approach. If this is the case then no problem - we can ditch this part of the patch. However, if you look at the dynamic strings translation patch - where both the average string length, and the number of strings could both be an order of magnitude greater than in locale I think it is clear that the first approach is just a non-starter there!

Owen Barton’s picture

Work on a block caching implementation of this API is over at http://drupal.org/node/80951

kbahey’s picture

Owen

Exciting stuff indeed.

Robert has the advcache module which does cache stuff for authenticated users. Perhaps combining both is a good thing?

Moshe is right about complexity, and hence if this is pluggable, that would be a good thing (remember that all of cache.inc is pluggable now).

Anyway, subscribing ...

Owen Barton’s picture

@Khalid - Yes, this is pluggable in the same simple way as with cache.inc - by specifying a different path to cache.peruser.inc using variable_set(). Note that this is a separate variable, so a module could override either or both.

dmitrig01’s picture

Status: Needs review » Needs work

If you say there is no way to review, then it can't be reviewed :)

Owen Barton’s picture

Status: Needs work » Needs review

Not ready for benchmarking !== not ready for review

Please read the issue :) ... I want to get a couple of reviews of the API and overall caching strategy before putting effort into writing 2 more client implementations (so we can try some detailed benchmarks). If people have good ideas on how the API or caching strategy could be improved then it would be better to know them before we have a lot of code depending on it.

Owen Barton’s picture

FileSize
13.81 KB

Updated patch

catch’s picture

Version: 6.x-dev » 7.x-dev
Status: Needs review » Needs work

bumping this to drupal7 and it no longer applies. This would be amazing obviously. It'll also need to be looked at again now block caching is in.

Owen Barton’s picture

A few revisions on this, in case anyone has time to update:
* Doing an array_merge in cache_collect is dumb - we should just set the elements instead (overwriting if we need to)
* I am not sure md5s of the string representation are the best approach for comparison - perhaps there is a neater way of fingerprinting the array.

FiReaNGeL’s picture

Something like this would be a great first enabler to work on http://drupal.org/node/300935

Owen Barton’s picture

Status: Needs work » Needs review

Updated patch (untested). I didn't update the locale implementation, or the coding standards, but it should be enough for people to see how it fits together in context. I switched from array_merge to +, which is a lot faster - I haven't fully considered what the consequences of this are, however (and if we should more more of the implementation out to the modules using the API and use a straight key setter).

Owen Barton’s picture

FileSize
29.05 KB
BioALIEN’s picture

Something is wrong with the above patch. I get ++ towards the end. Re-roll?

lilou’s picture

FileSize
10.73 KB

Yes, there a wrong diff in cache.peruser.inc

+ remove settings.php diff.

kbahey’s picture

Cool ...

I am a bit concerned about the extra parameter creep in the existing functions.

Can we work around that, e.g. rely on the variable_get(), or is it time to convert cache_set()/cache_get() arguments into one argument that is an associative array?

lilou’s picture

Status: Needs review » Needs work
FileSize
10.38 KB
Owen Barton’s picture

Status: Needs work » Needs review
FileSize
2.04 KB
10.95 KB

Here is an updated patch, with some code style fixes, and also an updated locale implementation patch (keeping them separate, for clarity). I have confirmed (although not exhaustively) that the patches are behaving correctly.

I would encourage anyone wanting to review this patch to read the issue description first - it is still accurate and explains a lot of the overall direction that would be hard to understand from reading the patch alone.

meba’s picture

subscribing

markus_petrux’s picture

Exciting challange, indeed... subscribing

I'm also caching stuff for authenticated users on a D5 based site, but using a different approach that I described here:

http://groups.drupal.org/node/12951#comment-42082

The idea is that the whole page can be cached, not all pages of course, and before sending them to other users, the HTML is proprocessed to adjust parts that depend on the user such as nick, links, etc. The cache key is derived from path and roles so changes between users with same set of roles are min¡mum. Also, HTML is preprocessed before being cached to easy the task that is required to serve pages. When content is changed, cached pages are cleared. This is possible because cache items can be cleared using wildcards for the cache ID.

The problem with my approach is that HTML processing may use complex regular expressions that fully depend on the kind of content that is cached. So it is not an universal solution.

FiReaNGeL’s picture

If Drupal had knowledge of which bits (links, nicks, some blocks) are user specific, the approach described by markus would be easier. Maybe they could get declared by the developer, maybe Drupal could auto-detect them. Then the page could be cached and only those bits would be calculated before page render.

yonailo’s picture

I dont quite understand why you have to use the placeholder path to compute the hash value. As I see it, if you are appending the hash value to the key, the key already contains the actual path and I think the number of rows would remain the same if you dont use the placeholder.... can you explain why the placeholder is needed ? give me an example ?

thanks a lot!

Owen Barton’s picture

@unknownguy: Good question - there are 2 parts to storing the caches:
1: Storing the caches themselves - these are keyed by hash, so yes in this case there would be no increase in the number of rows if we store the full path (rather than the placeholder path)
2: Storing the lookups so we know which specific caches to load for a particular user (we just store a list of requested lookup keys that reference the hashes for the actual instance of that cache that we need for that user). We store this per-user, per-language and per-placeholder-path. If we store it per full path then we get very low cache efficiency because the user will normally be often be visiting many different pages, rather than the same one again and again. Alternately if we drop the path completely then the lookup list gets too long (although we should benchmark this).

yonailo’s picture

Thank you Grugnog2, I was not understanding correctly the second point. After reading your post everything is crystal clear for me.

I have been watching the source code of the latest patch, I guess that I will have to create somehow the table "peruse_cache", otherwise cache_get() and cache_set() will fail, am I right ? I don't see the create table statements in the patch.

Good job!

Owen Barton’s picture

FileSize
12.89 KB

Patch including database creation/update (not sure how I missed these before!)
May not be 100% up to HEAD...I'll try and do this today

Owen Barton’s picture

Updated patch (no functional changes) and two additional sample implementations.

The block implementation is pretty useful because it means that per-user caching can be used quite a bit more (although not simultaneously with other caching flags) without running so much risk of exploding the cache_block table with zillions of entries.

The path implementation is actually just using the cache_collect function (which I actually think might be best to split into a different patch). This obviously needs some benchmarking, especially around memory usage, since we probably will want to differentiate the caches based on path or other factors to reduce memory consumption.

Anonymous’s picture

Status: Needs review » Needs work

The last submitted patch failed testing.

jaydub’s picture

subscribing

FiReaNGeL’s picture

Issue tags: +Performance
FiReaNGeL’s picture

Status: Needs work » Needs review

Resetting status to retest, its been a while

Status: Needs review » Needs work

The last submitted patch failed testing.

FiReaNGeL’s picture

Status: Needs work » Needs review
FileSize
14.28 KB

Reroll on latest head - I just setup Eclipse to roll patches, so I hope the patch is ok and the test bot doesn't choke on it!

I fixed a very minor mistake in system.install - $schema['cache_peruser']['description'] instead of $schema['cache_registry']['description'] - copy paste error I guess.

I also merged the 3 small patches files in one.

Status: Needs review » Needs work

The last submitted patch failed testing.

FiReaNGeL’s picture

Status: Needs work » Needs review
FileSize
18.58 KB

Had to update the system_update # - also, the previous patch by Grugnog2 missed a file (cache.peruser.inc) - added to the latest patch.

Quick review : the patch seem to break the block system, even with block cache disabled.

Status: Needs review » Needs work

The last submitted patch failed testing.

FiReaNGeL’s picture

It's the first time in the issue the patch actually get tested - quite a lot of brokenness here:

403 functionality 2 fail
Block functionality 21 fail
Block functionality 1 fail
Blog functionality 4 fail
Forum functionality 4 fail
Menu item creation/deletion 35 fail 10 exceptions
Path alias functionality 10 fail 28 exceptions
Path aliases with translated nodes 7 fail 12 exceptions
Session tests 1 fail
User registration 7 fail 1 exception

Is there a way to know which tests failed? There's no such thing on the test results page (http://testing.drupal.org/pifr/file/1/peruser_9.patch since the link above is malformed).

Berdir’s picture

Status: Needs work » Needs review
FileSize
18.7 KB

Re-rolling..

The main issue was that the name of the block view has changed, I fixed this, some of the manually selected tests do now work correctly.

I haven't look in detail into the patch but I think this is really interesting, especially as we will maybe convert the main "page content" into a block.

- I'm not sure about the name of the cache table.. cache_peruser sounds .. not right.

Status: Needs review » Needs work

The last submitted patch failed testing.

moshe weitzman’s picture

The content of the page is now a block. Perhaps someone can revisit this.

moshe weitzman’s picture

Status: Needs work » Needs review
FileSize
15.77 KB

Reroll, without path.inc implementation and cache_collect() stuff. I used the most recent patch in 39 as my starting point. I still need to review it carefully. On first look, I don't see where anything in core uses the peruser system. The locale impementation was not in Owen's last patch. Lets set to CNR so the bot can pound on it.

Dries’s picture

Issue tags: +Favorite-of-Dries

Tracking this one.

FiReaNGeL’s picture

Someone at Everyblock.com implemented a two-phased rendering mechanism in Django and report that it was crucial for their site performance, which is mostly authenticated users. See it at http://www.holovaty.com/writing/django-two-phased-rendering/ . Basically you end up defining what is dynamic in a page (either time or user dependent) and cache the rest - you define what NOT to cache. The result is that on most pages, > 90% of the page is from the cache for all users, and you recalculate the rest on view (or get it from a per-user cache such as the one in this issue).

Take Drupal front page for example; only a few links are user-dependent (the number of new comments links, and 2 links on blocks in the right sidebar).

All users benefit from the cache after a single page view from one user; in the system Grugnog propose, as I understand it, every user will have its first page generated and cached, on each different page, which is slow. I think such a system would be superior to what is presented here (performance wise, but not simplicity wise).

This being said, this has a patch, and I sadly don't have the technical ability to code the caching system I described. If anyone want to have a go at it, I think it would have a very big impact on Drupal's ability to sustain complex sites involving mostly authenticated users.

moshe weitzman’s picture

@FireaNG3L - I think there is a real need for 2 phase render like you describe. I think the block system will be the main cache for authenticated users and the preprocess layer will do the preg_replace. The details of course remain to be discussed and coded. AThis patch has merit independant of that effort.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Version: 7.x-dev » 8.x-dev
carlos8f’s picture

subscribing

kongoji’s picture

Status: Needs work » Needs review

#34: peruser_8.patch queued for re-testing.

moshe weitzman’s picture

Status: Needs review » Closed (works as designed)

I think the render cache in Drupal 7 allows for a lot of whats desired here. Please reopen with new summary if you disagree.

Owen Barton’s picture

The render cache does indeed allow a lot of this. The other benefit of this approach was caching DRUPAL_CACHE_PER_USER and collapsing caches that are the same for different users (e.g. per-role or per-OG) into a single entry, without having to determine the key mapping manually. It would be very interesting to look explore this kind of approach some more in combination with #636454: Cache tag support.