I feel the page caching system could do with some improvements as it doesn't work the way many would expect or like.

Right now it works in the following way when page caching is turned on:

Cache setting
Page doesn't exist in cache, so page is generated normally.

Reaches page_set_cache() function in common.inc

set_cache() is called with expire argument set to constant CACHE_TEMPORARY (-1 = remove at next cache wipe)

Page Loading
Page is accessed again - since it exists in the cache, cache_get is called, no check on expire value since its set to -1.

Cache Clearing
Every time cron runs or even a node is saved, cache_clear_all() is called and as long as enough time has passed since the last flush (based on 'cache_lifetime' variable) ALL pages are cleared.

Issues with this approach
Pages can remain cached for a very long time - for instance if you set cache_lifetime to be 5 mins, but your cron is only run every 3 hours the pages will remain cached for upto 3 hours.

After wiping, suddenly the cache is empty so every page needs to be generated again possibly causing high load on the db server if the site is busy.

Possible Solution
'page_cache_lifetime' variable created for maximum cache lifetime rather than the minimum (or use existing 'cache_lifetime'). This would involve changes to page_set_cache to allow setting an explicit expire time (REQUEST_TIME + cache_lifetime).

page_cache_get() will only return a valid page if the age of the page is less than the cache_lifetime.
(cache_get() may need to be changed to make sure it checks whether the expire time is valid or not as i'm not sure if the garbage collection in that function works for this case.)

Finally node_save() should be modified to make the call to cache_clear_all only remove cache entries related to the specific node. (some work on this has been done here #256416: Editing a node does not wipe its cache entry)

Benefits
Going for this approach would allow much more specific cache lifetimes, where the administrator can be sure that the site won't serve a page older than the cache lifetime.

It also means that there is a much more balanced cache flushing system in place, where its unlikely there will be a situation where the entire page cache is empty, so there will be fewer load peaks.

We are successfully using changes like these on a drupal 5 site (spiritlibrary,com) and it really eased the db load.

If there is interest I will adapt the changes we made to make a patch for Drupal 7.

Comments

proxiss’s picture

You have my vote for your proposal. Additionally, there could be kind of a hook in D7 for caching.

Cheers, Rainer

djohns’s picture

I agree with you!

smoothify’s picture

Version: 7.x-dev » 8.x-dev

This obviously won't get into D7 since it has just been released, so setting to D8.

ohnobinki’s picture

+

smoothify’s picture

I use Boost for my D6 sites now rather than changing core as I did with D5.

Boost creates a static html version of pages and still allows the page counter to work (uses javascript). Boost also offers comprehensive settings that allow you to specify how long each cached page should be valid for and doesn't clear them all at once.

I'm not suggesting static html caching should go in core, but some of Boost's caching and expiration logic could find a home there - so that other caching methods (Database, Varnish, etc) could utilize them.

proxiss’s picture

I am using mod_cache now for caching pages "outside" drupal and piwik to create access stats...

ianthomas_uk’s picture

Issue summary: View changes
Status: Active » Closed (duplicate)

There's been a huge amount of work to improve caching in Drupal 8, see #1393398: [meta] Drupal 8 Cache API improvements and https://drupal.org/node/1884796

If you think you can make further improvements, it's probably best to open new issues (or look for existing ones) with specific changes, based on the current Drupal 8 API.