The problem

After #81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers) we are still left with a very monolithic API for clearing caches on content changes.

cache_clear_all() is hard coded to the page and block caches.

Node, user, taxonomy and many other modules call cache_clear_all() after any form submission - even though the entity in question may not even be shown on any of the pages or blocks that were cached. This can result in a very low cache hit rate especially on sites with lots of content getting posted. Many sites use varnish for page caching now, however that still leaves a choice of either very short cache TTL, or using varnish commands to flush the cache in the same way as cache_clear_all() does. The same problem exists even if you're doing ESI as well.

Proposal

Introduce support for 'cache tags', adding support is likely to come in three parts:

  1. Low level implementation: cache()->set($cid, $data, $tags); and cache()->deleteTags($tags). This is the current focus of patches to this issue.
  2. Add #tags support to drupal_render() caching - so that cached HTML can inherit the tags from child elements. Make it easy for blocks and the page cache to populate this.
  3. Replace calls to cache_clear_all() with cache()->deleteTags($tags). Eventually remove cache_clear_all() in favour of this mechanism. (contrib/custom modules can still clear bins.)

What's the difference between cache tags and unique IDs
And ID is a unique identifier of a cache record, it might be completely fixed, or dynamically generated.

When dynamic cache IDs refer to rendered content or other dynamic structures, they are usually based on context. This can be described by the DRUPAL_CACHE_* constants - per-user, per-role, per-page, per-language etc. The context the item is generated in determines the output, but different versions are served according to different contexts - a context change (granting a role to current user or switching language) will rarely effect existing cached content - it will just cause different version to be served to the user. Each cache item has a unique ID, even if it might be a dynamically generated variant.

Cache tags are rather a way to describe the items that make up a cache item. Entities are the obvious example - a node might be cached in various view modes, as part of a cached listing of node titles in a block etc. The tag allows us to associate those blobs of HTML with the node itself, so that if the node is edited, we can invalidate those blobs (but not the thousands of other cached HTML fragments that have nothing to do with it).

Each cache item can have multiple tags - for example a cached page might contain 5 nodes, 10 taxonomy terms and 3 users.

Other systems/standards
This issue began with a late night conversation between catch and sdboyer, but has since evolved. It took a while for it to materialize into 'cache tags', but when it did, I immediately found other systems with the same concept.

However, review of those system indicated the support was not very mature, and I've ruled out actually incorporating either the exact APIs or implementations here, a quick summary follows though:

Zend Cache supports cache tags as part of it's base cache class. However many implementations such as their memcache backend don't support it, instead throwing exceptions. Additionally it contains some IMO not very useful API features like "delete all content not tagged like this".

Kohana also has tags as part of their API. But they also have extremely limited support in their implementations, and also have API functions which are not very useful, like allowing for finding content with a particular tag. Again - some implementations do not and cannot support their API, making it unsuitable for Drupal.

There is a fork of memcached (the library, not the PHP extension or Drupal module) that attempts to implement tag support - http://code.google.com/p/memcached-tag/ I have not seen much mention of this though and it looks mostly inactive at this point.

More promising is Varnish's support for regexp invalidation of cache items. See http://www.gossamer-threads.com/lists/varnish/misc/19732#19732 and http://pooteeweet.org/blog/0/1979#m1979 for discussion of this (thanks Moshe). Assuming the API given above, a varnish implementation would need to set headers containing the cache tags, then also clear varnish via regexp on cache()->clearTags(). So if we can get this right in PHP/MySQL/Memcache, it should be 99% there for Varnish as well.

Remaining tasks

We need to agree on the changes to the cache interface, and come up with a scalable implementation in MySQL.

catch's experience of working on wildcard clearing in the Drupal Memcache project suggests it should be possible to support this in memcache as well.

We need to discuss whether to try to replace prefix cache clears with this system as well, there are likely arguments for and against that.

We should be able to quickly use this to replace CACHE_TEMPORARY and cache_clear_all() once the API is available, see #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support.

Supporting this in drupal_render() is yet to be tackled.

Do we want to automate the drupal_render() support in entity rendering too so that blocks and pages get it for free?

As well as invalidating blocks and pages when items contained in them change, we also need to consider the case of a block listing items that needs to change when a new one is added. This would mainly be a convention for tags, such as array('node_posted_to_og' => array($og_ids)) for the block. Organic groups would then need to call cache()->deleteTags() when items are added too.

Our cache API is very much tied to bins. We may need to add a wrapper that clears any bin using tags, or nominate a couple of bins (cache_rendered?cache_content?) to use the automatic drupal_render() etc. handling - and any other bins you can use the API yourself but need to do the extra legwork.

Original report:

Our current methods of cache invalidation are a huge sledgehammer - we essentially let any old function clear any arbitrary cache as much as it likes, and if you need something a bit different, like system_cron() not invalidating your entire page and block cache every time it gets run, you're more or less out of luck unless you work around it in a custom cache.inc.

So, I think we need a new API, have been thinking about this for a while, but didn't really crystallize until the past week or so.

It consists primarily of two parts:

<?php
function hook_cache_info() {
 
$caches = array();
 
$caches['page'] = array(
   
'invalidate' => array(
     
'entity' => array(
       
'node',
       
'comment',
      ),
     
'callback' => 'system_page_cache_clear'.
    ),
  );
  return
$caches;
}


function
node_save($node) {

 
// blah blah
 
cache_invalidate('entity', 'node', 'insert', $nid);

 
// blah blah
 
cache_invalidate('entity', 'node', 'update', $nid);
}

function
system_page_cache_clear($type, $type_type, $op, $extra) {
 
// Clear the whole page cache whenever this is called. But there's a
  // hook_cache_info_alter() which would allow sites to only clear on delete
  // or only on insert, or every cron run, depending on requirements.
?>

So - modules control their own caches via a registry hook and cache clearing callback, which can be altered if you need to. Modules inform the cache clearing API about what they're doing - similar to how field_attach_*() works ("I'm doing this now, here's what I'm doing it with.", then the module controlling the cache decides what to do based on that information.

Also pasting an irc conversation with sdboyer which cleared some of this up:

<catch> sdboyer_: so I want a hook_caches() - which  allows modules to expose which caches they maintain, and on what conditions they need to be cleared.
<sdboyer_> catch: i think that'd get too complicated too quickly if the clearing logic is centralized
<catch> sdboyer: so system_caches() says I've got 'page' - I need this cleared when any CRUD operation happens on content.
<catch> sdboyer: node and comment, when doing CRUD operations, do cache_invalidate('entity', $entity, $op, $id);
<catch> sdboyer: system_caches() includes a callback key for actually clearing the cache.
<catch> sdboyer: API isn't quite there yet, but basically we let modules control their own caches and ask for information about when they want to clear.
<catch> sdboyer: other modules don't clear caches directly, they do a field_attach_*() style thing of saying "this is what I'm doing here"
<catch> sdboyer: and then hook_caches() is alterable - so you can make the page cache only invalidate if content was deleted, or whatever.
<sdboyer> catch: so i like cache_invalidate('entity' ...), but that concern about centralized clearing logic still stands. i think i'd rather a system where entities, or other stuff that's often cached, notifies when its been changed and modules can react via a hook, rather than a selectively-triggered callback
<catch> sdboyer: the problem with hooks is you can't disable them.
<catch> sdboyer: so system_cron() does cache_clear_all() every cron run.
<catch> You can't stop that from having an effect unless you write a custom caching backend.
<sdboyer> catch: why would you want to disable cache invalidation?
<catch> sdboyer: because some sites need to clear the page cache every five minutes with no exceptions, some only when content is added, some only if it's taken down.
<sdboyer> catch: right yeah i'm talking about somethin diffrent.
<catch> sdboyer: so cache_invalidate(..... you can't disable.
<sdboyer> catch: cache_invalidate() is an entity's way of saying, "i've been changed - things that cache me, react as you will."
<catch> sdboyer: yes.
<sdboyer> catch: i think we're focused at two different problems, then
<sdboyer> catch: i guess the key point is: [11:05:19] <catch> sdboyer: API isn't quite there yet, but basically we let modules control their own caches and ask for information about when they want to clear. <-- what's the point of asking for information about when they want to clear if they're controlling their own caches?
<catch> sdboyer: well it's that if controlling their own caches means hook_cache_clear($type, $type, $id, $foo, $bar) - and you can never stop a hook being executed.
<catch> sdboyer: and when to clear some caches is a site specific decision - like the page cache.
<sdboyer> catch: ahh i think i see where we're talking about different things there - when i say "controlling their own caches" i don't mean "they decide and it's not configurable"
<sdboyer> catch: i'm just talkin about looser coupling. which might not actually even be necessary, because i don't know the mechanics of how a cache_clear_all() call actually works right now
<catch> sdboyer: in default cache.inc it's almost a delete query builder.
<sdboyer> catch: eew
<catch> sdboyer: so hook -> cache_clear_all() is what we can currently do with hook_node_insert() or whatever, but that's why I want this extra layer of cache clearing callbacks you can swap out somehow.
<catch> sdboyer: because by the time you get to cache_clear_all() your custom cache.inc doesn't know what the hell's going on anyway.
<sdboyer> catch: i think i see where you're going, then - the cache_invalidate() is the big new addition, by adding a layer of indirection that allows caches to decide whether or not to flush, rather than the cache_clear_all() steamroller approach
<catch> sdboyer: but yeah whether cache_invalidate() just invokes hooks, or some kind of setting, or triggers callbacks you can alter out is a bit messy at the moment.
<sdboyer> catch: yeah, that i'd fully support. i think the crucial addition for custom caching backends, or contrib-based caching, would be a very simple thing in cache_invalidate() that triggers registerable notification callbacks with not-too-much granularity
<sdboyer> catch: i think the best approach is to set some basic conditions that determine whether or not a callback is triggered ("is it an entity?" "is it a node?"), but punt the more complicated logic to the callback
<catch> sdboyer: so $page['clear callback']['page_clear_cache'] no?
<sdboyer> catch: that was what i was originally saying about being concerned that a centralized clearing system would quickly buckle under its own weight of custom clearing rules
<catch> sdboyer: OK that could work - trigger the callback at a certain point and just pass arguments in.
<sdboyer> catch: exactly
<sdboyer> catch: yup, that'd basically be it. could almost just be a hook_menu()-style registry thing
<catch> sdboyer: OK now I get it. So cache_invalidate() stays as it is.
<sdboyer> cache_invalidate?
<sdboyer> cmon druplicon!
<sdboyer> ahh wait, nm, it doesn't exist yet :)
<catch> sdboyer: the mythical cache_invalidate which doesn't exist yet stays how it was 20 minutes ago.
<sdboyer> :P
<sdboyer> catch: sounds like it, yeah. then the page cache can register something that says its callback should be fired if entity,node are both true about the invalidated item
<sdboyer> catch: and THEN inside its callback can do something like reading TTL options configured via the UI and decide how to proceed
<catch> so hook_caches() { return array('page' => array('type' => array('entity' => array('node', comment'), 'module'), 'clear' => 'page_clear_cache'); or something bizarre like that.
<catch> With less shit syntax.
<sdboyer> catch: yup, that basic approach seems sound to me
<catch> sdboyer: nice, I'll paste this into an issue if you don't mind.
<sdboyer> catch: absolutely :)
Files: 
CommentFileSizeAuthor
#198 drupal-636454-197.patch2.04 KBtim.plunkett
PASSED: [[SimpleTest]]: [MySQL] 35,070 pass(es).
[ View ]
#195 drupal-636454-195.patch2.05 KBtim.plunkett
PASSED: [[SimpleTest]]: [MySQL] 35,065 pass(es).
[ View ]
#191 cache_tags_8.patch16.91 KBBerdir
PASSED: [[SimpleTest]]: [MySQL] 36,161 pass(es).
[ View ]
#189 cache_tags_7.patch16.91 KBBerdir
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/lib/Drupal/Core/Cache/DatabaseBackend.php.
[ View ]
#187 cache_tags_6.patch16.9 KBBerdir
FAILED: [[SimpleTest]]: [MySQL] 36,160 pass(es), 0 fail(s), and 34 exception(s).
[ View ]
#185 cache_tags.patch16.9 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/lib/Drupal/Core/Cache/DatabaseBackend.php.
[ View ]
#183 cache_tags.patch16.9 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/lib/Drupal/Core/Cache/DatabaseBackend.php.
[ View ]
#178 cache_tags.patch16.96 KBcatch
FAILED: [[SimpleTest]]: [MySQL] 35,926 pass(es), 0 fail(s), and 98 exception(s).
[ View ]
#175 cache_tags.patch16.94 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]
#169 cache_tags.patch16.82 KBcatch
FAILED: [[SimpleTest]]: [MySQL] 35,925 pass(es), 0 fail(s), and 98 exception(s).
[ View ]
#165 cache_tags.patch13.47 KBcatch
PASSED: [[SimpleTest]]: [MySQL] 35,920 pass(es).
[ View ]
#161 cache_tags-636454-161.patch13.52 KBpillarsdotnet
PASSED: [[SimpleTest]]: [MySQL] 35,694 pass(es).
[ View ]
#158 cache_tags.patch13.94 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Failed to run tests: failed to enable simpletest module.
[ View ]
#154 cache-tags-636454-153.patch16.09 KBmsonnabaum
PASSED: [[SimpleTest]]: [MySQL] 34,594 pass(es).
[ View ]
#150 cache-tags-636454-150.patch15.51 KBmsonnabaum
PASSED: [[SimpleTest]]: [MySQL] 34,402 pass(es).
[ View ]
#148 cache-tags-636454-148.patch15.49 KBmsonnabaum
PASSED: [[SimpleTest]]: [MySQL] 34,421 pass(es).
[ View ]
#145 cache-tags-636454-145.patch14.98 KBmsonnabaum
PASSED: [[SimpleTest]]: [MySQL] 34,413 pass(es).
[ View ]
#142 cache-tags-636454-142.patch15.25 KBmsonnabaum
FAILED: [[SimpleTest]]: [MySQL] 34,413 pass(es), 0 fail(s), and 5 exception(es).
[ View ]
#138 cache-tags-636454-138.patch14.54 KBmsonnabaum
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]
#136 cache-tags-636454-136.patch14.57 KBbeejeebus
PASSED: [[SimpleTest]]: [MySQL] 34,399 pass(es).
[ View ]
#135 cache-tags-636454-135.patch16.18 KBbeejeebus
PASSED: [[SimpleTest]]: [MySQL] 34,383 pass(es).
[ View ]
#128 cache-tags-636454-128.patch14.26 KBmsonnabaum
PASSED: [[SimpleTest]]: [MySQL] 34,398 pass(es).
[ View ]
#125 cache-tags-636454-124.patch15.49 KBmoshe weitzman
FAILED: [[SimpleTest]]: [MySQL] 34,206 pass(es), 20 fail(s), and 8 exception(es).
[ View ]
#124 cache-tags-636454-124.patch15.49 KBbeejeebus
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache-tags-636454-124.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]
#102 cachetags.patch15.87 KBcarlos8f
PASSED: [[SimpleTest]]: [MySQL] 33,651 pass(es).
[ View ]
#100 0001-Counter-based-cache-tags-in-D8.patch14.72 KBcarlos8f
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 0001-Counter-based-cache-tags-in-D8.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]
#97 cachetags.patch11.94 KBcarlos8f
PASSED: [[SimpleTest]]: [MySQL] 33,449 pass(es).
[ View ]
#94 0001-Counter-based-cache-tags-with-plugin-interface.patch12.05 KBcarlos8f
PASSED: [[SimpleTest]]: [MySQL] 33,346 pass(es).
[ View ]
#93 0001-Cache-tags-catch-version-fixed-tag-deletion-and-adde.patch20.41 KBcarlos8f
FAILED: [[SimpleTest]]: [MySQL] 33,304 pass(es), 0 fail(s), and 4 exception(es).
[ View ]
#93 0001-Counter-based-cache-tags.patch20.4 KBcarlos8f
PASSED: [[SimpleTest]]: [MySQL] 33,324 pass(es).
[ View ]
#90 0001-Implement-cache-tags-using-counter-based-invalidatio.patch19.65 KBcarlos8f
PASSED: [[SimpleTest]]: [MySQL] 33,325 pass(es).
[ View ]
#90 0001-Implement-cache-tags-using-timestamp-based-invalidat.patch19.28 KBcarlos8f
PASSED: [[SimpleTest]]: [MySQL] 33,330 pass(es).
[ View ]
#89 0001-Implement-cache-tags-using-counter-based-invalidatio.patch19.65 KBcarlos8f
PASSED: [[SimpleTest]]: [MySQL] 33,327 pass(es).
[ View ]
#89 0001-Implement-cache-tags-using-timestamp-based-invalidat.patch19.28 KBcarlos8f
PASSED: [[SimpleTest]]: [MySQL] 33,323 pass(es).
[ View ]
#88 0001-636454-73-by-catch-cache-tags.patch18.89 KBcarlos8f
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]
#73 cache_tags_4.patch18.87 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_tags_4_0.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]
#71 cache_tags.patch19.46 KBcatch
FAILED: [[SimpleTest]]: [MySQL] 30,291 pass(es), 639 fail(s), and 73,242 exception(es).
[ View ]
#67 cache_tags.patch17.73 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]
#64 cache_tags.patch11.62 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in includes/cache.inc.
[ View ]
#63 cache.tags_.patch10.95 KBbeejeebus
PASSED: [[SimpleTest]]: [MySQL] 0 pass(es).
[ View ]
#60 cache_tags.patch10.87 KBcatch
PASSED: [[SimpleTest]]: [MySQL] 32,924 pass(es).
[ View ]
#57 cache_tags.patch8.43 KBcatch
PASSED: [[SimpleTest]]: [MySQL] 32,930 pass(es).
[ View ]
#55 cache_tags.patch3.26 KBcatch
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in includes/cache.inc.
[ View ]

Comments

sdboyer’s picture

subscribing, and noting that the conversation above is flipped - lines starting with 'catch:' are from me, lines starting with 'sdboyer:' are from catch.

edit: this is now fixed

catch’s picture

Fixed the lack of code tags so it should be back the right way 'round now.

catch’s picture

I'm moving this back to Drupal 7 for discussion because:

No existing APIs would be changed.

There's a reasonably limited number of places in core which would need to be converted.

Nothing in here would stop people doing cache_clear_all() directly, so it won't really affect modules already ported except they wouldn't be using D7 best practices.

if someone moves this back to D8 I won't mind, just running out of performance ideas at the moment.

andypost’s picture

Subscribe, this approach looks very promising! If D7 going more entity-related so this could go. Right now this could be done in contrib like clearing cached views on some events.

catch’s picture

Just a note this wouldn't only be for entities - we'd need things like cache_invalidate('css');, cache_invalidate('menu') and cache_invalidate('modules'); - although some of those are only used internally within certain subsystems - so whether we let them keep using cache_clear_all() or implement and then use this API I don't know yet.

merlinofchaos’s picture

This is a very interesting approach, and I think I like it. One interesting addendum to this approach is that hook_cache_info could be used to create schema for cache tables, so that modules wouldn't have to handle that. Right now that's a trainwreck waiting to happen if the cache table schemas ever change again.

moshe weitzman’s picture

Subscribe. Lets release D8 right after this gets in :)

moshe weitzman’s picture

I think we want the ability to fire multiple callbacks for an event like: cache_invalidate('entity', 'node', 'insert', $nid). Perhaps a model like FAPI submit handlers is what we want here.

andypost’s picture

I think better use arrays

<?php
 
cache_invalidate
( array( $bin1 => array($tags1), $bin2 => array($tags2) ) );
?>
catch’s picture

@andypost: the problem is that node module has no idea about what's caching node content or where. Nodes are cached in blocks, pages, views cache, panels cache. And caches aren't always in bins, for example css and js aggregation. However node module can make the assumption that it's getting cached somewhere, so it can inform the system of what's going on, instead of the current situation of blindly calling cache_clear_all() for hardcoded tables and having other modules operate on hook_node_*() (or often having to add submit handlers to forms for things with less of an APi than nodes, for example if you wanted to cache based on user roles or permissions).

@moshe:

That seems sensible, we could change the example in the first post from:

function hook_cache_info() {
  $caches = array();
  $caches['page'] = array(
    'invalidate' => array(
      'entity' => array(
        'node',
        'comment',
      ),
      'callback' => 'system_page_cache_clear'.
    ),
  );
  return $caches;
}

to

function hook_cache_info() {
  $caches = array();
  $caches['page'] = array(
    'bin' => 'cache_page',
    'invalidate' => array(
      'entity' => array(
        'node' => array('system_page_cache_clear'),
        'comment' => array('system_page_cache_clear'),
      ),
    ),
  );
  return $caches;
}

Then

<?php
hook_cache_info_alter
($caches) {
 
$caches['page']['invalidate']['node'][] = 'my_extra_callback';
}
?>

But probably we won't figure out the actual structure until we try to apply it.

catch’s picture

Title:More granular cache clearing» Cache invalidation API

Better title.

EvanDonovan’s picture

Subscribing. Looks very cool.

andypost’s picture

There's a great discussion - Drupal Cache Roadmap for D8 http://groups.drupal.org/node/75823

catch’s picture

That discussion looks useful, however (fortunately) it's pretty separate from this one, which is only talking about invalidation.

slantview’s picture

I just found this, and the two conversations are separate, but any decisions made here absolutely affect the D8 cache API roadmap. Let's pull these two conversations together and i'll update the wiki page to reflect the hooks and cache API features we're talking about adding.

This naming sounds a bit cryptic currently, my only suggestion would be to spend some time to make sure we have clear naming conventions.

+1

Dries’s picture

Issue tags:+Favorite-of-Dries
sun’s picture

sun’s picture

Note that we badly need a quick fix for D7, without an API change. I've opened a separate issue and submitted a stupid-simple proposal in a patch: #918538: Decouple cache tags from cache bins

Dave Reid’s picture

Subscribing. Would also find this very useful in several contribs. Maybe we can experiment in contrib for D7?

sdboyer’s picture

+1 to experimenting in contrib. I was actually noodling around with some ideas the other day, thinking about a contrib module for them...

catch’s picture

I have a start on page cache expiration changes in http://drupal.org/project/performance_hacks. It's ugly as hell at the moment (have to replace node_form_submit() entirely), but is the sort of thing that'd benefit from this change, if someone starts a contrib project for the API please ping me.

catch’s picture

Priority:Critical» Major

Downgrading all D8 criticals to major per http://drupal.org/node/45111

carlos8f’s picture

subscribing

beejeebus’s picture

subscribe.

mikeytown2’s picture

subscribe as I'm always hacking around drupal's cache clear all in my various modules. I've been quite successful at it I might add.

pillarsdotnet’s picture

Edith Illyés’s picture

moshe weitzman’s picture

D8 is open for business. Anyone willing to flesh this out a bit more? If so, assign this issue to yourself.

catch’s picture

I've been thinking about this, but not had a chance to write it up, this is still extremely rough but has moved on a bit since the last rough idea (and changed a couple of times in the writing of this post).

I think our archetype for complex cache invalidation should be entities - since they show up in so many different kinds of cache (entitycache, blocks, page cache, views etc.) - and most things that we cache that don't include entities are usually low level and maintain their own caches relatively well (variables, locale, system info etc.). That doesn't mean we should build something that's specific to entities + fields, but we should aim to have a system that covers most use cases for those, in the assumption it will work more or less well for other things.

For example, say I want to cache the teaser view of a node - this is something I'd like us to at least support doing in core, rendering entities is one of the most expensive things that happens during page building - currently performance hacks has some support but it is a big hack. We cache rendered entities in the full page cache, blocks, views etc. but apart from views content cache all of these are very 'dumb' caches that have no knowledge of what they're caching, and have to be cleared very aggressively or allowed to get stale.

At the moment, the best you can do rendering cached nodes via the cache API is:

<?php
cache_set
("node:render:$nid:$view_mode:$roles:$language:$etc", $rendered_node);
?>

Now if the node is updated, I can clear the cache entry for that node with a wildcard clear:

<?php
cache_clear_all
("node:render:$nid:", 'cache_foo', TRUE);
?>

But say I change the formatter for the image in the teaser view mode, I can't do:

<?php
cache_clear_all
(":$view_mode:", 'cache_foo', TRUE);
?>

since there is no caching engine that would be able to handle a wildcard prefix (MySQL could run the query, but it's very slow). This leaves us with the option of just clearing everything on that kind of change, or allowing things to be stale for some amount of time.
All of the cache key past the first part is just namespacing, it avoids collisions but it's useless for invalidation.

Similarly, say I delete a taxonomy term. This may be included in the rendered output of one node, no nodes or one million nodes. In this case I have three choices:

1. Clear 'everything' - cache_clear_all()
2. Don't clear anything.
3. Do an entityFieldQuery, find everything that references that taxonomy term, clear any caches I know about, but I might miss a block that renders the node that I don't know about.

I don't really like any of these much, the original idea in the opening post here was to make #3 easier, but it is going to end up messy I think, even if a bit better than what we do now (blow everything away or leave everything stale).

So one thing we could do, is send some metadata in when we do a cache_set(), then have modules decide whether that data is relevant to caching or not, for example:

$realms = array(
'entity' => array($node),
'view_mode' => 'teaser',
// Any key might be valid here potentially.
);

cache_set($cid, $rendered_node, $bin, $realms);

Then we invoke hook_entity_realm_alter() - this would let you add or remove realms that are passed in.

Then, for each realm, we invoke hook_cache_realm($realm, $value); and hook_cache_realm_REALM($value);

This hook would return an array of triggers (not trigger module triggers but I can't think of a better name) - a trigger is a string that is linked to the cache key - when we see the trigger, we should clear that cache item.

Entity module might do this:

<?php
/**
 * Implements hook_cache_realm_REALM().
 */
function entity_cache_realm_entity($entities) {
 
$triggers = array();
   foreach (
$entities as $entity) {
    
$triggers[] = "entity:$entity->type:$entity->id");
    }
   }
   return
$triggers;
}
?>

And it might also want to store the view mode as a trigger, since we'll want to clear the view mode across all entities when its settings are changed:

<?php
function entity_cache_realm_view_mode($view_mode) {
   return array(
"view_mode:$view_mode");
}

Text module knows that entities may have text fields, that are linked to text formats. And that when we change a text format, we need to invalidate everything that references the text format (currently it wipes the entire field cache whenever any text format is updated).

<?
php
text_cache_realm_entity
($entities) {
  foreach (
$entities as $entity) {
   
// Find all the text formats for all text fields in this entity.
    // [...]
   
foreach ($formats as $format) {
      
$triggers[] = "text_format:$format_name";
    }
  }
}
?>

Reference could do the same - it would add a trigger for each entity referenced by the entity being saved (or alternatively it could implement hook_cache_realm_alter() and add each of the referenced entities as a realm instead). So a node that references five taxonomy terms, would get a 'trigger' for each term - entity:taxonomy_term:$tid.

Then in the caching backend we need to store the triggers in relation to the cache id. I haven't given this bit a lot of thought, and the implementation may well be sucky, but for arguments sake:

In MySQL we'd have a {cache_trigger} table with:
cid | trigger

DrupalDatabaseCache::set() would now need to insert into the cache table, then do a multi-insert into the cache_property table.

Wildcard clears are now removed completely, and instead you get:

cache_clear_realm($realms);

So, say I clear a taxonomy term, I can then do:

cache_clear_realm(array('entity' => $taxonomy_term));

cache_clear_realm() also invokes hook_cache_realm() and hook_cache_realm_REALM() - and with the same information, builds up the same list of triggers.

Then the caching backend takes the triggers, and does something like:
DELETE FROM {cache_foo} WHERE cid IN (SELECT cid FROM {cache_trigger} WHERE trigger IN (:triggers));

(there'd need to be some cron garbage collection which would do DELETE FROM cache_trigger WHERE cid NOT IN (SELECT cid FROM {cache_foo}) if we did this).

This would then remove any cache entry which has registered entity:taxonomy_term:$tid as its context - could include all cached view modes for the term, as well as all cached view modes of all entities referencing that term.

For MongoDB, the triggers could be stored in the same document as the cache item, you'd then have an index on the trigger key of the document, and do a simple delete on all records that have that value.

For memcache, the current version of the memcache module has wildcard support that keeps a count of each wildcard and compares this when items are fetched. Instead of this, the triggers could be stored with the cache item, and then you could keep a count per trigger.

Where this starts to get a bit trickier:

You have a block that caches the latest five items in a taxonomy term, this should update when:

* one of those five nodes is updated.
* a new item is added to the taxonomy term.
* the taxonomy term itself is updated.

It's easy to get the five nodes, and add each node as a realm.
It's easy to add the taxonomy term itself as a realm.

What is a bit tricky, but there might be a simple enough answer, is to trigger a cache clear of the block, when a new node is added to the realm. We might need a trigger for specific to lists of stuff for this.

Advantages of a system like this:

1. We'd be able to do a lot better than cache_clear_all() in the vast majority of cases, much better cache hit rates.
2. Realm and triggers could be customized by contrib and custom modules via alter hooks (both adding and removing).
3. We'd only add this overhead to items that are currently cleared by wildcards or blown away. If there's no realm, there's no overhead.

Disadvantages:
1. For MySQL, it's an extra multi-insert on set, and a big DELETE FROM on clear.
2. For Memcache there would be a runtime cost fetching the counts for the triggers - but we already have this for wildcards.
3. There is the potential for modules to be either over or under vigilant with realms and triggers - same as things go mad with cache_clear_all() or update the node table without using node_save() now.
4. We've exchanged $cid + $bin, for $cid, $bin, $realm and $triggers.

However, $realm could easily be swapped out for the $context object from butler - so you'd have cache_set($cid, $bin, $context);

Potentially, we could automatically generated cache keys based on context - so cache_set($context, $bin); - this somewhat relates to edge side include keys too (where you'd need to be able to build a context object from a string).

Should that all come together, the only addition here would end up being the hooks and the concept of cache clear triggers.

moshe weitzman’s picture

This is a hard problem for sure. I do think a new table like {cache_trigger} is the only way to track exactly cache items and events which invalidate them.

I recall that mysql is (was?) really inefficient with subqueries like DELETE FROM {cache_foo} WHERE cid IN (SELECT cid FROM {cache_trigger} WHERE trigger IN (:triggers));. Needs investigating.

Also, I think that nearly every cache item is going to record triggers which will extremely rarely ever activate. Stuff like language and view mode. We still have to multi-insert these over and over again. I guess I should not sweat it but it is a bit of a bummer for slower cache backends like DB.

I can't think of a better name than trigger right now. Another way to think of these are 'the list of potential changes which would cause a given cache item to become invalid and thus get deleted'. i kinda like 'causes of death' but thats a bit too cheerful :)

catch’s picture

Moshe pointed me to http://framework.zend.com/manual/en/zend.cache.html, which is what Symfony2 uses and I also found http://framework.zend.com/issues/browse/ZF-4253

The latter discussion is interesting since they are having the same problem with 'tags' that Drupal's memcache integration had with wildcards (I say 'had' because while the current code in the memcache project is complex, it's mostly there now).

Looking at that Zend documentation (I did not review the code there yet).

- what they call tags is exactly the same as what I'm calling triggers here.
- in Drupal we would definitely need a mechanism above triggers/tags to allow contrib modules to add their own (field modules dealing with some kind of reference is the obvious example). However we might want to support adding tags directly as well as via context.

While tags is as bad a word as triggers for confusion, I could definitely live with using that.

edit: also found http://code.google.com/p/memcached-tag/

catch’s picture

Looking at Zend a bit more, I wanted to see how they implement tags with sqlite (the closest they appear to have to our MySQL backend) - http://open.thusa.net/tabs/browser/TABS/trunk/web/library/Zend/Cache/Bac...

edit: that was from four years ago, the latest version is at http://framework.zend.com/code/filedetails.php?repname=Zend+Framework&pa... - however there don't appear to have been many changes in four years.

There's lots of problems there in the implementation (DISTINCT query for no obvious reason, loading all records into memory regardless of result set size, no multi-insert for tags, use of @ all over the place) but the basic concept is very similar.

sun’s picture

The proposal is in the right direction, but limiting this whole thing to be a workaround to improve our cache invalidation doesn't make sense to me. If you remove the word "cache" from the last follow-ups, then we are at the actual and essential point: We need a system that is capable of tracking what is being involved where.

Caching and cache invalidation is just one use-case for this information. The very same information is required (and badly missing) for countless of other situations and operations. Be it module updates that need to figure out what is affected by a certain change and needs to be updated, or be it administrative lookups/audits that need to investigate what is possibly going to be affected by a configuration change. All of this (and more) requires the same kind of internal and reliable data source.

Therefore, it would be a huge gain for Drupal at glance if we'd slightly shift the focus and try to solve the what is being involved where problem, which apparently is the heart and primary challenge of the cache invalidation issue. If that info would exist, proper cache invalidation would be a lot easier.

For starters, we already have a simple/suboptimal involvement tracker with the file usage API.

catch’s picture

While this issue does touch on some others that we have elsewhere, there are some parts of it which are unique to caching - for example the fact that cache entries may not exist at all (hence the attempt here to avoid the "query every possible item that could be affected and clear caches for it" pattern), and that we need the caching backend to be able to control the tagging situation for best performance.

If we're able to generalize some aspects of this to other systems, that's great, but I consider this a "stop the bleeding" issue. Someone mentioned last year that our page cache disappears whenever someone breathes, this is the motivation behind this issue.

andypost’s picture

Agreed with @sun that we need to focus on involved bricks of system. If the butler approach with context will be realized by this way we get context + actual render (or usage of cached item)

Cache tags has a long story but the main trouble with memcache that tags periodically are stale from cache because of nature of memcache (LRU)

Another way I'd like to point into issue #1120928: Move "history" table into separate History module

Core already have node-history table so we can interpolate this approach to cache:
- extend history support for core entities
- run cache clean depending on historical usage of entity within context of block, view and whatever...

catch’s picture

Summary of Zend Cache research so far:

- it uses the concept of 'tags' the same way that triggers are used here.
- there is no layer around tags to allow them to be modified or added to (probably because it's a framework rather than a framlication)
- the sqlite implementation is the only one (apart from Zend Platform, but that is EOL) that supports tags.
- there is a patch for memcache, but it is currently in the same state as the original wildcard support for http://drupal.org/project/memcache (store all tags in one huge array and hope there aren't too many)
- the sqlite implementation is loading every possible cid into memory via a DISTINCT query then deleting one by one by cid, this is not an option for us.

I also looked at Kohana's cache framework. They have have much the same concept of tags as Symfony.

- The SQLite implementation uses DELETE FROM cache WHERE tags LIKE '%foo%'; - this is just as unacceptable as loading every possible tag into memory since prefixed wildcards cannot use indexes.
- The Memcache tag support is there, but relies on http://code.google.com/p/memcached-tag/ - a fork of memcache.

Neither of these have support for Redis or MongoDB as cache backends, which are potentially (although I've not used either of them in practice as caching backends, and not used Redis at all yet) the best of both worlds to support this sort of thing.

Based on this, I'm reassured that two of the more highly regarded PHP frameworks have more or less the same concept baked into their caching APIs as I'm proposing here. But I'm a bit disappointed the implementations aren't very robust at all so there's unlikely to be much we can steal. CakePHP's caching API doesn't support tags at all.

If people know of better examples, that'd be great, but having looked at these I'm tempted to get some draft code in place for this sooner rather than later. While I was originally thinking the context object would be good to pass in (and it still might), it may be more useful to pass in a render array - this would allow tags to be added a similar way to cid_parts/#attached are now (particularly being able to suck up #attached to the top level would be great here).

catch’s picture

@andypost:
"Cache tags has a long story but the main trouble with memcache that tags periodically are stale from cache because of nature of memcache (LRU)". We already came up against this in memcache wildsupport, see #1085626: Wildcard entries may go missing, leading to caches not being invalidated properly which fixes this.

The history feature is about when things were viewed by users, it is not whether something appeared somewhere or not, and even less about whether it was cached in that context.

I agree Butler can be useful here, but we can't postpone every core patch on Butler, and it's only potentially useful for a small part of this - there is a lot of work to do here that has nothing to do with butler.

catch’s picture

@moshe - with the subselect I think you're thinking of #961604: dbtng drivers are not able to have their own condition classes. I'm assuming we can rely on that being fixed in Drupal 8 (and 7).

carlos8f’s picture

I still have to do some catch-up reading on this issue, but I wanted to mention this in light of @sun's "what is being involved where" comment.

The FlexiCache project provides something like this. You can associate users, nodes, comments, etc. with cache entries via a metadata database. Then clear the cache via those entity IDs, rather than cache IDs or "general wipes". This system sits on top of the traditional cache API as a wrapper, so it's an API addition rather than a total shift.

The main thing holding this back is that the current implementation depends on MongoDB as a metadata backend. That makes it easy to create indexed arrays and query with arrays. I have FlexiCache running on a fairly large site right now, and it has improved our cache hit rate (particularly for block cache) by a *huge* amount.

carlos8f’s picture

I read #30 now and it basically describes FlexiCache. The key idea is that wildcards are very clumsy to query on, and need to be phased out in favor of a more structured (and query-able) array of metadata.

In catch's example, cache_clear_realm(array('entity' => $taxonomy_term)); would be equivalent to the Flexicache syntax: flexicache_clear_all(array('terms' => $tid)). Instead of "realms" I called them "properties" (although I tend to think of them as tags).

FlexiCache has been running in production for 4 months on a social network, and although the garbage collection aspect could be improved a bit (there are 400k+ metadata objects), the concept is working in practice, our cache lifetimes are much better, and I have confidence that this would be a really good core improvement. The only challenge is making this work in SQL without subselect chaos raining down on everyone :)

Lars Toomre’s picture

@catch: I think this is a great topic and am looking forward to sample code for review.

catch’s picture

@carlos8f, thanks for the link to flexicache, I hadn't seen this project before and it's good there's already something similar around. So far I've convinced myself that tag handling should be handled by the caching backend (i.e. no separate backends for tag database and cache, although it would of course be possible to build a cache backend that does that internally). If you get a chance to read up on this, or see me in irc, would be great to discuss that more.

One thing my examples above don't cover at all is tracking tag flushes against particular bins - if we clear by tag, we don't want to have to specify the bin, but we also don't want to be clearing by tag for every bin, even if that kind of tag has never been used on a bin (i.e. taxonomy term tags shouldn't be running against cache_bootstrap). I'm not sure what the answer to that is yet, although again memcache module has some logic dealing with this for wildcard flushes that we might be able to adapt even to the SQL backend - the array('terms' => $tids) syntax could help with this (namespace the tag flushes.

One the subselect #961604: dbtng drivers are not able to have their own condition classes is a lot less far on than I thought (I changed the title to reflect this), however I'd still hope we could solve that issue parallel to this and take advantage of the optimization there.

There's also an argument that if you have this many cache items, you should not be using the SQL caching backend (similar to search module), but it depends how bad the queries get with x number of items/tags.

Crell’s picture

catch pointed me here for reference. The tags/triggers idea sounds logical to me, and the fact that the other major frameworks haven't been able to do any better means we're probably on the right track. :-) By and large I agree, at least in concept, with what catch is describing. I've not looked at any implementation yet.

As far as butler/context/WSCCI, I agree that should not be a blocker here. Rather, the cache system itself should be able to handle "tags", and then we should, later, try to figure out a good way to automate what the right tags are to use based on context data. For blocks, one thing I've discussed with David Strauss and Sam Boyer is having blocks declare the context keys that they will need. Deriving that information dynamically will be impossible. So if a given block declares that it depends on a node and the language, then you can reliably cache it based on nid and language, and use nid and language as Varnish cache keys, etc. Then we could cache blocks using what catch describes automagically, using that same meta information.

Perhaps we could expand that concept? That is, are there other "cachable thingies" that we could conceivably declare up front vary based on context keys X, Y, and Z? If so, then we could automate their caching based on those keys. Views queries, for instance, I once discussed with Earl as being something we could conceivably cache if we can reliably cache "plugins", or some subset of plugins. A views query object varies based on the arguments fed to it. If there are no arguments, great. If there are, then it varies based just on that. (OK, that's an over-simplification, but you hopefully get the idea.)

If we could define "thing thingie is cachable", then we could declare up front "and these are the keys that matter", and then automatically cache based on that data with no further user involvement. Then we just need to make sure that lots of thingies meet whatever criteria that is.

podarok’s picture

subscribe

catch’s picture

Component:base system» cache system
catch’s picture

There is no direct dependency, but I would like to get this clean-up of cache_clear_all() in (or at least ready) before adding new features to the cache API - #81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers).

jherencia’s picture

Subscribing...

catch’s picture

Title:Cache invalidation API» Cache tag support
Issue tags:+API change

Updating metadata for the current plan.

catch’s picture

Assigned:Unassigned» catch
Category:task» feature
Issue tags:+Needs issue summary update

This is a feature so it should be a feature request.

Assigning to me to write an issue summary, I'll also get some proper pseudo code in here.

moshe weitzman’s picture

Some interesting Varnish-related ideas at http://pooteeweet.org/blog/0/1979#m1979

EvanDonovan’s picture

@moshe: Very interesting. So in a Drupal context, that would entail things like adding an html header that showed which nids were in a given View, so that the Views-generated page could be invalidated if any of those nodes had been updated?

I don't know if there's anything core could do to support something like that though - seems like it would be best in a contrib module.

catch’s picture

I've been thinking the tags would look something like this as passed to cache_set().

<?php
array(
 
'node' => array(1, 10),
 
'user' => array(16),
);
?>

So that'd be easy enough to convert to a string like:

node:1,node:10,user:16

The API for setting a cache item with tags just looks something like: cache($bin)->set($cid, $data, $tags);

If it's in the db or another backend you have something mapping cids to tags. If you're using the varnish backend then cache()->set() would normally be no-op, but could perhaps set the header with that string instead.

For automatically pulling up tags from views etc. for the page, we could use a similar mechanism to #attached #cache which we already have. Renderable arrays can have cache tags attached to them, and these get pulled up to the highest level (either full page level cache or a block, or an ESI block).

#81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers) is RTBC, I'm planning to shift attention to an initial patch here once that's in.

mikeytown2’s picture

In boost I use the boost_cache_relationships table to keep track of what nodes show up on the pages. Works well, but is a little DB intensive as a page with 10 nodes on it, creates 10 entries in the database; boost_schema(). This is the reason you see people running with Boosted Varnish. Its the same setup we are using at my work; we run don't even use the .html files boost creates unless we have a "bad event" (happens about 2x a year).

catch’s picture

Status:Active» Needs work
StatusFileSize
new3.26 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in includes/cache.inc.
[ View ]

Here's a very rough start.

It adds a $tags argument to cache()->set()

And it adds cache()->deleteTags($tags)

I made a rough stab at what the database implementation might look like but there's no schema to test it with and I haven't verified the relative efficiency or not of those queries yet.

One thing that needs sorting out here: we currently hard code the concept of clearing on content events to the cache_page and cache_block bins. When using tags this actually has some advantages - in that we don't want to be running delete queries against every cache bin, that might not use tags at all. This may encourage doing something like #1194136: Re-organise core cache bins - we probably want to continue to reserve one or two bins for rendered HTML strings and allow modules to assume their stuff will end up there unless they're rolling their own cache clearing implementations too.

I want to get the SQL implementation firmed up so it's possible to do some basic benchmarks with it.

The other half of this is the collection of cache tags from things we'll be caching (i.e. in drupal_render() at least to start with). Not yet sure if we want to do that in the initial patch or not since that's a very separate layer.

Also I'm still thinking of ways to invalidate lists of content - i.e. not when an item is update or deleted from it, but when an item is added to it. this is likely to be "node added to term ID foo" or "node added to organic group foo", maybe we can do something like node_insert_og_group : 1 and node_insert_taxonomy_term: 3 for things like that.

@moshe thanks for the link to the varnish discussion, some of the conceptual issues (how to collect references to things that should cause cache items to clear, adding a hook system so they can clear caches, how to deal with additions to lists) are exactly the same problems that are being dealt with here. I'm pretty sure the actual varnish integration with this will just be a small add-on though once we've got the rest of the infrastucture in place.

catch’s picture

Issue summary:View changes

Updated issue summary.

catch’s picture

Added schema for cache_tags and cache_page_tags along with tests which now pass.

dbtng only allows this table design to work with a subquery as opposed to a join (have a feeling sqlite doesn't support joins in DELETE statements).

Next I generated cache items with this (drop it in drupal root as tags.php and run drush scr tags.php).

<?php

$n = 0;
$values = array();
$values = array_fill(0, 10000, 1);
$namespaces = array('foo', 'bar', 'baz', 'banana');
while ($n < 200000) {
cache()->set($n, $n, CACHE_PERMANENT, array($namespaces[array_rand($namespaces)] => array_rand($values, 3)));
$n++;
}

This puts 200k items into {cache} and 600k into {cache_tags}, with 4 unique namespaces and 10k unique values.

Then I checked how many matches there'd be on a particular tag:

SELECT COUNT(*) FROm cache_tags WHERE namespace = 'foo' AND value = 20;

this gave me 20 items to delete, subsquent was between about 15-40.

Then I ran:

catch@catch-laptop:~/www/8$ time drush php-eval "cache()->deleteTags(array('foo' => array(110)));"

real 0m0.861s
user 0m0.240s
sys 0m0.040s

So that's 0.8s with the subquery.

JOIN is instant though:

MariaDB [d8]> DELETE FROM cache where cid IN (SELECT cid FROM cache_tags WHERE value = '129' AND namespace = 'foo');Query OK, 8 rows affected (0.79 sec)

MariaDB [d8]> DELETE cache FROM cache INNER JOIN cache_tags ON cache_tags.cid = cache.cid WHERE cache_tags.value IN ('139') AND cache_tags.namespace = 'foo';
Query OK, 9 rows affected (0.00 sec)

catch’s picture

StatusFileSize
new8.43 KB
PASSED: [[SimpleTest]]: [MySQL] 32,930 pass(es).
[ View ]

Forgot patch.

catch’s picture

chx suggested this syntax in irc, it's better than the straight subquery but the join is considerably better still.

MariaDB [d8]> DELETE FROM cache where cid IN (SELECT * FROM (SELECT cid FROM cache_tags WHERE value = '169' AND namespace = 'foo') AS cache_tags);
Query OK, 14 rows affected (0.58 sec)
catch’s picture

Talked to nnewton about this in irc. He pointed me to:

http://bugs.mysql.com/bug.php?id=9021

That means the subquery will be fast in MySQL 5.something (apparently there will be no MySQL 6.0), but may nor may not get backported to a stable 5.x release.

This leaves switching on the driver and using db_query() with the JOIN if it's MySQL, but the db_delete() for everything else. This would present the opportunity to write a big comment about how much MySQL sucks linking to that bug report, but I'm not sure anyone will let me do that. However I'm probably going to write a patch for it to show the issue.

Otherwise there are not really any options (pulling into PHP is bad, LIKE '%foo%' queries on a string will likely be worse than the subquery).

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new10.87 KB
PASSED: [[SimpleTest]]: [MySQL] 32,924 pass(es).
[ View ]

Alright. Added support to dbtng to determine if a driver supports JOIN with INSERT/DELETE/UPDATE after talking to chx in irc.

This leaves some quite ugly code in cache.inc to handle the MySQL bug still - but it is limited only to the specific implementation of database caching that core ships with.

This makes the DELETE a fully indexed query in MySQL, with 600k rows in cache_tags it is not even measurable by the MySQL cli so should scale fine.

Uploading the latest patch, more to do here but let's see what the bot thinks and it'd be cool to get feedback on the general API at this level too.

One thing I'm not sure about is the order of expire vs. tags in cache()->set(), I put $tags at the end for now since there is no bc break that way and I dunno which is going to be used more yet.

chx’s picture

I support this. If your driver cant deal with an unstructured fugly DELETE statement then... well do nothing as the code defaults to no DML JOIN anyways :) protected $joinDMLSupport = FALSE;

beejeebus’s picture

i'm happy with where the ugly is in this patch.

no code consuming the API should ever have to care that we got ugly to make their site fast, because it's an implementation inside a method on an interface.

we can safely gut this later without any BC changes at all.

beejeebus’s picture

StatusFileSize
new10.95 KB
PASSED: [[SimpleTest]]: [MySQL] 0 pass(es).
[ View ]

added an updated patch, only change is to truncate the cache tags table associated with a bin on flush.

catch’s picture

StatusFileSize
new11.62 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in includes/cache.inc.
[ View ]

Also adding deletion of tags when ->delete() or ->deleteMultiple() is called. This is likely to break dozens of tests since we don't have the $bin/$bin . _tags schema everywhere yet.

I really hate the current cache bin creation method, have been planning to move that schema definition into the DrupalDatabaseCache class for a while, this might be the time to do it...

Status:Needs review» Needs work

The last submitted patch, cache_tags.patch, failed testing.

Berdir’s picture

+++ b/includes/cache.incundefined
@@ -304,6 +310,15 @@ interface DrupalCacheInterface {
+   *   An array of tags grouped by namespace. The array follows the same format
+   *   as that used to in DrupalCacheInterface::set().
+   */
+  function deleteTags($tags);

Don't we have to override this method too in DrupalFakeCache? Will probably not be called, but might still be a good idea...

+++ b/includes/cache.incundefined
@@ -485,8 +515,42 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+
   function flush() {
     db_truncate($this->bin)->execute();
+    db_truncate($this->bin . '_tags')->execute();

What happens if we call flush() (or deleteTags()) on a cache bin that does not have a tags table defined? Should we check with a tableExists()?

Or alternatively, could we even only have a single cache_tags table which would have an additional 'bin' column? No idea how much slower that would be...

EDIT: Crosspost with catch :)

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new17.73 KB
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]

DrupalFakeCache isn't really fake - it deletes whatever it can, so we might not need to override but I'll check.

I'd be really concerned about the performance implications of a since cache_tags bin - we originally split the cache tables (back in 2006 or earlier) to avoid contention on a single table, and tags are many to one vs. cache items. So if this part of the API got used a lot it could be a massive, massive database table with a lot of i/o.

This patch at least simplifies the schema code a bit, moves it out of system_schema() and into DrupalDatabaseCache::schema() - which makes creating a bin a bit of an easier task.

I'd rather move towards #1167144: Make cache backends responsible for their own storage so that if you don't use the database cache you don't have the redundant database tables at all - we'll need the schema in the DrupalDatabaseCache class for that, but creating the bins themselves would no longer be done by modules at all, so this is a bit of half way towards that. I might try to move this hunk into that patch and get that one updated though.

Status:Needs review» Needs work
Issue tags:-Performance, -Favorite-of-Dries, -Needs issue summary update, -API change

The last submitted patch, cache_tags.patch, failed testing.

catch’s picture

Status:Needs work» Needs review

#67: cache_tags.patch queued for re-testing.

Status:Needs review» Needs work
Issue tags:+Performance, +Favorite-of-Dries, +Needs issue summary update, +API change

The last submitted patch, cache_tags.patch, failed testing.

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new19.46 KB
FAILED: [[SimpleTest]]: [MySQL] 30,291 pass(es), 639 fail(s), and 73,242 exception(es).
[ View ]

Fixed installation and a few other bits. The schema code is making me said so going to switch gears to #1167144: Make cache backends responsible for their own storage.

Status:Needs review» Needs work

The last submitted patch, cache_tags.patch, failed testing.

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new18.87 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache_tags_4_0.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

Without the debug.

Status:Needs review» Needs work
Issue tags:-Performance, -Favorite-of-Dries, -Needs issue summary update, -API change

The last submitted patch, cache_tags_4.patch, failed testing.

pillarsdotnet’s picture

Status:Needs work» Needs review

#73: cache_tags_4.patch queued for re-testing.

Status:Needs review» Needs work

The last submitted patch, cache_tags_4.patch, failed testing.

Berdir’s picture

Status:Needs work» Needs review
Issue tags:+Performance, +Favorite-of-Dries, +Needs issue summary update, +API change

#73: cache_tags_4.patch queued for re-testing.

catch’s picture

I've hijacked #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support for the integration with content caching, since that's likely to not be a small job, and it's (by design) going to happen completely outside cache.inc except for the eventual removal of cache_clear_all().

Since tests pass I think this is at the point where it needs actual reviews - we'll be adding a method that's not called anywhere except tests yet, however that already exists for isEmpty().

miro_dietiker’s picture

Looks very promising to me. I think that's the right direction.

Although you clearly showed the disadvantages, we hopefully will have much less clear-everything cases in future. This will allow much better long-running performance. It's just very important to eliminate all those clear-all cases in regular code to make the cache perform better for future.

EDIT: If a caching system will support tagging natively... Will we be able to cover that with best performance? :-)

catch’s picture

I'm not aware of any dedicated caching systems that support tagging natively except for this experimental (and dormant) fork of Memcached which doesn't look like it's getting much use.

However, MongoDB would definitely support this (you could store tags as part of the cache item document, add an index on the array, and delete using that index).

I haven't played much with Redis yet but that also ought to be able to handle this well from what I know of it.

Moshe linked to lsmith's post about doing a similar thing with varnish (store tags with the content, then invalidate via a regexp).

Since we'll let the caching plugin handle both the way any tags are stored, and also how they're invalidated, it'll be up to the plugin author to figure out the best way to handle that with the storage engine.

Owen Barton’s picture

I haven't figured out the exact mechanism, but this could probably help improve the cache efficiency of #152901: Caching (per user) for authenticated users quite a bit.

carlos8f’s picture

I'm starting to work on FlexiCache 2.x (http://drupal.org/project/flexicache) which will store the cache data in Mongo, have memcache-like version flushing for tags and wildcards, and no longer require a core patch (I hope). I need to look at the SQL patches here and take ideas from that, and possibly have an approach that will work for both SQL and noSQL.

xjm’s picture

Untagging.

moshe weitzman’s picture

Looking good.

  1. Is varchar 64 enough room for the tag value?
  2. I'm not clear on why block.install and all the others hard code DatabaseCache like so: '$cache = new DrupalDatabaseCache('block');'. Couldn't they choose to use non-DB cache backend? Block is not a particularly special module anymore.
catch’s picture

@moshe - that's in #1167144: Make cache backends responsible for their own storage. I re-used some of the stuff from there here, but did not want to combine both patches.

Cache tag value length - how long is a uuid?

andypost’s picture

A UUID is a 16-byte (128-bit) number. But most database storage engines can manipulate with UUID by COLUMN definition.

I think each cache definition(_info) could define some limits (native_UUID = TRUE, key_limit = 64, chunk size = 1M) also I lke to remember about cacherouter module - each router have it's own key generation override

carlos8f’s picture

#73: cache_tags_4.patch queued for re-testing.

carlos8f’s picture

StatusFileSize
new18.89 KB
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]
carlos8f’s picture

StatusFileSize
new19.28 KB
PASSED: [[SimpleTest]]: [MySQL] 33,323 pass(es).
[ View ]
new19.65 KB
PASSED: [[SimpleTest]]: [MySQL] 33,327 pass(es).
[ View ]

For the sake of experimentation, I'm introducing a couple of alternate implementations.

The concept is to do simple invalidation of tags instead of synchronously deleting all the data on a cache clear. This shifts some weight to cache get/set, since the get needs to check against the current "version" of a tag, and the set may need to get/attach the current "version" to the data record. However, I think we still win in the end.

Having run a Mongo-based cache tags system in production for most of 2011, cache clears have been the biggest bottleneck, as I was using a "select all from these tags" -> "delete all that stuff" combination which frequently ends up in unhappy cursors and timed out requests once there is a good deal of tagging going on. Shifting to a lightweight invalidation scheme I think is the way to avoid that. It will also make cache tags very do-able in simple key-value stores, Mongo, Redis, whatever :)

RE: the two approaches, I prefer the counter-based approach because it avoids weirdness when time sync is slightly off between servers. The timestamp approach has faster cache_set() though, since it doesn't need to query for the latest tag checksum.

Of course, benchmarks should be done. I will try to get some up for baseline vs. catch's, baseline vs. each approach here, and breakdowns of performance loss or gain for each cache operation. The benchmarks need to operate on a large dataset to be realistic; I may be able to export my production cache tags from Mongo to play with.

carlos8f’s picture

StatusFileSize
new19.28 KB
PASSED: [[SimpleTest]]: [MySQL] 33,330 pass(es).
[ View ]
new19.65 KB
PASSED: [[SimpleTest]]: [MySQL] 33,325 pass(es).
[ View ]

Oops, fixed cache-install invalidate() method name.

miro_dietiker’s picture

carlos8f if the decision is not clear, why not having both strategies available but configurable?
I'm not sure if we'll find the perfect decision here that matches best for all cases.. Are you sure we will?

carlos8f’s picture

The DB schema in my patch is very different from catch's, that would make switching between the two difficult. Plus it would be a lot of baggage to carry around multiple implementations. I am just trying to play devil's advocate and see if there's interest in other approaches.

carlos8f’s picture

StatusFileSize
new20.4 KB
PASSED: [[SimpleTest]]: [MySQL] 33,324 pass(es).
[ View ]
new20.41 KB
FAILED: [[SimpleTest]]: [MySQL] 33,304 pass(es), 0 fail(s), and 4 exception(es).
[ View ]

I ran some benchmarks:

<?php
#!/usr/bin/php
<?php

timer_start
('set');
for (
$i = 0; $i < 10000; $i++) {
 
cache_set("test-$i", NULL, 'cache', CACHE_PERMANENT, array('user' => array(1, 2, 3, 4), 'node' => array(1, 2, 3, 4)));
}
timer_stop('set');

timer_start('get');
for (
$i = 0; $i < 10000; $i++) {
 
cache_get("test-$i");
}
timer_stop('get');

timer_start('clear 1');
cache_clear_tags(array('node' => array(3)), 'cache');
timer_stop('clear 1');

timer_start('clear multi');
cache_clear_tags(array('user' => array(1, 2, 3, 4), 'node' => array(1, 2, 4)), 'cache');
//cache_clear_all('test', 'cache', TRUE);
timer_stop('clear multi');

unset(
$GLOBALS['timers']['page']);
var_dump($GLOBALS['timers']);
?>

I ran this 3 times for each patch, and took the middle result for each stat.

(numbers expressed in milliseconds)

Patch set() get() clear_tags() single clear_tags() multi
8.x 0.97 0.19 n/a 179.56 (wildcard)
catch (JOIN) 33.68 0.18 1154.12 5164.90
catch (subquery) 17.86 0.19 1062.97 4261.28
carlos8f (checksum) 1.06 0.18 1.40 7.56
carlos8f (timestamp) 1.04 0.20 1.44 6.8

Although the approaches are only semi-comparable, we can see some of the theoreticals expressed in numbers here. Subquery/JOIN-based tag clearing is clearly much more expensive than simple wildcard clearing. Curiously, the subquery was always faster than the JOIN on my system. There is also significant overhead in multi-inserting tags for each cid, whereas my method keeps that low by storing them in a serialized field and having one row per tag.

The overhead in querying/computing the checksum is actually lower than I thought, so I'm going to discontinue the timestamp-based approach in favor of the checksum. The performance of checksum-based tags is only *slightly* worse than no tags at all, so it looks very promising if we can work out garbage collection.

It's also worth noting that ideally tags can be cleared across multiple bins. My patch stores tags separately from cid's or bins, making that trivial. I need to tweak the patch to implement that though, since the cache class is so coupled with the bin currently.

I have made some fixes and tweaks to catch's patch and mine to do the tests, so I am posting that also. The main bug in catch's patch was that it only cleaned up tag records in non-JOIN mode (probably skewing benchmarks in the JOIN favor), and the null cache needed fixing.

carlos8f’s picture

StatusFileSize
new12.05 KB
PASSED: [[SimpleTest]]: [MySQL] 33,346 pass(es).
[ View ]

Followed up on the cross-bin clearing, this version abstracts cache tag stuff into its own class.* That allows cache tag storage to be extendable and still able to expire across multiple bins.

Cleared out some of the cruft from previous patches, and expanded on the tests to include multi-bin expiration. No longer includes #1167144: Make cache backends responsible for their own storage.

* needs docs :)

carlos8f’s picture

I've started a contrib project to follow this issue: Cache Tags

I also have a working Mongo cache backend with tag support committed there, and it's fast! Benchmark using #93 method:

set() - 0.068ms
get() - 0.16ms
expire_tagged() single - 0.06ms
expire_tagged() multi - 0.26ms

Whew, blazing!

carlos8f’s picture

StatusFileSize
new11.94 KB
PASSED: [[SimpleTest]]: [MySQL] 33,449 pass(es).
[ View ]

Updated patch, renamed cache_expire_tagged() to cache_invalidate().

carlos8f’s picture

Deployed the 6.x Mongo version of Cache Tags yesterday to a production site, and so far there have been 500k entries across 18 bins in the cache, 38k tags and 546k tag invalidations. Server load is very low, transfer rate is about 36Mb/s, and the site's overall speed is much improved. It's working so smoothly, I am a little shocked :)

Edit: after the weekend we've hit 1.5m cache entries, 124k tags and 2.8m invalidations. Working beautifully.

Berdir’s picture

Status:Needs review» Needs work
+++ b/includes/cache.incundefined
@@ -616,3 +639,84 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+
+interface DrupalCacheTagsInterface {
+  function __construct();
+
+  function invaldate($tags);
+
+  function checksum($tags);
+
+  function isValid($checksum, $tags);

Captain Obvious here, documentation missing :)

Also, having an empty __construct() in the interface seems pointless, if nothing is passed to it then there is no requirement to implement it?

Also, typo: "invaldate"

+++ b/includes/cache.incundefined
@@ -616,3 +639,84 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+  function flatten($tags) {
+    if (isset($tags[0])) return $tags;
+    $flat_tags = array();
+    foreach ($tags as $namespace => $values) {
+      foreach ($values as $value) {
+        $flat_tags[] = "$namespace:$value";
+      }
+    }
+    return $flat_tags;

Missing documentation too here, and should maybe be protected?

+++ b/includes/cache.incundefined
@@ -616,3 +639,84 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+class DrupalDatabaseCacheTags extends DrupalCacheTags implements DrupalCacheTagsInterface {
+  protected $tag_cache = array();

You could have the abstract class implement the interface, then you don't need to repeat the implements part.

+++ b/modules/system/system.installundefined
@@ -673,6 +673,18 @@ function system_schema() {
+      'tags' => array(
+        'description' => 'Space-separated list of cache tags for this entry.',
+        'type' => 'text',
+        'size' => 'big',
+        'not null' => FALSE,
+       ),
+      'checksum' => array(
+        'description' => 'The tag invalidation sum when this entry was saved.',
+        'type' => 'int',
+        'not null' => TRUE,
+        'default' => 0,

Not sure if completely understood the schema changes but this is certainly missing update functions and seems to only change the {cache} bin.

I guess we'd want to wait with this stuff until we decided who manages database cache bin creation and when it's done. Because this brings up an interesting point, namely schema updates...

-22 days to next Drupal core point release.

carlos8f’s picture

StatusFileSize
new14.72 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch 0001-Counter-based-cache-tags-in-D8.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Thanks for the review. All good suggestions.

- I added docs!
- took out empty __construct()
- fixed silly method name typo
- flatten() is now a static function (making it protected prevents DrupalDatabaseCacheTags from calling it)

Haven't dealt with the schema update issue, and we might want to leave that to #1167144: Make cache backends responsible for their own storage. I'm not sure if making a special hook for cache bins is really good, since that is making dependencies on the module API for the cache API. However it seems like it doesn't block this issue since we can just provide a hook_update_N() which alters in new fields for cache_* tables in the mean time.

carlos8f’s picture

Status:Needs work» Needs review

And...

carlos8f’s picture

StatusFileSize
new15.87 KB
PASSED: [[SimpleTest]]: [MySQL] 33,651 pass(es).
[ View ]

Added upgrade path, and this time the bot should pick it up.

catch’s picture

So the checksum here is going to do an additional query for each cache item retrieved. Memcache does a similar thing for wildcard clears (with a multi get instead of an IN()) and we saw a big performance degradation with this until extra layers of optimization were added on. It'd be interesting to see how much effect this has on page caching for example compared to core and the original patch I did - where the cache queries will be a larger proportion of the request.

However even if the raw read performance is worse, we might still want to go ahead with this - since it ought to scale well, we may be able to do some similar optimizations to memcache (i.e. don't try to fetch tags for bins that never use them), and even if raw read speed is slower better cache hit rates should make up for it in terms of actual server load.

I think we could drop the cache_invalidate() function and just use cache_tags()->invalidate($tags);

I still need to do a thorough review of this but looks like great progress.

carlos8f’s picture

It won't be exactly an additional query per cache get, but rather, an additional query per tag on cache entries fetched per page load. Individual tag invalidation counts are statically cached, and summed with counts fetched from the db. This is very cheap stuff though, as the query component of a cache get takes less than 1/5 of a millisecond on a humble Mac Mini.

So far in production the system has been scaling very well, and at over 2.8 million tag invalidations, and 40 200+ queries per second, the cache server load is typically at 0.00. I think it works! :)

aspilicious’s picture

Status:Needs review» Needs work
+++ b/includes/cache.incundefined
@@ -616,3 +639,175 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+   * ¶

trailing whitespace

aspilicious’s picture

Status:Needs work» Needs review

Whoops, not needed

catch’s picture

@carlos that's a good point, since the tags are stored with the cache items we're already skipping the lookup if an item isn't tagged.

beejeebus’s picture

Status:Needs review» Needs work

i like this.

took me a while to get my head around the tags implementation, but i can't find any obvious problems with this approach.

i found the 'checksum' stuff hard to follow at first, because that made me think of this:

http://en.wikipedia.org/wiki/Checksum

so, how about something like this in the interface:

<?php
 
/**
   * Calculate the sum of invalidations of the given tags.
   *
   * @param $tags
   *   Associative array of tags to calculate the invalidation count for.
   *
   * @return
   *   Integer representing the sum of invalidations of the given tags.
   */
 
function invalidationCount($tags);
?>
catch’s picture

+++ b/includes/cache.incundefined
@@ -137,9 +137,16 @@ function cache_get_multiple(array &$cids, $bin = 'cache') {
+function cache_set($cid, $data, $bin = 'cache', $expire = CACHE_PERMANENT, $tags = array()) {
+  return cache($bin)->set($cid, $data, $expire, $tags);

It'd be great to get #1272706: Remove backwards compatibility layer for cache API in so we don't have this duplicate any longer.

+++ b/includes/cache.incundefined
@@ -616,3 +639,175 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+ * Cache tags and their corresponding invalidation counters are stored
+ * independently from the cache itself. Cache tag storage is stored on a
+ * site-wide level (as opposed to per-bin) so only one cache tag storage engine

I'm not sure we can enforce that this is a site-wide setting. One use case I have in mind for this is storing cache tags in headers for varnish, then using varnish regexp clearing to invalidate items based on the tags. That would require being able to use specific tag implementations with specific storage. This would mean multiple cache tags backends which all get called each time tags are cleared. It'd be possible to write a cache tag backend that itself calls back to multiple other backends I guess.

In irc msonnabaum also mentioned redis which may well be able to handle directly invalidating wildcards, mongodb can likely do this as well.

Since tags can exist across multiple backends, maybe we just need to allow for multiple cache tag backends - but not per-bin - when invalidating a tag you'd invalidate it for each backend.

Also discussing this in irc, beejeebus mentioned that if this is the case we should move checksum() out of the interface - even if it's used in multiple backends we can allow for alternative implementations that do something else.

Other minor stuff:

+++ b/includes/cache.incundefined
@@ -616,3 +639,175 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+   * ¶

Trailing whitespace.

+++ b/includes/cache.incundefined
@@ -616,3 +639,175 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+  static function flatten($tags) {
+    if (isset($tags[0])) return $tags;

Why the early return here? Also we don't allow if statements without braces.

-24 days to next Drupal core point release.

dixon_’s picture

@catch I've been thinking the exact same thing regarding Varnish. But I'm not sure per-bin storage will help since you don't necessarily do cache_set($cid, $data, ..., $tags) for the page bin (i.e. Varnish).

So, my thinking was that we add a function called cache_tag_page() that add tags to a X-Cache-Tags HTTP header. It would work like this:

<?php

/**
 * Helper function for setting page cache tags.
 */
function cache_tag_page($tags) {
  foreach (
$tags as $tag) {
   
drupal_add_http_header('X-Cache-Tags', $tag);
  }
}

/**
 * Implements hook_entity_view().
 */
function mymodule_entity_view($entity, $type) {
  list(
$id) = entity_extract_ids($type, $entity);
 
cache_tag_page(array($type => array($id)));
}

/**
 * Implements hook_entity_update().
 */
function mymodule_entity_update($entity, $type) {
  list(
$id) = entity_extract_ids($type, $entity);
 
cache_invalidate($type => array($id));
}
?>

Then we add a hook to cache_invalidate($tags) and then things could look like this:

<?php

function cache_invalidate($tags) {
 
cache_tags()->invalidate($tags);
 
// Let other cache backends (i.e. HTTP proxies) to clear their caches based on tags.
 
module_invoke_all('cache_invalidate', $tags);
}

/**
 * Implements hook_cache_invalidate().
 */
function varnish_cache_invalidate($tags) {
  foreach (
$tags as $tag) {
   
_varnish_terminal_run('purge obj.http.X-Cache-Tags ~ ' . $tag);
  }
}
?>
catch’s picture

@dixon_ so getting things into varnish I was thinking the varnish cache backend could do that i.e. drupal_set_header() in cache('page')->set() and the terminal run in cache_tags()->clear(). If we allow more than one cache tags backend that's equivalent to having a hook (except we're able to skip a hook system dependency). I'm not sure if the drupal_set_header() trick would actually work when setting the page cache now, but seems like it ought to be possible to make that work.

catch’s picture

Issue summary:View changes

Updated issue summary.

carlos8f’s picture

Right, we can't really invoke hooks in the cache API. Our goal is to uncouple dependencies like that in D8, not create more. module_invoke_all() is only really usable after a full bootstrap (unless we do some huge lazy-loading change in D8), and the cache API needs to be available much before that. So we would probably just loop over a list of backend plugins and call their respective invalidate() methods, or else create a lower-level hook system for includes to use amongst themselves.

I rather like the idea of cache_tag_page(), but would be concerned about the tagging getting out of control. Also I would generalize it more to be cache_tag_request(), since in D8 we're trying to move away from page-centric thinking and more towards a REST server.

xjm’s picture

Looks like this got re-tagged through a crosspost; fixing.

carlos8f’s picture

Now that I think of it, tagging the request might be better as a part of the new context initiative. That way the metadata isn't only for the cache layer, but usable as general information about the request.

At this point we're at 'needs work', but not really sure what needs the work.

- 'checksum' is meant to be a signature to track whether something has changed, it happens to be the total invalidation count in this case, but might be something else in a different implementation. I think it's conceptually a checksum then.
- Running parallel backends for varnish headers seems edge-casey, and could be accomplished by one backend proxying to another (or parent::invalidate, etc).
- Haven't heard a good argument against site-wide tag storage and invalidation.
- Tagging the request may have to wait so we can coordinate that with D8 context and re-use metadata.
- Minor code formatting will get fixed with the next re-roll.

mikeytown2’s picture

Don't forget about ETags.

catch’s picture

"- Haven't heard a good argument against site-wide tag storage and invalidation."

I'm fine with that but this doesn't mean we shouldn't allow storage backends to do immediate tag invalidation (i.e. delete rather than checksum), in which case we should not put that implementation detail into the interface. And since a storage backend could do this, and we allow storage backends to be mixed and matched, then it also doesn't make sense to me to impose this on people building sites. Either turning tags backends into an array and executing all of them, or a proxy backend will do this though, so these are both fairly small but I'd not like to see us lock these two things out of the design when they're easy to account for.

While checksum is technically correct it brings to mind a hash to many people (including everyone who discussed it in irc), so it might make sense to use something more specific, but I'd be fine with using checksum() and a better comment (since we'd move it out of the interface to the SQL implementation anyway).

beejeebus’s picture

re #114, i still don't think checksum() is the right name, but really, i don't care that much - we all know my blue shed looks better than your red one ;-)

i'm much more concerned about keeping the concept in the interface. so, i'll RTBC a patch that still has checksum() in the default sql implementation, but won't support a patch that keeps that in the interface.

dixon_’s picture

The invalidation will only work for cache backends that can run cache_tags()->isValid($checksum, $tags) after fetching a cache entry. Varnish will never be able to invalidate cache entries with cache_invalidate as it is. But I agree that a hook is probably not the best solution either.

The best solution would probably be the one catch mentioned -- iterating over cache backends and run ->invalidate($tags) on them. But then we need to register cache backends in a different way i.e. not with magic variables but an array, as catch also mentioned.

carlos8f’s picture

i'm much more concerned about keeping the concept in the interface. so, i'll RTBC a patch that still has checksum() in the default sql implementation, but won't support a patch that keeps that in the interface.

I can't see how we can remove checksum() from the interface, without requiring the cache implementation to have special knowledge about the cache tags implementation. isValid() requires a $checksum, so if we remove checksum() we can't have isValid() either. Then we're left with an interface with one method, pointless. Can you illustrate what your concern is?

beejeebus’s picture

yes, we'd remove them both. i don't see why an interface with one method is useless.

most big sites won't use the db implementation. they'll probably use a backend that will invalidate the tags directly, and have useless stub methods for the checksum stuff so that php doesn't choke.

not sure what else to say - the checksum concept seems like an implementation detail, for the default backend that most big sites just won't use. i've said my piece, if everyone else wants that in the interface, so be it, i won't block the issue.

carlos8f’s picture

Sounds like there is a misunderstanding -- the checksum method is much, much faster than clearing tags directly, from benchmarks I've done and real deployment experience. And there is no preference for SQL. In fact, the approach targets NoSQL by keeping things key-value, but it also works great in SQL. I'm already using Cache Tags Mongo version in a large deployment with amazing results, compared to the older cache tags implementation we were running with direct clearing. @dixon_ contributed a wonderful Memcache backend using the checksum interface, and I'm sure it wouldn't be hard to port to APC or Redis either.

A backend that doesn't choose to go with checksums could physically clear data in invalidate(), return 0 in checksum(), and always return TRUE in isValid(). Likely this will be an edge case though, and varnish has special needs so we shouldn't take that as the baseline.

beejeebus’s picture

just to make sure its clear - please continue with the interface with checksum().

i don't agree with that detail (and i'm probably wrong), but i really, really don't want to block this. this is a great feature for our cache system, and i'm fully behind it.

dixon_’s picture

Semi-related note...

Here's a friendly fork of carlos' CacheTags module, which adds support for tagging the page cache. A working Varnish implementation is included. Read more here #1341306: New core patch and support for tagging of the page cache and here #1341322: Support for tagging and invalidating the Varnish cache

Basically what you can do is to register the cache_page bin as an "external bin" in settings.php. Cache invalidation of external bins can't be made in passive fashion (i.e. with a checksum). External bins needs to actively invalidate the cache.

This is not fully relevant to this thread, since we need a better way to solve it in Drupal 8, so please, read more in the issue links above.

beejeebus’s picture

Status:Needs work» Needs review
StatusFileSize
new15.49 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch cache-tags-636454-124.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

simple reroll for core directory changes, lets see what the bot thinks.

moshe weitzman’s picture

StatusFileSize
new15.49 KB
FAILED: [[SimpleTest]]: [MySQL] 34,206 pass(es), 20 fail(s), and 8 exception(es).
[ View ]

Just a re-upload to see if bot will test

Status:Needs review» Needs work

The last submitted patch, cache-tags-636454-124.patch, failed testing.

beejeebus’s picture

waiting for catch to commit the 'get rid of BC layer' cache patch before i work on fixing the tests.

msonnabaum’s picture

StatusFileSize
new14.26 KB
PASSED: [[SimpleTest]]: [MySQL] 34,398 pass(es).
[ View ]

Here's a new version that removes the procedural wrappers.

msonnabaum’s picture

Status:Needs work» Needs review
beejeebus’s picture

discussed this a bunch today with msonnabaum. we're both now leaning toward making the different cache backends responsible for handling tag invalidation. pseudocode stolen from msonnabaum's gist:

<?php
function cache_invalidate($tags) {
 
$classes = cache_get_classes();
  foreach (
$classes as $bin => $class) {
    if (
in_array("DrupalCacheTagsInterface", class_implements($class))) {
     
_cache_get_object($bin)->invalidate($tags);
    }
  }
}
?>

so, we iterate over the backends for a site, and if the backend implements DrupalCacheTagsInterface, we call invalidate().

the downside to this is that tags are stored per-backend.

the upside is that it is way simpler, and makes it very easy to handle sites with more than one backend.

msonnabaum’s picture

The main motivation behind this is handling reverse proxy backends like varnish or akamai. It's in no way an edge case, so we should handle it without having to work around the current implementation, which seems to be written with mongo in mind.

I'm also not totally convinced that we need checksum/isValid in the interface. I trust it's faster for mysql and mongo, and we should use it there, but I don't know if it's worth the complexity for backends like redis that can probably actively invalidate just as fast. They also won't be used at all for reverse proxy backends.

catch’s picture

Don't have much time to type, but caught up on the updates and spoke to beejeebus briefly in irc. Allowing cache storage to implement tags, then calling it once per implementing class for invalidation (not per bin, per class - regardless of how many bins it's assigned to), sounds great.

The only thing that's a bit tricky is we need a wrapper (currently cache_invalidate()) that collects all classes in use, checks if they implement that interface and calls the same method on them with the same argument. That's a little messy since it introduces a procedural wrapper to the cache system other than cache(), but on the other hand it will nuke the existing one that's still in there as cache_clear_all() hard coded to the page and block caches, so it'd be an improvement compared to now if not ideal keeping things self-contained - and the actual invalidation logic is still entirely in the backends, this is just a helper.

msonnabaum’s picture

Just to compare with the numbers above, here's redis with the checksum implementation and with immediate invalidation:

with checksum:

set         :           0.59ms
get         :           0.30ms
expire 1    :           0.27ms
expire 7    :           0.60ms

immediate invalidation (using SUNION and DEL):

set         :           0.38ms
get         :           0.11ms
expire 1    :           36.99ms
expire 7    :           67.54ms

So while deleting the entries at the time of invalidation is more expensive, I don't know that it's enough to warrant the overhead on get/set for this particular implementation.

catch’s picture

Yeah, at a minimum I think we want to leave it open for implementations to do direct invalidation, even if everything (apart from varnish) ends up doing the checksum method, there's no reason to bake the implementation details into the interface and prevent people from trying different things out.

beejeebus’s picture

StatusFileSize
new16.18 KB
PASSED: [[SimpleTest]]: [MySQL] 34,383 pass(es).
[ View ]

ok, first rough cut, cache tags tests pass.

todo:

- make a real cache_get_backends(). not 100% sure how to do this, as it will require discovery.
- make the tests cover more than one active backend
- docblocks need work

setting to needs review for bot, but definitely still needs more work.

beejeebus’s picture

StatusFileSize
new14.57 KB
PASSED: [[SimpleTest]]: [MySQL] 34,399 pass(es).
[ View ]

this time without the commenting out of other test methods.

catch’s picture

Assigned:catch» Unassigned

This is fortunately no longer my baby.

msonnabaum’s picture

StatusFileSize
new14.54 KB
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]

Here's a new version that changes invalidate() to invalidateTags() and replaces the cache_default_class and cache_class_* variables with a new cache_classes variable so that we can discover the classes easier. Here's how it would be used:

<?php
$conf
['cache_classes'] = array(
 
'cache' => 'DrupalDatabaseCache',
 
'page' => 'Redis_Cache_PhpRedis'
);
?>

Status:Needs review» Needs work
Issue tags:-Performance, -Favorite-of-Dries, -API change

The last submitted patch, cache-tags-636454-138.patch, failed testing.

msonnabaum’s picture

Status:Needs work» Needs review

#138: cache-tags-636454-138.patch queued for re-testing.

Status:Needs review» Needs work
Issue tags:+Performance, +Favorite-of-Dries, +API change

The last submitted patch, cache-tags-636454-138.patch, failed testing.

msonnabaum’s picture

StatusFileSize
new15.25 KB
FAILED: [[SimpleTest]]: [MySQL] 34,413 pass(es), 0 fail(s), and 5 exception(es).
[ View ]

Should fix the testing error.

msonnabaum’s picture

Status:Needs work» Needs review

Status:Needs review» Needs work

The last submitted patch, cache-tags-636454-142.patch, failed testing.

msonnabaum’s picture

Status:Needs work» Needs review
StatusFileSize
new14.98 KB
PASSED: [[SimpleTest]]: [MySQL] 34,413 pass(es).
[ View ]

One more time.

pounard’s picture

function cache_invalidate(Array $tags) { Even if PHP is not really case sensitive, it should be a lowercase "array".

beejeebus’s picture

nice work!

a couple of nits:

<?php
  
// We do not use drupal_static() here because we do not want to change the
   // storage of a cache bin mid-request.
  
static $cache_objects;
$cache_backends = variable_get('cache_classes', array('cache' => 'DrupalDatabaseCache'));
   if (!isset(
$cache_objects[$bin])) {
-   
$class = variable_get('cache_class_' . $bin);
-    if (!isset(
$class)) {
-     
$class = variable_get('cache_default_class', 'DrupalDatabaseCache');
-    }
+   
$class = isset($cache_backends[$bin]) ? $cache_backends[$bin] : $cache_backends['cache'];
    
$cache_objects[$bin] = new $class($bin);
?>

can we make that:

<?php
  
// We do not use drupal_static() here because we do not want to change the
   // storage of a cache bin mid-request.
  
static $cache_objects;
   if (!isset(
$cache_objects[$bin])) {
    
$cache_backends = cache_get_backends();
    
$cache_backend = isset($cache_backends[$bin]) ? $cache_backends[$bin] : $cache_backends[variable_get('cache_backend_default', 'cache')];
    
$cache_objects[$bin] = new $cache_backend($bin);
?>

also, would probably be good to make cache_get_backends() use 'cache_backends' as the variable name? or go the other way, and make it all cache_get_classes()? not a big deal either way.

msonnabaum’s picture

StatusFileSize
new15.49 KB
PASSED: [[SimpleTest]]: [MySQL] 34,421 pass(es).
[ View ]

New patch with some of the above comments incorporated.

beejeebus’s picture

coolio, this looks good to me. will wait for the bot before RTBC.

msonnabaum’s picture

StatusFileSize
new15.51 KB
PASSED: [[SimpleTest]]: [MySQL] 34,402 pass(es).
[ View ]

Fixing code standard violation.

aspilicious’s picture

Status:Needs review» Needs work
+++ b/core/includes/cache.incundefined
@@ -295,6 +301,70 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+   * Flatten a tags array into a numeric array suitable for string storage.

FlattenS

+++ b/core/includes/cache.incundefined
@@ -510,3 +590,53 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+/**
+ * Interface for cache backends that support tags.
+ *

/**
* Defines a common interface cache backends that support tags.
*/

See http://drupal.org/node/1354#classes

+++ b/core/includes/cache.incundefined
@@ -510,3 +590,53 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+   * Invalidate each tag in the $tags array.

InvalidateS

+++ b/core/includes/cache.incundefined
@@ -510,3 +590,53 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+ * Invalidate the items associated with given list of tags.

InvalidateS

+++ b/core/includes/cache.incundefined
@@ -510,3 +590,53 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+ * Get the list of cache backends for this site.

GetS

+++ b/core/modules/system/system.installundefined
@@ -1661,6 +1692,38 @@ function system_update_8002() {
+ * Add the {cache_tags} table.

AddS

+++ b/core/modules/system/system.installundefined
@@ -1661,6 +1692,38 @@ function system_update_8002() {
+ * Modify existing cache tables, adding support for cache tags.

ModifieS

Srry for posting the same "doxygen error" that many times but using modify in stead of modifies is just a habbit we have to forget (except when creating hook documentation).

-4 days to next Drupal core point release.

msonnabaum’s picture

Issue tags:+Novice

Tagging novice to reroll with aspilicious' suggested cleanups.

aspilicious’s picture

Status:Needs work» Needs review

:) I'm going to mark this neesds review again for the code. Btw if someone is going to reroll this don't pay attention to the missing docblocks if they are to hard.

msonnabaum’s picture

StatusFileSize
new16.09 KB
PASSED: [[SimpleTest]]: [MySQL] 34,594 pass(es).
[ View ]

New patch with doc cleanups.

pounard’s picture

Do the DrupalCacheTagsInterface means a backend that supports tags? If yes, it's possible to make it more explicit 1) by naming, and 2) by interface inheritance:

Could be:

<?php
interface DrupalCacheTagsInterface extends DrupalCacheInterface {}
?>

Then the naming sucks (I'm thinking of the cache as PHP 5.3/PSR-0 naming), something like:

namespace Drupal\Cache;

interface TaggableCacheBackendInterface extends CacheBackendInterface {}

But this is a minor note, the patch seems good.

bibo’s picture

This issue is currently for D8. Is it likely there will be a backport to D7?

I can see this having huge performance potential, especially when used with Varnish :) I also happen to have rather immediate and specific needs for http tagging & and cache clearing with Varnish, which means that I'll just create my own module, for now. If cachetags didn't need a core patch I would use it partially.

Anyway, my module will have a proper UI, context & rules support, and some trick for esi + user specific caching, which should make it very flexible and reuseable. Even so, I'm definitely also awaiting this core feature eagerly!

catch’s picture

Status:Needs review» Needs work
+++ b/core/includes/cache.incundefined
@@ -280,9 +284,11 @@ class DrupalNullCache implements DrupalCacheInterface {
+class DrupalDatabaseCache implements DrupalCacheInterface, DrupalCacheTagsInterface {
   protected $bin;

+  protected $tag_cache = array();

It looks to me like if the same tag is used across multiple bins, then while it will be stored in a single {cache_tags} table, we'll be doing duplicate queries for it since the class is instantiated once for each bin.

I'd been thinking we'd allow a separate tags backend to the actual storage, but that's not happening here now. If that's the case, I'm not sure there's any value to a separate interface here - we have no way to enforce which modules put cache items in which bins and whether they try to use tags or not, nor which backends site admins assign to each bins - so why not put this directly into DrupalCacheInterface since it's going to be a requirement anyway?

+++ b/core/modules/system/system.installundefined
@@ -1661,6 +1692,38 @@ function system_update_8002() {
/**
+ * Adds the {cache_tags} table.
+ */
+function system_update_8003() {
+  $table = drupal_get_schema_unprocessed('system', 'cache_tags');
...
+function system_update_8004() {
+  foreach (module_list(TRUE) as $module) {
+    foreach (drupal_get_schema_unprocessed($module) as $table => $schema) {

Can't do this, the original schema definition has to be copied here or put into a helper, otherwise there's no way to know which version of the schema sites install with if we change it.

Also we can't rely on module_list() to find cache tables, since some modules may be disabled (say block module or something). So this would need to use db_find_tables() I think.

-1 days to next Drupal core point release.

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new13.94 KB
FAILED: [[SimpleTest]]: [MySQL] Failed to run tests: failed to enable simpletest module.
[ View ]

OK I've re-rolled this.

- updated for PSR-0/namespaces.
- tags support is added to the CacheBackendInterface now, no separate interface.
- modified the upgrade path to not depend on hook_schema() or module_list() (did not run upgrade tests though, out of time).

That should be all the meaningful changes in the patch I think.

Status:Needs review» Needs work

The last submitted patch, cache_tags.patch, failed testing.

aspilicious’s picture

+++ b/.htaccessundefined
@@ -95,7 +95,7 @@ DirectoryIndex index.php index.html index.htm
-  # RewriteBase /
+  RewriteBase /

Needed?

5 days to next Drupal core point release.

pillarsdotnet’s picture

StatusFileSize
new13.52 KB
PASSED: [[SimpleTest]]: [MySQL] 35,694 pass(es).
[ View ]

Obviously not.

pillarsdotnet’s picture

Status:Needs work» Needs review
tstoeckler’s picture

Status:Needs review» Needs work
+++ b/core/includes/cache.inc
@@ -32,16 +32,42 @@ function cache($bin = 'cache') {
+    $cache_backends = variable_get('cache_classes', array('cache' => 'Drupal\Core\Cache\DatabaseBackend'));

Is there a reason this doesn't use cache_get_backends()? It has a different default, but it seems that cache_get_backends() still has the pre-PSR-0 one.

+++ b/core/includes/cache.inc
@@ -32,16 +32,42 @@ function cache($bin = 'cache') {
+    if (in_array('DrupalCacheTagsInterface', class_implements($class))) {

This doesn't exist anymore, the if can simply be removed.

+++ b/core/includes/install.core.inc
@@ -297,7 +297,7 @@ function install_begin_request(&$install_state) {
+  $conf['cache_classes'] = array('cache' => 'Drupal\Core\Cache\InstallBackend');

Same question here: Couldn't this use cache_get_backends()? It's the installer so there is probably little benefit to having this configurable, but still.

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -254,6 +269,89 @@ class DatabaseBackend implements CacheBackendInterface {
+  public function invalidateTags($tags) {

This could use a function header. Should be the simple
"Implements CacheInterface::invalidateTags()." though.

Lars Toomre’s picture

+++ b/core/includes/cache.incundefined
@@ -32,16 +32,42 @@ function cache($bin = 'cache') {
+ * @param $tags

Can we add type hinting here? I believe it should be an array.

+++ b/core/includes/cache.incundefined
@@ -32,16 +32,42 @@ function cache($bin = 'cache') {
+/**
+ * Returns a list of cache backends for this site.
+ */

There is a missing @return directive here (ideally with type hinting).

+   * @param integer @checksum
+   *   The initial checksum to compare against.
+   * @param integer @tags
+   *   An array of tags to calculate a checksum for.
+   * @return boolean
+   *   TRUE if the checksums match, FALSE otherwise.

I think the wrong type hint is given here for tag param. Also I think it should be $checksum and $tags. Finally, there needs to be a blank line before the @return directive.

+   * @param $tags
+   *   Associative array of tags.
+   * @return integer
+   *   Sum of all invalidations.

Another blank needed before @return directive and type hint needed for $tags variable. Several other $tags vaiables also could use type hinting too.

catch’s picture

Status:Needs work» Needs review
Issue tags:-Novice
StatusFileSize
new13.47 KB
PASSED: [[SimpleTest]]: [MySQL] 35,920 pass(es).
[ View ]

This should address Lars and tstoeckler's reviews.

With install.inc, we're specifically setting the cache backend to the install backend, that's a hack required for very obscure install system bugs, so I've left that bit unchanged.

tstoeckler’s picture

Yes, that definitely makes sense. (I had somehow overlooked that we are a setting a different class in install.inc)
I couldn't find anything more to complain about from visual inspection, but have yet to play with it, so not setting RTBC.

dixon_’s picture

Status:Needs review» Needs work
Issue tags:+Needs tests

The last iteration looks very good to me. But before we RTBC this, we probably should have some tests for this, no?

I'll try to give the last patch a ride very soon, and write some tests for it.

tstoeckler’s picture

Status:Needs work» Needs review

Also noticed that there are no tests yet.
I'll have a go at that, maybe that's a good way to try this out.

catch’s picture

Issue tags:-Needs tests
StatusFileSize
new16.82 KB
FAILED: [[SimpleTest]]: [MySQL] 35,925 pass(es), 0 fail(s), and 98 exception(s).
[ View ]

Hmm we have tests for this but they went awol in one of the re-rolls, added those back!

tstoeckler’s picture

Status:Needs review» Needs work
Issue tags:+Needs tests

Oops, that was a crosspost. (Great minds think alike :))
Alright then @_dixon: first come, first serve. Go for it :)

catch’s picture

Status:Needs work» Needs review
Issue tags:-Needs tests

I think that was another crosspost ;)

tstoeckler’s picture

Status:Needs review» Needs work
Issue tags:+Needs tests

Wow, another one...

tstoeckler’s picture

Status:Needs work» Needs review
Issue tags:-Needs tests

incredible...

Lars Toomre’s picture

Status:Needs review» Needs work
+++ b/core/includes/cache.incundefined
@@ -32,16 +32,40 @@ function cache($bin = 'cache') {
+/**
+ * Returns a list of cache backends for this site.
+ */

Doing a final review of this patch, I noticed that this docblock is missing a @return directive.

+   * @param $tags
+   *   Associative array of tags, in the same format that is passed to
+   *   CacheBackendInterface::set().

This is also missing type hinting.

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new16.94 KB
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]

Fixed the docs up.

Status:Needs review» Needs work

The last submitted patch, cache_tags.patch, failed testing.

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new16.96 KB
FAILED: [[SimpleTest]]: [MySQL] 35,926 pass(es), 0 fail(s), and 98 exception(s).
[ View ]

And the installer wasn't updated for the type hinting...

Lars Toomre’s picture

Status:Needs review» Needs work

Thanks @catch. The documentation fixes look good.

The one thing I am unsure about is the current practice for the translation of assertion messages. I think the best practice now is to remove all t() calls and use format_string() where necessary. However, I may well be wrong.

pounard’s picture

Nothing to do with the patch itself, but aren't you worried that the cache tags feature might no be implemented easily with some backends, and that we may end up with bailing out some of them because of it? Aside of that, it's an excellent feature, but I'm still afraid of loosing some great backends (memcache, redis, every one of them that are key/value stores in fact).

catch’s picture

With the checksum method of checking the tags, it should be completely fine to implement this with both memcache and redis.

The worst thing for key/value stores is wildcard cache clears, which are implemented by the memcache backend in 6.x and 7.x but the code to do so is extremely complex and quite fragile. Once this is in, I'd want to see if we can replace wildcard clears with tags (or potentially we could move to something like cache namespaces which the stash project uses). But yeah this has been considered in depth in this issue and it should hopefully make things better rather than worse.

pounard’s picture

I hope so, but I still see some weaknesses, even if the checksum method there are chances of cache stall and memory bloat if the checksum validity check is made on access, some cache entries may not be accessed for a while (note that this is also true for actual cache usage anyway so it doesn't really matter).

Thanks for the update, it seem perfectly doable indeed.

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new16.9 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/lib/Drupal/Core/Cache/DatabaseBackend.php.
[ View ]

Fixed the noticed and removed t() from asserts.

Status:Needs review» Needs work

The last submitted patch, cache_tags.patch, failed testing.

catch’s picture

Status:Needs work» Needs review
StatusFileSize
new16.9 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/lib/Drupal/Core/Cache/DatabaseBackend.php.
[ View ]

Status:Needs review» Needs work

The last submitted patch, cache_tags.patch, failed testing.

Berdir’s picture

Status:Needs work» Needs review
StatusFileSize
new16.9 KB
FAILED: [[SimpleTest]]: [MySQL] 36,160 pass(es), 0 fail(s), and 34 exception(s).
[ View ]

Changed self::tagCache to self::$tagCache, the first one is unfortunately not PHP :)

Let's see if this is better now.

Status:Needs review» Needs work

The last submitted patch, cache_tags_6.patch, failed testing.

Berdir’s picture

Status:Needs work» Needs review
StatusFileSize
new16.91 KB
FAILED: [[SimpleTest]]: [MySQL] Invalid PHP syntax in core/lib/Drupal/Core/Cache/DatabaseBackend.php.
[ View ]

Fixed a remaining $self->tag_cache and initialized self::$tagCache to an empty array to prevent array_merge() errors.

Status:Needs review» Needs work

The last submitted patch, cache_tags_7.patch, failed testing.

Berdir’s picture

Status:Needs work» Needs review
StatusFileSize
new16.91 KB
PASSED: [[SimpleTest]]: [MySQL] 36,161 pass(es).
[ View ]

Grr... and now without syntax errors :)

msonnabaum’s picture

Status:Needs review» Reviewed & tested by the community

This is looking good to me. There are a few improvements that could be made I think, but could wait for separate issues. I don't think we're going to get much more insight on this until we start using it in core.

beejeebus’s picture

just a me too comment - i think this is ready to fly.

catch’s picture

Title:Cache tag support» Change notification for: Cache tag support
Category:feature» task
Priority:Major» Critical
Status:Reviewed & tested by the community» Active

I was heavily involved in this one, but it's had the "Favorite-of-Dries" tag for a while, so I'm comfortable committing it given it's had plenty of review from plenty of people at this point.

Committed/pushed to 8.x. We'll need a change notification for the API addition, and hopefully see people over in #730060: Replace CACHE_TEMPORARY, cache_clear_all() and minimum cache lifetime with cache tags support and other follow-ups.

tim.plunkett’s picture

Status:Active» Needs review
StatusFileSize
new2.05 KB
PASSED: [[SimpleTest]]: [MySQL] 35,065 pass(es).
[ View ]

While attempting to write a change notification for this, I was reading the docs for CacheBackendInterface and I noticed it still had references to cache_default_class and cache_class_*.

Maybe this should be a follow-up issue, but I'm uncertain of any change notification until the in-code docs match.

For the record, I'm not sure my change is actually correct, especially since it makes 1 line into 3, and its all non-CMI variable_get/set stuff.

tim.plunkett’s picture

Title:Change notification for: Cache tag support» Cache tag support

Per xjm's suggestion, retitling for now but leaving priority.

msonnabaum’s picture

That looks correct, but it's kinda silly that we give the example using variable_get/variable_set. No one does that. You use use $conf in settings.php.

As far as I know, CMI has not tried to deal with the settings.php settings, which this would be, so it seems like a good idea to just update the docs using $conf.

tim.plunkett’s picture

StatusFileSize
new2.04 KB
PASSED: [[SimpleTest]]: [MySQL] 35,070 pass(es).
[ View ]

The only usage of this in core is $conf['cache_classes'] = array('cache' => 'Drupal\Core\Cache\InstallBackend'); in includes/install.core.inc, so that fits.

However, doesn't this also need a hook_update_N for any existing cache_class_* variables and cache_default_class?

msonnabaum’s picture

If everyone sets their cache classes in $conf, an update hook wouldn't really do anything, so I'm leaning towards no.

chx’s picture

Of course we set them there, it'd impossible not to. The very reason you set an alternative cache backend because you want to avoid a database, right :) ?

tim.plunkett’s picture

Wrote up http://drupal.org/node/1534630 and http://drupal.org/node/1534648, someone who actually knows what is going on, please review them :)

Berdir’s picture

Status:Needs review» Reviewed & tested by the community

Wasn't aware of the changes related to these variables, confirmed that the new documentation is correct by looking at the code.

Note: RTBC is for the documentation cleanup patch in #198 (wow), so this needs to set back to needs review once that is commited for the change notices unless someone confirms that they are ok in the meantime.

Berdir’s picture

About the change notices:

> "Which would now be"

Maybe something along the lines of "Instead of a separate variable, the cache backend of the default bin "cache" is now used as the fallback, which can be set like this". (Edit: If we'd rename the default cache bin e.g to "default" instead of "cache", this would make even more sense ;))

The wording can probably be improved/simplified, but I think it deserves a sentence that actually explains why it's like that. Looks confusing otherwise IMHO.

Similar for the actual cache tag change notice, I'm not sure if that's common, but cache tags aren't just a replacement for cache_clear_all(), they're a completely new API. So a short introduction along the lines of "Cache Tags can be used to flag certain cache entries which allows to invalidate them based on the used tags later on. For example, invalidate all cache entries that have been tagged with a certain node id if that node is updated."

Also, again not sure about the procedure, but cache_clear_all() already already been dropped for all uses except the no-arguments case that clears page and block cache. So maybe you should use the new $cache->delete* methods? Also, maybe a better example would actually be how you can use it to clear all caches tagged with a node id instead of using a prefix/wildcard.

Example code for that:

// Previously, use some kind of naming scheme to identify caches for a node.
$cache = cache($bin);
$nid = 1;
$cache->set('cache_nid_' . $nid . '_id_one', $some_value);
$cache->set('cache_nid_' . $nid . '_id_two', $some_value);

$cache->deletePrefix('cache_id_' . $nid);

// New. Does not rely on magical cid's, you can use whatever makes sense and keep them short.
$cache = cache($bin);
$nid = 1;
$cache->set('cache_id_one', $some_value, CACHE_PERMANENT, array('node' => array($nid)));
$cache->set('cache_id_two', $some_value, CACHE_PERMANENT, array('node' => array($nid)));

cache_invalidate(array('node' => array($nid)));

I feel something like this explains the advantage of the tags system better.

tim.plunkett’s picture

Berdir, I was just posting my limited understanding of it after an hour of staring at the patch, please feel free to edit the change notice!

catch’s picture

Title:Cache tag support» Change notice for: Cache tag support
Status:Reviewed & tested by the community» Needs review

Committed/pushed the followup, back to CNR for the change notice.

tim.plunkett’s picture

Status:Needs review» Needs work

Per #203, the change notices need work. They're linked in the bottom of the issue summary.

Berdir’s picture

Status:Needs work» Needs review

Updated the change notices. Should be closer to a real world use case now. Back to needs review.

catch’s picture

Priority:Critical» Major
Status:Needs review» Fixed

Looks good to me, marking this one fixed.

plach’s picture

Title:Change notice for: Cache tag support» Cache tag support

Restoring title

Status:Fixed» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

claudiu.cristea’s picture

Status:Closed (fixed)» Needs work

The only thing not consistent is the "cache_tags" table name which has the same pattern as a regular cache bin table. This is confusing but not critical.

aspilicious’s picture

Priority:Major» Normal
claudiu.cristea’s picture

Renaming to cachetags will fix but, I know, it's ugly :)

pounard’s picture

It's not really ugly, both "cache" and "tags" word and readable and short enough for us to name the table "cachetags" without making it less readable. Or we could just change the cache system to be configured to use only explicitely registered bins by modules, and never allow to register the "tags" bin, this should be the most secure and elegant way to do this, plus it would allow us to get rid of the infamous hook_flush_caches() hook implementation for most modules in favor of something like hook_bin_info().

claudiu.cristea’s picture

+1 for registering bins OR, more simple, redesign the cache table pattern. Something like: cache_bin_*.

For consistency I would rename also the cache table.

Examples:

  • cache_path => cache_bin_path
  • cache_filter => cache_bin_filter
  • cache => cache_bin_drupal <<< switch the main cache table to a consistent pattern
pounard’s picture

Yeah, good idea. For the record, there is actually an issue opened for this, it didn't moved since a long time ago, let me try to find it.

#1167144: Make cache backends responsible for their own storage Found it!

claudiu.cristea’s picture

So, what is your option? Registering bins or change cache tables pattern?

Berdir’s picture

See #216, there is an issue for registering cache bins. I don't think we need to discuss this here. I recommend to close this issue again.

aspilicious’s picture

Status:Needs work» Closed (fixed)

Closing again... Lets meet in the other issue :)

fubhy’s picture

Didn't want to reopen this... When reading through the DatabaseBackend implementation for invalidateTags() I noticed that there may be a major performance issue. I may be wrong though: #1800768: DatabaseBackend tag invalidation performance

sun’s picture

sun’s picture

Issue summary:View changes

Adding followup issue