I just noticed that the tag invalidation logic for the DatabaseBackend is slightly flawed (if I am not reading it incorrectly). Currently, we invoke ->invalidateTags() from common.inc:

/**
 * Invalidates the items associated with given list of tags.
 *
 * Many sites have more than one active cache backend, and each backend my use
 * a different strategy for storing tags against cache items, and invalidating
 * cache items associated with a given tag.
 *
 * When invalidating a given list of tags, we iterate over each cache backend,
 * and call invalidate on each.
 *
 * @param array $tags
 *   The list of tags to invalidate cache items for.
 */
function cache_invalidate(array $tags) {
  foreach (CacheFactory::getBackends() as $bin => $class) {
    cache($bin)->invalidateTags($tags);
  }
}

That means that for every cache bin we execute that method once. So if I have 10 bins that use the DatabaseBackend (and we already got 10+ in core) we run this 10 times:

  /**
   * Implements Drupal\Core\Cache\CacheBackendInterface::invalidateTags().
   */
  public function invalidateTags(array $tags) {
    foreach ($this->flattenTags($tags) as $tag) {
      unset(self::$tagCache[$tag]);
      Database::getConnection()->merge('cache_tags')
        ->key(array('tag' => $tag))
        ->fields(array('invalidations' => 1))
        ->expression('invalidations', 'invalidations + 1')
        ->execute();
    }
  }

So, the problem with this is that ->flattenTags() may return an array with quite a few elements depending on the amount of tags and the number of key/value pairs in those tags.

Let's assume the most simple case: We want to clear 10 tags (no key/value pairs... just 10 simple tags). That means that ->flattenTags would return an array of 10 items. As previously mentioned we got about 10 bins, so that means that we run ->merge on the database 100 times (and since it's merge that means we basically run 200 queries).

I feel that I am missing something here or mis-read the code... Or simply didn't understand it fully. Otherwise it's a major issue :P. So... I apologize if this issue is nonsense!

Comments

lars toomre’s picture

If a patch gets rolled for this issue, can we include a correction for 'backend my' to 'backend may' in the docblock for cache_invalidate()?

fubhy’s picture

Note: Currently, we don't register cache bins. But there is an issue for that already: #1167144: Make cache backends responsible for their own storage. So with the current implementation in core we don't (yet) see the aforementioned issue because we are not looping over all existing bin's yet (we don't even know that they exist).

So IMO ->invalidateTags HAS to run per bin (same with ->garbageCollection())! Currently we don't have a list of all available bins but at the moment that we register bins ans loop over all of them (which we should) this issue IS going to arise.

re #2: Haha, I just saw that too :)

pounard’s picture

Following this issue, will give a stronger look after I have done with some other :)

catch’s picture

Priority: Normal » Major

I think this is completely valid. There's a static property for tags that have been checked by the database backend, specifically so that those queries aren't duplicated across different cache bins, but there's no equivalent check for invalidation. Bumping to major since this could get pretty bad.

catch’s picture

Priority: Major » Normal

Well, that's not entirely true. If you explicitly register the database backend (or any other one) with individual bins, then you'll run into it, but with default configuration there's only one registration so it shouldn't matter.

fubhy’s picture

Yep, that's what I meant in #2. However, that was exactly what I was trying to do when I ran into this. And we probably want to have that... So once we register cache bins we need to fix this issue.

catch’s picture

We register cache bins already via hook_flush_caches(), it's just that this hook is abused for flushing caches and random other stuff too. Even with hook_cache_bin_info() I still think we'll need to rely on the $conf configuration for flushing tags - since it's only this that tells you which backends are in use.

fubhy’s picture

Let's keep that discussion in #1167144: Make cache backends responsible for their own storage and then come back here once we are finished there :)

olli’s picture

So once we register cache bins we need to fix this issue.

I think we are doing that now.

danblack’s picture

Issue tags: +database cache
berdir’s picture

Status: Active » Closed (duplicate)

We changed it a while ago to have a static cache so that it only runs once (I don't think we can do anything about the merge query), and #918538: Decouple cache tags from cache bins is going to improve it further.