Problem/Motivation

See issue title. borisson_ should know more context I'd assume.

Estimated Value and Story Points

This issue was identified as a Beta Blocker for Drupal 8. We sat down and figured out the value proposition and amount of work (story points) for this issue.

Value and Story points are in the scale of fibonacci. Our minimum is 1, our maximum is 21. The higher, the more value or work a certain issue has.

Value : 8
Story Points: 13

Proposed resolution

Remaining tasks

  1. Test coverage for \Drupal\search_api\Plugin\views\query\SearchApiQuery, see #3
  2. Update all other views handlers to remove the use of TodoCacheableDependencyTrait and provide the correct cacheability metadata instead, plus accompanying test coverage, see #5
  3. Fix \Drupal\search_api\Plugin\views\cache\SearchApiCache::generateResultsKey(), see #6

User interface changes

None.

API changes

TBD, see #3.

Data model changes

None.

Comments

Nick_vh created an issue. See original summary.

Wim Leers’s picture

Assigned: Unassigned » Wim Leers
Issue tags: +D8 cacheability
Wim Leers’s picture

Status: Active » Needs review
Issue tags: +Needs tests
FileSize
3.13 KB

This updates the Views query plugin to provide the necessary cacheability metadata.

Wim Leers’s picture

Oops, some cruft in #3. Removed.

Wim Leers’s picture

It's not at all clear to me how the Views argument plugins that Search API provides work. Where does it get its data? So, I'm leaving that to the experts to figure out.

For now, what matters most, is correctness, not performance. So, at this time, it makes sense to have every Views handler that hasn't yet been investigated in detail (which is all of them except the query handler, see #3) to have max-age = 0, tags = [], contexts = [].
That's why I added TodoCacheableDependencyTrait, which sets exactly that cacheability metadata. I updated every Views handler that Search API provides to use that trait — with the exception of the query handler.
The trait's documents have written documentation of what I told @borisson_ in person. To be able to be absolutely certain that everything functions correctly, every single plugin needs its own test coverage that proves the cacheability metadata is correct/complete:

/**
 * Trait to implement CacheableDependencyInterface for objects whose
 * cacheability still needs to be:
 * - investigated
 * - optimized
 * - tested
 *
 * For test coverage, test the class/object X (e.g. a Views handler) in
 * isolation and apply a test procedure like this:
 * 1. use X
 * 2. verify this was a cache miss (if max-age = 0, it ends here)
 * 3. use X again
 * 4. verify this was a cache hit, with the expected cacheability metadata
 *    (cache tags, contexts, max-age)
 * 5. change the input data for X: config objects/entities/… (cache tags), or
 *    the logged in user or permissions (cache contexts) — i.e. this is "input"
 *    in the sense "output = function(input)", i.e. anything that causes the
 *    output to change
 * 6. use X again
 * 7. verify this was a cache miss (if not, then the cacheability metadata is
 *    incomplete inaccurate)
 * 8. repeat steps 5, 6 and 7 for each input
 *
 * @see \Drupal\Core\Cache\CacheableDependencyInterface
 */
Wim Leers’s picture

I also noticed that \Drupal\search_api\Plugin\views\cache\SearchApiCache::generateResultsKey() is code very similar to what \Drupal\views\Plugin\views\cache\CachePluginBase::generateResultsKey() used to use:

      $key_data = array(
        'query' => $query,
        'roles' => $user->getRoles(),
        'super-user' => $user->id() == 1, // special caching for super user.
        'langcode' => \Drupal::languageManager()->getCurrentLanguage()->getId(),
        'base_url' => $GLOBALS['base_url'],
      );
      …

      $this->resultsKey = $this->view->storage->id() . ':' . $this->displayHandler->display['id'] . ':results:' . hash('sha256', serialize($key_data));

This essentially is hardcoding a number of cache contexts: url.query_args, user.roles, user.is_super_user, languages:language_interface and url.site.

This is hugely problematic:

  1. It causes too many variations in many cases. Consequence: poor cache hit ratios.
  2. It causes too few variations in some cases. Consequence: information disclosure security bugs.

It's very important that this consumes the cache contexts of the handlers for the view that is being executed. Not some hardcoded set of cache contexts.

So, I added a @todo for that also.

Wim Leers’s picture

Issue summary: View changes

Updated the IS, now lists remaining tasks for this patch.

borisson_’s picture

We should fix .1 and .3 from the IS in this issue and create a follow-up for .2.

drunken monkey’s picture

Thanks a lot for the review and feedback!

I have to admit I don't really understand the reasoning for max-age=0 on search results. As far as I understand it, we don't have to "guarantee that the search results are in sync with the data managed by Drupal" – just that executing the search anew would yield the same results as the ones that are cached. To that end, I would just have added an index-specific cache tag and invalidated that tag every time content is indexed or deleted on the index. (We could probably even go more finely-grained, but I think it's complicated enough as it is. And in any case, I completely agree that we first have to make it correct, and only then can/should make it faster.) I don't understand why it would matter that it's an "external datasource".

Of course, it's something different if you use, e.g., a Solr server and index/manage data on it with some other application, not the same Drupal site. Then there would of course be no way to manage a cache for that server from inside Drupal – unless with specific support from the backend, if it can determine when the last data change occurred. Is that maybe what you mean?
Such a thing can't really happen for a database server, I'd say, or at least I don't think it's a use case we want to actively support.

And, one further complication with Solr is the delayed "commit" when indexing or deleting data – even if it's only half a second (and it seems we currently even default to 10 seconds!) for newer Solr versions, it might still lead to a stale search result being cached. So without some pretty complex mechanism, it would seem on the contrary that we can't actually ever cache results from a Solr server, while caching results from a database server wouldn't be a problem.

But, of course, I could easily be mistaken there.

Also, from a purely code-review perspective: wouldn't it be better to create a CacheableMetadata object once and then use it for each of the three getCache*() methods in the query plugin? That would probably also make the feature unnecessary – just implement the interface in your backend. And we should then also override the three getCache*() methods in our server entity class to include the information from the backend plugin.

In any case, I think we should move the CacheableDependencyInterface implementation directly to our query class – it's (largely) not specific to Views searches, the same principles apply to any search. The Views query plugin could then just use the information from the search query object, and possibly add its own.

drunken monkey’s picture

Also: What do you mean with "test the class/object X in isolation"? A view with only that one handler, or really just the object itself?
In the latter case, I wouldn't really know how to test whether it's a cache hit or miss, or how to test in general, really.
In the former case, I also wouldn't really know how to determine whether a page request is a cache hit or a miss. Is that in the HTTP metadata?

Is there maybe an example to look at in Views itself, or somewhere else in Core?

drunken monkey’s picture

Also, I agree with borisson_: 1. and 3. have to be fixed here, 2. can use a follow-up issue (or even one for each plugin). Depending on how likely it will be that search results end up being cacheable at all, I guess this could be pretty low priority, though. If the view is uncacheable anyways, it doesn't seem important to have proper cache metadata for its individual components.

borisson_’s picture

@drunken monkey, testing for hits/misses is fairly easy; it's embedded in the page headers.

$this->assertHeader('x-drupal-cache', 'HIT');

Wim Leers’s picture

Thanks a lot for the review and feedback!

Thank Acquia :)

To that end, I would just have added an index-specific cache tag and invalidated that tag every time content is indexed or deleted on the index.

The problem is that we don't know when the content is indexed. It happens in an external process, quite possibly even a separate server. Solr doesn't and cannot inform Drupal when its index has changed. Furthermore, there's also the problem of master-slave replication: given 4 servers (1 master, 3 slaves), the replication happens every 2 minutes by default, which means that during 2 minutes, you have on average a 25% chance to get up-to-date search results, and 75% chance to get outdated search results.

Furthermore, a single cache tag for the entire index would also mean an awful cache hit ratio: it'd mean invalidating every render cached thing that contains data originating from Search API.

Finally, I was told that quite often, search results consume data from outside of Drupal — for which we cannot possibly know when it changes.

Of course, what is possible, is to set Drupal's "page cache max-age" setting at /admin/config/development/performance to 1, 2, 5, 60, however many minutes. Which means Drupal will tell the browser (and proxies, including the built-in Page Cache) to cache the HTML pages containing search results (or anything else).


Of course, it's something different if you use, e.g., a Solr server and index/manage data on it with some other application, not the same Drupal site. Then there would of course be no way to manage a cache for that server from inside Drupal – unless with specific support from the backend, if it can determine when the last data change occurred. Is that maybe what you mean?
Such a thing can't really happen for a database server, I'd say, or at least I don't think it's a use case we want to actively support.

@Nick_vh tells me using Solr is the most common case. I personally have zero experience with Search API. It sounds like you're saying that using the Database back-end is actually more common?

With the Database back-end, a more optimal caching solution is definitely possible, if and only if the indexing happens entirely in Drupal.

it would seem on the contrary that we can't actually ever cache results from a Solr server, while caching results from a database server wouldn't be a problem.

Correct.

In any case, I think we should move the CacheableDependencyInterface implementation directly to our query class

Sounds sensible :) Like I said, this is just a rough proposed potential implementation. Merely a suggestion. I leave it to you, the Search API maintainers, to make a decision, you know the architecture and constraints infinitely better than I do.


All that being said, and assuming we could indeed cache the search results when using the Database back-end, a very important question remains: what is the gain? How often do people perform the exact same queries? Some queries will likely be popular, and the vast majority will only be executed a few times per 24 hours. So, the cache hit ratio will likely be abominable. Which means we'll be caching a lot of things that will rarely, if ever, be retrieved. Which means we're wasting valuable storage space. On top of that, because for every content change requires things to be re-indexed, further reducing the longevity of the data in the cache.

Wim Leers’s picture

Also: What do you mean with "test the class/object X in isolation"? A view with only that one handler, or really just the object itself?

A view with only that one handler.

Examples: \Drupal\system\Tests\Entity\EntityCacheTagsTestBase and all of its subclasses, e.g. \Drupal\taxonomy\Tests\TermCacheTagsTest, \Drupal\system\Tests\Cache\PageCacheTagsTestBase and all of its subclasses, e.g. \Drupal\menu_ui\Tests\MenuCacheTagsTest.

i.e.: test one Views handler's cacheability at a time, not multiple simultaneously.

drunken monkey’s picture

I can thank both you and Acquia!

And OK, we really seem to have been talking about different scenarios. The Database backend is about 2.5 times as popular as the Solr backend, and even the Solr users will in over 90% of the cases (I'd estimate) use just a single server and index items only through Drupal. Even though you wouldn't believe it listening to Nick. ;P

Anyways, your other points are of course still valid: ultimately, caching searches will in almost all cases do more damage than good, so we really shouldn't fret over this too long. Just make sure we don't cache things accidentally (like we are currently doing), maybe add some API for adding caching information to the query class (so we can easily improve caching later, if we're sure we want to, without changing the API) and that's it.

Insofar, we probably don't even need those follow-ups for 2. – just ignore this for now, and create issues if we are sure we want to actually use caching. (Also, there's already the Views cache plugin, which I guess will still work?)

Oh, and in this case, do we even need the TodoCacheableDependencyTrait? Where is the cache information of Views handlers actually collected? If we always just return 0 for the max-age in the query plugin, are these still needed?

Oh, also, if we always set max-age to 0, do we also need to collect the cache contexts/tags? Or can this then go, too, for now? (Then we could just use the trait for the query plugin, too.)

Wim Leers’s picture

Sorry, I misinterpreted what Nick said. My bad!

Just make sure we don't cache things accidentally (like we are currently doing), […]

Exactly.

[…] maybe add some API for adding caching information to the query class (so we can easily improve caching later, if we're sure we want to, without changing the API) and that's it.

Indeed.

Oh, and in this case, do we even need the TodoCacheableDependencyTrait? Where is the cache information of Views handlers actually collected? If we always just return 0 for the max-age in the query plugin, are these still needed?

It's better/safer to be explicit. If every single Views handler that Search API provides explicitly says "nope, I'm not cacheable", then there will be no doubt.
But in theory, you're right: it's not necessary to set on all of these things. But, in order to set it on the minimum number of Views handlers necessary, you'd need to very thoroughly understand how Views works. And since Views can also do results caching independently of render caching, that can be quite complex. This approach I proposed is practical: it leaves no room for doubt, and no chance for anything to be cached. Which is also helpful for anybody diving into the code: their expectations will be immediately correct upon reading the code.
I think renaming TodoCacheableDependencyTrait to UncacheableDependencyTrait could make sense in that case. In fact, I opened an issue to add such a trait to Drupal 8 core: #2653346: Add UncacheableDependencyTrait: the opposite of UnchangingCacheableDependencyTrait.

Oh, also, if we always set max-age to 0, do we also need to collect the cache contexts/tags? Or can this then go, too, for now? (Then we could just use the trait for the query plugin, too.)

You're right, if we set max-age to zero, then none of the other things really matter… except in the case of the Views query handler, where we'd be returning zero ONLY if the search back-end doesn't support caching. Note that all other Views handlers use the trait I added, and those indeed return max-age zero and *no* cache tags, *no* cache contexts. So, that's already doing what you're saying.

borisson_’s picture

I think renaming TodoCacheableDependencyTrait to UncacheableDependencyTrait could make sense in that case. In fact, I opened an issue to add such a trait to Drupal 8 core: #2653346: Add UncacheableDependencyTrait: the opposite of UnchangingCacheableDependencyTrait.

I agree, this name makes more sense, I've renamed it in our codebase as well, this means that we can commit as-is and just remove our own implementation and use the core one when the core issue lands without changes in the trait name as well.

borisson_’s picture

This adds some basic tests. The test doesn't specifically test for the query's cache yet though, we should have a test for that. What is the best way to do this?

Status: Needs review » Needs work

The last submitted patch, 18: make_sure_that_cache-2624472-18.patch, failed testing.

The last submitted patch, 18: make_sure_that_cache-2624472-18.patch, failed testing.

Wim Leers’s picture

  1. +++ b/src/Plugin/views/cache/SearchApiCache.php
    @@ -92,28 +91,20 @@ class SearchApiCache extends Time {
    +        ->convertTokensToKeys($this->displayHandler
    +          ->getCacheMetadata()
    +          ->getCacheContexts())
    +        ->getKeys();
    

    Strange indentation.

  2. +++ b/src/Tests/CacheabilityTest.php
    @@ -0,0 +1,95 @@
    + * Tests the cacheability settings of Search API.
    

    s/settings/metadata/

  3. +++ b/src/Tests/CacheabilityTest.php
    @@ -0,0 +1,95 @@
    +  public static $modules = array(
    

    s/array()/[]/

  4. +++ b/src/Tests/CacheabilityTest.php
    @@ -0,0 +1,95 @@
    +    // Setup example structure and content and populate the test index with that
    

    s/Setup/Set up/

  5. +++ b/src/Tests/CacheabilityTest.php
    @@ -0,0 +1,95 @@
    +    \Drupal::getContainer()
    

    Isn't $this->container() available?

  6. +++ b/src/Tests/CacheabilityTest.php
    @@ -0,0 +1,95 @@
    +    $expected_tags = $index_cachetag . ' ' . $server_cachetag . ' ' . $view_cachetag . ' rendered';
    ...
    +    $this->assertHeader('x-drupal-cache-tags', $expected_tags);
    

    Rather than doing this, use:

    $this->assertCacheTag($index_cachetag);
    $this->assertCacheTag(…);
    
  7. +++ b/src/Tests/CacheabilityTest.php
    @@ -0,0 +1,95 @@
    +    $this->assertHeader('cache-control', 'must-revalidate, no-cache, post-check=0, pre-check=0, private');
    

    We removed post-check and pre-check in Drupal 8.1.x. This test will thus fail against 8.1. That's also why you see the test fail.

    I'd verify no-cache is present. But, better yet, don't we have some helper for this already?

borisson_’s picture

Thanks for your review Wim. I fixed most of the issues you had with the patch.

  1. Fixed.
  2. Fixed.
  3. We decided to use the long array notation for Search API.
  4. Fixed.
  5. It isn't, no.
  6. I didn't know about that one, that's awesome. Fixed.
  7. If there is a helper, I don't know about it. I did change it to $this->assertTrue(strpos($this->drupalGetHeader('cache-control'), 'no-cache'));
borisson_’s picture

Status: Needs work » Needs review

Back to NR for the bot. Run them tests, testbot.

Wim Leers’s picture

Assigned: Wim Leers » Unassigned
Issue tags: -Needs tests
+++ b/src/UncacheableDependencyTrait.php
@@ -0,0 +1,58 @@
+/**
+ * Trait to implement CacheableDependencyInterface for objects whose
+ * cacheability still needs to be:
+ * - investigated
+ * - optimized
+ * - tested
+ *
+ * For test coverage, test the class/object X (e.g. a Views handler) in
+ * isolation and apply a test procedure like this:
+ * 1. use X
+ * 2. verify this was a cache miss (if max-age = 0, it ends here)
+ * 3. use X again
+ * 4. verify this was a cache hit, with the expected cacheability metadata
+ *    (cache tags, contexts, max-age)
+ * 5. change the input data for X: config objects/entities/… (cache tags), or
+ *    the logged in user or permissions (cache contexts) — i.e. this is "input"
+ *    in the sense "output = function(input)", i.e. anything that causes the
+ *    output to change
+ * 6. use X again
+ * 7. verify this was a cache miss (if not, then the cacheability metadata is
+ *    incomplete inaccurate)
+ * 8. repeat steps 5, 6 and 7 for each input
+ *
+ * @see \Drupal\Core\Cache\CacheableDependencyInterface
+ */
+trait UncacheableDependencyTrait {

So, if you want to keep this comment, then it makes sense to keep this trait.

If you don't want to keep this comment per se (which I think you don't, given the discussion on this issue, and given the fact that this issue can easily be traced back in the future), then I think it makes sense to add a @todo to remove this as soon as Drupal 8 core ships with this same trait.

drunken monkey’s picture

Thanks for pointing that out! The attached patch corrects that comment, and the others in this class.

Also, I think we should just let the query plugin implement the UncacheableDependencyTrait, too – if we use a feature anyways, we can always introduce it later, even after a stable release. And I think we all agreed here that, in most cases, it will make no sense to cache search results anyways. And I feel that implementing a way to cache this after all wouldn't be worth putting off the beta (or stable) release.

If I misunderstood something and this code needed in any case, I apologize.
Or maybe we should only override getCacheMaxAge(), so that the cache tags and contexts are inherited from the plugin base class?

Also, would you say this can now be committed like this, or should it have more tests (or is something else still needed)?

drunken monkey’s picture

Hm, probably we should add tests for the Views cache plugin, right? Since this was broken before.
But maybe in a follow-up.

borisson_’s picture

Status: Needs review » Reviewed & tested by the community

We can commit this as is. I agree that we can always introduce the cacheable backend feature later.

Wim Leers’s picture

Status: Reviewed & tested by the community » Needs work

+1

Just one thing:

+++ b/src/UncacheableDependencyTrait.php
@@ -2,54 +2,61 @@
+   * The cache contexts associated with this object.
+   *
+   * These identify a specific variation/representation of the object.
+   *
+   * Cache contexts are tokens: placeholders that are converted to cache keys by
+   * the @cache_contexts_manager service. The replacement value depends on the
+   * request context (the current URL, language, and so on). They're converted
+   * before storing an object in cache.
+   *
+   * @return string[]
+   *   An array of cache context tokens, used to generate a cache ID.
+   *
+   * @see \Drupal\Core\Cache\Context\CacheContextsManager::convertTokensToKeys()
+   * @see \Drupal\Core\Cache\CacheableDependencyInterface::getCacheContexts()
...
+   * The cache tags associated with this object.
+   *
+   * When this object is modified, these cache tags will be invalidated.
+   *
+   * @return string[]
+   *  A set of cache tags.
+   *
+   * @see \Drupal\Core\Cache\CacheableDependencyInterface::getCacheTags()
...
+   * The maximum age for which this object may be cached.
+   *
+   * @return int
+   *   The maximum time in seconds that this object may be cached.
+   *
+   * @see \Drupal\Core\Cache\CacheableDependencyInterface::getCacheMaxAge()

The docs are just the ones on the interface. IMO it's confusing/bad to copy their docblocks. Because if those docblocks are amended, these will still remain the same.

I do realize that {@inheritdoc} doesn't work correctly here. But then use @see \Drupal\Core\Cache\CacheableDependencyInterface::METHOD() instead.

drunken monkey’s picture

Status: Needs work » Fixed

OK, good to hear!

@ Wim: I don't think there's any standard for that, but in this module I've now always copied the interface doc blocks and additionally added @see tags. Of course, none of this is really a perfect (or even good) solution, but I think this is most in line with our current standards and, as said, at least it's consistent within this module.
If there is a new standard at some point, or someone convinces me a different solution is significantly better, I can change all traits at once.

So, anyways: committed.
Thanks again for all your help here, Wim, and thanks for your work, Joris and Nick!

Wim Leers’s picture

Consistent within module == good :)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

larowlan’s picture

Also, I agree with borisson_: 1. and 3. have to be fixed here, 2. can use a follow-up issue (or even one for each plugin).

Did that follow-up ever get created? I'm noticing search pages built with views don't refresh on content edit - i.e. stale results.

pfrenssen’s picture

@larowlan, please check #2753667: Improve Views cache plugin(s) and their cache metadata for the followup of this issue.

It is also possible you are affected by #2839058: ViewPageController doesn't provide complete cacheability metadata.