Blocks #606840: Enable internal page cache by default.

This fixes the 3 failures o/t parent issue in:

  • Drupal\search\Tests\SearchCommentTest

We must ensure that when search pages are cached in the internal page cache (or an external reverse proxy), they have a cache tag that is invalidated whenever the search index is updated. Otherwise, anonymous users will continue to see outdated search results.

Beta phase evaluation

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because missing cache tags are breaking page caching.
Prioritized changes The main goal of this issue is performance: it blocks #606840: Enable internal page cache by default.
Disruption Very little disruption. A method is added to the Search plugin interface, but it is given a default implementation in the base class.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Wim Leers’s picture

Assigned: Wim Leers » Unassigned
Issue summary: View changes
Status: Active » Needs review
FileSize
4.57 KB

Search index are per-plugin. So we set and invalidate cache tags on a per-plugin-search-index basis.

Wim Leers’s picture

Issue summary: View changes
Wim Leers’s picture

Assigned: Unassigned » Wim Leers
Status: Needs review » Needs work
Issue tags: +Needs tests
Wim Leers’s picture

Fabianx’s picture

If it was not for the 'Needs tests' part, I would RTBC this now :).

Berdir’s picture

Interesting. Approach looks good to me. Looks like we can't use the entity cache tag, because that is not always available?

https://www.drupal.org/project/search_api will need a similar issue I think, and it only has views integration to display search results. So I guess the query backend will need to add a similar cache tag, probably for the search index and server.

Speaking of that, search.module has views integration too. We should probably make sure that the corresponding cache tags also exist when a view with those filters is used?

Wim Leers’s picture

+++ b/core/modules/search/src/Tests/SearchCommentTest.php
@@ -159,8 +159,8 @@ function testSearchResultsComment() {
     // Invoke search index update.
-    $this->drupalLogout();
     $this->cronRun();
+    $this->drupalLogout();
 

Per #2461087-3: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable) & #2461087-4: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable), this hunk is unnecessary once #2461087: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable) lands. So let's postpone this on that issue.

Wim Leers’s picture

Title: Search reindexing should invalidate cache tags » [PP-1] Search reindexing should invalidate cache tags

Postponing per #7.

Wim Leers’s picture

Issue tags: -Needs tests
FileSize
8.31 KB
3.84 KB

Tests.

Wim Leers’s picture

Title: [PP-1] Search reindexing should invalidate cache tags » Search reindexing should invalidate cache tags
Assigned: Wim Leers » Unassigned
Status: Postponed » Needs review
FileSize
7.71 KB
655 bytes

#2461087: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable) landed, this is now unblocked.

Straight reroll, with the hunk mentioned in #7 removed.

Fabianx’s picture

  1. +++ b/core/modules/search/src/Plugin/SearchIndexingInterface.php
    @@ -46,6 +46,9 @@
    +   *
    +   * @return bool
    +   *   TRUE if any updates happened, FALSE otherwise.
    

    This is an API change.

  2. +++ b/core/modules/search/src/Tests/SearchPageCacheTagsTest.php
    @@ -25,11 +26,33 @@ class SearchPageCacheTagsTest extends SearchTestBase {
    +    // Create a node and update the search index.
    +    $this->node = $this->drupalCreateNode(['title' => 'bike shed shop']);
    

    ROTFL :-D

  3. +++ b/core/modules/search/src/Tests/SearchPageCacheTagsTest.php
    @@ -40,26 +63,42 @@ function testSearchText() {
    +    $this->node->title = 'bike shop';
    +    $this->node->save();
    

    Don't we need to run cron first to make this happen?

Leaving at Code needs review.

We need a beta eval and approval of the API / Interface change and a change record for the API change.

Wim Leers’s picture

Issue tags: +API change
  1. True. Added tag.
  2. :)
  3. Don't we need to run cron first to make this happen?

    No, the cache tags ensure that the displayed data is updated immediately.

Berdir’s picture

jhodgdon’s picture

Status: Needs review » Needs work

This patch may work, sort of, for NodeSearch, but it is not really quite right.

+++ b/core/modules/search/src/Controller/SearchController.php
@@ -123,7 +124,7 @@ public function view(Request $request, SearchPageInterface $entity) {
-        'tags' => $entity->getCacheTags(),
+        'tags' => Cache::mergeTags($entity->getCacheTags(), ['search_index:' . $plugin->getPluginId()]),
       ),

This is not necessarily correct. Although NodeSearch uses the plugin ID as $type, that is not necessarily what all other plugins will do. In particular, I am aware of one contrib module that, when ported to 8, will be using a different $type for each *instance* of the plugin, not one shared by this plugin type. So this line needs to change to ask the plugin for what $type it is using.

The other thing is that NodeSearch and potentially other plugins are also removing data from the search index in other places. You've got an invalidate in search_index_clear, but ..

Actually, wouldn't it be simpler to just put the invalidate in all of the functions in search.module that are adding or removing from the index? This would be a *lot* easier. These functions are:
search_index()
search_index_clear()

These are the only two ways that plugins using the index are supposed to be adding/removing data from the search index, so if you invalidate there, you'll be gold. Then you don't have to worry about changing the API for the search plugins.

Fabianx’s picture

I agree with #14, thanks for chiming in.

jhodgdon’s picture

As one of the maintainers of the D8 search module, it's my job to chime in. ;)

jhodgdon’s picture

As one other note, not all Search plugins even use the index. Take a look at UserSearch as one example that doesn't.

So in SearchController, you need to ask the active search plugin instance which $type, if any, it's using in the search index, and invalidate that. I suggest adding a method to SearchInterface to get this information, implementing it with a '' or NULL return in the base class SearchPluginBase, and then overriding this in NodeSearch to return 'node_search'.

If you'd like me to take this over and make a patch, let me know...

jhodgdon’s picture

Hm. So, in the case of a Node search, the following things would change which nodes are displayed as search results on a given results page:
- Updates to any node -- this issue should catch these, at least after cron runs, because they'll result in node reindexing. Language, published/unpublished status, taxonomy terms, and author directly go into the query in some cases.
- Any update to a node that affects node access, or turning on/off a node access module.
- Updates to factors that can affect search rankings. In Core, these are (in addition to editing the node, which will eventually filter into an update to the search index): number of comments, sticky flag, promoted flag, and number of times the node has been viewed if the Statistics module is turned on.

And then the page output should also depend on the display of the specific nodes that are in the search results.

User search is much simpler. It just matches on user name (and email address if an admin is searching), and the query also filters out blocked users (unless an admin is searching). So basically if any user account is edited, all user search pages should probably be invalidated.

Is that being taken care of elsewhere?

Wim Leers’s picture

Status: Needs work » Needs review
Issue tags: -API change
FileSize
7.44 KB
2.56 KB
2.71 KB

#14: thanks for the review! Super helpful pointers :)

Did this. But while doing this, I noticed that search_index actually calls search_index_clear()… so really, we only need tag invalidation in search_index_clear!

This also means no API change is needed anymore :)

See interdiff-14.txt.


#17: Done.

See interdiff-17.txt.


#18:

Updates to any node -- this issue should catch these, at least after cron runs, […]

Indeed, the patch already handles that.

Any update to a node that affects node access, or turning on/off a node access module.

Installing/uninstalling modules clears the render cache.

Updates to nodes affecting node access: see rendered entity cache tags below.

Updates to factors that can affect search rankings

#2241249: First step in making search results pages cacheable: add the associated SearchPage's cache tag took care of that already, and that is exactly what the existing SearchPageCacheTagsTest test is verifying to be working.
(Convince yourself by adding debug($tags) in Cache::invalidateTags(), modify the factors, save.)

And then the page output should also depend on the display of the specific nodes that are in the search results.

See rendered entity cache tags below.

User search is much simpler. It just matches on user name (and email address if an admin is searching), and the query also filters out blocked users (unless an admin is searching). So basically if any user account is edited, all user search pages should probably be invalidated.

Is that being taken care of elsewhere?

See rendered entity cache tags below.


Regarding rendered node cache tags: for search modules + cache tags, we've only had:

  1. #2241249: First step in making search results pages cacheable: add the associated SearchPage's cache tag, which took care of the search_page config entities (which contain the ranking factors)
  2. this issue, which is taking care of search index-dependent cache tags (to invalidate content when the search index is updated)

You've pointed out:

  1. modifications of node properties (which may affect node access, and thus the listed nodes),
  2. the node "search result" entity display
  3. modification of user properties (which may affect the listed users)

This is neither about the ranking factors, nor about the index, but about the representation of what is listed. That's clearly a different scope than this issue — this issue is specifically about re-indexing. So, yes, let's take care of that elsewhere: the solution for that is independent of what we do here. No need to delay this issue (and the issues this issue blocks) for that.

I opened #2464409: Search results should bubble rendered entity cache tags and set list cache tags.

To take care of it:

  • Points 1 and 3 can be addressed by setting the node_list and user_list cache tags, respectively.
  • Point 2 can be addressed by ceasing the early rendering that NodeSearch::prepareResults() does; rendering should actually happen only when the entire page is being rendered. If that turns out to be tricky in Search module's architecture, then we just need to be able to set the cache tags we need to be associated on individual search results, so that when the search-result.html.twig template is rendered, they are bubbled.

Status: Needs review » Needs work

The last submitted patch, 19: search_index_cache_tags-2460911-19.patch, failed testing.

Wim Leers’s picture

Status: Needs work » Needs review

Testbot failure, re-tested.

Fabianx’s picture

+++ b/core/modules/search/src/Controller/SearchController.php
@@ -123,7 +124,7 @@ public function view(Request $request, SearchPageInterface $entity) {
-        'tags' => $entity->getCacheTags(),
+        'tags' => Cache::mergeTags($entity->getCacheTags(), ['search_index:' . $plugin->getPluginId()]),

$plugin->getType();

and should check for NULL :)

+++ b/core/modules/search/search.module
@@ -152,6 +153,11 @@ function search_index_clear($type = NULL, $sid = NULL, $langcode = NULL) {
+  if ($type !== NULL) {
+    // Invalidate all render cache items that contain data from this index.
+    Cache::invalidateTags(['search_index:' . $type]);
+  }

Is this the same $type as $plugin->getType()?

This confuses me ...

Wim Leers’s picture

FileSize
8.61 KB
3.04 KB

#23:

  1. D'oh, forgot to update that one. Thanks! Fixed. This also means I had to update the test, which is great, because now it actually conforms to @jhodgdon's remarks in #17 :)
  2. Yes, it is. (NodeSearch calls search_index() with $type = $this->getPluginId(), which is passed on to search_index_clear(). It'd be better if NodeSearch would use $this->getType() instead of $this->getPluginId() in the 3 places where we're actually passing a type, but I don't want to do that here to keep changes minimal, and because then the comments in those places don't make sense anymore. See NodeSearch::(indexNode|indexClear|markForReindex).)
jhodgdon’s picture

This all looks correct to me now. Great work!

A few minor things I think should be addressed:

  1. +++ b/core/modules/search/search.module
    @@ -152,6 +153,11 @@ function search_index_clear($type = NULL, $sid = NULL, $langcode = NULL) {
    +  if ($type !== NULL) {
    

    This can just be if($type) I think?

    Also if $type is actually NULL, then I think we would need to invalidate *all* cached search pages with any $type?

  2. +++ b/core/modules/search/src/Controller/SearchController.php
    @@ -127,6 +128,13 @@ public function view(Request $request, SearchPageInterface $entity) {
    +    if ($plugin->getType() !== NULL) {
    

    Again, why !== NULL, why not just if($plugin->getType()) ?

  3. +++ b/core/modules/search/src/Plugin/SearchInterface.php
    @@ -65,6 +65,17 @@ public function getAttributes();
       /**
    +   * Returns the used search index type.
    +   *
    +   * @return string|null
    +   *   The plugin ID or other machine-readable type for the search index this
    +   *   search uses. NULL if no search index is used.
    

    Perhaps this is better as:

    Returns the search index type this plugin uses.

    and for the @return docs:

    The type used by this search plugin in the search index, or NULL if this plugin does not use the search index.

  4. +++ b/core/modules/search/src/Plugin/SearchInterface.php
    @@ -65,6 +65,17 @@ public function getAttributes();
    +   * @see search_index_clear()
    +   */
    

    Probably better (or in addition) to have @see to search_index() rather than search_index_clear() ?

  5. +++ b/core/modules/simpletest/src/WebTestBase.php
    @@ -2762,4 +2762,15 @@ protected function assertCacheTag($expected_cache_tag) {
    +   * Asserts whether an expected cache tag was absent in the last response.
    +   *
    +   * @param string $expected_cache_tag
    +   *   The expected cache tag.
    +   */
    +  protected function assertNoCacheTag($expected_cache_tag) {
    +    $cache_tags = explode(' ', $this->drupalGetHeader('X-Drupal-Cache-Tags'));
    

    Saying something in a "No" assertion is "expected" is a bit odd. ;)

    How about making the first line:

    Asserts that a cache tag is absent from the current page.

    And then change $expected_cache_tag to just $cache_tag

    and in the docs for the param: The cache tag to check.

  6. Maybe add a code comment in search_index() right where it calls search_index_clear() to note that this operation clears the cache tags?
Wim Leers’s picture

FileSize
9.29 KB
5.88 KB

Thanks for the review! :)

  1. I prefer strictness/exactness. But, you're the maintainer, so it's your call. Done.
    Also added a generic search_index cache tag, and expanded the test coverage.
  2. See 1.
  3. Done.
  4. Done.
  5. I kept the docs of the first line, so that it's still analogous with WebTestBase::assertCacheTag(). All your other requested changes have been applied.
  6. Done.
jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Looks great to me now! Assuming test bot agrees, I think this is ready to go. Thanks!

jhodgdon’s picture

Needs beta evaluation added to summary though...

Wim Leers’s picture

Issue summary: View changes

Beta evaluation added! :)

jhodgdon’s picture

Issue summary: View changes
Wim Leers’s picture

#30: oh, right, thanks for that correction!

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Looks good, thanks!

Committed and pushed to 8.0.x.

  • webchick committed 8fcce5b on
    Issue #2460911 by Wim Leers, jhodgdon: Search reindexing should...
jhodgdon’s picture

I had a moment of doubt about this, but just verified it is working correctly. Phew!

What I wanted to verify was that if I added my own Node search page at admin/configure/search/pages, that the cache tags are correct. And they are: it still has search_index:node_search in there, plus a config item for the particular machine name of the search page config. User searches are also working well. So, all good!

But I'm wondering if we should put this in the tests too? The reason I am wondering about this is that the default config entity for NodeSearch has machine name 'node_search', which happens to match the plugin type ID for the NodeSearch plugin. So, my worry that made me take a look at this was that in place of using the plugin type ID string 'node_search' in the search_index cache tag, we could have been mistakenly using the individual config entity's machine name (which is the same)... Should we do something about this in the tests or not worry?

Wim Leers’s picture

Hurray for the automatically generated config entity cache tags! Thanks to that, it's actually difficult to cause the problem you are describing :)

But, yes, let's add test coverage for that. I didn't even know you can configure additional NodeSearch pages!

If you can file a follow-up, I'll roll a patch.

jhodgdon’s picture

That sounds like a good division of labor to me! I'll also (of course) volunteer to review the patch. :) Filed:
#2465111: Add tests for caching on user-added Search page

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.