Search reindexing should invalidate cache tags [#2460911]

Reference: https://www.drupal.org/core/beta-changes
Issue category	Bug because missing cache tags are breaking page caching.
Prioritized changes	The main goal of this issue is performance: it blocks #606840: Enable internal page cache by default.
Disruption	Very little disruption. A method is added to the Search plugin interface, but it is given a default implementation in the base class.

Comment	File	Size	Author
#26	interdiff.txt	5.88 KB	wim leers
#26	search_index_cache_tags-2460911-26.patch	9.29 KB	wim leers
#24	interdiff.txt	3.04 KB	wim leers
#24	search_index_cache_tags-2460911-24.patch	8.61 KB	wim leers
#19	interdiff-17.txt	2.71 KB	wim leers
#19	interdiff-14.txt	2.56 KB	wim leers
#19	search_index_cache_tags-2460911-19.patch	7.44 KB	wim leers
#10	interdiff.txt	655 bytes	wim leers
#10	search_index_cache_tags-2460911-10.patch	7.71 KB	wim leers
#9	interdiff.txt	3.84 KB	wim leers
#9	search_index_cache_tags-2460911-9.patch	8.31 KB	wim leers
#1	search_index_cache_tags-2460911-1.patch	4.57 KB	wim leers

Comment #1

Ghent 🇧🇪🇪🇺

commented 27 March 2015 at 15:34

Assigned:	wim leers	» Unassigned
Issue summary:	View changes
Status:	Active	» Needs review

Status	File	Size
new	search_index_cache_tags-2460911-1.patch	4.57 KB

Search index are per-plugin. So we set and invalidate cache tags on a per-plugin-search-index basis.

Log in or register to post comments

Comment #2

wim leers

Ghent 🇧🇪🇪🇺

commented 27 March 2015 at 15:35

Issue summary:

View changes

Log in or register to post comments

Comment #5

fabianx commented 27 March 2015 at 18:15

If it was not for the 'Needs tests' part, I would RTBC this now :).

Log in or register to post comments

Comment #6

berdir

German

Switzerland

commented 27 March 2015 at 22:55

Interesting. Approach looks good to me. Looks like we can't use the entity cache tag, because that is not always available?

https://www.drupal.org/project/search_api will need a similar issue I think, and it only has views integration to display search results. So I guess the query backend will need to add a similar cache tag, probably for the search index and server.

Speaking of that, search.module has views integration too. We should probably make sure that the corresponding cache tags also exist when a view with those filters is used?

Log in or register to post comments

Comment #7

wim leers

Ghent 🇧🇪🇪🇺

commented 28 March 2015 at 13:47

Status:	Needs work	» Postponed
Related issues:		+#2461087: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable)

+++ b/core/modules/search/src/Tests/SearchCommentTest.php
@@ -159,8 +159,8 @@ function testSearchResultsComment() {
     // Invoke search index update.
-    $this->drupalLogout();
     $this->cronRun();
+    $this->drupalLogout();

Per #2461087-3: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable) & #2461087-4: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable), this hunk is unnecessary once #2461087: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable) lands. So let's postpone this on that issue.

Log in or register to post comments

Comment #8

wim leers

Ghent 🇧🇪🇪🇺

commented 31 March 2015 at 09:31

Title:

Search reindexing should invalidate cache tags

» [PP-1] Search reindexing should invalidate cache tags

Postponing per #7.

Log in or register to post comments

Comment #9

wim leers

Ghent 🇧🇪🇪🇺

commented 31 March 2015 at 10:16

Issue tags:

-Needs tests

Status	File	Size
new	search_index_cache_tags-2460911-9.patch	8.31 KB
new	interdiff.txt	3.84 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	search_index_cache_tags-2460911-1.patch	4.57 KB

Tests.

Log in or register to post comments

Comment #10

wim leers

Ghent 🇧🇪🇪🇺

commented 1 April 2015 at 11:58

Title:	[PP-1] Search reindexing should invalidate cache tags	» Search reindexing should invalidate cache tags
Assigned:	wim leers	» Unassigned
Status:	Postponed	» Needs review

Status	File	Size
new	search_index_cache_tags-2460911-10.patch	7.71 KB
new	interdiff.txt	655 bytes

1 file was hidden/shown/deleted

Status	File	Size
hidden	search_index_cache_tags-2460911-9.patch	8.31 KB

#2461087: Add 'no_cache' route option to mark a route's responses as uncacheable (was: Cron run response should not be cacheable) landed, this is now unblocked.

Straight reroll, with the hunk mentioned in #7 removed.

Log in or register to post comments

Comment #11

fabianx commented 1 April 2015 at 12:34

+++ b/core/modules/search/src/Plugin/SearchIndexingInterface.php
@@ -46,6 +46,9 @@
+   *
+   * @return bool
+   *   TRUE if any updates happened, FALSE otherwise.

This is an API change.

+++ b/core/modules/search/src/Tests/SearchPageCacheTagsTest.php
@@ -25,11 +26,33 @@ class SearchPageCacheTagsTest extends SearchTestBase {
+    // Create a node and update the search index.
+    $this->node = $this->drupalCreateNode(['title' => 'bike shed shop']);

ROTFL :-D

+++ b/core/modules/search/src/Tests/SearchPageCacheTagsTest.php
@@ -40,26 +63,42 @@ function testSearchText() {
+    $this->node->title = 'bike shop';
+    $this->node->save();

Don't we need to run cron first to make this happen?

Leaving at Code needs review.

We need a beta eval and approval of the API / Interface change and a change record for the API change.

Log in or register to post comments

Comment #12

wim leers

Ghent 🇧🇪🇪🇺

commented 1 April 2015 at 12:39

Issue tags:

+API change

True. Added tag.
:)
Don't we need to run cron first to make this happen?

No, the cache tags ensure that the displayed data is updated immediately.

Log in or register to post comments

Comment #13

berdir

German

Switzerland

commented 1 April 2015 at 12:41

Opened #2463715: Properly integrate with Drupal Core's caching system for search API.

Log in or register to post comments

Comment #14

jhodgdon

she/her

English

commented 1 April 2015 at 15:20

Status:

Needs review

» Needs work

This patch may work, sort of, for NodeSearch, but it is not really quite right.

+++ b/core/modules/search/src/Controller/SearchController.php
@@ -123,7 +124,7 @@ public function view(Request $request, SearchPageInterface $entity) {
-        'tags' => $entity->getCacheTags(),
+        'tags' => Cache::mergeTags($entity->getCacheTags(), ['search_index:' . $plugin->getPluginId()]),
       ),

This is not necessarily correct. Although NodeSearch uses the plugin ID as $type, that is not necessarily what all other plugins will do. In particular, I am aware of one contrib module that, when ported to 8, will be using a different $type for each *instance* of the plugin, not one shared by this plugin type. So this line needs to change to ask the plugin for what $type it is using.

The other thing is that NodeSearch and potentially other plugins are also removing data from the search index in other places. You've got an invalidate in search_index_clear, but ..

Actually, wouldn't it be simpler to just put the invalidate in all of the functions in search.module that are adding or removing from the index? This would be a *lot* easier. These functions are:
search_index()
search_index_clear()

These are the only two ways that plugins using the index are supposed to be adding/removing data from the search index, so if you invalidate there, you'll be gold. Then you don't have to worry about changing the API for the search plugins.

Log in or register to post comments

Comment #15

fabianx commented 1 April 2015 at 19:18

I agree with #14, thanks for chiming in.

Log in or register to post comments

Comment #16

jhodgdon

she/her

English

commented 1 April 2015 at 19:39

As one of the maintainers of the D8 search module, it's my job to chime in. ;)

Log in or register to post comments

Comment #17

jhodgdon

she/her

English

commented 1 April 2015 at 19:43

As one other note, not all Search plugins even use the index. Take a look at UserSearch as one example that doesn't.

So in SearchController, you need to ask the active search plugin instance which $type, if any, it's using in the search index, and invalidate that. I suggest adding a method to SearchInterface to get this information, implementing it with a '' or NULL return in the base class SearchPluginBase, and then overriding this in NodeSearch to return 'node_search'.

If you'd like me to take this over and make a patch, let me know...

Log in or register to post comments

Comment #18

jhodgdon

she/her

English

commented 1 April 2015 at 20:01

Hm. So, in the case of a Node search, the following things would change which nodes are displayed as search results on a given results page:
- Updates to any node -- this issue should catch these, at least after cron runs, because they'll result in node reindexing. Language, published/unpublished status, taxonomy terms, and author directly go into the query in some cases.
- Any update to a node that affects node access, or turning on/off a node access module.
- Updates to factors that can affect search rankings. In Core, these are (in addition to editing the node, which will eventually filter into an update to the search index): number of comments, sticky flag, promoted flag, and number of times the node has been viewed if the Statistics module is turned on.

And then the page output should also depend on the display of the specific nodes that are in the search results.

User search is much simpler. It just matches on user name (and email address if an admin is searching), and the query also filters out blocked users (unless an admin is searching). So basically if any user account is edited, all user search pages should probably be invalidated.

Is that being taken care of elsewhere?

Log in or register to post comments

Comment #19

wim leers

Ghent 🇧🇪🇪🇺

commented 2 April 2015 at 11:52

Status:	Needs work	» Needs review
Issue tags:	-API change

Status	File	Size
new	search_index_cache_tags-2460911-19.patch	7.44 KB
new	interdiff-14.txt	2.56 KB
new	interdiff-17.txt	2.71 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	search_index_cache_tags-2460911-10.patch	7.71 KB

#14: thanks for the review! Super helpful pointers :)

Did this. But while doing this, I noticed that search_index actually calls search_index_clear()… so really, we only need tag invalidation in search_index_clear!

This also means no API change is needed anymore :)

See interdiff-14.txt.

#17: Done.

See interdiff-17.txt.

#18:

Updates to any node -- this issue should catch these, at least after cron runs, […]

Indeed, the patch already handles that.

Any update to a node that affects node access, or turning on/off a node access module.

Installing/uninstalling modules clears the render cache.

Updates to nodes affecting node access: see rendered entity cache tags below.

Updates to factors that can affect search rankings

#2241249: First step in making search results pages cacheable: add the associated SearchPage's cache tag took care of that already, and that is exactly what the existing SearchPageCacheTagsTest test is verifying to be working.
(Convince yourself by adding debug($tags) in Cache::invalidateTags(), modify the factors, save.)

And then the page output should also depend on the display of the specific nodes that are in the search results.

See rendered entity cache tags below.

User search is much simpler. It just matches on user name (and email address if an admin is searching), and the query also filters out blocked users (unless an admin is searching). So basically if any user account is edited, all user search pages should probably be invalidated.

Is that being taken care of elsewhere?

See rendered entity cache tags below.

Regarding rendered node cache tags: for search modules + cache tags, we've only had:

#2241249: First step in making search results pages cacheable: add the associated SearchPage's cache tag, which took care of the search_page config entities (which contain the ranking factors)
this issue, which is taking care of search index-dependent cache tags (to invalidate content when the search index is updated)

You've pointed out:

modifications of node properties (which may affect node access, and thus the listed nodes),
the node "search result" entity display
modification of user properties (which may affect the listed users)

This is neither about the ranking factors, nor about the index, but about the representation of what is listed. That's clearly a different scope than this issue — this issue is specifically about re-indexing. So, yes, let's take care of that elsewhere: the solution for that is independent of what we do here. No need to delay this issue (and the issues this issue blocks) for that.

I opened #2464409: Search results should bubble rendered entity cache tags and set list cache tags.

To take care of it:

Points 1 and 3 can be addressed by setting the node_list and user_list cache tags, respectively.
Point 2 can be addressed by ceasing the early rendering that NodeSearch::prepareResults() does; rendering should actually happen only when the entire page is being rendered. If that turns out to be tricky in Search module's architecture, then we just need to be able to set the cache tags we need to be associated on individual search results, so that when the search-result.html.twig template is rendered, they are bubbled.

Log in or register to post comments

Comment #20

2 April 2015 at 11:54

Status:

Needs review

» Needs work

The last submitted patch, 19: search_index_cache_tags-2460911-19.patch, failed testing.

Log in or register to post comments

Comment #21

2 April 2015 at 12:06

Wim Leers queued 19: search_index_cache_tags-2460911-19.patch for re-testing.

Log in or register to post comments

Comment #22

wim leers

Ghent 🇧🇪🇪🇺

commented 2 April 2015 at 12:06

Status:

Needs work

» Needs review

Testbot failure, re-tested.

Log in or register to post comments

Comment #23

fabianx commented 2 April 2015 at 12:39

+++ b/core/modules/search/src/Controller/SearchController.php
@@ -123,7 +124,7 @@ public function view(Request $request, SearchPageInterface $entity) {
-        'tags' => $entity->getCacheTags(),
+        'tags' => Cache::mergeTags($entity->getCacheTags(), ['search_index:' . $plugin->getPluginId()]),

$plugin->getType();

and should check for NULL :)

+++ b/core/modules/search/search.module
@@ -152,6 +153,11 @@ function search_index_clear($type = NULL, $sid = NULL, $langcode = NULL) {
+  if ($type !== NULL) {
+    // Invalidate all render cache items that contain data from this index.
+    Cache::invalidateTags(['search_index:' . $type]);
+  }

Is this the same $type as $plugin->getType()?

This confuses me ...

Log in or register to post comments

Comment #24

wim leers

Ghent 🇧🇪🇪🇺

commented 2 April 2015 at 13:05

Status	File	Size
new	search_index_cache_tags-2460911-24.patch	8.61 KB
new	interdiff.txt	3.04 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	search_index_cache_tags-2460911-19.patch	7.44 KB

#23:

D'oh, forgot to update that one. Thanks! Fixed. This also means I had to update the test, which is great, because now it actually conforms to @jhodgdon's remarks in #17 :)
Yes, it is. (NodeSearch calls search_index() with $type = $this->getPluginId(), which is passed on to search_index_clear(). It'd be better if NodeSearch would use $this->getType() instead of $this->getPluginId() in the 3 places where we're actually passing a type, but I don't want to do that here to keep changes minimal, and because then the comments in those places don't make sense anymore. See NodeSearch::(indexNode|indexClear|markForReindex).)

Log in or register to post comments

Comment #25

jhodgdon

she/her

English

commented 2 April 2015 at 13:34

This all looks correct to me now. Great work!

A few minor things I think should be addressed:

+++ b/core/modules/search/search.module
@@ -152,6 +153,11 @@ function search_index_clear($type = NULL, $sid = NULL, $langcode = NULL) {
+  if ($type !== NULL) {

This can just be if($type) I think?

Also if $type is actually NULL, then I think we would need to invalidate *all* cached search pages with any $type?

+++ b/core/modules/search/src/Controller/SearchController.php
@@ -127,6 +128,13 @@ public function view(Request $request, SearchPageInterface $entity) {
+    if ($plugin->getType() !== NULL) {

Again, why !== NULL, why not just if($plugin->getType()) ?

+++ b/core/modules/search/src/Plugin/SearchInterface.php
@@ -65,6 +65,17 @@ public function getAttributes();
   /**
+   * Returns the used search index type.
+   *
+   * @return string|null
+   *   The plugin ID or other machine-readable type for the search index this
+   *   search uses. NULL if no search index is used.

Perhaps this is better as:

Returns the search index type this plugin uses.

and for the @return docs:

The type used by this search plugin in the search index, or NULL if this plugin does not use the search index.

+++ b/core/modules/search/src/Plugin/SearchInterface.php
@@ -65,6 +65,17 @@ public function getAttributes();
+   * @see search_index_clear()
+   */

Probably better (or in addition) to have @see to search_index() rather than search_index_clear() ?

+++ b/core/modules/simpletest/src/WebTestBase.php
@@ -2762,4 +2762,15 @@ protected function assertCacheTag($expected_cache_tag) {
+   * Asserts whether an expected cache tag was absent in the last response.
+   *
+   * @param string $expected_cache_tag
+   *   The expected cache tag.
+   */
+  protected function assertNoCacheTag($expected_cache_tag) {
+    $cache_tags = explode(' ', $this->drupalGetHeader('X-Drupal-Cache-Tags'));

Saying something in a "No" assertion is "expected" is a bit odd. ;)

How about making the first line:

Asserts that a cache tag is absent from the current page.

And then change $expected_cache_tag to just $cache_tag

and in the docs for the param: The cache tag to check.

Maybe add a code comment in search_index() right where it calls search_index_clear() to note that this operation clears the cache tags?

Log in or register to post comments

Comment #26

wim leers

Ghent 🇧🇪🇪🇺

commented 2 April 2015 at 15:13

Status	File	Size
new	search_index_cache_tags-2460911-26.patch	9.29 KB
new	interdiff.txt	5.88 KB

1 file was hidden/shown/deleted

Status	File	Size
hidden	search_index_cache_tags-2460911-24.patch	8.61 KB

Thanks for the review! :)

I prefer strictness/exactness. But, you're the maintainer, so it's your call. Done.
Also added a generic search_index cache tag, and expanded the test coverage.
See 1.
Done.
Done.
I kept the docs of the first line, so that it's still analogous with WebTestBase::assertCacheTag(). All your other requested changes have been applied.
Done.

Log in or register to post comments

Comment #27

jhodgdon

she/her

English

commented 2 April 2015 at 15:25

Status:

Needs review

» Reviewed & tested by the community

Looks great to me now! Assuming test bot agrees, I think this is ready to go. Thanks!

Log in or register to post comments

Comment #28

jhodgdon

she/her

English

commented 2 April 2015 at 15:28

Needs beta evaluation added to summary though...

Log in or register to post comments

Comment #29

wim leers

Ghent 🇧🇪🇪🇺

commented 2 April 2015 at 15:59

Issue summary:

View changes

Beta evaluation added! :)

Log in or register to post comments

Comment #30

jhodgdon

she/her

English

commented 2 April 2015 at 16:07

Issue summary:

View changes

Log in or register to post comments

Comment #31

wim leers

Ghent 🇧🇪🇪🇺

commented 2 April 2015 at 16:42

#30: oh, right, thanks for that correction!

Log in or register to post comments

Comment #32

webchick

she/they

English

Vancouver 🇨🇦

commented 2 April 2015 at 17:27

Status:

Reviewed & tested by the community

» Fixed

Looks good, thanks!

Committed and pushed to 8.0.x.

Log in or register to post comments

Comment #33

2 April 2015 at 17:28

webchick committed 8fcce5b on

Issue #2460911 by Wim Leers, jhodgdon: Search reindexing should...

Log in or register to post comments

Comment #34

jhodgdon

she/her

English

commented 2 April 2015 at 21:16

I had a moment of doubt about this, but just verified it is working correctly. Phew!

What I wanted to verify was that if I added my own Node search page at admin/configure/search/pages, that the cache tags are correct. And they are: it still has search_index:node_search in there, plus a config item for the particular machine name of the search page config. User searches are also working well. So, all good!

But I'm wondering if we should put this in the tests too? The reason I am wondering about this is that the default config entity for NodeSearch has machine name 'node_search', which happens to match the plugin type ID for the NodeSearch plugin. So, my worry that made me take a look at this was that in place of using the plugin type ID string 'node_search' in the search_index cache tag, we could have been mistakenly using the individual config entity's machine name (which is the same)... Should we do something about this in the tests or not worry?

Log in or register to post comments

Comment #35

wim leers

Ghent 🇧🇪🇪🇺

commented 3 April 2015 at 04:28

Hurray for the automatically generated config entity cache tags! Thanks to that, it's actually difficult to cause the problem you are describing :)

But, yes, let's add test coverage for that. I didn't even know you can configure additional NodeSearch pages!

If you can file a follow-up, I'll roll a patch.

Log in or register to post comments

Comment #36

jhodgdon

she/her

English

commented 3 April 2015 at 14:01

That sounds like a good division of labor to me! I'll also (of course) volunteer to review the patch. :) Filed:
#2465111: Add tests for caching on user-added Search page

Log in or register to post comments

Comment #37

17 April 2015 at 14:04

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Log in or register to post comments

Parent issue:		» #606840: Enable internal page cache by default
Related issues:	-#606840: Enable internal page cache by default

Assigned:	Unassigned	» wim leers
Status:	Needs review	» Needs work
Issue tags:		+Needs tests

Search reindexing should invalidate cache tags

Beta phase evaluation

Comments

Parent issue

Related issues

Referenced by