Note for reviewers: You may want to read the draft change notices for a better understanding of what is going on here... I think they clarify how to use the new API fairly well.

Problem/Motivation

The search_reindex() API has multiple problems, most notably that it clears the index, rather than reindexing (the name is very confusing and/or dead wrong). Also it has quite different behavior depending on what combination of arguments are passed in. If you pass in $sid, it clears out the index; if you don't, it instead asks all plugins to mark items for reindexing -- it's just inconsistent. Plus, it doesn't really check its arguments very well.

Proposed resolution

Instead of this one function that is confusing, have a more rational API for search:

search_index($sid, $type, $text, $langcode) [index text for ($type, $sid, $langcode)]
[this function already exists, for use by search plugins to call]

search_index_clear($type, $sid, $langcode) [delete an item from the index]
search_index_clear($type, $sid) ==> omit langcode, means "do all of this type/sid"
search_index_clear($type) ==> omit $sid, means "do all of this type"
[new function. the first two variations would be mostly for entity delete hooks/methods to call; the last is set up for a plugin to call. Replaces the functionality of search_reindex($type, $sid, $langcode)]

search_mark_for_reindex($type, $sid, $langcode) [mark an item as "updated, needs reindexing"]
search_mark_for_reindex($type, $sid) ==> omit langcode, means "do all of this type/sid"
search_mark_for_reindex($type) ==> omit $sid, means "do all of this type" [presumably called from a plugin]
[this function exists; would add ability to pass in just $type and add $langcode argument. First two variations would be mostly for entity update hooks/methods to call; the last is set up for plugins to call]

We should also change the order of search_index() parameters to be consistent with these functions:
search_index($type, $sid, $langcode, $text)

Also, the plugin API of SearchIndexingInterface needs an update. There is currently a method resetIndex(), which corresponds to search_mark_for_reindex() functionality. But the terminology is confusing. We should replace this by:


 /**
   * Clears the search index for this plugin.
   */
  public function indexClear();

// This next one would do what resetIndex() currently does, but renamed for less confusion.

  /**
   * Marks the search index for reindexing for this plugin.
   */
  public function markForReindex();

So if you wanted to mark the index for all plugins for reindexing, you would use the plugin manager to get a list of all active indexable search plugins, and call markForReindex() on all of them. This would let each plugin decide the appropriate action to take.

Beta phase evaluation

-->

Reference: https://www.drupal.org/core/beta-changes
Issue category Bug because the name of the existing search_reindex() function is very misleading, and it does very different things depending on its arguments. It should not be all lumped into one function and should not have that name. So this is important to fix -- it should either be a normal bug or a major task.
Reduces fragility This definitely reduces fragility in the search API, by making a consistent set of well-named functions for search plugins to use when dealing with the Core search index.
Disruption The disruption of this change is limited to contributed modules that provide Search plugins and use the Core search index. At this time I am not aware of any that have been ported to Drupal 8, and I do not think there are more than a few out there anyway. Most contributed search-related modules are for integrating Solr and other search index/retrieval engines, and do not use the core Search index.

Follow-up Items

As follow-ups to this, we should add to the search settings page the following:
a) A button to clear out the entire search index (we only have the ability to mark it for reindex currently). This was requested years ago:
#326062: Add clear search index functionality

b) For each defined search page that supports indexing, actions to clear/reindex their index.
(no issue yet, but could probably add this to that first issue)

Remaining tasks

- Agree on approach
- Write patch, including tests
- Review patch

Change Notices

The following change notices will need updates when this is committed:

a) Existing change notice https://www.drupal.org/node/2083471 will need an update. This is the master notice that tells how to migrate D7 hook_search_* to plugins. Needs to have the new NodeSearch plugin code inserted.

b) Existing change notice https://www.drupal.org/node/1737832 needs an update. It says:

search_reindex() changed from search_reindex($sid = NULL, $module = NULL, $reindex = FALSE) to search_reindex($sid = NULL, $module = NULL, $reindex = FALSE, $langcode = NULL) (defaults to reindexing/deleting all language variants of a $sid)

This needs to say:

search_reindex() was split into two functions and refactored. See https://www.drupal.org/node/2326575 for details.

Also the section on that page dealing with search_index() needs the parameters updated to search_index($type, $sid, $langcode, $text)

c) Existing change notice https://www.drupal.org/node/2173329 needs an update.... I actually think that the best thing to do with this one is to put a note at the top saying:
=====
This change notice is now obsolete, since the search_reindex() function has been refactored and removed. See
https://www.drupal.org/node/2326575 for details.
=====
and leave the rest how it is.

d) Draft change notices will need to be published.

User interface changes

None

API changes

As described in Proposed resolution.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ianthomas_uk’s picture

An alternative resolution would be to add optional $sid and $langcode parameters to SearchIndexingInterface::resetIndex().

ianthomas_uk’s picture

Issue summary: View changes

I've added markForReindex() to the issues summary and changed the proposal to use resetIndex() with optional parameters for consistency between the methods. (previously I was proposing a removeItem() method that required an $sid.)

jhodgdon’s picture

The issue summary here is incorrect in a few respects.

a) The main problem with this function is just its name, which has been the same since Drupal 5.x I think... Its functionality has always been to clear the search index, and I agree that is name is bad. We could change that.

b)

search_reindex(null, null, $langcode)
// Clears the index for all languages

That is true and documented -- if the first two params are null the language code is ignored and I do not think this is a problem.

c)

search_reindex(null, $type, null)
// Passes $sid = null to SQL queries (I need to check what this actually results in)

Yeah that is a problem. The queries will not work. You cannot do this.

d)

search_reindex($sid, null, null)
// Passes $type = null to SQL queries (I need to check what this actually results in)

It is documented that if you include $sid you also need to include $type, so if someone is calling it this way they deserve what they get (which is that the query will not work I believe).

Anyway... until we agree on the problem I do not agree with the proposed resolution. We have traditionally had an API function for doing this. As long as search_index() is an API function, I believe we do need functions to clear the index. We can't just put this on the plugins.

jhodgdon’s picture

So, let's step back here.

We have a little API for search plugins that use the search index to use, and hypothetically for makers of UI dashboards to use.

It consists of a few functions:

a) search_index($sid, $type, $text, $langcode), which indexes $text and stores it with (sid/type/langcode) in the tables. [to be called by search plugins]

b) search_mark_for_reindex($type, $sid), which changes the timestamp on all items (all languages) matching type/sid to the current time, to indicate that they need to be reindexed sometime soon. [to be called by hooks or entity save operations related to search plugins, such as when nodes are updated]

c) The poorly-named search_reindex($sid, $type, $langcode), which performs two unrelated functions:
1. Clear an item from the search index, or a group of items with the same language (in this case, sid/type must be provided and langcode is optional). [to be called by hooks or entity delete operations related to search plugins, such as when nodes are deleted]
2. Request that all existing plugins reindex their items from the search index, if this function is called with no arguments [this is called when you click the "reindex everything" button on the search settings page, and it is done by calling the plugins' resetIndex() methods, which is also kind of poorly named.]

I think we need all 4 of these pieces in the search API.

So what I think we should actually do is to break search_reindex up into two functions:

1. search_index_delete($sid, $type, $langcode = NULL) ==> this would delete an item from the index, and would require $sid and $type to be provided

2. search_index_reset() ==> this would invoke all of the plugin resetIndex() methods and have no arguments

Thoughts?

jhodgdon’s picture

I just noticed #2211259: Move search_index() to ConfigurableSearchPluginBase and marked as a duplicate of this issue.

So, moving these API functions should be done as a group if at all. But moving them to ConfigurablePluginBase is not the right move -- they are only useful for index-using plugins, which is not the same as configurable plugins.

We could potentially make them into a Trait (see #2134513: [policy, no patch] Trait strategy) or make indexing a Service. Probably making it a Service provided by the search module would make more sense. Don't we have an issue for that already actually? I guess not.

ianthomas_uk’s picture

I have no problem with moving all these plugins as a group, but it seemed easier to move search_index() first on its own, which is why I opened #2211259: Move search_index() to ConfigurableSearchPluginBase as a followup to this.

The only function in core (except tests) that currently calls search_index() is NodeSearch::indexNode(). If we were to call search_index() directly, passing it at least a node id and the string node_search, then it would fail to call hook_node_update_index() or respect node.cron_last from state. Therefore I would suggest anyone wanting to index or reindex a node should call NodeSearch::indexNode(), and search_index() should not exist.

I may have picked the wrong plugin or trait to place them on.

jhodgdon’s picture

There will be contrib plugins using the search index that are not connected in any way to NodeSearch. (I maintain one for 7, so assuming I port the module to 8, I know this is the case). So we absolutely do not want the API for plugins to call to be located on NodeSearch, and my plugin (and other contrib plugins) that use the index should not be using NodeSearch in any way.

So we need search_index() and the other functions in this API as outlined in #4. And we need these functions either to be in search.module or to be provided as a "search.index" service.

ianthomas_uk’s picture

Half the reason I opened this was the slightly odd range of APIs available. I'll talk about NodeSearch as a specific example, but I appreciate any solution needs to work for all plugins.

To delete one node, you call search_reindex($sid, 'node_search');
There is no function to clear the node index
There is no function to clear all indexes

To reindex one node, you call search_mark_for_reindex('node_seach', $sid) // Reversed parameters
To reindex all nodes, you call NodeSearch::resetIndex();
To reindex all indexes, you call search_reindex();

This should be a consistent set of functions, whether that's procedural or OO. The easiest solution seems to be to keep them procedural. We could do this with two functions with optional parameters, but then if someone happens to pass NULL when they meant to pass a value then they could accidentally wipe out their index, so six separate functions is probably better.

I'm really struggling to get a set of good, consistent names, but we might have:
search_index_reset_item($type, $sid, $langcode = NULL);
search_index_reset($type);
search_index_reset_all();
search_index_mark_item_for_reindex($type, $sid, $langcode = NULL);
search_index_mark_for_reindex($type); // Also called by NodeSearch::resetIndex()
search_index_mark_all_for_reindex();

The delete everything and start again approach is useful if you have a corrupt index, for example if there are items in there whose entities have since been deleted but for whatever reason the index wasn't updated. If this is handled by some others means then those functions may be unnecessary.

jhodgdon’s picture

I think functions that are deleting items from the index should have "delete" in the name, not "reset". We may also need to rename methods on the plugin interface/classes.

So, that's a good thought about what should be in search.module vs. what should be in the plugin. Let's think about this. Here are some guiding principles:

a) "API functions" discussed here should either be functions located in search.module, or methods in a "search.index" service provided by the search module.

b) Indexing is definitely more complex than just a database query, and needs to be centralized in an API function for plugins to call.

c) Anything that code outside the search plugin (entity hooks, etc.) would need to call should be provided in an API function:
- Deleting an item of specified (type, sid, langcode) from the search index (some plugins may want to do it this way)
- Deleting all language items with the same (type, sid) from the search index (node delete needs to do this for all languages)
- Marking an item (type, sid, langcode) in the search index as "updated, needs reindexing" (needs to be done on entity update)
- Marking all language items with the same (type, sid) as "updated, needs reindexing (node wants to do it this way)
- Clearing out the entire search index (e.g., because it is corrupted; there should be a UI button for this on the search settings page and we have an issue about it)
- Marking the entire index for reindexing (there is an existing UI button for this on the search settings page)

d) For convenience, we may also want to provide API-level functions for plugins to call that would:
- Run a query that deletes all items of a given $type from the search index
- Run a query that marks all items of a given $type as "updated, needs reindexing"
Most likely, both of these should notify plugins that this is happening, in case they have internal bookkeeping that needs to happen as well.

e) The API functions should not assume any knowledge of what $type or $sid means. For instance:
- relation between $type and $plugin->id, $entity->type, etc.
- relation between $sid and $entity->id.
Plugins should control these meanings for the $type string they've decided to use, and they shouldn't try to do anything with another plugin's index data.

f) The API functions and plugins should both understand that the 'reindex' column in search.dataset has this specific meaning: if it is zero, the item is deemed "updated and indexed", and if it is non-zero, it represents the request time for when it was marked "needs reindexing".

I also am not sure what "reset" means in a function name -- that seems a bit ambiguous at first glance.

Given that... I think we should have:

search_index($sid, $type, $text, $langcode) [index text for ($type, $sid, $langcode)]

search_index_clear($type, $sid, $langcode) [delete an item from the index]
search_index_clear($type, $sid) ==> omit langcode, means "do all of this type/sid"
search_index_clear($type) ==> omit $sid, means "do all of this type" [presumably called from a plugin]
search_index_clear() ==> omit $type, means "do it all", via query [notify all plugins this is happening]

search_mark_for_reindex($type, $sid, $langcode) [mark an item as "updated, needs reindexing"]
search_mark_for_reindex($type, $sid) ==> omit langcode, means "do all of this type/sid"
search_mark_for_reindex($type) ==> omit $sid, means "do all of this type" [presumably called from a plugin]
search_mark_for_reindex() ==> omit $type, means "do it all", via query [notify all plugins this is happening]

In which case, we would:
- Add $langcode arg to search_mark_for_reindex(), and make it work with those combinations of args
- Remove existing search_reindex() completely and replace by search_index_clear() with those combinations of args -- and these would all be based on queries, not calling plugin methods
- Add the button to the Search settings page that would clear out the entire index

And I think we would also modify the SearchIndexingInterface. We currently have this method:

  /**
   * Takes action when the search index is going to be rebuilt.
   *
   * Modules that use updateIndex() should update their indexing bookkeeping so
   * that it starts from scratch the next time updateIndex() is called.
   */
  public function resetIndex();

This is kind of silly -- NodeSearch's implementation of this just does a query that search_mark_for_reindex(NULL) could be doing itself.

So I think we should remove this, and instead add these two functions:

  /**
   * Responds to the entire search index being cleared.
   *
   * When a request is made to delete all items from the search index, this
   * method will be called on all plugins, so that they can update their own
   * indexing bookkeeping.
   *
   * @see search_index_clear()
   */
  public function searchIndexClearAll();

  /**
   * Responds to the entire search index being marked for reindex.
   *
   * When a request is made to mark the entire search index for reindexing, this
   * method will be called on all plugins, so that they can update their own
   * indexing bookkeeping.
   *
   * @see search_mark_for_reindex()
   */
  public function searchMarkAllForReindex();

In the case of NodeSearch, both of those methods would be empty. However, some plugins (such as my Search by Page module) will need these methods, if they have separate bookkeeping for indexing, and they correspond to hooks that used to live in search.api.php in D7. For convenience, we can add empty methods on SearchPluginBase, or make a SearchIndexingPluginBase with them added.

Thoughts? Is this clear and rational enough?

ianthomas_uk’s picture

If I want to clear the index of node_search, what do I call?
- NodeSearch::searchIndexClearAll()?
- search_index_clear('node_search')?
- Both?

This is similar to my earlier question about search_index() vs NodeSearch::indexNode().

jhodgdon’s picture

RE #10... That is a good question.

I think that you are right, we need to have functions/methods that can be triggered from the UI that are plugin-aware, probably for both clearing and marking for reindex, as opposed to the API functions above, which are dumb queries.

So... How about this:

1. We keep the API functions search_index(), search_index_clear(), and search_mark_for_reindex() outlined in #9.

2. Add API functions called:
search_index_clear_plugin($plugin_id) [optional $plugin_id ==> omitting would do all plugins]
search_mark_plugin_for_reindex($plugin_id) [optional $plugin_id ==> omitting would do all plugins]

These would load the plugin(s) and call methods on the plugin. So I think we would need these 2 methods now on the plugin, in place of what I suggested in #9 and the existing resetIndex():

 /**
   * Clears the search index for this plugin.
   *
   * When a request is made to delete all items from the search index, or to
   * delete all items related to this plugin, this method will be called. It
   * should call search_index_clear($type) to remove indexed items from the
   * search database, and do anything else necessary for plugin-specific
   * indexing bookkeeping.
   *
   * @see search_index_clear()
   * @see search_index_clear_plugin()
   */
  public function searchIndexClear();

  /**
   * Marks the search index for reindexing for this plugin.
   *
   * When a request is made to mark all items from the search index for
   * reindexing, or to mark all items related to this plugin for reindexing,
   * this method will be called. It should call search_mark_for_reindex($type)
   * to mark the items in the search database, and do anything else necessary
   * for plugin-specific indexing bookkeeping.
   *
   * @see search_mark_for_reindex()
   * @see search_mark_plugin_for_reindex()
   */
  public function searchMarkForReindex();

We might also want to add a "reindex" and a "clear" action to the Actions for each Search Page in the UI.

How's that?

jhodgdon’s picture

Actually, I have one other thought.

I think maybe we should not have search_index_clear(NULL) and search_mark_for_reindex(NULL). We should instead do it by active plugin type and let the plugins have control over it, with search_index_clear_plugin() and search_mark_plugin_for_reindex(). You could call those for a particular plugin (and the plugin could call search_index_clear($type) to do the database query), or for all plugins, but we wouldn't ever run a query that would just clear out the tables.

Thoughts?

jhodgdon’s picture

Issue summary: View changes

Sigh, one more thought. We should probably do the "by plugin" stuff instead by search page entity.

OK, at this point these last few comments are a mess and I'm going to update the issue summary. See what you think?

jhodgdon’s picture

ianthomas_uk’s picture

I don't think you've really answered #10, we've now got three APIs to choose from to clear the node index:
- search_index_clear($type = 'node_search')
- search_index_clear_search_page(NodeSearch $entity)
- NodeSearch::searchIndexClear()

(in the case of node the index is shared, so there's a 1-1 mapping from $type to plugin, although that's not always true for other plugins.)

We need a single primary API. It needs to take the search page entity if we're going to allow per-search page clearing. It needs to call a method on the plugin if we're to allow plugin-specific book keeping.

If we're always going to call a method on the plugin, then it makes sense to me to make that method the primary API. Any other functions would either be helpers that are only called by search plugins, or wrappers that are never called by search plugins.

jhodgdon’s picture

I do not understand your objections in #15 at all.

As stated in the issue summary:

- search_index_clear($type) is meant to be called **from plugins***, and is a handy way for them to get the database query done that clears their particular type of data from the index, so that we don't have to have every plugin writing the query itself (it has to clear data from two tables and I think there is value in centralizing this in one API function for plugins to use, rather than making them all reproduce this functionality).

- search_index_clear_search_page($entity) is going to be calling the clear method on the plugin. It is basically a wrapper at the API function level for calling that method. We also need the API function that calls the method on *all* plugins, and I think this shortcut is also useful to have. Is that a problem?

ianthomas_uk’s picture

Ah, sorry, I'd missed that search_index_clear($type) was intended to be called from plugins only (after they've done their own book-keeping).

The answer to my question from #15 is:
- search_index_clear($type = 'node_search') is a helper only called by search plugins
- search_index_clear_search_page(NodeSearch $entity) is a wrapper never called by search plugins
- NodeSearch::searchIndexClear() is the primary API I was talking about

The problem still applies to the other two forms though: search_index_clear($type, $sid, $langcode) and search_index_clear($type, $sid). If these are called by a hook, then the entity plugin will not be given a chance to perform its own book-keeping.

We've raised some important points, but I think now the easiest way to explain my thinking might be for me to write the patch I'm imagining and get your feedback on that when you can see the whole picture.

jhodgdon’s picture

Issue summary: View changes

I woke up this morning thinking we could just get rid of the *_page* functions that I proposed in the previous issue summariy. They were intended to be just helpers/wrappers that would call into the plugin methods. I think we should just drop them, and (as I think you've been trying to get through my thick head :) ), let the plugins be the primary (and only) API for keeping track of what $type corresponds to what plugin.

Updated the issue summary accordingly...

jhodgdon’s picture

Issue tags: +beta deadline

If we're going to do this, it should be pre-beta

xjm’s picture

@jhodgdon, are you sure? It isn't a public API change for anything other than search API. Generally only things that change data schemata or the critical APIs are beta deadline.

Edit: I'd suggest beta target, but not beta deadline. Unless you as component maintainers prefer not to change this after beta.

jhodgdon’s picture

Issue tags: -beta deadline +beta target

OK.

jhodgdon’s picture

Assigned: ianthomas_uk » jhodgdon

I think I'll take this one on, in any case.

jhodgdon’s picture

Status: Active » Needs review

Here is a patch. It does what the issue summary says, and also adds tests.

I think the API is considerably clearer. The function now called search_mark_for_reindex() marks nodes for reindexing [as it did before], and the function now called search_index_clear() clears items completely out of the index. Both of these use database queries entirely and do not involve plugins directly (so they're mostly to be called from plugins and entities and such).

Someone wanting to use plugins can use the plugin manager to find the plugin and call the new methods on the plugins.

The new tests pass locally... let's see if there's anything else that this breaks (shouldn't be) and then I'll write a change node, which should make things even clearer.

jhodgdon’s picture

Let's actually upload a patch. :)

jhodgdon’s picture

Title: Fix search_reindex() » Refactor search_reindex() into separate functions
FileSize
19.16 KB
10.72 KB

Looks like I didn't break anything...

I have one more revision to make on this though. I think that the "reindex" button on the Search settings page should ask each plugin to reindex its own stuff, rather than using the DB directly -- so it should call the newly renamed method on the plugin, as it was actually doing before. This will mean that something akin to Solr could implement "indexable" but really be using its own index database, and still be able to participate in the "reindex" action from the admin UI. Added some more tests for this too, and rewrote some of the UI text about reindexing.

Also decided the new method on the indexable plugin interface had a redundant name, so I renamed it.

I'm going to draft a change notice now too... or probably two separate ones actually, one for the search.module functions and one for the interface.

Status: Needs review » Needs work

The last submitted patch, 25: 2211241-search-reindex-revamp-25.patch, failed testing.

jhodgdon’s picture

Issue summary: View changes
FileSize
19.98 KB
1.63 KB

I took a look at change notices... three existing ones need updates; updated issue summary with notes.

Drafted new change notices:
https://www.drupal.org/node/2326575 for search_reindex() refactor/removal
https://www.drupal.org/node/2326581 for the interface changes

Also in drafting the 2nd change notice I again realized that a method name was kind of redundant, so I decided to change the name of the new index clear method on the interface. So here is one more patch.

Oh, and one of the tests needed a tweak for the UI changes in the previous patch.

jhodgdon’s picture

Status: Needs work » Needs review
jhodgdon’s picture

Category: Bug report » Task
jhodgdon’s picture

Issue summary: View changes

ianthomas_uk’s picture

Reading through the patch it all looks really good, I've just got a few small points.

We've still got two sets of APIs, the methods and the functions (e.g. $plugin->indexClear() vs search_index_clear()). I don't think it's clear from the documentation on the functions that they are intended as helpers for the plugins, and should not usually be called directly. The docs on SearchIndexingInterface do make this clear from the other direction.

  1. +++ b/core/modules/search/search.module
    @@ -139,42 +139,35 @@ (docs for search_index_clear)
    + *   (optional) The plugin ID or other machine-readable type for the items to
    + *   remove from the search index. If omitted, $sid and $langcode are ignored
    + *   and the entire search index is cleared.
    

    I was going to suggest allowing $langcode to be specified on its own, but then I remembered why we have this restriction:
    - $sid is maintained by the plugin/type, so it doesn't make sense to call this with an $sid but not a $type.
    - $langcode depends on $sid, because we've no way to tell if a new language has been added to an entity that has already been indexed.

    If we want to improve that (if it's even possible), I'd do it in a separate issue.

  2. +++ b/core/modules/search/src/SearchPageListBuilder.php
    @@ -207,11 +207,11 @@ public function buildForm(array $form, FormStateInterface $form_state) {
    -      '#title' => $this->t('Indexing settings'),
    +      '#title' => $this->t('Indexing settings for default index'),
    
    +++ b/core/modules/search/src/SearchPageListBuilder.php
    @@ -329,12 +329,14 @@ public function submitForm(array &$form, FormStateInterface $form_state) {
    -    // If these settings change, the index needs to be rebuilt.
    +    // If these settings change, the Core index needs to be rebuilt.
    

    I assume "Core index" and "default index" are the same thing, and basically refer to the tables maintained by our helper functions? (Core index being the term used in comments and default index in the UI).

    This makes sense in code, and while the term isn't really explicit, it's better than just using "index" on it's own. In the UI, I don't think it makes any sense. How do I know which page types use the default index and which implement their own?

    I think we should document that if page types want to use their own index then they should implement their own support for these settings, or provide additional UI to explain/override them.

To summarise, my suggested changes are to elaborate in the documentation for functions such as search_index_clear() and to remove the new "default index" term from the UI.

Reroll attached which fixes a simple conflict in a test.

jhodgdon’s picture

Thanks for the reroll and excellent review!

I think I have addressed your comments:
- I added documentation to the two functions in search.module stating that they're either meant for use by plugins or for building a UI. We currently have a UI that calls the search_mark_for_reindex() function, and we have a long-standing feature request for a UI that would call search_index_clear() as well. So I don't want to exclude that possibility by saying those functions are only for plugins to call.
- I added a paragraph to the SearchIndexingInterface documentation to suggest that if you're making a plugin that uses its own settings or its own index, you should make that clear to users.
- Changed code comments to say "default" instead of "Core" index for consistency with the UI
- Added some explanations to the UI to explain what "default" means for settings and index. Hopefully.

Thoughts?

Status: Needs review » Needs work

The last submitted patch, 33: 2211241-search-reindex-revamp-33.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
22.78 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch, 36: 2211241-search-reindex-revamp-36.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
22.79 KB
1.24 KB

Further reroll for changes in URL stuff that has been in flux lately in Core.

Status: Needs review » Needs work

The last submitted patch, 38: 2211241-search-reindex-revamp-38.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
22.79 KB

reroll.

ianthomas_uk’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
22.79 KB
+++ b/core/modules/search/src/SearchPageListBuilder.php
@@ -203,16 +203,16 @@ public function buildForm(array $form, FormStateInterface $form_state) {
+      '#description' => $this->t('The maximum number of items indexed in each pass of a <a href="@cron">cron maintenance task</a>. If necessary, reduce the number of items to prevent timeouts and memory errors while indexing. Some search page types may have their own setting for this', array('@cron' => \Drupal::url('system.status'))),

This should end in a full stop.

Other than that the patch looks good and addresses my concerns. I've added the missing full stop in the attached patch, but not changed anything else.

jhodgdon’s picture

Assigned: jhodgdon » Unassigned

Good catch! Thanks for the review and new patch.

jhodgdon’s picture

Category: Task » Bug report
Issue summary: View changes

I believe that this should properly be classified as a bug, because the name of the search_reindex() function is just wrong, either that or a major task because it "needs to be done".... if not, it at least definitely falls into the "reduces fragility" section of the flow chart on https://www.drupal.org/contribute/core/beta-changes ... I've added a "beta" section to the issue summary.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

I'm fine with this being marked as a bug.

  1. function search_index($sid, $type, $text, $langcode) {
    

    vs

    function search_index_clear($type = NULL, $sid = NULL, $langcode = NULL) {
    

    It is unfortunate that the argument order is changing. I would support changing the order of search_index to match all the new methods ie. so it is like

    function search_index($type, $sid, $langcode, $text) {
    

    There are only 4 usages in core - 3 of which are tests. That way search_index(), search_index_clear(), search_mark_for_reindex() are all consistent. search_reindex() used to be sort of consistent with search_index()

  2. +++ b/core/modules/search/search.module
    @@ -507,22 +503,41 @@ function search_index($sid, $type, $text, $langcode) {
    + * @param strong $langcode
    

    That langcode might be strong but it certainly is a string :)

ianthomas_uk’s picture

Status: Needs work » Needs review
FileSize
25.11 KB
3.56 KB

Don't be silly - this is PHP, we can't have consistent parameter order across functions ;)

This patch addresses Alex's comments, I've made sure it's covered by the CRs too.

I was concerned for a minute that people wouldn't notice the order had changed, but thinking about it indexing will be totally broken, so it should be relatively easy for contrib to spot the change and upgrade.

jhodgdon’s picture

Issue summary: View changes
Status: Needs review » Reviewed & tested by the community
FileSize
3.9 KB

That diffdiff is not very readable... Here's a proper interdiff file. The new patch looks good, and testbot agrees, so thanks! Good idea to take that one extra step and make all the API functions for Search consistent.

I updated one of the draft change notices for this new change: https://www.drupal.org/node/2326575

I also updated the issue summary to include this change, and I updated the instructions on which change notices need to be updated and how when this patch is committed.

alexpott’s picture

+++ b/core/modules/search/search.module
@@ -121,42 +121,38 @@ function search_preprocess_block(&$variables) {
+function search_index_clear($type = NULL, $sid = NULL, $langcode = NULL) {
+  $query_index = db_delete('search_index');
+  $query_dataset = db_delete('search_dataset');
+  if ($type) {
+    $query_index->condition('type', $type);
+    $query_dataset->condition('type', $type);
+    if ($sid) {
+      $query_index->condition('sid', $sid);
+      $query_dataset->condition('sid', $sid);
+      if ($langcode) {
+        $query_index->condition('langcode', $langcode);
+        $query_dataset->condition('langcode', $langcode);
+      }
     }
   }

This looks weird why is it not:

function search_index_clear($type = NULL, $sid = NULL, $langcode = NULL) {
  $query_index = db_delete('search_index');
  $query_dataset = db_delete('search_dataset');
  if ($type) {
    $query_index->condition('type', $type);
    $query_dataset->condition('type', $type);
  }
  if ($sid) {
    $query_index->condition('sid', $sid);
    $query_dataset->condition('sid', $sid);
  }
  if ($langcode) {
    $query_index->condition('langcode', $langcode);
    $query_dataset->condition('langcode', $langcode);
  }

  $query_index->execute();
  $query_dataset->execute();
}

I know you've got this documented and this was similar to the way the original function works but why?

ianthomas_uk’s picture

Status: Reviewed & tested by the community » Needs work

You can't pass $sid without a type, because two unrelated entities of different types might have the same $sid - you wouldn't want to remove indexed data for both node 5 and user 5, for example.

--

Now I think about $langcode again, I think it needs work and might need to be removed from the function totally.

The structure of the database tables is such that if we have one language of an entity indexed, then we assume that we've got all languages indexed. The update query looks for $sids that are missing from the table, and assumes that if a language is missing it's because it doesn't exist on the entity. That's normally fine, because if you add an extra language you're editing the entity so it will get picked up in a routine reindex, but if you remove a language from the database we won't know to reindex it.

Ideally we'd change NodeSearch::updateIndex() to properly handle entities that were indexed in some language only, but I think I looked at that before and couldn't do it.

It's not a problem with search_mark_for_reindex, because there's still a row in the index to say it needs to be updated. So maybe the solution is to delete the row in search_dataset and mark the row in search_index for reindexing.

jhodgdon’s picture

Status: Needs work » Reviewed & tested by the community

RE #48: $sid is meaningless without $type -- each search plugin defines what $sid means within its own $type, so if $sid is provided, you need to provide $type. Similarly, $langcode is also meaningless without $type, and we wouldn't want someone to try and clear out all the index for just one language. That would screw up plugins like NodeSearch, which (as described by Ian in #49) assumes that all languages are going to be indexed or all unindexed for a particular node.

But... I disagree with the conclusion of #49 that we should remove $langcode from our API functions. The Node module is using the Search index/database in the way you describe in its NodeSearch plugin, but that doesn't mean that the all possible plugins would use it that way, or that the generic API functions we provide should assume that someone should never want to clear out just one language/ID combination rather than the entries for all languages.

So I think if we remove $langcode we are making assumptions... Besides which, having $langcode as an argument for all three functions makes them look more similar, and also costs us next to nothing. I don't think we should make assumptions -- we want the search indexing and its database to be as generic as possible -- and I think having consistency in the API is desirable.

Provisionally setting back to RTBC, in hopes...

webchick’s picture

Assigned: Unassigned » alexpott

Back to Alex...

alexpott’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: +DX (Developer Experience)

This issue is a normal bug fix, the changes are disruptive to a small part of the search module and reduce fragility by making sense and being consistent. I feel that the win for DX and additional testing this patch adds is big enough to be allowed per https://www.drupal.org/core/beta-changes. Committed f5d180e and pushed to 8.0.x. Thanks!

Thanks for adding the beta evaluation for to the issue summary.

  • alexpott committed f5d180e on 8.0.x
    Issue #2211241 by jhodgdon, ianthomas_uk: Refactor search_reindex() into...
jhodgdon’s picture

Assigned: alexpott » Unassigned

Thanks!!!!!

I went through and updated all the change notices as noted in the issue summary. Drafts were published and there were edits to three others.

ianthomas_uk’s picture

My concern here is that at the moment is that with the only implementation we have in core, you can use the APIs we've provided exactly as documented and end up with content that will never be indexed, which seems pretty dangerous to me.

However, I agree that the APIs provided here are correct, the problem is with how NodeSearch is using them, so I've opened #2381635: Node search indexing queries are not language aware.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.