Problem/Motivation
Having set up an index with lots of content, any minor update to the search index configuration deletes the whole index. I would expect the content in the index to be marked to be reindexed at most. Deleting everything would need a manual index trigger and also a downtime in search while the content is not fully indexed.
Steps to reproduce
- Create an index
- Index content
- Export the search index, make a minor change like changing a label etc
- Import the search index and all your indexed content will be cleared
Proposed resolution
Mark the content for reindexing but not delete/clear it completely.
Remaining tasks
Check and create a patch.
User interface changes
-
API changes
-
Data model changes
-
| Comment | File | Size | Author |
|---|
Issue fork search_api_opensearch-3285438
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
varshith commentedThe following patch marks the content to be reindexed with there is any change in search index that is imported.
Comment #3
acbramley commentedThe
clear()is needed when settings change, otherwise we'll get an error when updating schema for example.Maybe we need a way to detect if settings have changed, and either clear or reindex based on that?
Comment #4
achap+1 for this. I'm in the same boat with a lot of content that takes a long time to index.
Just looking at how other backends handle this and
search_api_solrseems to detect if settings have changed, but calls reindex rather than clear:https://git.drupalcode.org/project/search_api_solr/-/blob/4.x/src/Plugin/search_api/backend/SearchApiSolrBackend.php#L1060
search_api_sajariseems to not do either and just updates the schema:https://git.drupalcode.org/project/search_api_sajari/-/blob/1.0.x/src/Plugin/search_api/backend/SearchApiSajariBackend.php#L393
@acbramley When you are referring to schema errors are you talking about a mismatch between the data structure in opensearch and drupal until re-indexing has occured?
If a clear is necessary is it possible to handle this so that downtime of search while the index is rebuilt is eliminated?
Comment #5
bkildow commentedI'm hitting a similar issue when updating the index with new fields. It looks like`clear()` gets called and further down the stack there is a search_api.task.updateIndex event that fires, and ends up calling `clear()` again and this ends up in an infinite loop. The stack looks like this:
Comment #6
bkildow commentedI was able to get around the infinite loop by removing the index and adding it back, instead of calling clear().
Comment #7
kim.pepperJust coming back to this, we can't change or remove field mappings on existing indexes, we can only add additional field mappings.
Comment #8
longwaveAlso running into this, even just a minor change to an index config that could simply be reindexed over the existing content means the index is cleared during deployment.
Index::postSave()already has some logic to handle this problem, can we just rely on this? Or at least detect the situation where fields are changed/removed and only clear in that case?Comment #9
longwaveA method we have tried to use to work around this is blue/green deployments; we have two identical indexes, and when we need to make a change we write to index 2 and ensure it is fully indexed before switching reads to use index 2, and then swap over again next time around. This adds manual deployment steps and is prone to error, though; it would be nice if this could be supported natively but not sure if this would be better off in Search API itself or in here.
Comment #12
bobooon commentedTook a crack at the problem. Search API Solr offers up a similar solution. Re-index is only triggered when a indexed field has changed. Tested with both in-site configuration changes and using configuration sync. There might be some edge cases. Definitely needs more eyes on it.
https://git.drupalcode.org/project/search_api_solr/-/blob/4.x/src/Plugin...
Comment #14
kim.pepperHiding the patch because we are using MRs and Gitlab CI now.
Comment #15
achapThanks for the MR. Checking if fields have changed does seem like a good idea but I think a clear rather than a re-index will be necessary as alluded to in previous comments. From the Opensearch docs:
See docs: https://opensearch.org/docs/latest/api-reference/index-apis/put-mapping/
So if we could detect if a field doesn't yet exist then we could use the re-index flag, and if it does already exist then we would need to clear. However that could get complicated if one field is added and another changed etc.
So just checking if any fields exist in new index vs original and only clearing then already seems like an improvement. However I think the clear would need to come before the call to updateSettings and updateFieldMapping
Comment #17
achapI took a stab at this with MR#52
It retrieves the mappings from the OpenSearch server and compares them with the local Drupal mappings after event subscribers have fired.
I don't think comparing the original index fields to the new ones will work because the modified mappings after event subscribers have fired aren't stored anywhere and so it's not really possible to compare.
The other thing to be aware of is the dynamic mappings in OpenSearch. If you don't explicitly provide a type it will guess based off the data that is indexed. That means that every time you click save even if nothing has changed it will clear the index (because there are no local mappings). This is mostly a problem if you create fields in processors etc and the fix is just to use the IndexParamsEvent to explicitly give your custom fields the correct types. I have found that if you incorrectly give a Drupal field that is a float for example a string type in Search API then OS will override it with the float type in the mappings once data has been indexed. That will also cause clearing rather than re-indexing. So you need to be careful to assign the correct type.
Did not tackle settings (analysers etc), only field mappings and doesn't negate the need for a blue/green deployment but should definitely help reduce the amount of times you need to switch indexes.
Comment #18
kim.pepperThis MR will need to be rebased to go onto 3.x branch. It's currently on 1.x which is security fixes only.
Comment #19
achapComment #20
kim.pepperNW for feedback and test fails
Comment #21
achapHave updated the code based on feedback. Looks like that error was unrelated to my code but I was able to trace it to the
search_api_itemtable not being installed in the SpellCheckTest so I added there.Comment #22
kim.pepperThis looks good to me. I'll leave it as RTBC in case anyone else has feedback over the next 24hrs.
Comment #23
achapSetting back to needs review as I found a small edge case after the type hinting changes when the OS mappings are not set.
Comment #26
kim.pepperCommitted to 3.x. Thanks!