Problem/Motivation
There is an edge case where the _mappings for an index can be incorrect.
If you index from the command line and the index doesn't exist the BackendClient::indexItem() indexes the content anyway. It creates the packet to send to Opensearch with the data from the entities; and sends it. Opensearch, not having an index definition, is happy to accept the data (as it is designed to) and infers the type from the data sent to it.
Here are a few cases that can cause the issue outlined.
- I've upgraded my Opensearch server and cron indexed items while I was still transferring data.
- I've copied my database to a local development environment and ran
drush sapi-ito get some test data. - I've created a test enviroment in the cloud to test my PR and it didn't specifically have to update the Index entity.
- I've deleted the index intentionally and an automated index beat me to my next tasks.
- The index failed to create but that was missed or ignored.
- Many others...
Given these can happen during the course of everyday work, I think we need to handle the case gracefully.
Steps to reproduce
A concrete example is that if you have a date field it will be indexed with the type
{
"type": "date",
"format": "strict_date_optional_time||epoch_second"
}
. This _mapping will be created/updated on save or update of the Index entity.
If you remove the index from Opensearch and run drush sapi-i [index] the field will be indexed with the type
{
"type": "long",
}
.
Proposed resolution
In `BackendClient::indexItems()` do a check if the index exists and if not, add it, before processing the items.
Remaining tasks
User interface changes
API changes
Data model changes
Issue fork search_api_opensearch-3515396
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
nterbogt commentedComment #3
nterbogt commentedComment #4
nterbogt commentedComment #6
nterbogt commentedComment #7
nterbogt commentedComment #8
kim.pepperI agree this would be useful. OpenSearch will create an index when the first document is pushed, so this is existing behaviour. We are just making sure our mappings are correct when the index is created.
Comment #9
kim.pepperMarking RTBC. Hoping other maintainers can review as well.
Comment #10
kim.pepperUpdated title
Comment #11
kim.pepperChanges go into 3.x.
Comment #12
larowlanLooks good to me
Comment #15
kim.pepperCommitted to 3.x and 2.x. Thanks!
Comment #16
kim.pepperReleased in 2.4.0
Comment #17
nterbogt commented@mparker17 I'm pretty sure this bug affects elasticsearch_connector also.