Problem/Motivation

There is an edge case where the _mappings for an index can be incorrect.

If you index from the command line and the index doesn't exist the BackendClient::indexItem() indexes the content anyway. It creates the packet to send to Opensearch with the data from the entities; and sends it. Opensearch, not having an index definition, is happy to accept the data (as it is designed to) and infers the type from the data sent to it.

Here are a few cases that can cause the issue outlined.

  • I've upgraded my Opensearch server and cron indexed items while I was still transferring data.
  • I've copied my database to a local development environment and ran drush sapi-i to get some test data.
  • I've created a test enviroment in the cloud to test my PR and it didn't specifically have to update the Index entity.
  • I've deleted the index intentionally and an automated index beat me to my next tasks.
  • The index failed to create but that was missed or ignored.
  • Many others...

Given these can happen during the course of everyday work, I think we need to handle the case gracefully.

Steps to reproduce

A concrete example is that if you have a date field it will be indexed with the type

{
  "type": "date",
   "format": "strict_date_optional_time||epoch_second"
}

. This _mapping will be created/updated on save or update of the Index entity.

If you remove the index from Opensearch and run drush sapi-i [index] the field will be indexed with the type

{
  "type": "long",
}

.

Proposed resolution

In `BackendClient::indexItems()` do a check if the index exists and if not, add it, before processing the items.

Remaining tasks

User interface changes

API changes

Data model changes

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

nterbogt created an issue. See original summary.

nterbogt’s picture

Issue summary: View changes
nterbogt’s picture

Issue summary: View changes
nterbogt’s picture

Issue summary: View changes

nterbogt’s picture

Issue summary: View changes
nterbogt’s picture

Assigned: nterbogt » Unassigned
Status: Active » Needs review
kim.pepper’s picture

I agree this would be useful. OpenSearch will create an index when the first document is pushed, so this is existing behaviour. We are just making sure our mappings are correct when the index is created.

kim.pepper’s picture

Status: Needs review » Reviewed & tested by the community

Marking RTBC. Hoping other maintainers can review as well.

kim.pepper’s picture

Title: Index _mappings can be incorrect » Indexing data when index does not exist will not use defined mappings

Updated title

kim.pepper’s picture

Version: 2.3.0 » 3.x-dev

Changes go into 3.x.

larowlan’s picture

Looks good to me

  • kim.pepper committed 514ef7a8 on 3.x authored by nterbogt
    Issue #3515396 by nterbogt: Indexing data when index does not exist will...

  • kim.pepper committed 2ac97387 on 2.x
    Issue #3515396 by nterbogt: Indexing data when index does not exist will...
kim.pepper’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 3.x and 2.x. Thanks!

kim.pepper’s picture

Released in 2.4.0

nterbogt’s picture

@mparker17 I'm pretty sure this bug affects elasticsearch_connector also.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.