Problem/Motivation

Search API defines a Highlight SearchApiProcessor plugin, but it works by postprocessing the results in a simplistic way (i.e.: it doesn't recognize stopwords, synonyms, or stemming). See also https://www.drupal.org/project/issues/search_api?text=highlight&status=Open

However, ElasticSearch has its own highlighter, which understands various locales, matches on synonyms and stems, and handles stopwords correctly.

Note that I tested my patch on the Search API OpenSearch module, and after some minor changes, it worked, so I've opened #3489307: Support OpenSearch server highlighting in the search_api_opensearch queue, so that both projects can collaborate on the idea.

Proposed resolution

Add a new Search API processor that uses ElasticSearch's highlighter instead.

More specifically, add a new SearchApiProcessor plugin that:

  1. preprocesses the search query sent to ElasticSearch, adding a highlight clause that properly leverages the Highlighting API
  2. postprocesses the search results when they come back from ElasticSearch to generate an Excerpt from the highlight clause in each search hit
  3. builds a SearchApiProcessor configuration form that exposes a bunch of options (see below)

Note that, because Search API OpenSeach doesn't yet support adding term_vector fields, the initial version of this patch won't support the Lucene Fast Vector Highlighter type (i.e.: fvh) and related highlighting options.

Remaining tasks

  1. Write a patch for 8.0.x-dev branch with tests
  2. Review and feedback — skipped because author is a maintainer
  3. RTBC and feedback
  4. Commit
  5. Release - released in 8.0.0-alpha4
  6. Backport to 8.x-7.x branch

User interface changes

Adds an "Elasticsearch Highlighter" processor to the page at /admin/config/search/search-api/index/YOUR_INDEX/processors with options to configure:

  1. the Fields to highlight (i.e.: choose from the list of fields in the index),
  2. the Highlighter type (i.e.: Unified or Plain),
  3. the Boundary scanner to use (if the Unified highlighter is selected; i.e.: Sentence or Word),
  4. the Boundary scanner locale to use (if the Sentence boundary scanner is selected),
  5. the Fragmenter to use (if the Plain highlighter is selected; i.e.: Simple or Span),
  6. the HTML tag to use to highlight the search term in the excerpt
  7. the Snippet encoder (i.e.: No encoding or HTML)
  8. the Maximum number of snippets per field
  9. the Snippet size (in characters)
  10. the Snippet size when there is no match (in characters)
  11. the Snippet order (i.e.: order they appear, or relevance)
  12. whether to show snippets from all fields, or only show snippets from fields that match the query
  13. the text to use to join snippets together when rendering the excerpt

API changes

Only API additions.

Data model changes

Adds a configuration in the plugin.plugin_configuration.search_api_processor.elasticsearch_connector_es_highlight namespace (see MR for details).

Original report by @thomas_henon

Hi,

I would like to use highlight SearchAPI processor but my excerpts aren't reliable without ElasticSearch backend support.
SearchAPI HighLight processor create an excerpt in PHP but StopWords or Synoyms aren't supported.

Is Highlighting with ElasticSearch support planned to by supported with ElasticSearch Connector ?

Thanks

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

thomas_henon created an issue. See original summary.

thomas_henon’s picture

Issue summary: View changes
eyilmaz’s picture

Version: 8.x-6.x-dev » 8.x-7.x-dev
StatusFileSize
new4.32 KB

Here is a first try with fixed values to support elasticsearch side highlights for 7.x version if the connector.

star-szr’s picture

Title: HighLighting support » Highlighting support (leverage Elasticsearch highlighting)
Assigned: thomas_henon » Unassigned
Status: Active » Needs review
StatusFileSize
new9.63 KB

Submitting a new patch with a different approach. This approach puts the configuration fully in a new processor, not in the index configuration, and minimizes the other changes to the module. This approach intentionally avoids using the standard search_api Highlight processor since that plugin does a lot of heavy lifting (in other words: things that can be a significant performance drag) that we can handle on the Elasticsearch side.

In many ways this is not as "fancy" as the search_api highlighting but does try to make more direct use of the Elasticsearch highlighting functionality and improve performance for larger result sets. Hopefully it's useful to somebody else.

Edit: Also (at least in my experience) the quality of the highlights is vastly improved, especially when using the plain highlighter.

Configuration options:
- Field to highlight (no support for multi-field highlighting)
- type (unified or plain, does not support fvh)
- number_of_fragments
- fragment_size
- pre_tags
- post_tags

No interdiff since this is a fresh start.

marios anagnostopoulos’s picture

StatusFileSize
new10.09 KB

Hey there. There is an issue in Search API, to allow highlighting to be provided by the search back end. When this lands, all the heavy lifting you mention (And you are quite right about that) will be skipped. I would suggest to not introduce an extra Processor like in #4, but leverage the existing one. I like though the addition of configuration.

I moved the config from patch #4 to the index third_party_settings, like the solr module does and I removed the pre/post tags configurations since they can be retrieved from the processor's config (The Search API processor I mean). As for the "Field to highlight" config, I think this is not something we want to be globally configurable since, different queries/views/whatever, might want to highlight different fields. For this I followed the existing approach of taking the intersection of the query's and the index's fulltext fields. I also extracted this logic into a helper function that can be used when building the query_string option.

Additionally for retrieving the default configuration, for the third party settings, instead of writing a helper function in the module file, I thought it best to introduce a Utility class for the module.
Finally I provide a simple update hook for the new configuration since the module's schema is changed. (I am not 100% sure that this is needed, but I had issues on existing installations and solved them like this)

I do not provide an interdiff, since there are many changes and whole files missing, and an interdiff would not be helpful.

Any input/review is welcome. Cheers!

marios anagnostopoulos’s picture

marios anagnostopoulos’s picture

StatusFileSize
new10.3 KB

I made an oopsie with the settings in #5 so I reupload the same patch (fixed)

marios anagnostopoulos’s picture

StatusFileSize
new10.29 KB
marios anagnostopoulos’s picture

StatusFileSize
new11.3 KB
new2.31 KB

Reuploading #8 with a more generic check for Highlight processors (for getting the config)

In the related issue I attached, I think the idea is that no support for something like that will be introduced in search API so probably we should expand on #4 instead.

hikkypo’s picture

StatusFileSize
new9.2 KB

Attempted to redo this patch for version 7 and drupal 9.5

andrechun’s picture

StatusFileSize
new9.76 KB

Re-roll the patch in #10. It was giving me fatal error because the changes for the new src/Utility.php file was missing.

mparker17’s picture

Version: 8.x-7.x-dev » 8.0.x-dev
Assigned: Unassigned » mparker17
Status: Needs review » Needs work

I've created a patch for ElasticSearch 8, that leverages Search API's API a little better, and supports a handful more options. Let's see if we can get it into the 8.0.x branch first, then backport it.

mparker17’s picture

Assigned: mparker17 » Unassigned
Status: Needs work » Needs review

I've created merge request !73. Reviews are welcome!

This seems like something we could also contribute to Search API OpenSearch!

mparker17’s picture

Title: Highlighting support (leverage Elasticsearch highlighting) » Support ElasticSearch server highlighting
Issue summary: View changes

(updated the issue summary)

mparker17’s picture

Issue summary: View changes
Status: Needs review » Needs work

Updated the issue summary, for parity with search_api_opensearch's issue #3489307: Support OpenSearch server highlighting.

Moving to Needs Work, because I think I'd like to add another test, this time against the ElasticSearch environment in CI. I also want to change the machine name of the new highlighter from elasticsearch_connector_es_highlight to elasticsearch_highlight.

mparker17’s picture

Issue summary: View changes
Status: Needs work » Reviewed & tested by the community

Yay, the test works!

star-szr’s picture

Title: Support ElasticSearch server highlighting » Support Elasticsearch server highlighting

I have tested the MR on a real site, reviewed the code, and I think it's ready to go.

+1 to RTBC.

mparker17’s picture

Issue summary: View changes

Awesome, thanks @star-szr!

  • mparker17 committed 886488ed on 8.0.x
    Issue #3077596 by mparker17, marios anagnostopoulos, star-szr, hikkypo,...
mparker17’s picture

Status: Reviewed & tested by the community » Fixed

Merged! Thanks everyone!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

mparker17’s picture

Issue summary: View changes

Updated issue summary to mention when it was released.