Problem/Motivation
Search API defines a Highlight SearchApiProcessor plugin, but it works by postprocessing the results in a simplistic way (i.e.: it doesn't recognize stopwords, synonyms, or stemming). See also https://www.drupal.org/project/issues/search_api?text=highlight&status=Open
However, ElasticSearch has its own highlighter, which understands various locales, matches on synonyms and stems, and handles stopwords correctly.
Note that I tested my patch on the Search API OpenSearch module, and after some minor changes, it worked, so I've opened #3489307: Support OpenSearch server highlighting in the search_api_opensearch queue, so that both projects can collaborate on the idea.
Proposed resolution
Add a new Search API processor that uses ElasticSearch's highlighter instead.
More specifically, add a new SearchApiProcessor plugin that:
- preprocesses the search query sent to ElasticSearch, adding a
highlightclause that properly leverages the Highlighting API - postprocesses the search results when they come back from ElasticSearch to generate an Excerpt from the
highlightclause in each search hit - builds a SearchApiProcessor configuration form that exposes a bunch of options (see below)
Note that, because Search API OpenSeach doesn't yet support adding term_vector fields, the initial version of this patch won't support the Lucene Fast Vector Highlighter type (i.e.: fvh) and related highlighting options.
Remaining tasks
Write a patch for 8.0.x-dev branch with testsReview and feedback— skipped because author is a maintainerRTBC and feedbackCommitRelease- released in 8.0.0-alpha4- Backport to 8.x-7.x branch
User interface changes
Adds an "Elasticsearch Highlighter" processor to the page at /admin/config/search/search-api/index/YOUR_INDEX/processors with options to configure:
- the Fields to highlight (i.e.: choose from the list of fields in the index),
- the Highlighter type (i.e.: Unified or Plain),
- the Boundary scanner to use (if the Unified highlighter is selected; i.e.: Sentence or Word),
- the Boundary scanner locale to use (if the Sentence boundary scanner is selected),
- the Fragmenter to use (if the Plain highlighter is selected; i.e.: Simple or Span),
- the HTML tag to use to highlight the search term in the excerpt
- the Snippet encoder (i.e.: No encoding or HTML)
- the Maximum number of snippets per field
- the Snippet size (in characters)
- the Snippet size when there is no match (in characters)
- the Snippet order (i.e.: order they appear, or relevance)
- whether to show snippets from all fields, or only show snippets from fields that match the query
- the text to use to join snippets together when rendering the excerpt
API changes
Only API additions.
Data model changes
Adds a configuration in the plugin.plugin_configuration.search_api_processor.elasticsearch_connector_es_highlight namespace (see MR for details).
Original report by @thomas_henon
Hi,
I would like to use highlight SearchAPI processor but my excerpts aren't reliable without ElasticSearch backend support.
SearchAPI HighLight processor create an excerpt in PHP but StopWords or Synoyms aren't supported.
Is Highlighting with ElasticSearch support planned to by supported with ElasticSearch Connector ?
Thanks
| Comment | File | Size | Author |
|---|---|---|---|
| #11 | highlighting-support-11-3077596.patch | 9.76 KB | andrechun |
Issue fork elasticsearch_connector-3077596
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
thomas_henon commentedComment #3
eyilmazHere is a first try with fixed values to support elasticsearch side highlights for 7.x version if the connector.
Comment #4
star-szrSubmitting a new patch with a different approach. This approach puts the configuration fully in a new processor, not in the index configuration, and minimizes the other changes to the module. This approach intentionally avoids using the standard search_api Highlight processor since that plugin does a lot of heavy lifting (in other words: things that can be a significant performance drag) that we can handle on the Elasticsearch side.
In many ways this is not as "fancy" as the search_api highlighting but does try to make more direct use of the Elasticsearch highlighting functionality and improve performance for larger result sets. Hopefully it's useful to somebody else.
Edit: Also (at least in my experience) the quality of the highlights is vastly improved, especially when using the
plainhighlighter.Configuration options:
- Field to highlight (no support for multi-field highlighting)
- type (unified or plain, does not support fvh)
- number_of_fragments
- fragment_size
- pre_tags
- post_tags
No interdiff since this is a fresh start.
Comment #5
marios anagnostopoulos commentedHey there. There is an issue in Search API, to allow highlighting to be provided by the search back end. When this lands, all the heavy lifting you mention (And you are quite right about that) will be skipped. I would suggest to not introduce an extra Processor like in #4, but leverage the existing one. I like though the addition of configuration.
I moved the config from patch #4 to the index third_party_settings, like the solr module does and I removed the pre/post tags configurations since they can be retrieved from the processor's config (The Search API processor I mean). As for the "Field to highlight" config, I think this is not something we want to be globally configurable since, different queries/views/whatever, might want to highlight different fields. For this I followed the existing approach of taking the intersection of the query's and the index's fulltext fields. I also extracted this logic into a helper function that can be used when building the query_string option.
Additionally for retrieving the default configuration, for the third party settings, instead of writing a helper function in the module file, I thought it best to introduce a Utility class for the module.
Finally I provide a simple update hook for the new configuration since the module's schema is changed. (I am not 100% sure that this is needed, but I had issues on existing installations and solved them like this)
I do not provide an interdiff, since there are many changes and whole files missing, and an interdiff would not be helpful.
Any input/review is welcome. Cheers!
Comment #6
marios anagnostopoulos commentedComment #7
marios anagnostopoulos commentedI made an oopsie with the settings in #5 so I reupload the same patch (fixed)
Comment #8
marios anagnostopoulos commentedComment #9
marios anagnostopoulos commentedReuploading #8 with a more generic check for Highlight processors (for getting the config)
In the related issue I attached, I think the idea is that no support for something like that will be introduced in search API so probably we should expand on #4 instead.
Comment #10
hikkypo commentedAttempted to redo this patch for version 7 and drupal 9.5
Comment #11
andrechun commentedRe-roll the patch in #10. It was giving me fatal error because the changes for the new src/Utility.php file was missing.
Comment #12
mparker17I've created a patch for ElasticSearch 8, that leverages Search API's API a little better, and supports a handful more options. Let's see if we can get it into the 8.0.x branch first, then backport it.
Comment #14
mparker17I've created merge request !73. Reviews are welcome!
This seems like something we could also contribute to Search API OpenSearch!
Comment #15
mparker17(updated the issue summary)
Comment #16
mparker17Updated the issue summary, for parity with search_api_opensearch's issue #3489307: Support OpenSearch server highlighting.
Moving to Needs Work, because I think I'd like to add another test, this time against the ElasticSearch environment in CI. I also want to change the machine name of the new highlighter from
elasticsearch_connector_es_highlighttoelasticsearch_highlight.Comment #17
mparker17Yay, the test works!
Comment #18
star-szrI have tested the MR on a real site, reviewed the code, and I think it's ready to go.
+1 to RTBC.
Comment #19
mparker17Awesome, thanks @star-szr!
Comment #21
mparker17Merged! Thanks everyone!
Comment #23
mparker17Updated issue summary to mention when it was released.