Hi. In my context the Excerpt views field is displayed only in case of a full-word search.

E.g., "...I have some apples..." will be displayed if searching "some" but not if searching "som".

The item which contains this keyword is found correctly, it just that the excerpt field itself is not displayed (returned empty).

Any ideas?
Thanks.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

Status: Active » Postponed (maintainer needs more info)

What backend are you using, Solr, DB or something else? What server settings? Are you using the "Highlighting" processor?

Amir Simantov’s picture

Hi and thanks :)

- I am using the search_api_db.
- "Search on parts of a word" in the server settings is enabled.
- "Highlighting" in the index settings is enabled (and works). Default options under the corresponding vertical tab.

drunken monkey’s picture

Version: 7.x-1.13 » 7.x-1.x-dev
Component: Views integration » Plugins
Category: Support request » Feature request
Status: Postponed (maintainer needs more info) » Active

Ah, OK, that's clear then – that's something that's not supported at the moment. As the "Highlight" processor and the "Database" backend are separate systems, the former cannot know how the latter achieved a match. The processor therefore just tries to match on complete words, ignoring partial matches.

Adding a "highlight partial matches" option to the "Highlight" processor would probably be what you need, which makes this a feature request. As in the other issue, I'd happily accept a patch, but won't have time to work on this myself in the foreseeable future.

Amir Simantov’s picture

Category: Feature request » Support request

Thanks again. I am, however, not sure you have spotted the issue correctly. This is not a highlighting issue at all. As you describe it, it may be indeed a feature request. But this is not the issue here. The problem is that the excerpt itself, is nof displayed (disregarding highlighting).

drunken monkey’s picture

If nothing could be found to highlight, then there will also not be any excerpt. This is not a bug, it's expected behavior.
You can work around it with Views, e.g., by using appropriate "No results behavior" to use the shortened body text (or something else) if there is no excerpt.

Amir Simantov’s picture

OK, I better understand now how it works (or, in our case - does not work...). Thanks for the explanation!

BTW, does Solr support it?

drunken monkey’s picture

Status: Active » Fixed

No problem.

Solr is able to highlight natively, so you don't need the "Highlight" processor and will even get excerpts/highlighting for partial matches (if you have Solr configured to return those).

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

acbramley’s picture

Thanks @drunken monkey, I got this working with your advice and a bit of rejigging:

1) Disable highlight processor on the search index
2) Ensure the following are ticked on the server: "Retrieve result data from Solr" and "Highlight retrieved data"
3) Change my view from using the Excerpt field to using "The main body text: Text (indexed)"

Now I can search for stemmed matches and I get a highlighted value when the stemmed version matches, it also means the body is still returned when there's no highlighted value.

Thanks a lot again!

thommyboy’s picture

Category: Support request » Feature request
Status: Closed (fixed) » Active

May I re-open this issue as a feature request? I think it's not very "logical" that a search does return matches from a fulltext-search and afterwards does not show an excerpt because highlighting something is impossible?!
In my case I search for lets say "gliding" and it returns matches for "i like gliding" and "i like speedgliding" but the latter excerpt is not shown?
@drunkenmonkey I just had a quick look at the code and it seems you use a preg_match on word boundaries- can't this simply be changed somehow? I just replaced it with stristr which does return an excerpt (on single search terms...) and not sure if there are more things to keep in mind...

drunken monkey’s picture

Sure, we can keep it as a feature request, if you like. I don't think anyone will have the time to implement this properly, though. But I'm open for patches.
However, are you using Solr? If so, you should just use Solr's built-in highlighting, which should work as expected. (Otherwise, update to the latest dev version of the Solr module, and either comment in an existing issue or create a new one if it's still not working.)
The problem of the "Highlight" processor is that it cannot know which matching logic the backend will use, and is thus unable to decide whether to highlight matches that are parts of words. But maybe a simple option for the processor would already help there?

thommyboy’s picture

Hm- I might not realize the depths of the code behind it but there seems to be simply a pregmatch searching for the search string(s?) at word beginning and simply using stristr did somehow work in my case. This is for sure not a complete patch but maybe I find the time... Using db not Solr though

drunken monkey’s picture

The problem is that this will necessarily also cause false positives, like when you search for "emo" and it highlights "tremolo". And even highlighting "emotion" might be incorrect, depending on your backend settings (regarding prefix matches).

thommyboy’s picture

why should highlighting "tremolo" be wrong at least if "search on parts of a word" is activated for the server?
i think there might still be some work- e.g. I have the mentioned setting enabled and it seems it does find the content.
but for autocomplete (might be another issue though) and don't get "gleitschirmfliegen" proposed when searching for "schirm" here http://4-seasons.tv/
So the highlighting in excerps seems to have the same "restriction" like the autocomplete-suggestions (searching on word-starts only)

drunken monkey’s picture

why should highlighting "tremolo" be wrong at least if "search on parts of a word" is activated for the server?

Yeah, I noticed that afterwards, too, sorry! I thought that feature was only a prefix search, not complete infix matching.
Anyways, the problem remains the same: we can't know how the underlying backend does its matching.

For the Autocomplete module: a) that's a completely different issue and b) I'd see it just feels much more natural there to just complete what was being typed, and not also search for an additional prefix to the input.
If you disagree with that, though, it should be pretty easy to implement a variant of the suggestion algorithm as a new "suggester" plugin.

jelo’s picture

I would love to see this feature as well and like thommyboy I am not sure what the issue is. As he said, the existing highlighter processor uses a regular expression on word boundary which could be changed to search within words.

It seems to me that highlighting is independent from the actual searching, i.e. there is a backend process that determines the matches based on configuration. Then a result is returned. The processor runs on the result set and simply highlights the keywords in the text fields. If this is true, then it seems irrelevant how and if the search process functions. Logically, it would still make sense to highlight any occurrences of my keywords even if that particular instance may not have contributed to the ranking during the search.

Examples/Scenarios:
Search is configured to not search on parts of a word.
Text: Lorem ipsum dolor sit amet, nec tale quaestio instructior ea, mel in dolorem tractatos.
Keyword: Lorem
Search finds this text, ranks it and lets assume it is the best result. Would the user not expect to see LOREM and doLOREM in bold as highlighted (even though only LOREM may have been used to determine the ranking of this result)?

Search is configured to not search on parts of a word.
Text: nec tale quaestio instructior ea, mel in dolorem tractatos.
Keyword: Lorem
Now lorem is included in dolorem, but search does not rank it due to the setting of ignoring parts of words. This result is not returned, no issues for highlighting.

Search is configured to search on parts of a word.
Text: nec tale quaestio instructior ea, mel in dolorem tractatos.
Keyword: Lorem
Now this text is returned as result, but we have no excerpt which is the case we are trying to solve. dolorem should be highlighted.

Search is configured to search on parts of a word.
Text: Lorem ipsum dolor sit amet, nec tale quaestio instructior ea, mel in dolorem tractatos.
Keyword: Lorem
Search finds this text, ranks it and now the ranked result might have a higher ranking because dolorem is considered in the algorithm, i.e. both occurrences should be highlighted anyway instead of just lorem.

Case in point: shouldn't all occurrences always be highlighted, irrespective of the search settings? The setting does not state to highlight only occurrences of words that were used for the ranking and the expectation may be to actually see highlighting in pages.

drunken monkey’s picture

As said:

Sure, we can keep it as a feature request, if you like. I don't think anyone will have the time to implement this properly, though. But I'm open for patches.

Jelle_S’s picture

Status: Active » Needs review
FileSize
1.54 KB

Patch with solution as suggested in #3.

RKopacz’s picture

@Jelle_S, you are a gem! I just finished explaining the constraint on partial word searches, that it was currently not possible! I will test this patch out on that site, which is still in dev, and report back. Thank you!

Amir Simantov’s picture

I am not sure how this patch can help here. If the excerpt does not return in the partial string, how could it be highlighted?

drunken monkey’s picture

Status: Needs review » Needs work

Great job, looks pretty good!

However, you can just use !empty($this->options['highlight_partial']) for checking for the option, and then use str_ireplace() with array arguments for the replacing instead of preg_replace() – the latter is currently only needed because we have to check for word boundaries.
But with those two changes, and if a few people report this patch working for them, I'd happily commit it. Thanks again!

Oh, just one open question, I guess, for those who plan to use this feature: would you say just the keyword (as entered by the user) should be highlighted when found, or the complete word containing it?

jelo’s picture

I would vote for just the keyword, e.g. lores as keyword would display as dolores

RKopacz’s picture

Spoke too soon in #19. I am using version 8. Will look to see what is involved to port this to 8.

graper’s picture

This required another line change to allow the generation of the excerpt so it could then be highlighted.

graper’s picture

Seems that my patch included the changes from #18, so I'm hiding the older patch file to provide less confusion.

drunken monkey’s picture

Status: Needs work » Needs review
FileSize
3.88 KB
3.74 KB

Thanks for the new revision. Had trailing spaces in one line and one extra line that was commented out, but otherwise looks good. Please also attach an interdiff in such cases, though. It makes reviewing easier.

Anyways, here is a patch with also my suggested improvements from #21. Please test/review!

seattlehimay’s picture

Patched my 7.x install, and everything is working as hoped. Thanks for the issue and patch!

  • drunken monkey committed 5447fe0 on 7.x-1.x authored by Jelle_S
    Issue #2358065 by Jelle_S, graper, drunken monkey: Added the option for...
drunken monkey’s picture

Status: Needs review » Fixed

Great to hear, thanks for reporting back!
Committed.
Thanks a lot again, everyone!

mErilainen’s picture

Is this going to be ported to D8 version? Or is there another way to achieve it?

drunken monkey’s picture

Version: 7.x-1.x-dev » 8.x-1.x-dev
Status: Fixed » Patch (to be ported)

Oh, you're right, thanks for pointing this out!
Should of course be ported to D8.

tstoeckler’s picture

Status: Patch (to be ported) » Needs review
FileSize
3.12 KB

Here's a try at porting this to D8. I didn't actually try this out (will do that now). Also this is the first time I've looked at this code (both D7 and D8), so while the code did look somewhat similar it would be good for this to be reviewed thoroughly even though it's "just" a port.

Status: Needs review » Needs work

The last submitted patch, 32: 2358065-32-d8.patch, failed testing.

tstoeckler’s picture

Status: Needs work » Needs review
FileSize
3.71 KB
20.79 KB

Oops, sorry.

tstoeckler’s picture

FileSize
3.12 KB

Ahhh, this time without PhpStorm configuration. Sorry for the noise.

The last submitted patch, 34: 2358065-34-d8.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 35: 2358065-35-d8.patch, failed testing.

tstoeckler’s picture

Status: Needs work » Needs review
FileSize
742 bytes
3.12 KB

Sorry :-( Not my day today...

Status: Needs review » Needs work

The last submitted patch, 38: 2358065-38-d8.patch, failed testing.

tstoeckler’s picture

Status: Needs work » Needs review
FileSize
686 bytes
3.79 KB

Let's see what adding the config schema fixes.

Status: Needs review » Needs work

The last submitted patch, 40: 2358065-40-d8.patch, failed testing.

tstoeckler’s picture

Status: Needs work » Needs review
FileSize
761 bytes
3.8 KB

Wow, I sincerely hope this is green. I'm really on a streek of terrible patches...

borisson_’s picture

I think the code looks good, as far as I can see. however I think we'll also need a test for this behavior.
At least an integration test I guess.

tstoeckler’s picture

FileSize
2.02 KB
3.8 KB

Here's a test. I just added something to HighlightTest. Let me know if that's sufficient or if there should be more.

I also realized: Since Search API is now in beta, I guess we need an upgrade path, right? Is it OK to simply load and save all indexes, (which should then include the new default config) or does that automatically trigger re-indexing of the associated content?

tstoeckler’s picture

FileSize
5.81 KB

Oops, this one has the test. Interdiff was correct, but the patch in #44 was bogus.

borisson_’s picture

Look good, I we do need an upgrade path (I tend to forget about that as well). Resaving doesn't automatically trigger reindexing as far as I know, so I don't think that's enough. I think it should be enough to call $index->reindex() though.

tstoeckler’s picture

Heh, I meant it the other way around. Do we actually need to trigger reindexing? I thought highlighting was just relevant during searching.

borisson_’s picture

Oh, heh. Yeah I don't think we need to trigger reindexing. Just saving should be enough.

drunken monkey’s picture

Status: Needs review » Fixed

Wow, fantastic job, thanks a lot!
Amazingly, I don't even see a tiny nit to pick here – everything perfect.
Test is also present and passes, so: committed.
Thanks a lot, again!

And no, we don't need an update path: thanks to defaultConfiguration() the default (which is the old behavior) will automatically be applied for existing indexes; and as you correctly say, there's also no need for re-indexing as the processor doesn't even run during indexing.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

thommyboy’s picture

thanks a lot guys!