Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Hi. In my context the Excerpt views field is displayed only in case of a full-word search.
E.g., "...I have some apples..." will be displayed if searching "some" but not if searching "som".
The item which contains this keyword is found correctly, it just that the excerpt field itself is not displayed (returned empty).
Any ideas?
Thanks.
Comment | File | Size | Author |
---|---|---|---|
#45 | 2358065-45.patch | 5.81 KB | tstoeckler |
| |||
#26 | 2358065-26--highlight_partial_matches.patch | 3.74 KB | drunken monkey |
|
Comments
Comment #1
drunken monkeyWhat backend are you using, Solr, DB or something else? What server settings? Are you using the "Highlighting" processor?
Comment #2
Amir Simantov CreditAttribution: Amir Simantov commentedHi and thanks :)
- I am using the search_api_db.
- "Search on parts of a word" in the server settings is enabled.
- "Highlighting" in the index settings is enabled (and works). Default options under the corresponding vertical tab.
Comment #3
drunken monkeyAh, OK, that's clear then – that's something that's not supported at the moment. As the "Highlight" processor and the "Database" backend are separate systems, the former cannot know how the latter achieved a match. The processor therefore just tries to match on complete words, ignoring partial matches.
Adding a "highlight partial matches" option to the "Highlight" processor would probably be what you need, which makes this a feature request. As in the other issue, I'd happily accept a patch, but won't have time to work on this myself in the foreseeable future.
Comment #4
Amir Simantov CreditAttribution: Amir Simantov commentedThanks again. I am, however, not sure you have spotted the issue correctly. This is not a highlighting issue at all. As you describe it, it may be indeed a feature request. But this is not the issue here. The problem is that the excerpt itself, is nof displayed (disregarding highlighting).
Comment #5
drunken monkeyIf nothing could be found to highlight, then there will also not be any excerpt. This is not a bug, it's expected behavior.
You can work around it with Views, e.g., by using appropriate "No results behavior" to use the shortened body text (or something else) if there is no excerpt.
Comment #6
Amir Simantov CreditAttribution: Amir Simantov commentedOK, I better understand now how it works (or, in our case - does not work...). Thanks for the explanation!
BTW, does Solr support it?
Comment #7
drunken monkeyNo problem.
Solr is able to highlight natively, so you don't need the "Highlight" processor and will even get excerpts/highlighting for partial matches (if you have Solr configured to return those).
Comment #9
acbramley CreditAttribution: acbramley commentedThanks @drunken monkey, I got this working with your advice and a bit of rejigging:
1) Disable highlight processor on the search index
2) Ensure the following are ticked on the server: "Retrieve result data from Solr" and "Highlight retrieved data"
3) Change my view from using the Excerpt field to using "The main body text: Text (indexed)"
Now I can search for stemmed matches and I get a highlighted value when the stemmed version matches, it also means the body is still returned when there's no highlighted value.
Thanks a lot again!
Comment #10
thommyboy CreditAttribution: thommyboy commentedMay I re-open this issue as a feature request? I think it's not very "logical" that a search does return matches from a fulltext-search and afterwards does not show an excerpt because highlighting something is impossible?!
In my case I search for lets say "gliding" and it returns matches for "i like gliding" and "i like speedgliding" but the latter excerpt is not shown?
@drunkenmonkey I just had a quick look at the code and it seems you use a preg_match on word boundaries- can't this simply be changed somehow? I just replaced it with stristr which does return an excerpt (on single search terms...) and not sure if there are more things to keep in mind...
Comment #11
drunken monkeySure, we can keep it as a feature request, if you like. I don't think anyone will have the time to implement this properly, though. But I'm open for patches.
However, are you using Solr? If so, you should just use Solr's built-in highlighting, which should work as expected. (Otherwise, update to the latest dev version of the Solr module, and either comment in an existing issue or create a new one if it's still not working.)
The problem of the "Highlight" processor is that it cannot know which matching logic the backend will use, and is thus unable to decide whether to highlight matches that are parts of words. But maybe a simple option for the processor would already help there?
Comment #12
thommyboy CreditAttribution: thommyboy commentedHm- I might not realize the depths of the code behind it but there seems to be simply a pregmatch searching for the search string(s?) at word beginning and simply using stristr did somehow work in my case. This is for sure not a complete patch but maybe I find the time... Using db not Solr though
Comment #13
drunken monkeyThe problem is that this will necessarily also cause false positives, like when you search for "emo" and it highlights "tremolo". And even highlighting "emotion" might be incorrect, depending on your backend settings (regarding prefix matches).
Comment #14
thommyboy CreditAttribution: thommyboy commentedwhy should highlighting "tremolo" be wrong at least if "search on parts of a word" is activated for the server?
i think there might still be some work- e.g. I have the mentioned setting enabled and it seems it does find the content.
but for autocomplete (might be another issue though) and don't get "gleitschirmfliegen" proposed when searching for "schirm" here http://4-seasons.tv/
So the highlighting in excerps seems to have the same "restriction" like the autocomplete-suggestions (searching on word-starts only)
Comment #15
drunken monkeyYeah, I noticed that afterwards, too, sorry! I thought that feature was only a prefix search, not complete infix matching.
Anyways, the problem remains the same: we can't know how the underlying backend does its matching.
For the Autocomplete module: a) that's a completely different issue and b) I'd see it just feels much more natural there to just complete what was being typed, and not also search for an additional prefix to the input.
If you disagree with that, though, it should be pretty easy to implement a variant of the suggestion algorithm as a new "suggester" plugin.
Comment #16
jelo CreditAttribution: jelo commentedI would love to see this feature as well and like thommyboy I am not sure what the issue is. As he said, the existing highlighter processor uses a regular expression on word boundary which could be changed to search within words.
It seems to me that highlighting is independent from the actual searching, i.e. there is a backend process that determines the matches based on configuration. Then a result is returned. The processor runs on the result set and simply highlights the keywords in the text fields. If this is true, then it seems irrelevant how and if the search process functions. Logically, it would still make sense to highlight any occurrences of my keywords even if that particular instance may not have contributed to the ranking during the search.
Examples/Scenarios:
Search is configured to not search on parts of a word.
Text: Lorem ipsum dolor sit amet, nec tale quaestio instructior ea, mel in dolorem tractatos.
Keyword: Lorem
Search finds this text, ranks it and lets assume it is the best result. Would the user not expect to see LOREM and doLOREM in bold as highlighted (even though only LOREM may have been used to determine the ranking of this result)?
Search is configured to not search on parts of a word.
Text: nec tale quaestio instructior ea, mel in dolorem tractatos.
Keyword: Lorem
Now lorem is included in dolorem, but search does not rank it due to the setting of ignoring parts of words. This result is not returned, no issues for highlighting.
Search is configured to search on parts of a word.
Text: nec tale quaestio instructior ea, mel in dolorem tractatos.
Keyword: Lorem
Now this text is returned as result, but we have no excerpt which is the case we are trying to solve. dolorem should be highlighted.
Search is configured to search on parts of a word.
Text: Lorem ipsum dolor sit amet, nec tale quaestio instructior ea, mel in dolorem tractatos.
Keyword: Lorem
Search finds this text, ranks it and now the ranked result might have a higher ranking because dolorem is considered in the algorithm, i.e. both occurrences should be highlighted anyway instead of just lorem.
Case in point: shouldn't all occurrences always be highlighted, irrespective of the search settings? The setting does not state to highlight only occurrences of words that were used for the ranking and the expectation may be to actually see highlighting in pages.
Comment #17
drunken monkeyAs said:
Comment #18
Jelle_SPatch with solution as suggested in #3.
Comment #19
RKopacz CreditAttribution: RKopacz as a volunteer commented@Jelle_S, you are a gem! I just finished explaining the constraint on partial word searches, that it was currently not possible! I will test this patch out on that site, which is still in dev, and report back. Thank you!
Comment #20
Amir Simantov CreditAttribution: Amir Simantov commentedI am not sure how this patch can help here. If the excerpt does not return in the partial string, how could it be highlighted?
Comment #21
drunken monkeyGreat job, looks pretty good!
However, you can just use
!empty($this->options['highlight_partial'])
for checking for the option, and then usestr_ireplace()
with array arguments for the replacing instead ofpreg_replace()
– the latter is currently only needed because we have to check for word boundaries.But with those two changes, and if a few people report this patch working for them, I'd happily commit it. Thanks again!
Oh, just one open question, I guess, for those who plan to use this feature: would you say just the keyword (as entered by the user) should be highlighted when found, or the complete word containing it?
Comment #22
jelo CreditAttribution: jelo commentedI would vote for just the keyword, e.g. lores as keyword would display as dolores
Comment #23
RKopacz CreditAttribution: RKopacz as a volunteer commentedSpoke too soon in #19. I am using version 8. Will look to see what is involved to port this to 8.
Comment #24
graper CreditAttribution: graper as a volunteer commentedThis required another line change to allow the generation of the excerpt so it could then be highlighted.
Comment #25
graper CreditAttribution: graper as a volunteer commentedSeems that my patch included the changes from #18, so I'm hiding the older patch file to provide less confusion.
Comment #26
drunken monkeyThanks for the new revision. Had trailing spaces in one line and one extra line that was commented out, but otherwise looks good. Please also attach an interdiff in such cases, though. It makes reviewing easier.
Anyways, here is a patch with also my suggested improvements from #21. Please test/review!
Comment #27
seattlehimay CreditAttribution: seattlehimay commentedPatched my 7.x install, and everything is working as hoped. Thanks for the issue and patch!
Comment #29
drunken monkeyGreat to hear, thanks for reporting back!
Committed.
Thanks a lot again, everyone!
Comment #30
mErilainen CreditAttribution: mErilainen at Wunder commentedIs this going to be ported to D8 version? Or is there another way to achieve it?
Comment #31
drunken monkeyOh, you're right, thanks for pointing this out!
Should of course be ported to D8.
Comment #32
tstoecklerHere's a try at porting this to D8. I didn't actually try this out (will do that now). Also this is the first time I've looked at this code (both D7 and D8), so while the code did look somewhat similar it would be good for this to be reviewed thoroughly even though it's "just" a port.
Comment #34
tstoecklerOops, sorry.
Comment #35
tstoecklerAhhh, this time without PhpStorm configuration. Sorry for the noise.
Comment #38
tstoecklerSorry :-( Not my day today...
Comment #40
tstoecklerLet's see what adding the config schema fixes.
Comment #42
tstoecklerWow, I sincerely hope this is green. I'm really on a streek of terrible patches...
Comment #43
borisson_I think the code looks good, as far as I can see. however I think we'll also need a test for this behavior.
At least an integration test I guess.
Comment #44
tstoecklerHere's a test. I just added something to HighlightTest. Let me know if that's sufficient or if there should be more.
I also realized: Since Search API is now in beta, I guess we need an upgrade path, right? Is it OK to simply load and save all indexes, (which should then include the new default config) or does that automatically trigger re-indexing of the associated content?
Comment #45
tstoecklerOops, this one has the test. Interdiff was correct, but the patch in #44 was bogus.
Comment #46
borisson_Look good, I we do need an upgrade path (I tend to forget about that as well). Resaving doesn't automatically trigger reindexing as far as I know, so I don't think that's enough. I think it should be enough to call $index->reindex() though.
Comment #47
tstoecklerHeh, I meant it the other way around. Do we actually need to trigger reindexing? I thought highlighting was just relevant during searching.
Comment #48
borisson_Oh, heh. Yeah I don't think we need to trigger reindexing. Just saving should be enough.
Comment #50
drunken monkeyWow, fantastic job, thanks a lot!
Amazingly, I don't even see a tiny nit to pick here – everything perfect.
Test is also present and passes, so: committed.
Thanks a lot, again!
And no, we don't need an update path: thanks to
defaultConfiguration()
the default (which is the old behavior) will automatically be applied for existing indexes; and as you correctly say, there's also no need for re-indexing as the processor doesn't even run during indexing.Comment #52
thommyboy CreditAttribution: thommyboy commentedthanks a lot guys!