Didn't look into this more closely, but I'm pretty sure there is a bug in search_excerpt()
:
// Locate a space before and after this match, leaving about 60
// characters of context on each end.
$before = strpos(' ' . $text, ' ', max(0, $found_position - 61));
if ($before !== FALSE && $before <= $found_position) {
$after = strrpos(' ' . $text . ' ', ' ', min($found_position + 61, strlen($text) + 1));
if ($after !== FALSE && $after > $found_position) {
I came across this myself in the past (there's even a comment about it in the Search API code): strrpos()
has rather confusing (borderline idiotic) behavior, it doesn't specify where the search starts, but where it should end. Try it yourself:
echo strrpos('01234567890', '0', 5); // 10
So the above code always assigns strlen($text) + 1
(since we append a space) to $after
. If you want to find the closest space before the specified position ($found_position + 61
), either use substr()
to truncate the text beforehand or pass a negative offset (which, even more confusingly, behaves as it should).
Comments
Comment #1
drunken monkeyThe attached patch would implement the first alternative. But since the test suite currently doesn't catch this (provided I'm not mistaken entirely), we probably need tests, too.
Comment #3
larowlanComment #4
jhodgdonHm... This patch causes the Search Excerpt test to fail.... Do you think that the Search Excerpt test has problems, or your patch?
Comment #5
drunken monkeyMy patch – it didn't handle the end of the string correctly.
Revised patch attached, also with a new test.
Comment #7
jhodgdonAh, thanks for the test! I get it now, and plus now we will not have this bug revert in the future.
So... I tested this manually: I took your test-only patch and added a debug printout of before/after -- without your patch, we are indeed getting a very long, incorrect snippet. With the patch, obviously it's working right.
So, this is definitely a necessary patch, and it is working correctly. Many thanks!
I also took a look at Drupal 7. I don't think we need to patch there -- in Drupal 7, the code is doing:
So it's looking backwards from position 80, which is close enough (the 60 characters is only an approximate goal anyway).
Comment #8
jhodgdonOh, for D7 we haven't ported the D8 updates to this function yet. I added a note about this problem to #916086: search_excerpt() doesn't highlight words that are matched via search_simplify() and linked that issue here as "related".
Comment #9
alexpottCommitted f89d31f and pushed to 8.x. Thanks!
Comment #12
RaulMuroc CreditAttribution: RaulMuroc commentedDrupal 7 first patch. I also tried to clean the vars and instead of $p use $found_position, instead of $q user $before and instead of $s/$end use $after for better understanding.
Comment #13
RaulMuroc CreditAttribution: RaulMuroc commentedComment #15
RaulMuroc CreditAttribution: RaulMuroc commentedComment #16
RaulMuroc CreditAttribution: RaulMuroc commentedComment #17
RaulMuroc CreditAttribution: RaulMuroc commentedComment #18
jhodgdonHi Raul,... Please see the comment in #7. Unless we are fixing #916086 in Drupal 7, I do not think we need to fix this problem in Drupal 7. Are you convinced there is an actual problem in the current Drupal 7 code that we need to fix?
Comment #19
RaulMuroc CreditAttribution: RaulMuroc commentedIs possible that it doesn't highlight because of incorrect strrpos?
Comment #20
jhodgdonI don't think so. The strrpos is only being used to make sure there are plenty of characters before and after the highlighted word, not to highlight the word itself. Are you seeing a highlighting problem apart from #916086: search_excerpt() doesn't highlight words that are matched via search_simplify(), which is already a separate issue? If so, please file a new issue so we can investigate. Meanwhile I'm going to close this one again...