Didn't look into this more closely, but I'm pretty sure there is a bug in search_excerpt():

        // Locate a space before and after this match, leaving about 60
        // characters of context on each end.
        $before = strpos(' ' . $text, ' ', max(0, $found_position - 61));
        if ($before !== FALSE && $before <= $found_position) {
          $after = strrpos(' ' . $text . ' ', ' ', min($found_position + 61, strlen($text) + 1));
          if ($after !== FALSE && $after > $found_position) {

I came across this myself in the past (there's even a comment about it in the Search API code): strrpos() has rather confusing (borderline idiotic) behavior, it doesn't specify where the search starts, but where it should end. Try it yourself:

echo strrpos('01234567890', '0', 5); // 10

So the above code always assigns strlen($text) + 1 (since we append a space) to $after. If you want to find the closest space before the specified position ($found_position + 61), either use substr() to truncate the text beforehand or pass a negative offset (which, even more confusingly, behaves as it should).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

Status: Active » Needs review
FileSize
1023 bytes

The attached patch would implement the first alternative. But since the test suite currently doesn't catch this (provided I'm not mistaken entirely), we probably need tests, too.

Status: Needs review » Needs work

The last submitted patch, 1: 2301715-1--wrong_use_of_strrpos_offset.patch, failed testing.

larowlan’s picture

Issue tags: +Needs tests
jhodgdon’s picture

Status: Needs work » Postponed (maintainer needs more info)

Hm... This patch causes the Search Excerpt test to fail.... Do you think that the Search Excerpt test has problems, or your patch?

drunken monkey’s picture

Status: Postponed (maintainer needs more info) » Needs review
Issue tags: -Needs tests
FileSize
1.15 KB
2.68 KB

My patch – it didn't handle the end of the string correctly.
Revised patch attached, also with a new test.

The last submitted patch, 5: 2301715-5--wrong_use_of_strrpos_offset--tests_only.patch, failed testing.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

Ah, thanks for the test! I get it now, and plus now we will not have this bug revert in the future.

So... I tested this manually: I took your test-only patch and added a debug printout of before/after -- without your patch, we are indeed getting a very long, incorrect snippet. With the patch, obviously it's working right.

So, this is definitely a necessary patch, and it is working correctly. Many thanks!

I also took a look at Drupal 7. I don't think we need to patch there -- in Drupal 7, the code is doing:

          $end = substr($text . ' ', $p, 80);
          if (($s = strrpos($end, ' ')) !== FALSE) {

So it's looking backwards from position 80, which is close enough (the 60 characters is only an approximate goal anyway).

jhodgdon’s picture

Oh, for D7 we haven't ported the D8 updates to this function yet. I added a note about this problem to #916086: search_excerpt() doesn't highlight words that are matched via search_simplify() and linked that issue here as "related".

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed f89d31f and pushed to 8.x. Thanks!

  • alexpott committed f89d31f on 8.x
    Issue #2301715 by drunken monkey: Fixed search_excerpt() uses strrpos()...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

RaulMuroc’s picture

Drupal 7 first patch. I also tried to clean the vars and instead of $p use $found_position, instead of $q user $before and instead of $s/$end use $after for better understanding.

RaulMuroc’s picture

Version: 8.0.x-dev » 7.x-dev
Status: Closed (fixed) » Needs review

Status: Needs review » Needs work

The last submitted patch, 12: searchexcerpt_uses_strrpos_incorrectly-2301715-12.patch, failed testing.

RaulMuroc’s picture

Issue tags: +Needs backport to D7
RaulMuroc’s picture

RaulMuroc’s picture

Status: Needs work » Needs review
FileSize
3 KB
jhodgdon’s picture

Hi Raul,... Please see the comment in #7. Unless we are fixing #916086 in Drupal 7, I do not think we need to fix this problem in Drupal 7. Are you convinced there is an actual problem in the current Drupal 7 code that we need to fix?

RaulMuroc’s picture

Is possible that it doesn't highlight because of incorrect strrpos?

jhodgdon’s picture

Version: 7.x-dev » 8.0.x-dev
Status: Needs review » Closed (fixed)

I don't think so. The strrpos is only being used to make sure there are plenty of characters before and after the highlighted word, not to highlight the word itself. Are you seeing a highlighting problem apart from #916086: search_excerpt() doesn't highlight words that are matched via search_simplify(), which is already a separate issue? If so, please file a new issue so we can investigate. Meanwhile I'm going to close this one again...