Background: The search_excerpt() function is used by node_search() to extract an excerpt/snippet showing where in your node the keywords you search for are found. It can also be called by other modules extending Search via hook_search(). The core search doesn't support keyword stemming (e.g. if you search for "work" in core search, you will not find nodes containing "works", "worked", and "working"). But you can add a module like http://drupal.org/project/porterstemmer to add stemming to Search, so that those searches will return results.

The issue: If you do use a stemming module, you'll find that search excerpts don't show you where your keyword is found, because the excerpt function is inflexible. It only searches for exact keyword matches, and there is no way for a module to modify this behavior.

How to resolve: I think what needs to be done is to add a hook to search_excerpt() that modules can use to override the search_excerpt function. This would allow issues like #437084: Excerpt fails to find stemmed keyword to be resolved in stemming modules.

This is an issue in 6.x. I've filed it against 7.x, however, because the search_excerpt function appears to be identical, and it's probably more likely to get noticed in the 7.x issue queue... hope that's OK.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

malc_b’s picture

I've worked out a fix for this. In D6 search module on line 1212 add in .'.*?' so the line becomes:

if (preg_match('/'. $boundary . $key . '.*?' . $boundary .'/iu', $text, $match, PREG_OFFSET_CAPTURE, $included[$key]))

And the same for line 1270 so that becomes

$text = preg_replace('/'. $boundary .'('. implode('|', $keys) .')'. '.*?' . $boundary .'/iu', '<strong>\0</strong>', $text);

The first says match the key plus any more characters up to a boundary. So this find the right extract with the stemmed words. The second is the fix so the whole word is made bold.

jhodgdon’s picture

I am not sure that all stems are necessarily sub-strings of the full words they match. I don't know enough about how the stemming modules work to know this for sure, but in English, you should have "like" and "liking" being equivalent (the base word is "like", which is not a substring of "liking" -- is the "stemmed" keyword "lik" or "like"? I don't know... does this work in this case?). And what about things like "person" being the singular of "people"? I don't know whether stemming modules work for this or not, but if they do, your solution would not work in this case, I think.

Also, I am not sure that all substrings should be matched for all stems. For instance, if you search for "like", the words "likely" and "likewise" should not be matched, even though they both start with "like".

So I don't know if this is a good general solution. My guess is that for a general solution, you would need a hook that the stemming module could implement to modify the excerpt in some way that is appropriate for that particular stemming algorithm.

malc_b’s picture

Good point. You are right it isn't that simple. Taking your example like, liking likely the porter/stemmer says these all have the stem of like. So all nodes with any of those 3 words get a search key of like when cron runs. Type in any of those 3 words and the search key changes to like, finds the node, but then fails at finding the extract. Stemming rules that reduce say 10 possible words to one is fair enough but to go the other is likely to make one word into 50, some of which will be nonsense i.e. nationality stems to nation, but station doesn't unstem to stationality.

Perhaps the solution is to have multiple passes at the extract shrinking the keys by the end letter each time until there is match. Or I guess the stemmer could have a function that reduces the key down to the common root, so like, liking, likely would reduce to just lik. Or rather lik.*? which would be the right search key. That's probably better.

malc_b’s picture

OK, new mod. Add the '.*?' . to lines 1212 and 1270 as in post #1. And in addition before line 1186

$workkeys = $keys;

insert this code:

  foreach($keys as $k => $key){
    search_invoke_preprocess($key);
    $keys[$k] = substr($key,0,-1);
  }

what that does is take the search keys as typed. Stem them and remove the last character. Of course it would be better if there was a proper hook and stem root function but this probably works most of the time. The stem will be the smallest word but of course like -> liking loses the e, y becomes ies etc. so just removing the last stem character probably gives the correct result most of time, perhaps all, at least in english.

jhodgdon’s picture

Also: Are there languages (German?) where some stemming might use prefixes as well as suffixes? And what about irregular forms? I don't know if stemming modules work for irregular forms, but for instance, you would want to find "person" if you search for "people" in English, man/men, woman/women, etc. In all of these cases, just accounting for preprocessed-keyword-plus-suffix matching is not going to be useful.

malc_b’s picture

You could be right, but at least the above returns more extracts that are sensible that the current method which returns next none (only if you try in a word that is also a stem, if that is possible).

jhodgdon’s picture

I still think that having a heuristic like "look for a match of all but the last character of the stemmed keyword, and don't require the match to end on a word boundary" is not really going to solve the issue for all languages. And creating a patch that doesn't really fix the issue will probably not be accepted into the core of Drupal.

However, I do think it is possible to actually solve the problem. I think what is needed is to allow modules to do their own matching on the keys, wherein they could pre-process both the text and the key with their stemming algorithm. So you would replace this line in search_excerpt()

     if (preg_match('/' . $boundary . $key . $boundary . '/iu', $text, $match, PREG_OFFSET_CAPTURE, $included[$key])) {

with something like this (obviously it would need an addional } to get the loops working correctly):

   foreach (module_implements('search_excerpt_match') as $module) {
       if( module_invoke( $module, 'search_excerpt_match', $boundary, $key, $text, $match )) {

The idea would be to allow stemming modules to see if they can find a match between the given key and the text. The first module to return TRUE would be accepted (you'd want to break out of this foreach loop), or you could do something more complex like accepting the one with the first position in the text. The return value from the hook would be TRUE/FALSE, and the $match array would be passed back by reference and give the position of the match found in the original text string, just as preg_match is currently doing with PREG_OFFSET_CAPTURE. (Though it might make sense to make the hook return something other than what preg_match would return -- this is just a concept so far.) Anyway, the Search module itself could have its own implementation of the new hook_search_excerpt_match, doing what it used to do (looking for exact keys):

search_search_excerpt_match( $boundary, $key, $text, &$match ) {
  return preg_match('/' . $boundary . $key . $boundary . '/iu', $text, $match, PREG_OFFSET_CAPTURE, $included[$key]);
}

This is not quite complete, because the keyword highlighting at the end of search_excerpt() would also need to be modified, so that the actual word matched would be highlighted, not the "key" (which might not be present in its exact form). Probably you'd want to save the actual word that was found in the $keys array, so that at the end:

  $text = preg_replace('/'. $boundary .'('. implode('|', $keys) .')'. $boundary .'/iu', '<strong>\0</strong>', $text);

would still highlight the found words, rather than the original keywords.

Anyway, I might see if I can get this working, with patches for both Porter Stemmer and core Search. I think it's at least the start of a viable idea that would actually solve the problem.

jhodgdon’s picture

Assigned: Unassigned » jhodgdon
malc_b’s picture

Feel free to look at. I agree my solution is quick and dirty and not suitable as a patch. It just improves a bad situation, for english, to a state where the error is not so noticeable.

BTW it would be useful if your patch had a D6 as well as D7 version as I'm more interested in D6.

jhodgdon’s picture

I will definitely be developing/testing in D6! The Porter Stemmer module is not out for D7 yet, though it should not be too difficult to port, since it only implements 2 hooks.

jhodgdon’s picture

Version: 7.x-dev » 6.x-dev
Status: Active » Needs review
FileSize
2.08 KB
1.51 KB
2.82 KB

I think I have it working in Drupal 6 (have temporarily set the version of this issue to D6).

Attached:
- Patch for the core Search module in D6
- Patch for the Porter Stemmer module in D6 (patch created against the 6.x development branch of Porter Stemmer -- the patch just adds a new function to the module, so you should be able to add the function to pretty much any 6.x version of Porter Stemmer).
- Patch for the D6 docs (these are in the Contrib repository)

If people can review and test these patches for Drupal 6, and if everyone likes them, I will port the search and doc patches to Drupal 7 and submit for inclusion there. I am not sure whether they will want to patch Drupal 6 or not for this issue... not sure what the policy is there.

jhodgdon’s picture

FileSize
17.47 KB

In case someone wants to test this and isn't up to speed on applying patches, the attached zip file contains a replacement search.module (put into modules/search in your Drupal installation) and a replacement porterstemmer.module (put into sites/all/modules/porterstemmer, or wherever you have your contrib modules).

ONLY FOR DRUPAL 6.x!

malc_b’s picture

OK, I'm giving this a try. Seems to work well so far.

jhodgdon’s picture

One thing I just thought of: Probably the return value of the hook should be an associative array, such as
'pos' => $p,
'keyword' => $word
rather than just a simple array ($p, $word). That would make it a bit more self-documenting.

I plan to update the patches to do that, but not for a few days (holiday time, and I'm just about to leave on a backpacking trip).

jhodgdon’s picture

Here are updated D6 patches and zip file, using an associative array. Testing and comments welcome. I'll create a Drupal 7 patch once we're happy with Drupal 6's behavior.

jhodgdon’s picture

Another thought on the Porter Stemmer component of this patch group: It should verify that the keyword found actually stems to the searched key, after doing the substring match.

Here's a new patch for the Porter Stemmer module (compatible with the other patches). And a new zip file.

jhodgdon’s picture

Version: 6.x-dev » 7.x-dev
FileSize
5.01 KB

Here's a patch for Drupal 7.

Status: Needs review » Needs work

The last submitted patch failed testing.

jhodgdon’s picture

Status: Needs work » Needs review

This is very odd. The test that failed was "module dependency", and looking at that test, I do not see how this patch could have affected this test at all. So I am assuming there was something else that caused that test to fail. Requesting re-test.

jhodgdon’s picture

Assigned: jhodgdon » Unassigned

It would be great if we could get this into Drupal 7, and it would need to be before the code freeze... guess the next step would be if someone could review this patch?

Scott Reynolds’s picture

Related: #103548: Partial Search in Drupal Core. The reason the test fail to pass now, is the search_admin_validate() function is now gone. It was replaced by a proper submit() handler.

I think my solution in that issue was considerably smaller. I would consider testing that patch and see if it achieves the issue with less code changes. It was just a lil bit of regex.

jhodgdon’s picture

N-grams are NOT the same as stemming algorithms at all. Stemming algorithms are language-specific ways to linguistically reduce a word to its basic root, which is done to both the search terms and the text, and may not result in an actual sub-string of the original words.

N-grams are blind substrings.

Both have their strengths, but they are not equivalent. If you want to use stemming, then n-grams will not do the same thing.

Scott Reynolds’s picture

sigh u missed the point...

In the lasted patch, I had to accomplish what you are trying here, meaning highlight a full word when a part of the word was in the $keys

like so

@@ -1252,7 +1280,7 @@ function search_excerpt($keys, $text) {
       }
       // Locate a keyword (position $p), then locate a space in front (position
       // $q) and behind it (position $s)
-      if (preg_match('/' . $boundary . $key . $boundary . '/iu', $text, $match, PREG_OFFSET_CAPTURE, $included[$key])) {
+      if (preg_match('/' . $boundary .'[^' . PREG_CLASS_SEARCH_EXCLUDE . PREG_CLASS_CJK . ']*' . $key . '[^' . PREG_CLASS_SEARCH_EXCLUDE . PREG_CLASS_CJK . ']*' . $boundary . '/iu', $text, $match, PREG_OFFSET_CAPTURE, $included[$key])) {
         $p = $match[0][1];
         if (($q = strpos($text, ' ', max(0, $p - 60))) !== FALSE) {
           $end = substr($text, $p, 80);
@@ -1310,7 +1338,7 @@ function search_excerpt($keys, $text) {
   $text = (isset($newranges[0]) ? '' : '... ') . implode(' ... ', $out) . ' ...';
 
   // Highlight keywords. Must be done at once to prevent conflicts ('strong' and '<strong>').
-  $text = preg_replace('/' . $boundary . '(' . implode('|', $keys) . ')' . $boundary . '/iu', '<strong>\0</strong>', $text);
+  $text = preg_replace('/' . $boundary . '[^' . PREG_CLASS_SEARCH_EXCLUDE . PREG_CLASS_CJK . ']*' . '(' . implode('|', $keys) . ')' . '[^' . PREG_CLASS_SEARCH_EXCLUDE . PREG_CLASS_CJK . ']*' . $boundary . '/iu', '<strong>\0</strong>', $text);
   return $text;
 }
N-grams are NOT the same as stemming algorithms at all. Stemming algorithms are language-specific ways to linguistically reduce a word to its basic root, which is done to both the search terms and the text, and may not result in an actual sub-string of the original words.

Sorry for not being clearer, but u should assume people are this dumb :-D. I was trying to point out I had to accomplish exactly what you have here and it did it by that Regex ^^^ which is considerably smaller then what you have here, and is utf-8 safe.

jhodgdon’s picture

The point I was making is that a "stem" as returned from a stemming algorithm is not necessarily a substring of the full word. I am not that dumb either, just unclear in my writing. :)

jhodgdon’s picture

Issue tags: +API change

Adding tag

cburschka’s picture

Version: 7.x-dev » 8.x-dev

If this really does involve an API change, we may need to push it back to D8 now, sadly...

gpk’s picture

This is rather cool.

Have sort of got it working on Drupal 6.x after a bit of hacking. I'm assuming the latest code in porterstemmer_sbp_excerpt_match() from porterstemmer 6.x-2.5 is what should be used rather than what's in porterstemmer_search_excerpt_match() from #16 above?

I hit a problem with the line
if ($foundstem == $key) {
since this test will fail if there are differences in capitalisation.

Also am I right in thinking that if an exact match for a $key is found -- in the new search_excerpt() -- then any potential matches of the stemmed $key prior to this will be missed?

[Currently I'm experimenting with a custom module which implements mymodule_preprocess_search_result() to override the default snippet - this seems to provide a practical way of getting this working in 6.x]

Thanks!

jhodgdon’s picture

Yeah, the latest code in Porter Stemmer and Search by Page can be used in combination. Search by Page invokes the hook, and Porter Stemmer implements it.

Porter Stemmer lower cases everything before it does any stemming, so maybe that takes care of the upper/lower cased issue? Not sure... And I'm not sure about your other qeustions... will have to do some thinking (another day).

gpk’s picture

>Porter Stemmer lower cases everything before it does any stemming
It looks as though the lowercasing happens in porterstemmer_search_preprocess http://drupalcode.org/viewvc/drupal/contributions/modules/porterstemmer/..., rather than in porterstemmer_stem. So $foundstem (http://drupalcode.org/viewvc/drupal/contributions/modules/porterstemmer/...) can have caps in it, as can $key. Using

if (drupal_strtolower($foundstem) == drupal_strtolower($key)) {

at line 105 fixed this problem for me. I've opened an issue against porterstemmer for this.

#850950: capitalization can cause porterstemmer_sbp_excerpt_match() to miss matches

re. the other question, yes I need to do some proper tests for this. Probably another day!!!!

jhodgdon’s picture

Good catch! Thanks, I'll take care of that over in Porter Stemmer.

andypost’s picture

Subscribe

gpk’s picture

Status: Needs review » Needs work

Thanks jhodgdon, #850950-7: capitalization can cause porterstemmer_sbp_excerpt_match() to miss matches takes care of the capitalization problem. While I was testing that I also investigated the problem I alluded to at #27 and #29 above.

A node has content "qqQqqeateat qqQqqeating qqQqqeat hello world"

When I search for "qqqqqeat" I get the following snippet:

qqQqqeateat qqQqqeating qqQqqeat hello world

(I'm using the latest code from Search by Page instead of #15/#16.)

What seems to be happening is that the exact match is taking priority, and then, provided our excerpt length is under 256 characters, and having tried any other keys, the code only looks for any *subsequent* matches against that particular key. If the node has the words in a different order "qqQqqeat qqQqqeateat qqQqqeating hello world" then I do get qqQqqeating highlighted as well i.e. the snippet is qqQqqeat qqQqqeateat qqQqqeating hello world, because having found the exact match (the "bare keyword") there is a *subsequent* valid stem match.

jhodgdon’s picture

Hmmm. Thinking about how the SBP function works, that would be the case, because I think it is only invoking the preprocessor module to find matches if it doesn't find exact matches, and as you noticed, that also applies to "well, I have one match, let's see if there's another one", which always looks between the position of the match it found and the end of the string.

I have filed this as an issue in Search by Page. Thanks for your investigations! I'll see what I can do in the next few days about fixing this up. I'm so glad someone is testing all of this besides me. :)
#882328: When finding excerpts, exact matches have priority over preprocessing matches

jhodgdon’s picture

Title: search_excerpt() doesn't work well with stemming » search_excerpt() doesn't work well with stemming, diacritical accents, etc.
Version: 8.x-dev » 7.x-dev

We need to reopen this for D7. The issue is broader than just stemming, it also happens with diacritics/accents. See
#916086: search_excerpt() doesn't highlight words that are matched via search_simplify()
#731298: Searches for words with diacritics/accents: word not highlighted in results
which I've marked as duplicates of this issue. They're relevant even without stemming problems. This needs to be fixed.

mcarbone’s picture

Title: search_excerpt() doesn't work well with stemming, diacritical accents, etc. » search_excerpt() doesn't work well with search_simplify(), stemming, and diacritical accents
Status: Needs work » Needs review
FileSize
3.44 KB

Well, then, I reattach here the patch I originally wrote for #916086: search_excerpt() doesn't highlight words that are matched via search_simplify() as it addresses search_excerpt not supporting matches made via search_simplify().

I'm not convinced that we should focus on it respecting stemmed matches, since that could be handled by the contributed modules themselves. Handling stemmed excerpts is a feature to me -- not handling search_simplify and diacritics excerpting is a bug.

Lastly, I'll reiterate my point from #731298: Searches for words with diacritics/accents: word not highlighted in results that the diacritics issue has nothing to do with search_excerpt, and everything to do with mysql collation, and so perhaps should be handled separately by stripping diacritics entirely in the index. jhodgdon, you seem to disagree since you closed it as a dupe -- can you explain? Or do you think we should do fix that problem in this thread anyway?

jhodgdon’s picture

Ah. Perhaps I should not have closed that other issue as a dup -- feel free to override me and reopen it. :)

Contributed modules, without a patch similar to ones attached above in previous comments, have no way to highlight matches, although they are using the API provided by the search module. So I think it's all part of the same picture.

Regarding the current patch, it needs more tests before I will believe that it works for diacritical marks. It seems currently to only be testing numbers, which are one facet of the problem.

Also, I'll need to read through this patch some more to understand what it's doing... One thing I noticed is that I think it assumes that search_simplify($key) results in exactly one word. That's not necessarily going to always be true.

mcarbone’s picture

Title: search_excerpt() doesn't work well with search_simplify(), stemming, and diacritical accents » search_excerpt() doesn't work well with search_simplify(), and stemming
Status: Needs review » Needs work

Yep, you're right: this doesn't handle quoted keywords correctly. I'll take a stab at that in the near future, and add a test for it as well. I'm not sure other tests are needed, because it's not as if I'm just testing numbers here -- I'm testing the use of search_simplify in general (which is tested in all of its variations elsewhere). Thus just needs to make sure that changes made by search_simplify are still excerpted.

I see your point re: contributed modules using the search API, and I think it would likely involve adding a new API call in this patch to allow other modules to have a say about excerpting. But I'm still not convinced that this should hold up the rest of this thread, which is focused on a core bug caused by search_simplify (and nothing else in core as far as I know, if you accept my diacritics argument). I don't see why a contributed module couldn't just add a new preprocess variable to search_result.tpl.php to solve this itself. Again, I'm not against adding this functionality, but I don't see it as important as the search_simplify bug fix. But when I get to the above fix, I'll take a stab at this in case it's fairly easy to do.

I re-opened #731298: Searches for words with diacritics/accents: word not highlighted in results and removed diacritics from the subject line here.

jhodgdon’s picture

Ideally, I'd like to see a test that tested highlighting of several different types of keywords that search_simplify would alter, in a larger chunk of text. If we had such a test, then we would be assured that future changes to the search module wouldn't break the desirable behavior of highlighting such things, even if it might not be totally necessary for this particular issue. More testing is good...

And yes, the patches above did introduce a new API to allow contrib modules to highlight their own words. I actually have that working using the contrib modules Search by Page and Porter Stemmer. If we did it via an API (which could be that one or maybe something simpler that just let contrib modules say "this is a variation on the keyword that should also be highlighted in the excerpt", similar to how your patch is working), then the search module could just implement the hook too, making the whole thing more modular.

As far as adding a variable to the TPL, that's a possible solution, but it would then require people to make a change in their theme's implementation of the TPL to print out that variable instead of the search excerpt calculated by the node module. So I don't think it's a very good solution myself.

And regarding multiple words, I wasn't actually referring to quoted keywords -- search_excerpt already ignores this and highlights each individual word anyway. What I was referring to was the possibility that search_simplify() could take a string like "abc,def" and return "abc def" -- i.e. replace punctuation with spaces, making one word into two words. And you'd then possibly want to highlight abc, def, and "abc,def" anywhere in the text?

mcarbone’s picture

Title: search_excerpt() doesn't work well with search_simplify(), and stemming » search_excerpt() doesn't work well with search_simplify() and stemming

Well, in any case, it does turn out that quoted keywords are a problem when search_simplify gets involved. That is, if you have "one: two" in the text and search for "one two," my above patch wasn't highlighting the phrase appropriately. I've now fixed this issue, and from some tests it looks like the "abc,def" and "abc--def" situation isn't a problem. I'll add these to the testing suite.

I've looked at your patch and I agree that ideally I should combine my patch with yours. That is, search should implement search_excerpt_match to find search_simplify() related matches. However, I do think this might involve slightly tweaking your patch to allow an array of keywords to be returned, as opposed to just one, to catch the "one: two" situation mentioned above.

Assuming the merge works well, hopefully webchick/dries will be cool with adding a new API call (since it fixes a bug and it won't break any contrib functionality). If not, it wouldn't be too hard to post a version w/o it, but it would only work with search_simplify and not contributed stemmers, etc until D8.

jhodgdon’s picture

I think that if someone searches for "one two" with quotes, the search excerpt should highlight the words one and two, and not worry about the phrase "one two" (which would be highlighted anyway). I think that's what other search engines do, and I think that's what the search excerpt function used to do (isn't it?).

mcarbone’s picture

No, it doesn't currently do that when search_simplify is modifying one of the words, at least not in my current sandbox at HEAD.

To reproduce:

1) Create a node with body: "Word follows: this"

2) Run cron

3) Search for "follows this" and the node will be returned, but with nothing highlighted. If you search for "follows: this", however, it gets highlighted.

But as I said, I believe I've already solved this, because w/o search_simplify getting involved this wouldn't be an issue.

jhodgdon’s picture

"w/o search_simplify getting involved this wouldn't be an issue" ... huh?

"I believe I've already solved this" -- how?

mcarbone’s picture

Sorry, that wasn't well put. I was just stating the obvious, which is just that search_simplify() strips out punctuation and hence searching "follows" will find "follows:" -- so core is responsible for making sure that "follows this" will highlight "follows: this".

I think I've solved it in the patch I'm working on, but I still need to add more tests and integrate it with your patch by putting it inside of search_search_excerpt_match. But I've attached the latest version if you want to check it out.

mcarbone’s picture

Status: Needs work » Needs review
FileSize
10.72 KB

OK, I've added an implementation of the search_excerpt_match hook to find matches made via search_simplify(). I ended up having to rewrite the code I originally wrote, and I think the version I have now is much more robust. Hit me on IRC if you want to discuss the algorithm.

I've also added more tests -- five fail without the hook implementation.

jhodgdon’s picture

Thanks! I'll give this a thorough review in the next day or two.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

I finally got a chance to look this over carefully (sorry about the excessively long delay).

I think this patch is solid, and it has a solid test. Thanks for all the work you did and many iterations on the patches, mcarbone!

It's an API addition, so it better get in now or we'll have to leave this bug unfixed for D7 entirely.

webchick’s picture

Version: 7.x-dev » 8.x-dev

Sorry. I really do think it's too late for this. :( API freeze was over a year ago.

I see in #34 where this was changed back to 7.x, that this fixes some problems, but doesn't explain why this needs to happen via a new hook.

#916086: search_excerpt() doesn't highlight words that are matched via search_simplify() seems to be addressing the same issue without an API change. Is it worth looking at that again?

mcarbone’s picture

This eventually needs to happen via a new hook in order to allow proper excerpting for 3rd party stemmers, but if we can't have that in 7.x, then I think we should just stick with solving the search_simplify issue. The patch here had some algorithmic improvements over the patch over there, so I'll work on a new version that solves the search_simplify issue without the API change, and then for 8.x we can add the new hook.

jhodgdon’s picture

OK. I'm not happy about it (this change was originally proposed well before the original API freeze but I couldn't get anyone to review it, as usual), but I understand.

mcarbone: I've reopened that other issue in order to work the no-API-change version for Drupal 7. Thanks for all your hard work on this patch...

mcarbone’s picture

Title: search_excerpt() doesn't work well with search_simplify() and stemming » search_excerpt() doesn't work well with stemming
Status: Reviewed & tested by the community » Active

This should get a new title then so as not to be confused with the newly reopened issue.

Laveena’s picture

Status: Active » Needs review

#17: 493270.patch queued for re-testing.

gpk’s picture

Status: Needs review » Needs work

Just wondering what the status of this issue is. I guess the most recent patch is at #44 but now that #916086: search_excerpt() doesn't highlight words that are matched via search_simplify() has gone in to 8.x (and 7.x) I guess #44 needs updating? Also is it best to leave #731298: Searches for words with diacritics/accents: word not highlighted in results in its own issue?

andypost’s picture

Category: bug » task

Suppose we should introduce a kind of hook to alter data before indexing and make the same for excerpt generation

jhodgdon’s picture

Category: task » bug

RE #52/53 - actually I think the patch on #916086: search_excerpt() doesn't highlight words that are matched via search_simplify(), which was put into both 8.x and 7.x, may have completely taken care of this issue. I haven't tested it yet -- has anyone else?

pwolanin’s picture

did it fix it, or there is still a bug here?

jhodgdon’s picture

Status: Needs work » Closed (duplicate)

It should have fixed it but there was a bug in the patch. I'm working on it. Anyway this can be marked as a duplicate.