Closed (duplicate)
Project:
Porter Algorithm Search Stemmer
Version:
6.x-1.0
Component:
Miscellaneous
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
16 Jun 2009 at 08:01 UTC
Updated:
6 Jul 2009 at 22:10 UTC
Comments
Comment #1
gregglesBetter title. I don't have a strong feeling about this.
Comment #2
jhodgdonSee also #437084: Excerpt fails to find stemmed keyword and #493270: search_excerpt() doesn't work well with stemming
I think Porter Stemmer is doing the right thing, indexing the root words into the search index, and then reducing search terms to root words when searching. The Search module calls the preprocessing hook when indexing and when searching, so you have to do the same thing both times.
If you did it the other way, the search index would be many times larger. E.g. a page containing "walk" would have to be indexed under "walk", "walks", and "walking".
Comment #3
jhodgdonAlso there is no published algorithm for finding all possible derived forms of a word -- the Porter Stemmer module is stemming in the standard way (i.e. standard in the Information Retrieval industry), at least it seems so.
So I am going to go ahead and mark this as "duplicate", since the main issue reported here is that excerpts are not being found, which is covered in #493270: search_excerpt() doesn't work well with stemming.