On a fresh install of 4.7b2 after a cron run the search_index table contains words of the correct length, minimum 3 characters as should be the default. However there are some bugs:

1. It indexes integers of any length. (This is minor.)
2. The search function will not find any terms which are 3 characters or less, even if they are listed in the index. It does find words of 4 characters or more. This is a critical bug, I believe.

At this point the variables minimum_word_size and remove_short have not been set and recorded in the database. However, when they are set (by changing from the default values) then search behaves as expected.

I also note that there is no longer an error message reported when "short" words are excluded from a search. (I beleive this to be desirable - has it been removed?)

I am confused by the module code because there are no references to the variable remove_short other than in the settings page. (i.e. even if it is set, it get's ignored.) Furthermore, there are instances of both variable_get('minimum_word_size',3) and variable_get('minimum_word_size',4). While I appreciate this is not necessarily incorrect, it is at least confusing!

I've got no experience with this module myself, or I'd wade in...

Comments

simon rawson’s picture

Slight update to the above issue. The search_index table does not include words of 3 letters by default. It should.

As a quick fix for the default word size issue there are two instances were the number 4 needs to be changed to the number 3.

Namely, lines 362 and 736 change variable_get('minimum_word_size',4) to variable_get('minimum_word_size',3).

I would also go suggest that line 736 should actually be variable_get('remove_short',3) - but I don't know the module well enough to say for sure.

Steven’s picture

Title: minimum word size issues » dead variable: remove_short
Status: Active » Fixed

The 3/4 was already fixed in HEAD, but the 'remove_short' variable is now simply obsolete. Its use was to prevent overly broad wildcard queries in 4.6 and earlier. Since then, wildcards have been removed.

In fact, since the HEAD improvements, any search query is valid as long as it has at least one (positive) keyword long enough. The query drupal a bb c will work (for the default of 3), but a bb c will not.

I applied a patch which removes this obsolete variable, and adds a message to the user clarifying the minimum length rule.

Anonymous’s picture

Status: Fixed » Closed (fixed)