I'm trying to get Fuzzy search implemented for the title field of a few content types.

I'm using the following preprocessors in this order:

  1. stopwords
  2. ignore case
  3. tokenizer
  4. fuzzy search search settings

All of the options on those processors are defaults. The issue I'm seeing is titles with only one word are skipped by the tokenizer and no ngrams are created. Titles with more than one word are indexed and ngrams are created properly.

I've traced the problem down to the following if statement in processor_tokenizer.inc:

if (count($arr) > 1) {
        $value = array();
        foreach ($arr as $token) {
          $value[] = array('value' => $token);
        }
      }

If I set the first line to be count($arr) > 0, the tokenizer works as I want. All the titles are indexed and ngram'd. As it stands, titles like "example" are skipped while "example 1" and "example 2" are indexed properly.

Am I missing a configuration option or is this a real bug?

Maybe this belongs in the Search API issue queue, not sure.

Comments

Neograph734’s picture

Agreed, I ran into this problem as well. But since the tokenizer processor is part of the Search API, I guess you will have better luck there.