I was wondering why my results were totally messed up after changing to Fulltext Ngram.
After looking into the Solr config I found out that Fulltext Ngram does not tokenize like normal Fulltext:

This is the ngram type in the solr config:

<fieldType name="edge_n2_kw_text" class="solr.TextField" omitNorms="true" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

This causes an indexed string like "Welcome to my website" be indexed as "Welcome to my website" in stead of "Welcome", "to", "my", "website", which is how it would work with normal Fulltext.

I changed my config to:

    <fieldType name="edge_n2_kw_text" class="solr.TextField" omitNorms="true" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

Which fixed it. But it's still missing some other tokenizers/filters from the normal fulltext.

Is it supposed to work like this and if the current method is intended, shouldn't it be called "String Ngram"?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jeroen.b created an issue. See original summary.

mkalkbrenner’s picture

Status: Active » Needs review
FileSize
18.63 KB

Thanks for pointing out the issue.
This patch should fix the Ngram Text type and its integration. I also added a new Ngram String type :-)
A review would be welcome!

mkalkbrenner’s picture

FileSize
3.44 KB
19.27 KB

I modified the tests to not interfere with others.

  • mkalkbrenner committed 2c07dc5 on 8.x-1.x
    Issue #2896432 by mkalkbrenner, jeroen.b: Fulltext Ngram does not...
mkalkbrenner’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.