I have used Apache Solr along with Search API Solr to search node title and content.

Previously substring search was not working for nodes which has special characters as title. For example, "A/An-A-8'-PB" was not working. but after I placed EdgeNgramFilterFactory in schema.xml. This works fine.

But in some cases, this too fails. For example, title is "A/R-PUP" and when I am searching with "r-pup". It returns no results. Upto "r-p", it works whereas when I am searching as "r-pu" or "r-pup", it returns no results.

But for the title "A/SP-PUP", when I am searching with "sp-pup". It works fine. I don't know in what case it fails.

Please guide me if anybody knows alternate solution other than EdgeNGramFilterFactory.

I tried NGramFilterFactory, but it was not working for special characters.

This is the code where I added EdgeNGramFilterFactory in schema.xml

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="0"
                splitOnCaseChange="0"
                preserveOriginal="1"/>
        <filter class="solr.LengthFilterFactory" min="2" max="100" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="30" side="front"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="30" side="back"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                splitOnCaseChange="0"
                preserveOriginal="1"/>
        <filter class="solr.LengthFilterFactory" min="2" max="100" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LengthFilterFactory" min="4" max="20" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="30" side="front"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="30" side="back"/>
      </analyzer>
    </fieldType>

Thanks in advance.

Comments

tajinder.minhas’s picture

Hi Ganesh,

Were you able to resolve this issue as i am facing the similar issue with search keyword as run returning results for running but runn not returning any result.