Modifying queries and index data using Apache Solr Search Integration hooks

Enhance the default Spell Checker - "Did You Mean..?"

Last updated on

30 April 2025

I have a content specific search engine using Solr and Nutch. I wanted my "Do you mean..?/ Spell Checker to reference the body of my crawled and indexed pages as the results instead of just the default dictionary provided. I'll give you an example:

A user type in: ideon

A default response is: idea

My configured response based on content is: Gideon (It's "Gideon" as in Gideon v. Wainwright, a legal case)

So, those wishing to use their organic crawls and index as part of the dictionary response can follow these steps to get it to work more efficiently for your needs. I would like to thank Dan on the Solr emailing list for this lesson.

1.) Let's make sure we are performing the analysis step at index time. Replace this in your default schema.xml:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
     <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory" />
          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
          <filter class="solr.LengthFilterFactory" min="4" max="20" />
          <filter class="solr.LowerCaseFilterFactory" /> 
          <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> 
     </analyzer>
</fieldType>

With this:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
    <analyzer type="index">
       <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.StandardFilterFactory"/>
    </analyzer>
    <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.StandardFilterFactory"/>
   </analyzer>
</fieldType>

2.) Add a new field for the new spelling list:

</fields>
........... other fields here ..............
  <field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true"/>
</fields>

3.) Now use a copyField in order to push your desired content into the spellchecker:

<copyField source="content" dest="spell"/>

4.) Ensure that you have spellcheck.build=true

5.) Reindex

Some Errors:

After making modifications and reboot I get:

<abortOnConfigurationError>false</abortOnConfigurationError> in solrconfig.xml ------------------------------------------------------------- org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'spell' ignoring:

Now this didn't happen to me, *cough*, but if it happens to you, just make sure you only have one instance of the:

 <field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true"/>

Help improve this page

Page status: Not set

You can:

Log in, click Edit, and edit this page
Log in, click Discuss, update the Page status value, and suggest an improvement
Log in and create a Documentation issue with your suggestion

Modifying queries and index data using Apache Solr Search Integration hooks

Enhance the default Spell Checker - "Did You Mean..?"

Help improve this page

News items

Our community

Documentation

Drupal code base

Governance of community