Enhance the default Spell Checker - "Did You Mean..?"
I have a content specific search engine using Solr and Nutch. I wanted my "Do you mean..?/ Spell Checker to reference the body of my crawled and indexed pages as the results instead of just the default dictionary provided. I'll give you an example:
A user type in: ideon
A default response is: idea
My configured response based on content is: Gideon (It's "Gideon" as in Gideon v. Wainwright, a legal case)
So, those wishing to use their organic crawls and index as part of the dictionary response can follow these steps to get it to work more efficiently for your needs. I would like to thank Dan on the Solr emailing list for this lesson.
1.) Let's make sure we are performing the analysis step at index time. Replace this in your default schema.xml:
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LengthFilterFactory" min="4" max="20" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
With this:
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
</analyzer>
</fieldType>
2.) Add a new field for the new spelling list:
</fields>
........... other fields here ..............
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true"/>
</fields>
3.) Now use a copyField in order to push your desired content into the spellchecker:
<copyField source="content" dest="spell"/>
4.) Ensure that you have spellcheck.build=true
5.) Reindex
Some Errors:
After making modifications and reboot I get:
<abortOnConfigurationError>false</abortOnConfigurationError> in solrconfig.xml ------------------------------------------------------------- org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'spell' ignoring:
Now this didn't happen to me, *cough*, but if it happens to you, just make sure you only have one instance of the:
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true"/>
Help improve this page
You can:
- Log in, click Edit, and edit this page
- Log in, click Discuss, update the Page status value, and suggest an improvement
- Log in and create a Documentation issue with your suggestion