Just trying to get partial word matching working, I think the best way is using the ngram filter, I have read this post: http://drupal.org/node/1167494 and setup my schema.xml as below

<fieldType name="text" class="solr.TextField" indexed="true" stored="true" multiValued="true" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!--
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <!-- <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                /> -->
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
        <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> -->
        <!--[[SnowballPorterFilterFactory]]-->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

After re-indexing my content, I have a fulltext field that has values like www.someurl.com but when i search for some or some* I get no results.

Is there something I am missing?

Also, just a note I am using SOLR 3.4.0

Thanks.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

modstore’s picture

Um, never mind, it seems I forgot to restart solr with the new schema. All working now!

modstore’s picture

Status: Active » Closed (fixed)
pinkonomy’s picture

Hi,some help.Ιn which schema.xml should I add this?On apache solr server or on apache solr module?
Also where on the file should I put the above?thanks

modstore’s picture

When you setup the module it is necessary to copy the schema.xml from the module directory to your solr directory. That is the file you will need to modify.

arnested’s picture

Thank you for sharing this. This was what I needed as well.

Just for others to easier spot the change in config mentioned in this issue I have attached a patch to the schema.xml distributed with search_api_solr version 7.x-1.0-rc2.

geezon’s picture

Version: 7.x-1.0-rc1 » 7.x-1.0-rc3
Status: Closed (fixed) » Active

Hello, I still have a problem with partial word matching in 7.19-1.0-rc3...
schema.xml and solrconfig.xml were copied then Solr (v.3.3.6) was restart but w/o success.
it can be checked here - praca.com.ua:8983/solr
What can be the reason? Thank you for any help.

PedroMiguel’s picture

#6 Try to put the
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
on <fieldType name="text" class="solr.TextField" indexed="true" stored="true" multiValued="true" positionIncrementGap="100"> as described on code provided by the op.

I'm also a little confused at beginning because other EdgeNgram exists on provided schema.xml, but you need to put also on there.

I'm using solr 4.4.0 on tomcat6 and work fine for me (I upgrade from 1.4).

PedroMiguel’s picture

Version: 7.x-1.0-rc3 » 7.x-1.x-dev
El Alemaño’s picture

Hi PedroMiguel,
I try to do the fix you told, but is not working for me, maybe you can help me with that. Here is the schema.xml that I am using: http://hastebin.com/yomihuyefo.xml

Thanks!

El Alemaño’s picture

Hi,
I also try this one: http://hastebin.com/ucodiwexoz.xml

How should be the workflow? Should I need to do something more, or just change the schema.xml and try to search a partial Word?

Thanks!

PedroMiguel’s picture

You need restart your solr and re-index before search again. With the versions and instructions above you should be ready to go.

Please note I do this a year ago and is like a set and forget thing, Don't know if any change was made on last year. But before checking versions try to clean your index's, re-index and do a search.

guardian87’s picture

Dear,

I'm having the same problem when trying to enable partial word matching searches with my solr server.

My schema.xml file clearly has the ngram filter included

 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!--
 in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        
-->
<!--
 Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        
-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30"/>
</analyzer>
 

I use a view based on my node index search index and i have the exposed fulltext search filter that i use to search.

Any suggestions are most welcome.

Thanks in advance!

El Alemaño’s picture

FileSize
1.13 KB
El Alemaño’s picture

El Alemaño’s picture

Hi,
Patch #14 was not working for me, and I found this -> http://dropbucket.org/node/255. So I did a new patch. I hope now works als expected.

guardian87’s picture

Dear El Alemano,

I am trying to get partial word searching to work on an apache solr 3.6.2
Does this patch work for this one as well?

I am using search_api_solr module version 7.x-1.6 and used the included solr-conf.

Any suggestions are much appreciated!

Thanks in advance!

drunken monkey’s picture

I am trying to get partial word searching to work on an apache solr 3.6.2
Does this patch work for this one as well?

No, for Solr 3.x you have to make similar changes to the 3.x config files – the patch only changes the 4.x ones. Other than that, it should work exactly the same, though – just insert the new <filter> line at the end of both <analyzer> sections of the text field type, reindex – and you should be done.

NWOM’s picture

#15 worked for 5.x as well, by adding the lines manually. Thank you!

sah62’s picture

I've been trying to get partial string searches working using the documentation found here:

https://www.drupal.org/node/2009760#partial-matches

No luck so far. That page describes adding a text type definition and solr.EdgeNGramFilterFactory filter to schema_extra_types.xml and doesn't mention the need to modify schema.xml at all. Yes, I'm restarting the server and re-indexing the content after. Should I modify schema.xml as described in #15 instead?

drunken monkey’s picture

No luck so far. That page describes adding a text type definition and solr.EdgeNGramFilterFactory filter to schema_extra_types.xml and doesn't mention the need to modify schema.xml at all. Yes, I'm restarting the server and re-indexing the content after. Should I modify schema.xml as described in #15 instead?

That depends. If you're adding a separate type, you'll also have to tell the Solr module to use it, or add a new Search API type that maps to it (see search_api_solr_hook_search_api_data_type_info()).
Otherwise, yes, you'd have to modify schema.xml directly.

chrisgross’s picture

Here's one for 3.x, which might be useful for anyone on pantheon.

stijndmd’s picture

I must be missing something here. I have just made these changes in my schema.xml (patch #15) and made the same changes in the actual used schema file.

Now I am getting an unwanted result.

What is correct / desired:
=> When I search for "foo", everything containing "foo" as part of a word gets found. (fe: foobar is found)

What is incorrect / undesired:
=> When I search for "foobar", everything containing "foo", "oob", "oba", "bar" is found!

Has anyone encountered this?

mstrelan’s picture

@stijndmd I found that I had best results using only the first part of the patch, ie only adding it to <analyzer type="index"> rather than <analyzer type="query">.

stijndmd’s picture

That did the trick. Thanks a bunch @mstrelan

peterpearson’s picture

@mstrelan - Thank you! I had the same issue with substrings of the search term being used as search results, often before full word matches. Sure enough, removing the EdgeNGramFilterFactory from the query analyzer for the TextField fieldtype worked a charm.

OanaIlea’s picture

Status: Active » Closed (outdated)

This issue was closed due to lack of activity over a long period of time. If the issue is still acute for you, feel free to reopen it and describe the current state.