Partial word matching [#1414838]

Just trying to get partial word matching working, I think the best way is using the ngram filter, I have read this post: http://drupal.org/node/1167494 and setup my schema.xml as below

<fieldType name="text" class="solr.TextField" indexed="true" stored="true" multiValued="true" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!--
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <!-- <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                /> -->
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
        <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> -->
        <!--[[SnowballPorterFilterFactory]]-->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

After re-indexing my content, I have a fulltext field that has values like www.someurl.com but when i search for some or some* I get no results.

Is there something I am missing?

Also, just a note I am using SOLR 3.4.0

Thanks.

Comment	File	Size	Author
#21	search_api_solr-partial_word_matching-1414838-21.patch	1.47 KB	chrisgross
#21
#15	search_api_solr-partial_word_matching-1414838-8.patch	1.62 KB	El Alemaño
#15
#14	search_api_solr-partial_word_matching_1414838_7.patch	1.13 KB	El Alemaño
#14
#13	1414838_6.patch	1.13 KB	El Alemaño
#13
#5	search_api_solr-partial_word_matching-1414838-5.patch	484 bytes	arnested
#5

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

modstore CreditAttribution: modstore commented 24 January 2012 at 02:09

Um, never mind, it seems I forgot to restart solr with the new schema. All working now!

Comment #2

modstore CreditAttribution: modstore commented 24 January 2012 at 02:10

Status:

Active

» Closed (fixed)

Comment #3

pinkonomy CreditAttribution: pinkonomy commented 20 March 2012 at 21:22

Hi,some help.Ιn which schema.xml should I add this?On apache solr server or on apache solr module?
Also where on the file should I put the above?thanks

Comment #4

modstore CreditAttribution: modstore commented 21 March 2012 at 21:38

When you setup the module it is necessary to copy the schema.xml from the module directory to your solr directory. That is the file you will need to modify.

Comment #5

arnested CreditAttribution: arnested at Reload commented 9 July 2015 at 06:25

File	Size
search_api_solr-partial_word_matching-1414838-5.patch	484 bytes

Thank you for sharing this. This was what I needed as well.

Just for others to easier spot the change in config mentioned in this issue I have attached a patch to the schema.xml distributed with search_api_solr version 7.x-1.0-rc2.

Comment #6

geezon CreditAttribution: geezon commented 2 April 2013 at 09:00

Version:	7.x-1.0-rc1	» 7.x-1.0-rc3
Status:	Closed (fixed)	» Active

Hello, I still have a problem with partial word matching in 7.19-1.0-rc3...
schema.xml and solrconfig.xml were copied then Solr (v.3.3.6) was restart but w/o success.
it can be checked here - praca.com.ua:8983/solr
What can be the reason? Thank you for any help.

Comment #7

PedroMiguel CreditAttribution: PedroMiguel commented 7 August 2013 at 13:23

#6 Try to put the
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" />
on <fieldType name="text" class="solr.TextField" indexed="true" stored="true" multiValued="true" positionIncrementGap="100"> as described on code provided by the op.

I'm also a little confused at beginning because other EdgeNgram exists on provided schema.xml, but you need to put also on there.

I'm using solr 4.4.0 on tomcat6 and work fine for me (I upgrade from 1.4).

Comment #8

PedroMiguel CreditAttribution: PedroMiguel commented 7 August 2013 at 13:30

Version:

7.x-1.0-rc3

» 7.x-1.x-dev

Comment #9

El Alemaño CreditAttribution: El Alemaño commented 27 October 2014 at 13:06

Hi PedroMiguel,
I try to do the fix you told, but is not working for me, maybe you can help me with that. Here is the schema.xml that I am using: http://hastebin.com/yomihuyefo.xml

Thanks!

Comment #10

El Alemaño CreditAttribution: El Alemaño commented 27 October 2014 at 13:12

Hi,
I also try this one: http://hastebin.com/ucodiwexoz.xml

How should be the workflow? Should I need to do something more, or just change the schema.xml and try to search a partial Word?

Thanks!

Comment #11

PedroMiguel CreditAttribution: PedroMiguel commented 27 October 2014 at 17:04

You need restart your solr and re-index before search again. With the versions and instructions above you should be ready to go.

Please note I do this a year ago and is like a set and forget thing, Don't know if any change was made on last year. But before checking versions try to clean your index's, re-index and do a search.

Comment #12

guardian87 CreditAttribution: guardian87 commented 31 October 2014 at 14:12

Dear,

I'm having the same problem when trying to enable partial word matching searches with my solr server.

My schema.xml file clearly has the ngram filter included

 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!--
 in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        
-->
<!--
 Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        
-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30"/>
</analyzer>

I use a view based on my node index search index and i have the exposed fulltext search filter that i use to search.

Any suggestions are most welcome.

Thanks in advance!

Comment #13

El Alemaño CreditAttribution: El Alemaño commented 7 November 2014 at 13:34

File	Size
1414838_6.patch	1.13 KB

Comment #14

El Alemaño CreditAttribution: El Alemaño commented 16 November 2014 at 17:56

File	Size
search_api_solr-partial_word_matching_1414838_7.patch	1.13 KB

Comment #15

El Alemaño CreditAttribution: El Alemaño commented 26 November 2014 at 12:39

File	Size
search_api_solr-partial_word_matching-1414838-8.patch	1.62 KB

Hi,
Patch #14 was not working for me, and I found this -> http://dropbucket.org/node/255. So I did a new patch. I hope now works als expected.

Comment #16

guardian87 CreditAttribution: guardian87 commented 3 December 2014 at 13:57

Dear El Alemano,

I am trying to get partial word searching to work on an apache solr 3.6.2
Does this patch work for this one as well?

I am using search_api_solr module version 7.x-1.6 and used the included solr-conf.

Any suggestions are much appreciated!

Thanks in advance!

Comment #17

drunken monkey

he/him

German

Vienna, Austria

CreditAttribution: drunken monkey commented 6 December 2014 at 08:37

I am trying to get partial word searching to work on an apache solr 3.6.2
Does this patch work for this one as well?

No, for Solr 3.x you have to make similar changes to the 3.x config files – the patch only changes the 4.x ones. Other than that, it should work exactly the same, though – just insert the new <filter> line at the end of both <analyzer> sections of the text field type, reindex – and you should be done.

Comment #18

NWOM CreditAttribution: NWOM commented 15 July 2016 at 11:21

#15 worked for 5.x as well, by adding the lines manually. Thank you!

Comment #19

sah62 CreditAttribution: sah62 commented 8 August 2016 at 18:55

I've been trying to get partial string searches working using the documentation found here:

https://www.drupal.org/node/2009760#partial-matches

No luck so far. That page describes adding a text type definition and solr.EdgeNGramFilterFactory filter to schema_extra_types.xml and doesn't mention the need to modify schema.xml at all. Yes, I'm restarting the server and re-indexing the content after. Should I modify schema.xml as described in #15 instead?

Comment #20

drunken monkey

he/him

German

Vienna, Austria

CreditAttribution: drunken monkey as a volunteer commented 22 August 2016 at 09:54

No luck so far. That page describes adding a text type definition and solr.EdgeNGramFilterFactory filter to schema_extra_types.xml and doesn't mention the need to modify schema.xml at all. Yes, I'm restarting the server and re-indexing the content after. Should I modify schema.xml as described in #15 instead?

That depends. If you're adding a separate type, you'll also have to tell the Solr module to use it, or add a new Search API type that maps to it (see search_api_solr_hook_search_api_data_type_info()).
Otherwise, yes, you'd have to modify schema.xml directly.

Comment #21

chrisgross CreditAttribution: chrisgross commented 26 April 2017 at 17:52

File	Size
search_api_solr-partial_word_matching-1414838-21.patch	1.47 KB

Here's one for 3.x, which might be useful for anyone on pantheon.

Comment #22

stijndmd CreditAttribution: stijndmd commented 22 May 2017 at 10:03

I must be missing something here. I have just made these changes in my schema.xml (patch #15) and made the same changes in the actual used schema file.

Now I am getting an unwanted result.

What is correct / desired:
=> When I search for "foo", everything containing "foo" as part of a word gets found. (fe: foobar is found)

What is incorrect / undesired:
=> When I search for "foobar", everything containing "foo", "oob", "oba", "bar" is found!

Has anyone encountered this?

Comment #23

mstrelan CreditAttribution: mstrelan commented 14 June 2017 at 06:16

@stijndmd I found that I had best results using only the first part of the patch, ie only adding it to <analyzer type="index"> rather than <analyzer type="query">.

Comment #24

stijndmd CreditAttribution: stijndmd commented 13 July 2017 at 15:02

That did the trick. Thanks a bunch @mstrelan

Comment #25

peterpearson CreditAttribution: peterpearson commented 11 January 2018 at 23:18

@mstrelan - Thank you! I had the same issue with substrings of the search term being used as search results, often before full word matches. Sure enough, removing the EdgeNGramFilterFactory from the query analyzer for the TextField fieldtype worked a charm.

Comment #26

OanaIlea CreditAttribution: OanaIlea at bio.logis Genetic Information Management GmbH commented 8 August 2019 at 11:25

Status:

Active

» Closed (outdated)

This issue was closed due to lack of activity over a long period of time. If the issue is still acute for you, feel free to reopen it and describe the current state.