Errors in apachesolr_search when changing schema.xml to only support whole words [#1320574]

I have been using Apache Solr Search together with Apache Solr Attachments for many days without problems until hitting this
now today on any search attempt:

warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 414.
warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 415.
warning: Illegal offset type in isset or empty in /Library/WebServer/Drupal/drupal-6.22/includes/path.inc on line 64.
warning: Illegal offset type in /Library/WebServer/Drupal/drupal-6.22/includes/path.inc on line 69.
warning: htmlspecialchars_decode() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 432.
warning: mb_strlen() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/includes/unicode.inc on line 409.
warning: htmlspecialchars() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/includes/bootstrap.inc on line 856.

Where

apachesolr_search.module

414:      $doc->created = strtotime($doc->created);
415:      $doc->changed = strtotime($doc->changed);

432:	 'title' => htmlspecialchars_decode($doc->title, ENT_QUOTES),

Today I had experimented with trying different schema.xml (see #1319516: How can i enforce whole word matching only (disable partial word and related word matching)), which may or may not be implicated.

After including the textgen field type (from the main Solr Search example) in the Drupal schema.xml, and substituting
all occurrences of type="text" with type="textgen", I reran the system, with clean Solr server start and cleared both
main Solr Index and Attachments index, and used 'Delete the index' as well as just 'Re-index'.

This seemed to work fine, although I was not happy with the hits it was giving me on words split across lines (it was giving hits on 'requirement' for search on 'require') so I decided to tune the scheme.xml to adjust a generateWordParts parameter:

        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>

I carefully reran and requested deletion and reindexing to the entire search system. But soon after running the above warning messages started occurring on every search (but note the search results are served and make sense, both content hits and attachment hits).

Then I tried reverting the schema to generateWordParts="1" (as it is in the original schema.xml) and the error still occurred. I don't think it has anything to do with it.

I have tried deleting the attachments cache also, as well as the main Drupal cache. Problem still occurs on every search.

I have nothing at all otherwise in the Drupal module system.

Until playing the schema.xml I had indexed the entire site of over 20000 nodes with 800 file attachments without any problems at all.

Now as soon as any indexing has taken place the error above occurs.

Very glad for help,

Webel

Comments

Comment #1

webel commented 25 October 2011 at 06:19

Update: I reverted to the schema.xml as distrubuted with Solr Search Integration, restarted with deletion and reindexing, and the problem vanished. I would seem to have something to do with the replacement of the type 'text' with the borrowed type 'textgen'.

From original schema.xml:

    <!-- A general unstemmed text field - good if one does not know the language of the field -->
    <fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

From schema.xml from Apache Solr Integration:

<schema name="drupal-1.4" version="1.2">
..
    <!-- A text field that uses WordDelimiterFilter to enable splitting and matching of
        words on case-change, alpha numeric boundaries, and non-alphanumeric chars,
        so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".
        Synonyms and stopwords are customized by external files, and stemming is enabled.
        Duplicate tokens at the same position (which may result from Stemmed Synonyms or
        WordDelim parts) are removed.
        -->
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Apart from the textgen type not having the stemmer, and not removing duplicates, the only difference seems to be that it has splitOnCaseChange="0" and nothing about preserveOriginal.

I realise this involves the schema.xml and knowledge of Solr, but I am still treating this as a Drupal Apache Solr Integration bug, since I can't see how these setting should be able to provoke the error I am observing.

I would be most grateful for specific help on this (and I am certain there are other Solr Integration users who would like a functioning example of how to switch off stemming, which example I could provide if we can overcome this problem).

Webel

Comment #2

webel commented 25 October 2011 at 06:23

PS: And textgen is missing the solr.MappingCharFilterFactory.

I will try another approach, I will try working from the 'text' type commenting out the solr.SnowballPorterFilterFactory bits etc., preserving the other bits.

Comment #3

webel commented 25 October 2011 at 08:54

EDIT: POSTSCRIPT: THE "SOLUTION" REFERENCED HERE COULD NOT BE REPRODUCED, SEE LATER COMMENTS

Ok I seem to have got it working, as reported as a solution to my own support request at http://drupal.org/node/1319516#comment-5158076 (#1319516: How can i enforce whole word matching only (disable partial word and related word matching))

I report this success cautiously, as I have not indexed all of my site, and it will take some days to do so.

So the culprit would seem to be either:

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

or:

<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

That is to say, they seem to be needed to avoid the error I encountered.

May I ask the maintainer(s) to please consider this before closing, I would like to understand it, and I still consider it is probably still fairly called a bug or issue.

Webel

Comment #4

webel commented 25 October 2011 at 07:31

I spoke too soon, I made some further changes to the schema.xml (to try to deal with hyphenated line broken word matches) and it's back:

    warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 414.
    warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 415.
    warning: Illegal offset type in isset or empty in /Library/WebServer/Drupal/drupal-6.22/includes/path.inc on line 64.
    warning: Illegal offset type in /Library/WebServer/Drupal/drupal-6.22/includes/path.inc on line 69.
    warning: htmlspecialchars_decode() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 432.
    warning: mb_strlen() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/includes/unicode.inc on line 409.
    warning: htmlspecialchars() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/includes/bootstrap.inc on line 856.
    warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 414.
    warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 415.
    warning: Illegal offset type in isset or empty in /Library/WebServer/Drupal/drupal-6.22/includes/path.inc on line 64.
    warning: Illegal offset type in /Library/WebServer/Drupal/drupal-6.22/includes/path.inc on line 69.
    warning: htmlspecialchars_decode() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 432.
    warning: mb_strlen() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/includes/unicode.inc on line 409.
    warning: htmlspecialchars() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/includes/bootstrap.inc on line 856.

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
<!-- ORIGINAL: 
generateWordParts="1" 
Switch OFF to prevent matches like 'require' on hiphenated line broken 'require-ment'
catenateNumbers="1"
generateNumberParts="1"
-->                
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="0"
                generateNumberParts="0"
                catenateWords="1"
                catenateNumbers="0"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!--filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/-->
        <!-- Webel: switch off Porter-stemmer algorithm to enforce whole word match -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
<!--ORIGINAL                
 generateNumberParts="1"
 catenateWords="0"
-->
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="0"
                generateNumberParts="0"
                catenateWords="1"
                catenateNumbers="0"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- ORIGINAL filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/-->
        <!-- Webel: switch off Porter-stemmer algorithm to enforce whole word match -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Note the change I made to catenateWords in the analyzer type="query, switching it from 0 to 1, to match the analyzer type="index".

The reason I am doing this (trying it rather blindly) is that I am trying to prevent hits on words broken across a line. For example, 'procure' keeps yielding hits on 'procure-ment' when the latter is spread across 2 lines and hiphenated. Sounds rather pedantic I am sure, but it what my client wants (to prevent).

I've also switched off generateNumberParts in both analyser elements.

I will revert back to the adapted schema.xml I posted at: http://drupal.org/node/1319516#comment-5158076 and see whether the problem vanishes.

Comment #5

webel commented 25 October 2011 at 08:55

Well this is fairly driving me nuts.

When running with the very schema.xml 'text' adaptation that I posted as definitely working (and I let it run indexing lots of nodes and file attachments) is now giving the original error. The results I am reporting here are clearly inconsistent, and I can't draw clear conclusions about which parameters may be causing the problem.

I am being extremely careful each time I run a new schema.xml:

- I stop the cron job.

- I stop the Solr server, and then rerun it.

- I go to the Solr admin Search index tab and I select both 'Re-index all content' and 'Delete all content' (although I probably only need to delete).

- I go to Solr admin File attachments tab and I select both 'Delete files from index' and 're-index all files' (probably redundant).

- I do not always clear the attachments extracted text cache, as I can't see how this could possibly be responsible.

- I then restart cron, or I just use manual cron invokes.

Sometimes I get the initially reported error as soon as at least a few search hits are available; sometime I do not.

And this process is seemingly erratic and it is very time consuming isolating the factors.

I managed earlier today to get the "whole word" schema.xml snippet for 'text' at http://drupal.org/node/1319516#comment-5158076 to run ok without the error, indexing lots of nodes and file attachments, no worries. But then I played with the schema.xml and since then I can't restore my success. Something is affecting the state, something somewhere gets, for want of a better expression, "stuck".

Comment #6

nick_vh

he/him

Ghent

commented 25 October 2011 at 08:27

Interesting topic, please keep us updated!

Comment #7

webel commented 25 October 2011 at 08:45

I reverted once again to the original schema.xml distributed with Solr Search Integration and it definitely did not occur. I indexed 100s of nodes and file attachments, no worries.

Then I introduced again this minimal change. Every attribute is the same as in the Drupal schema.xml, except I have commented out the stemmer part:


    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!--filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/-->
        <!-- Webel: switch off Porter-stemmer algorithm to enforce whole word match -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
				generateNumberParts="1"
                generateWordParts="1"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- ORIGINAL filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/-->
        <!-- Webel: switch off Porter-stemmer algorithm to enforce whole word match -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Note that not even the generateNumberParts or catenateWords attributes (that i had previously been playing with) are changed. And once again, after completely clean restart (see above), it fails, depending on the word.

For example, it fails for search on 'atoms' but not for search on 'atom' (and both give hits), where Did you mean: offered 'atoms'.

If fails for search on 'procure', which offers lots of file attachment hits.

For every file attachment hit displayed it gives exactly once:

warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 414.
warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 415.
warning: Illegal offset type in isset or empty in /Library/WebServer/Drupal/drupal-6.22/includes/path.inc on line 64.
warning: Illegal offset type in /Library/WebServer/Drupal/drupal-6.22/includes/path.inc on line 69.
warning: htmlspecialchars_decode() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 432.
warning: mb_strlen() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/includes/unicode.inc on line 409.
warning: htmlspecialchars() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/includes/bootstrap.inc on line 856.

I would be most grateful if somebody could try out the "whole word" schema.xml snippet above and see whether they can reproduce this bug.

Comment #8

webel commented 25 October 2011 at 08:51

@Nick_vh and maintainers.

I have spent all day on this, there is nothing more I can do myself to overcome this problem.

Could you please see whether you can get it to run with the stemmer algorithm commented out,
and please try to figure out what could be causing the warning message chain.

It is clearly a genuine bug, and I am very much in need of your help.

Webel

Comment #9

webel commented 25 October 2011 at 08:57

KNOWN FACT: the problem has not arisen once with the original schema.xml distributed with Apache Solr Search Integration, only with it slightly adapted, such as commenting out the stemmer in the 'text' field type.

Comment #10

nick_vh

he/him

Ghent

commented 25 October 2011 at 09:06

Hi webel. Thanks for your very descriptive and thorough explanation. However I'd like you to go one step further and also provide a patch that can actually prevent the errors from popping up. This would make the code more robust and allow for further customizability.

Do you think you could make it?

Comment #11

webel commented 25 October 2011 at 09:29

When the problem is present it gives hits on both regular content and attachments.

But it does not display the node link for the hits, nor the hit snippet text.

For File attachment type it does provide the mime type download link.

I therefore think it is not due to Attachments module.

Comment #12

nick_vh

he/him

Ghent

commented 25 October 2011 at 10:16

Title:

On search: warning: strtotime() expects parameter 1 to be string, array given in apachesolr_search.module on line 414

» Notices in apachesolr_search when changing schema.xml to only support whole words

Changed the title

Comment #13

webel commented 26 October 2011 at 09:24

@Nick_vh Firstly thanks for replies and for your work on this module.

You wrote:

Thanks for your very descriptive and thorough explanation.

It's not really an explanation, it's a description of the circumstances under which the problem arises.

However I'd like you to go one step further and also provide a patch that can actually prevent the errors from popping up. This would make the code more robust and allow for further customizability.

I have no insight at all into what actually causes the problem, it is extremely difficult to get diagnostics on the problem (unless I start debugging cron and down). I now need help from the authors of the code who might figure out what is causing the errors as reported.

Do you think you could make it?

No. I need the help of those already familiar with the code.

Webel

Comment #14

webel commented 27 October 2011 at 07:20

Title:

Notices in apachesolr_search when changing schema.xml to only support whole words

» Errors in apachesolr_search when changing schema.xml to only support whole words

Changed title from "Notices in apachesolr_search .." to "Errors in apachesolr_search .."

When the "warnings" appear the search results no longer display the node title links or the snippet portion with the highlighted hit(s)

Comment #15

nick_vh

he/him

Ghent

commented 30 October 2011 at 14:44

Status:

Active

» Postponed (maintainer needs more info)

@webel, I am a bit unclear of what we can do to fix it. Could you textually summarize our options here? I'll try to discuss in pwolanin and we'll come back to you. Currently I'm marking this as postponed because it is not in our current roadmap to support this level of customizations.

Comment #16

nick_vh

he/him

Ghent

commented 30 October 2011 at 14:44

Status:

Postponed (maintainer needs more info)

» Postponed

Comment #17

webel commented 31 October 2011 at 02:18

@Nick_vh

I appreciate your continued feedback.

@webel, I am a bit unclear of what we can do to fix it. Could you textually summarize our options here?

I do not understand what you mean by your "options", and I do not know how I could possibly make the situation clearer.

When the stemmer part of the schema.xml is commented out errors arise as given above.
You main option would be to find out why _simplifying_ the schema breaks the module.

This would involve comparing on your own site somewhere, if possible with Attachments, and try restarting Solr with the Porter stemmer part commented out, and reindex the site and see whether you can reproduce the error.

You could also see whether you can figure out how the errors could arise in code, i.e. why I get:

warning: strtotime() expects parameter 1 to be string, array given in /Library/WebServer/Drupal/drupal-6.22/sites/all/modules/apachesolr/apachesolr_search.module on line 414.

It's a very complex chain of events from search indexing through the search that could give rise to this, and it is extremely difficult for somebody not intimate with the code to diagnose it, no matter how capable or fluent in PHP and Drupal that outsider may be.

I'll try to discuss in pwolanin and we'll come back to you. Currently I'm marking this as postponed because it is not in our current roadmap to support this level of customisations.

This is not a high level of customisation, it is a very basic level of customisation, namely just switching off the stemmer to facilitate whole word search (as required by my client), while maintaining other benefits of Solr search such as integration with Solr Attachments, content type filters etc.

Webel

Comment #18

webel commented 31 October 2011 at 02:22

Please unpostpone, please help by diagnosing how the errors could be caused by the code.

Comment #19

nick_vh

he/him

Ghent

commented 2 November 2011 at 09:40

Category:	bug	» feature
Priority:	Major	» Normal

@webel

Since you are using Drupal 6 there is not much I personally can do for you in short term. We are planning to refactor the Drupal 6 release so it is inline with the Drupal 7 version. Changing the schema is not a small modification so the help we can give you is minimal. I'd advice you to hire a professional that can help you sort out the problems and hopefully return with a patchfile

Comment #20

nick_vh

he/him

Ghent

commented 3 November 2011 at 00:03

Status:

Postponed

» Closed (works as designed)

Suggestion :
Index the content and modify the qf parameters to use the different field :

  <!-- Unstemmed text fields for full text - the relevance of a match depends on the length of the text -->
   <dynamicField name="tus_*" type="text_und" indexed="true"  stored="true" multiValued="false" termVectors="true"/>
   <dynamicField name="tum_*" type="text_und" indexed="true"  stored="true" multiValued="true" termVectors="true"/>

So in the update index hook you add it to the document

 //$document->tus_content = 'whatever you want to be unstemmed';

And in the query alter you add a qf param and remove the ones you do not want
$query->addParam('qf', 'tus_content^40');

This way you don't need to alter the schema for unstemmed results

Comment #21

cpliakas commented 14 September 2012 at 18:59

Oooooh. The approach in #1320574-20: Errors in apachesolr_search when changing schema.xml to only support whole words is really slick! I like it. Kind of like a dynamic protwords.txt file!

Errors in apachesolr_search when changing schema.xml to only support whole words

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

News items

Our community

Documentation

Drupal code base

Governance of community