This is a tough bug to describe, so it is best illustrated with an example.

As setup, do the following:

  • Run through a clean install of Drupal
  • Install the latest stable version of the Apache Solr Search Integration module (7.x-1.2 as of this post)
  • Configure Solr 3.5, and use the default config files included with the 7.x-1.2 release
  • Create a node with the title "test" and the body "test content"
  • Index the content via the "Index all queued content" button on the status page

When you execute a search for the following keywords, you get the following results:

  • content: 1 result
  • CONTENT: 1 result
  • Content: 1 result
  • CONTent: 1 result
  • ContenT: 1 result
  • contenT: no results
  • conTENT: no results

So the pattern that I see is that searches fail when they start with a lower-case letter but have an upper-case letter anywhere after it.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

cpliakas’s picture

Just some more info, I am getting the same behavior when executing the searches via the Solr admin interface, so I am shifting focus to the configs and Solr version and not as much on the Drupal code.

cpliakas’s picture

Experiencing same behavior using stock configs on Solr 3.6.2 and 4.2.1.

cpliakas’s picture

Priority: Major » Normal

So it looks like the issue is with the "splitOnCaseChange" attribute in the analyzer. Changing it to "0" resolves the issue as it doesn't split the words at the capital letters in the middle of the work. Makes sense.

The attached diff highlights this in the 3.x configs.

There are a few issues tangentially related to this, but no discussions on whether this is indeed the desired behavior by default. So, given that this was reported to us as a bug, is it "works as intended" or should we discuss changing "splitOnCaseChange" to be 0 by default?

cpliakas’s picture

Forgot to add diff ...

cpliakas’s picture

As a quick hack in case it helps anyone, a workaround in a contributed module would be to implement hook_apachesolr_query_alter() and force the keywords to be lowercase.

/**
 * Implements hook_apachesolr_query_alter().
 */
function mymodule_apachesolr_query_alter($query) {
  $query->replaceParam('q', drupal_strtolower($query->getParam('q')));
}
cpliakas’s picture

Nick_vh proposed a cool solution...

Keep the splitOnCaseChange=1 in the analyzer for the field but set it to 0 for the query's analyzer. If you had "conTent" in the source text, you could search for "content", "con", "tent", and "conTent", and they would all match the source text.

pwolanin’s picture

Using the analysis interface with Solr 3.6.2, I don't see the result you describe.

Nick_vh’s picture

I actually can confirm this is not working the way you expect

So say you have a node with BiCycle and you expect to find this also when searching for BiCycle it will not show up.

With splitOnCaseChange="1" in the query this has no results

http://localhost:8983/solr/core0/select?q=pauLatim&debugQuery=true&rows=1&qf=content&fl=id
<str name="parsedquery">
+DisjunctionMaxQuery((content:"(paulatim pau) latim")~0.01) DisjunctionMaxQuery((content:"(paulatim pau) latim"~15^2.0)~0.01)
</str>
<str name="parsedquery_toString">
+(content:"(paulatim pau) latim")~0.01 (content:"(paulatim pau) latim"~15^2.0)~0.01
</str>

With splitOnCaseChange="0" in the query it has the expected result

http://localhost:8983/solr/core0/select?q=pauLatim&debugQuery=true&rows=1&qf=content&fl=id
<str name="parsedquery">
+DisjunctionMaxQuery((content:paulatim)~0.01) DisjunctionMaxQuery((content:paulatim^2.0)~0.01)
</str>
<str name="parsedquery_toString">
+(content:paulatim)~0.01 (content:paulatim^2.0)~0.01
</str>

You can see clearly that the query that is generated is not the wanted behavior

cpliakas’s picture

Should we move this thread the the common configurations sandbox?

Nick_vh’s picture

Project: Apache Solr Search » Apache Solr Common Configurations
Version: 7.x-1.x-dev »

That's just a word! Moved

pwolanin’s picture

locally using analysis.jsp it tells me that BiCycle as both index and query for a text field matches - does it indicate that for you?

cpliakas’s picture

Just to figure out what angle you are coming from, are you debating that there is actually a problem or are you trying to use analysis.jsp to figure out the root issue? I want to make sure we are on the same plane so my response makes sense :-)

pwolanin’s picture

Status: Active » Fixed

committed

pwolanin’s picture

Status: Fixed » Active

oops, wrong issue - not fixed.

@Chris - I'm mostly trying to be sure we're on the same page. When you use the analysis tool do you see a match or not? If the analysis tool is giving a different answer than a query, we need to understand why there is a mismatch.

cpliakas’s picture

I don't remember whether the analysis tool is giving the same results. I think the results are the same, I am just not able to confirm at this time.

To me whether there is a mismatch in the analysis tool is a secondary concern. If we can apply a fix so that the end users get results they expect, then that should be our primary focus since it has the most widespread impact, whereas fixing a potential mismatch with the analysis tool doesn't really effect end users and has a very limited effect on administrators. I understand there are root causes and technical details, but I am hoping that we take in incremental approach to stop the bleeding and then resolve the root cause and ensure the foundation is in order.

Are there any drawbacks from the fixes proposed that would prevent us from adopting to stop the bleeding?

Nick_vh’s picture

Issue summary: View changes
Status: Active » Needs review
FileSize
1.71 KB

Updated patch

Nick_vh’s picture

Committed as this really makes a lot of sense. it does require re-indexing so we should make not of that when releasing this into drupal-4.2-solr-4.x and drupal-4.2-solr-3.x

Nick_vh’s picture

Status: Needs review » Fixed
cpliakas’s picture

Awesome. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.