Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
This is a tough bug to describe, so it is best illustrated with an example.
As setup, do the following:
- Run through a clean install of Drupal
- Install the latest stable version of the Apache Solr Search Integration module (7.x-1.2 as of this post)
- Configure Solr 3.5, and use the default config files included with the 7.x-1.2 release
- Create a node with the title "test" and the body "test content"
- Index the content via the "Index all queued content" button on the status page
When you execute a search for the following keywords, you get the following results:
- content: 1 result
- CONTENT: 1 result
- Content: 1 result
- CONTent: 1 result
- ContenT: 1 result
- contenT: no results
- conTENT: no results
So the pattern that I see is that searches fail when they start with a lower-case letter but have an upper-case letter anywhere after it.
Comment | File | Size | Author |
---|---|---|---|
#16 | 1984410-16.patch | 1.71 KB | Nick_vh |
#4 | split-on-case-change-1984410-3.diff | 831 bytes | cpliakas |
Comments
Comment #1
cpliakas CreditAttribution: cpliakas commentedJust some more info, I am getting the same behavior when executing the searches via the Solr admin interface, so I am shifting focus to the configs and Solr version and not as much on the Drupal code.
Comment #2
cpliakas CreditAttribution: cpliakas commentedExperiencing same behavior using stock configs on Solr 3.6.2 and 4.2.1.
Comment #3
cpliakas CreditAttribution: cpliakas commentedSo it looks like the issue is with the "splitOnCaseChange" attribute in the analyzer. Changing it to "0" resolves the issue as it doesn't split the words at the capital letters in the middle of the work. Makes sense.
The attached diff highlights this in the 3.x configs.
There are a few issues tangentially related to this, but no discussions on whether this is indeed the desired behavior by default. So, given that this was reported to us as a bug, is it "works as intended" or should we discuss changing "splitOnCaseChange" to be 0 by default?
Comment #4
cpliakas CreditAttribution: cpliakas commentedForgot to add diff ...
Comment #5
cpliakas CreditAttribution: cpliakas commentedAs a quick hack in case it helps anyone, a workaround in a contributed module would be to implement hook_apachesolr_query_alter() and force the keywords to be lowercase.
Comment #6
cpliakas CreditAttribution: cpliakas commentedNick_vh proposed a cool solution...
Keep the splitOnCaseChange=1 in the analyzer for the field but set it to 0 for the query's analyzer. If you had "conTent" in the source text, you could search for "content", "con", "tent", and "conTent", and they would all match the source text.
Comment #7
pwolanin CreditAttribution: pwolanin commentedUsing the analysis interface with Solr 3.6.2, I don't see the result you describe.
Comment #8
Nick_vhI actually can confirm this is not working the way you expect
So say you have a node with BiCycle and you expect to find this also when searching for BiCycle it will not show up.
With splitOnCaseChange="1" in the query this has no results
With splitOnCaseChange="0" in the query it has the expected result
You can see clearly that the query that is generated is not the wanted behavior
Comment #9
cpliakas CreditAttribution: cpliakas commentedShould we move this thread the the common configurations sandbox?
Comment #10
Nick_vhThat's just a word! Moved
Comment #11
pwolanin CreditAttribution: pwolanin commentedlocally using analysis.jsp it tells me that BiCycle as both index and query for a text field matches - does it indicate that for you?
Comment #12
cpliakas CreditAttribution: cpliakas commentedJust to figure out what angle you are coming from, are you debating that there is actually a problem or are you trying to use analysis.jsp to figure out the root issue? I want to make sure we are on the same plane so my response makes sense :-)
Comment #13
pwolanin CreditAttribution: pwolanin commentedcommitted
Comment #14
pwolanin CreditAttribution: pwolanin commentedoops, wrong issue - not fixed.
@Chris - I'm mostly trying to be sure we're on the same page. When you use the analysis tool do you see a match or not? If the analysis tool is giving a different answer than a query, we need to understand why there is a mismatch.
Comment #15
cpliakas CreditAttribution: cpliakas commentedI don't remember whether the analysis tool is giving the same results. I think the results are the same, I am just not able to confirm at this time.
To me whether there is a mismatch in the analysis tool is a secondary concern. If we can apply a fix so that the end users get results they expect, then that should be our primary focus since it has the most widespread impact, whereas fixing a potential mismatch with the analysis tool doesn't really effect end users and has a very limited effect on administrators. I understand there are root causes and technical details, but I am hoping that we take in incremental approach to stop the bleeding and then resolve the root cause and ensure the foundation is in order.
Are there any drawbacks from the fixes proposed that would prevent us from adopting to stop the bleeding?
Comment #16
Nick_vhUpdated patch
Comment #17
Nick_vhCommitted as this really makes a lot of sense. it does require re-indexing so we should make not of that when releasing this into drupal-4.2-solr-4.x and drupal-4.2-solr-3.x
Comment #18
Nick_vhComment #19
cpliakas CreditAttribution: cpliakas commentedAwesome. Thanks!