Updated: Comment #N

Problem/Motivation

The core Search module allows the use of AND (implicit) and OR (explicit) in keyword/node search. For instance, you can search for "boy dog" and it will look for nodes with both boy and dog in them; you can search for "boy OR dog" and it will find nodes with either.

This is normally working fine, but in some cases, due to the way the "first pass" of the query works, if you use the same keyword twice in the query (which is actually a valid thing to want do), the search fails even though it shouldn't.

For example, if you create a node with the following body:

I created this page with some very interesting content in it.

Here are some search results, all for queries that should find the node -- keep in mind that AND is implied unless you explicitly type OR, and that parens are not recognized:

this
found page
this OR that
found page
this OR that this
found page
this OR that this OR foo
DID NOT FIND
this OR that this OR interesting
found page
this OR foo this OR interesting
found page
this OR foo this OR foo
found page

All of these are correct except the "this OR that this OR foo", which is not working.

Proposed resolution

Fix the SearchQuery class so that these searches all work. See comment #3 for an explanation of what the root cause of the bug is -- basically that the "first pass" query is incorrectly saying that the search doesn't work and aborting the search. The solution is to define a search with multiple OR clauses as "not simple" so that it will fall through to doing the full query (which does work correctly).

Remaining tasks

Get the patch reviewed and committed (it works and has tests).

User interface changes

None, except that more searches will work correctly.

API changes

None.

Original report by @spartlow

Search combines keywords by AND except where adjacent keywords explicitly use OR. So this search:
match1 OR miss1 match2 OR miss2
is treated as:
(match1 OR miss1) AND (match2 OR miss2)
and will return a node that has match1 and match2 in it (but happens to not have miss1 or miss2 in it).

However, if the same exact keyword is used in two OR statements such as:
match1 OR miss1 match1 OR miss2
then that same node is NOT returned. This is the bug.

Interestingly the following search WILL find that node. It's just when an additional OR is added that it fails.
match1 OR miss1 match1

Background:
I'd like to use search hooks to conditionally add certain keywords to searches. Theoretically, I'd want to do (user_key1 user_key2) OR (my_key). So instead I have to add "OR my_key" to each of the user's search keywords, which I suspect would work if it wasn't for this bug. I'm still looking for a work-around.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhodgdon’s picture

Version: 7.17 » 8.x-dev
Issue tags: +Needs backport to D7
jhodgdon’s picture

Sorry, I meant to add a comment here.

This is very odd.

I tested this in Drupal 8 with a node I created with the following body:

I created this page with some very interesting content in it.

Here are some search results:

this
found page
this OR that
found page
this OR that this
found page
this OR that this OR foo
DID NOT FIND
this OR that this OR interesting
found page
this OR foo this OR interesting
found page
this OR foo this OR foo
found page

Anyway... it's odd. The bug is confirmed and still exists in 8.x.

jhodgdon’s picture

Issue summary: View changes

Added additional info.

jhodgdon’s picture

Issue summary: View changes
Status: Active » Needs review
FileSize
4.3 KB

I think I figured out what the cause of the problem is.

During SearchQuery::parseSearchExpression(), the entered search expression is parsed into words and AND/OR query conditions.

So let's say we have a node whose text is just

foo

If we enter search expression

foo OR baz foo OR bar

which is supposed to be interpreted as

(foo OR baz) AND (foo OR bar)

This should match. And the query conditions that are created are correct.

However, during the pre-query step, SearchQuery sets up a "normalization" query to figure out whether to even run the full query. To do this, during parsing it keeps track of a list of words that have to match, and of how many matches it needs to find to even continue with the full query.

For the AND/OR case envisioned in SearchQuery, where all the words are distinct, you would make a list of 4 distinct words and know that you need to find two matches to continue (one for each of the OR pieces). But in this case, it only finds 3 distinct words (foo, baz, bar) and still thinks it needs to find 2 matches (one for each of the OR pieces). But this fails because only one of those 3 words is a match.

The only way to fix it is to not require the full number of matches to continue in the case of a combined AND/OR query. There are already member variables that check whether the query is "complex" and we can use them to decide not to impose this "we need 2 words matching to continue" criterion.

However... I tried to fix this and my fix didn't work... uploading the patch for now and maybe someone can see what is going on, or maybe I'll get back to it. Also added tests for this problem in the patch, but they do not pass locally even with the code patch. Sigh.

Status: Needs review » Needs work

The last submitted patch, 3: 1842226-firsttry.patch, failed testing.

jhodgdon’s picture

FileSize
3.79 KB

Dang, stray chunk got into that patch. Sorry. Trying again.

jhodgdon’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 5: 1842226-5.patch, failed testing.

jhodgdon’s picture

OK, that's the failing test. Now we just need to figure out how to fix the bug.

jhodgdon’s picture

Doh! I had a typo in the tests: minum vs. minim. Patch is otherwise the same.

So on my machine, the tests pass with the patch, and fail without the patch. Let's make sure the test bot agrees...

jhodgdon’s picture

Status: Needs work » Needs review

Doh again. :)

Status: Needs review » Needs work

The last submitted patch, 9: 1842226-9-test-only.patch, failed testing.

jhodgdon’s picture

Status: Needs work » Needs review

That test was supposed to fail, and it failed in the right places to illustrate the bug.

jhodgdon’s picture

9: 1842226-9.patch queued for re-testing.

jhodgdon’s picture

This still needs to be reviewed. Should be straightforward... has a test... fix is pretty simple...

jhodgdon’s picture

Issue summary: View changes

Added an issue summary.

jhodgdon’s picture

9: 1842226-9.patch queued for re-testing.

jhodgdon’s picture

9: 1842226-9.patch queued for re-testing.

The last submitted patch, 9: 1842226-9.patch, failed testing.

jhodgdon’s picture

FileSize
3.81 KB

Straight reroll.

jhodgdon’s picture

19: 1842226-18.patch queued for re-testing.

pwolanin’s picture

Status: Needs review » Needs work

This variable naming or logic doesn't make sense:

+        // If we had already found one OR, this is another one AND-ed with the
+        // first, meaning it is not a simple query.
+        if ($simple_or) {
+          $this->simple = FALSE;
+        }

If it's simple it's NOT simple?

I don't think we usually include issue numbers in comments, or core would just be a long list of them:

See issue #1842226.

jhodgdon’s picture

Assigned: Unassigned » jhodgdon

On the case...

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
6.7 KB
4.49 KB

The code was correct, but I agree with #21 that the local variable names were confusing. So I took the opportunity to go through this whole parseSearchExpression() method and clean up the comments and local variable names a bit. Hope that is OK -- there were a lot of local variables with "or" and "and" in the name that I thought were confusing.

Also took out mentions of the issue number.

pwolanin’s picture

Looks better. Seems like this might be redundant?

+        // If we had already found one OR, this is another one AND-ed with the
+        // first, meaning it is not a simple query.
+        if ($has_or) {
+          $this->simple = FALSE;
+        }

At least, the comment implies it is, given this other code:

-    if ($simple_and && $simple_or) {
+    if ($has_and && $has_or) {
       $this->simple = FALSE;
     }
jhodgdon’s picture

RE #24... no, they are actually for two separate cases.

a) Search query is something like
cat OR mouse dog OR chicken
In this case, the first step of parsing gives you something like:

array(
  array('cat', 'mouse'), // an OR
  array('dog', 'chicken'), // another OR
);

So in the lower loop, $has_or gets set to TRUE the first time through, and then this code

+        if ($has_or) {
+          $this->simple = FALSE;
+        }

the second time through says "Oh, we already had one OR, here is another one, so this is not a "simple" query.

b) A search query something like:
cat OR mouse dog chicken
which gets parsed to:

array(
  array('cat', 'mouse'), // OR
  'dog', // AND
  'chicken' // AND
);

so that get $has_or set to TRUE from the first array, and $has_and set to TRUE for the other two words. Then

   if ($has_and && $has_or) {
       $this->simple = FALSE;
     }

says "This query had both AND and OR clauses, so it's not simple".

Does that make sense?

jhodgdon’s picture

FileSize
6.59 KB

Reroll for PSR-4

hughmccracken’s picture

Re #26 - I repeated the search with the single node test page:

this
found page
this OR that
found page
this OR that this
found page
this OR that this OR foo
found page
this OR that this OR interesting
found page
this OR foo this OR interesting
found page
this OR foo this OR foo
found page

Looks good to me!

jhodgdon’s picture

Thanks for testing muzza299! Do you want to mark this "reviewed and tested by the community" then?

mgifford’s picture

I tested with the patch and without.

My only page just had "I repeated the search with the single node test page" as the body.

My query was "search OR the single OR test page" which I think should emulate the only instance that wasn't working before.

I don't know why the phrase "the search" is bold since it isn't in the search. That's there both with or without the patch.

I also really have no idea why <a title="View user profile." href="/user/1" lang="" about="/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">admin</a> - 09/23/2014 - 10:45 - 0 comments shows up, but it's in the page without the patch too.

Can you tell me where I got my search wrong. You can edit the content in both instances above for the next few hours.

jhodgdon’s picture

Thanks for testing!

You have the word "the" in your search query, as well as the word "search", which is why both words are highlighted in the results.

The "view user profile" thing is a completely separate issue: #2297711: Fix HTML escaping due to Twig autoescape

And just to clarify... This issue's bug is that if you have the same word in your query twice, results do not happen. So you'd need to do something like "search OR single search OR test". Your test query does not have the same word in there twice. I think the issue summary explains it nicely...

mgifford’s picture

I still can't replicate the bug.. Here's the same instance above (still with 4hrs left on it):

http://sa41d02fd7a6a027.s2.simplytest.me/search/node?keys=search+OR+sing...

http://sfe7665a808d27f8.s2.simplytest.me/search/node?keys=search+OR+sing...

I get the same result with or without the patch.

Please feel free to edit the content or the search query in the example above and show me what you mean..

jhodgdon’s picture

Well, I am still able to follow the steps in the issue summary and reproduce this issue. There is also a failing test

jhodgdon’s picture

Sorry, I hit save before I meant to. There is also a failing test in comment #9. I do not know what your simplytest.me sites are doing, or how they are set up, but I can still reproduce the issue by following the steps in the issue summary...

mgifford’s picture

I just loaded Core + the patch in #26 as per:

http://simplytest.me/project/drupal/8.0.x?patch[]=https://www.drupal.org...

If you're logged in you've got 6hrs to test, evaluate, share.

There's a common u/p so it is really easy to pass around and see configurations.

I've looked through the summary. I don't see steps in the form of 1, 2, 3, 4...

By using SimplyTest.me we can look at the exactly the same interface and make sure that we're talking about the same thing. They are both still active [40 minutes left] or you can set up your own.

I can't repeat it.

jhodgdon’s picture

The issue summary gives some sample content and some sample searches. If I put in that sample content, run cron to index for searching, and run the sample searches, I get the same results as reported in the issue summary.

If you are not getting these on simplytest.me, without the patch, I do not know what you did. I am still able to reproduce the original bug report without the patch.

If you made a simplytest.me with the patch, I would assume the problems are fixed with the patch -- the patch has tests with it, which pass with the patch, and I have personally tested the patch and it fixes the problem.

mgifford’s picture

As I said in #33, I tested with & without the patch.

jhodgdon’s picture

OK, I just did the following:
a) Went to simplytest.me
b) Selected Drupal Core version 8.0.x
c) Installed with Standard profile
d) Created a Basic page node with title "Page" and body "I created this page with some very interesting content in it."
e) Went to admin/config/system/cron and ran cron.
f) Went to search/node
g) Searched for
this OR that this OR foo

The node was not found in the results.

h) As a reality check, searched for:
this OR that

The node was found.

My simplytest.me will expire in 12 minutes so I will not bother to paste the URL here. But maybe you can follow those instructions and verify that ***without the patch*** this bug still exists. Then you can proceed to add the patch and follow the same sequence and verify that the search works.

Or you could read the patch and the tests that it adds. There is a test-only patch in #9 that demonstrates that the tests fail without the patch (you can still look at the results there). The latest patch (with tests and code fix) passes.

I don't know what else to suggest. The steps to reproduce the issue are in the issue summary at the top of this page...

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: +Amsterdam2014

Given the clear test case here and clear understanding of the bug, I think this is rtbc.

alexpott’s picture

Version: 8.0.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Committed ca306bf and pushed to 8.0.x. Thanks!

  • alexpott committed ca306bf on 8.0.x
    Issue #1842226 by jhodgdon | spartlow: Fixed Search OR statements don't...
jhodgdon’s picture

Assigned: jhodgdon » Unassigned

Thanks for the commit! I'm probably not going to port this to 7.x any time soon, so... any takers?

  • alexpott committed ca306bf on 8.1.x
    Issue #1842226 by jhodgdon | spartlow: Fixed Search OR statements don't...

  • alexpott committed ca306bf on 8.3.x
    Issue #1842226 by jhodgdon | spartlow: Fixed Search OR statements don't...

  • alexpott committed ca306bf on 8.3.x
    Issue #1842226 by jhodgdon | spartlow: Fixed Search OR statements don't...
Rajab Natshah’s picture

Title: Search OR statements don't work if same keyword is used » Search COR statements don't work if same keyword is used
Rajab Natshah’s picture

Title: Search COR statements don't work if same keyword is used » Search OR statements don't work if same keyword is used
aerozeppelin’s picture

Status: Patch (to be ported) » Needs review
FileSize
840 bytes
5.44 KB

Porting this to D7.

The last submitted patch, 51: test-only-fail-1842226-51.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 51: 1842226-51.patch, failed testing.

  • alexpott committed ca306bf on 8.4.x
    Issue #1842226 by jhodgdon | spartlow: Fixed Search OR statements don't...

  • alexpott committed ca306bf on 8.4.x
    Issue #1842226 by jhodgdon | spartlow: Fixed Search OR statements don't...