Multiple content in the search result, when using multiple terms to filtering [#1863672]

When I wanted to use multiple terms to filter indexed contents on a custom search page which was created in Views, I sadly realised that one node was displayed more than once and the pagination started to behave strange.

I found related issues to the search API module:

#1734470: Views pager behaves strange with multiple-filter
#1840086: Multiple select on indexed taxonomy terms & Views items per page
#1860438: Duplicate query results causing views results with less-than configured items per page

After several days of code digging, I found if the module uses the distinct parameter in db_select in the search method of the SearchApiDbService class, we can avoid this strange behavior.

Comment	File	Size	Author
#19	1863672-19--use_distinct.patch	3.04 KB	drunken monkey
#14	search_api_test-support-for-custom-key-filters-and-listvalues-1863672-14.patch	20.74 KB	bago
#14	search_api_db-testing-support-for-custom-key-filters-and-listvalues-1863672-14.patch	5.79 KB	bago
#9	1863672-6--use_distinct.patch	1.76 KB	drunken monkey
#6	1863672-6--use_distinct.patch	1.76 KB	drunken monkey
#1	search_api_db-mutiple_nodes_in_search_results-1863672.patch	487 bytes	k.dani

Comments

Comment #1

k.dani commented 11 December 2012 at 12:58

Title:	Multiple content in result, when using multiple terms to filtering	» Multiple content in the search result, when using multiple terms to filtering
Status:	Active	» Needs review

Status	File	Size
new	search_api_db-mutiple_nodes_in_search_results-1863672.patch	487 bytes

I attached a patch which solves the problem.

Comment #1.0

agileadam

Maine, USA

commented 8 January 2013 at 20:25

Issue summary:

View changes

change the text

Comment #2

agileadam

Maine, USA

commented 8 January 2013 at 20:31

I came to the same exact solution after some more testing. I wish I would've known you already resolved the issue! I guess it was a good troubleshooting experience :)

At least now I can feel better about the solution knowing someone else came to the same one.

Thanks for posting the patch!
I've updated those other issues accordingly.

Comment #3

k.dani commented 17 January 2013 at 13:46

You are welcome! :) I hope it is useful for everybody having the same problem!

Comment #4

batdesign commented 28 March 2013 at 10:36

Anyone know if this in the dev release? The release notes only state 'Contains the latest developments of the module' which is about as useful as a chocolate teapot.

Any plans to roll this in?

Comment #5

rjacobs commented 5 April 2013 at 22:26

Version:

7.x-1.0-beta4

» 7.x-1.x-dev

@batdesign, the dev release isn't really a "release" per se, it just represents that most recent snapshot of a branch in the working git repository. Because of this the dev release notes are typically not specific to any change or update. You would need to look at the commit history in the repository to see this, such as what's listed at:

http://drupalcode.org/project/search_api_db.git/shortlog/refs/heads/7.x-1.x

Nothing related to this patch has been committed yet, and may not be until a maintainer personally reviews this or it can be confirmed as "ready to be committed" by multiple testers. I just tested this (against the most recent dev) and can confirm that it fixes the problem described, so we certainly have multiple confirming reviews, which is good. What I'm not sure about is if this patch introduces any performance issues or any problems for cases where a query may actually want duplicates to be displayed. I suppose that the use of DISTINCT in the query may be something that needs to be setup as configurable in the view before this fix could be accepted?

Perhaps someone with a bit more knowledge of this module's architecture may be able to confirm this... and we could then mark it as "ready to be committed" so it gets some attention by maintainers.

Comment #6

drunken monkey

he/him

German

Vienna, Austria

commented 25 May 2013 at 15:06

Status	File	Size
new	1863672-6--use_distinct.patch	1.76 KB

Thanks, rjacobs, for helping with this explanation! Always good to have help with such questions.
Also many thanks to k.dani, for reporting this issue and providing a great patch! I just have a few additions: a few other cases in which I think a DISTINCT would be needed. Also, for OR conditions on multi-valued fields, we don't have to join the table repeatedly. Without the DISTINCT, this still won't make results correct, but it improves performance.

So, please have a look aat the attached patch and see if it, too, works for you.
Thanks again!

Comment #7

bago commented 30 May 2013 at 10:22

I think the && $filter->getConjunction() === 'AND' is wrong because you can't know at what level of the filter hierarchy you are and you don't know if that table has been previously added for an AND or an OR.

It is safer to leave that condition as it was before.

Comment #8

drunken monkey

he/him

German

Vienna, Austria

commented 30 May 2013 at 17:34

I think the && $filter->getConjunction() === 'AND' is wrong because you can't know at what level of the filter hierarchy you are and you don't know if that table has been previously added for an AND or an OR.

Have you got an example of a query where this fails? I have one of the other variant failing, so some change is necessary in any case.

Comment #9

drunken monkey

he/him

German

Vienna, Austria

commented 30 May 2013 at 18:18

Status	File	Size
new	1863672-6--use_distinct.patch	1.76 KB

Re-post of #6 to (hopefully) get automated testing.

Comment #10

bago commented 3 June 2013 at 10:51

Have you got an example of a query where this fails? I have one of the other variant failing, so some change is necessary in any case.

I've looked at the test suite to understand how to provide a real test case but I didn't find a simple way to run a search with my custom filter.

fieldA = 1 AND fieldA = 2
is logically identical to
(fieldA = 1 OR false) AND (fieldA = 2 OR false)

In the first case when you add the second condition you find that the conjunction is AND and you add a new join. In the second case when you add the second condition you see the conjuction is OR and you reuse the table producing a different result.

Your choice to reuse or not a table should only depend on the field you are joining and not on the current filter/conjunction because you don't know who added the table or whatever could happen later.

Let's make another example:
(fieldA = 1 OR somethingelse) AND (fieldA = 2 AND somethingelse)
vs
(fieldA = 2 AND somethingelse) AND (fieldA = 1 OR somethingelse)
In the first case you add a new join, in the second case you reuse the previous table but in fact they are the same filter.

Comment #11

drunken monkey

he/him

German

Vienna, Austria

commented 3 June 2013 at 21:03

I've looked at the test suite to understand how to provide a real test case but I didn't find a simple way to run a search with my custom filter.

$q = search_api_query('YOUR_INDEX_ID');
$q->condition('my_field', 1)
$f = $q->createFilter('OR');
$f->condition('my_field2', 'foo');
$f->condition('my_field2', NULL);
$results = $q->execute();

For details, see the documentation for the SearchApiQueryInterface interface in includes/query.inc.
Please try to reproduce your theory in practice and then report back with the code and an explanation of the contents of each used field (single-/multi-valued, type).

It might very well be that you're right, but with the complexity of the DB code in the module it's hard to tell just by discussing it.

Comment #12

bago commented 4 June 2013 at 10:58

I know how to make a custom query (that's what I do in my module having the issue I described). What I don't know is how to do that in the test suite, as I see everything in search_api_db.test runs drupalGet/drupalPost and the real queries are written in the search_api_test module. I believe I have to improve the search_api_test module so to declare an index with "optional multiple list fields", and my "|CUSTOM|" like search call: right?
Or it is better that I reproduce what search_api_test module does (hook_menu/hook_entity_info/hook_entity_property_info/hook_search_api_service_info and custom SearchApiService) in search_api_db.test ? (or maybe there is a simpler way to do that?)

Update: I've currently hacked the search_api_test.module with:
- added a "|CUSTOM|*" type of query that takes a base64(serialized($keys)) input query (so you can pass whatever query you need).
- added an optional "keywords" field of type "list" and the relative form field (as a #multiple select input).
This way I can now write the needed tests in search_api_db.test.

I hope this is the right way to do this and I hope I will be able to produce a proofing test soon.

Comment #13

drunken monkey

he/him

German

Vienna, Austria

commented 4 June 2013 at 12:56

Ah, OK. Sorry, I understood you wrong previously. I actually also didn't mean that you need to write a proper test case – but I guess it would be a big Plus. I already thought that I should work more on including test cases for fixed bugs to prevent later regressions, especially for this module (where it happens too often that fixes in one part unforeseeably break a completely different part), but I rarely have the time and motivation to really do so.

Your approach seems sound, though if you are already modifying the search_api_test.module, I'd suggest copying it to this module altogether, renaming it to search_api_db_test.module and using that one in our tests instead.
Also, I'm not sure we really need the workaround with the page GETs to issue searches on the index – if we could get rid of that part, this would also be great. But that's of course not your problem, you can just add the tests like you planned to. If in doubt, just make things more flexible rather than just adding a very specific use case. (The idea with serialized keys is already a very good step, I think.)

Good luck, in any case, and thanks for your effort!

Comment #14

bago commented 4 June 2013 at 17:33

Status	File	Size
new	search_api_db-testing-support-for-custom-key-filters-and-listvalues-1863672-14.patch	5.79 KB
new	search_api_test-support-for-custom-key-filters-and-listvalues-1863672-14.patch	20.74 KB

Here is what I came with:
- a patch for the search_api_test module so to add support for custom queries and add a list typed field to the search_api_test entity
- a patch for the search_api_db.test so to add new tests proving my theory. You can see a "TODO:" comment on the failing test.
(I'm not sure the patch format is right: this comes from my subversion with "svn diff"... I can provide a different format, but not now, so I wanted to share my progress in the mean time)

I think the changes to the test module are generic enough to be committed there (maybe I should open a search_api issue for this?) as they can be used also by other backends.

To move this even further: many of the tests from search_api_db.test could be made generic and run identical by each backend (the idea is that the search api should provide tests and the backend should be able to pass them, don't you think?) Maybe an abstract class in search_api_test.module could have them and then the single backends could extend from this class (with a configurable constructor so to only test supported features).

This patch doesn't provide a fix to the issue, it just provide a proofing test. The "TODO:" marked test is the only test failing on my local copy where I applied the patch from #9 (and also #2007872: Missing results when using OR filters).

The new "|CUSTOM|" directive make the "|COMPLEX|" directive obsolete: this could be removed from the search_api_test.module if this has been written only for this backend.

Next step?

Comment #15

4 June 2013 at 17:33

Status:

Needs review

» Needs work

The last submitted patch, search_api_db-testing-support-for-custom-key-filters-and-listvalues-1863672-14.patch, failed testing.

Comment #16

drunken monkey

he/him

German

Vienna, Austria

commented 4 June 2013 at 19:49

I think the changes to the test module are generic enough to be committed there (maybe I should open a search_api issue for this?) as they can be used also by other backends.

To move this even further: many of the tests from search_api_db.test could be made generic and run identical by each backend (the idea is that the search api should provide tests and the backend should be able to pass them, don't you think?) Maybe an abstract class in search_api_test.module could have them and then the single backends could extend from this class (with a configurable constructor so to only test supported features).

Hm, yes, you're right, this could be useful for other modules, it's not very specific and it's definitely flexible.
However, as most backends integrate with some external software (Solr, Sphinx, Xapian, …) which mostly can't be tested this way, probably few to no other modules would use this.
Also, depending on an external test module has significant disadvantages when it comes to the test bot here in the issue queue. If an update should still be necessary to the test module, it wouldn't be picked up by the test bot until the next point release (which I myself only recently found out in #1993504: Test seems to use older version of a dependency).

Oh, also, I just tested and it seems it's possible to simply execute a search in the test class, no need to have the page callback for searching. So, forget all that, I'll just rewrite the test class to use no page callback at all and you can add your tests in the normal fashion, without having to change the search_api_test module.

Then we could probably also remove "|COMPLEX|" from the page callback, which would be nice, too, for the simple reason of code clarity.

Regarding your patch, there were a couple of things I noticed. I'll look into the whole thing later, though, when you can provide a complete patch, with the updated way of searching (I'll hopefully have a patch ready tomorrow for the updated test class).

+++ search_api_test.module      (copia locale)
@@ -64,7 +67,12 @@
+    $res->keywords = explode(',', $res->keywords);

This should apply array_filter(), too, or you can't have an empty array. Testing exactly that, however, would probably be interesting.
Hm, while we're at it, we might also set empty fields to NULL, so we can test NULL values, too. For indexing and searching empty fields, there have been numerous errors in the past, I think.

The best option here might be to apply all these conversions at form submit/item creation time (and, in case of the keywords, serializing them) and then only unserializing the keywords when loading.
On the other hand, it's hardly a performance problem in the case of tests, and if it works well this way, why not?

+++ search_api_test.module      (copia locale)
@@ -200,6 +234,19 @@
+function search_api_test_list_callback($data, array $options, $name, $type, $info) {
+  $name = isset($info['schema field']) ? $info['schema field'] : $name;
+  if (is_array($data)) {
+    $res = is_array($data[$name]) ? $data[$name] : explode(',',$data[$name]);
+  } else {
+    $res = is_array($data->$name) ? $data->$name : explode(',',$data->$name);
+  }
+  return $res;

Why is this necessary, shouldn't the code in search_api_test_load() suffice?

Comment #17

bago commented 5 June 2013 at 08:20

If no one but search_api_db is depending on search_api_test.module then why don't we move that module in search_api_db?

About the 2 code snippet you quoted the first one was my first attempt and it didn't work, so I moved to the second one and forgot to remove the first one.

I don't know why the load is not called, and I don't understand if that search_api_test_load is an implementation of https://api.drupal.org/api/drupal/modules!node!node.api.php/function/hoo... or not as the signature is different and the behaviour is not what the hook_load expects.

About the explode+array_filter I admit I wasn't aware of the empty string behaviour (something new to learn after 15 years of PHP ;-) )! And I agree that we should probably better serialize/unserialize the keywords property, or maybe we should serialize the whole entity using a simple 2 column table with id+data where data is the serialization of the whole entity: this way we have better control on how we want an entity to be built. BTW: All of this stuff is about the search_api_test.module: should we move part of this discussion to search_api issue queue or should we move/copy the test module to the search_api_db module? You are the owner of both project, so it should be an easy agreement ;-)

What can I do to help? Should I simply wait for you to refactor the test suite?

PS: while you're at test refactoring I think you should remove most of the "t()" calls in the test assert messages (and keep only the ones required to do the text comparisons). IIRC the testing guide says t() shouldn't be used there.

Comment #18

drunken monkey

he/him

German

Vienna, Austria

commented 6 June 2013 at 12:35

Phew. OK, it cost a lot of sweat and tears, but I managed to refactor the test suite. Came across a nasty little bug in the Search API in the process, which was the main reason I didn't manage to finish this yesterday. That and a very, very stupid coding mistake I should have had figured out in a minute.
Anyways, please see #2013609: Change the workflow of the module's tests for the refactoring, and for a list of issues it depends on.
I also moved our Search API-related discussion to #2013581-5: Add multi-valued field to test module – you're right, that doesn't really concern this issue.

PS: while you're at test refactoring I think you should remove most of the "t()" calls in the test assert messages (and keep only the ones required to do the text comparisons). IIRC the testing guide says t() shouldn't be used there.

Ah, thanks, I didn't know that! Or I did and then forgot again, somehow this sounds familiar … It makes sense, in any case.
Anyways, I now removed the unnecessary t() calls. Quite some work …

(Update on the issue at hand will follow shortly …)

Comment #19

drunken monkey

he/him

German

Vienna, Austria

commented 6 June 2013 at 14:24

Status:

Needs work

» Needs review

Status	File	Size
new	1863672-19--use_distinct.patch	3.04 KB

OK, I could see the tests failing, you were right. This really is a problem for nested filters, since we base our decission on joining on whether there was a join already for the whole query, not just for the conditions on this particular filter.
The attached patch should finally fix this. Just removing the && $filter->getConjunction() === 'AND' would have apparently worked, too, but would have performed significantly worse (in my tests up to 25% slower) due to unnecessary joins.
Please test/review if the patch now works as intended! (The tests in #2013609-5: Change the workflow of the module's tests all pass now, in any case.)

Comment #20

6 June 2013 at 14:26

Status:

Needs review

» Needs work

The last submitted patch, 1863672-19--use_distinct.patch, failed testing.

Comment #21

bago commented 6 June 2013 at 16:00

I'm not sure I will be able to test this before the next week as it involve applying a lot of patches to search_api/search_api_db. I hope to see a new search_api release with #2013581: Add multi-valued field to test module and #2012706: Fix $reset parameter for load functions soon ;-) so that we can see tests running/passing here.

I reviewed the patch and it makes sense: i fear that this final patch will bring back my performance problems #2007874] that I solved using more conditions in the "ON" clause, but we'll see that once I put it in production the whole patched thing.

Thank you for the great work you did!

Comment #22

drunken monkey

he/him

German

Vienna, Austria

commented 7 June 2013 at 13:57

I reviewed the patch and it makes sense: i fear that this final patch will bring back my performance problems #2007874: Performance issues with multivalued (list) fields and multiple OR filter that I solved using more conditions in the "ON" clause, but we'll see that once I put it in production the whole patched thing.

Hm, it should at least be a lot better than without the patch. So, yes, please just see how it goes and then maybe update your patch there. But we should naturally first make it correct, before we make it fast(er).

Thank you for the great work you did!

Thank you for your meticulous testing and review of the patch! It's always great to get such accurate feedback.

Currently, there is something severely wrong with the test bot for this module anyhow. See the failing test here (and in all other issues): it simply fails to enable the required modules and terminates.
However, I don't think we need to wait for the next Search API release. I already ran the tests locally and it came out fine. I'd just let the patch lie here for a few days and then, if no-one objects, commit it. A new Search API release probably won't be created until the end of the month, and I'd like to have this committed before that.

Comment #23

bago commented 7 June 2013 at 14:58

I've built search_api and search_api_db starting from the latest devs and applying the following patches:
#2012706: Fix $reset parameter for load functions
./search_api/2012706-1--reset_parameter_for_load_functions.patch
#2013581: Add multi-valued field to test module
./search_api/2013581-6--test_entity_keywords_field.patch
#1863672: Multiple content in the search result, when using multiple terms to filtering
./search_api_db/1863672-19--use_distinct.patch
#2013609: Change the workflow of the module's tests
./search_api_db/2013609-5--test_overhaul--do-not-test.patch
#2012688: Fix left-over data when executing two searches in a row
./search_api_db/2012688-1--reset_ignored_and_warnings.patch

The test suite passes here, too.

I tested the modules in dev environment and it worked fine, so I just pushed to 1 production server.
I also tested the query that was hanging my server in #2007874: Performance issues with multivalued (list) fields and multiple OR filter and it is working (1-2 seconds).

Comment #24

bago commented 7 June 2013 at 15:00

Status:

Needs work

» Needs review

Comment #25

drunken monkey

he/him

German

Vienna, Austria

commented 7 June 2013 at 15:13

Excellent, thanks again! Good to know.

Comment #26

bago commented 13 June 2013 at 16:44

Status:

Needs review

» Reviewed & tested by the community

1 week in production and everything is fine.

Comment #27

drunken monkey

he/him

German

Vienna, Austria

commented 14 June 2013 at 13:34

Status:

Reviewed & tested by the community

» Fixed

Excellent, good to hear. Thanks again for your help!
Committed.

Comment #28

drunken monkey

he/him

German

Vienna, Austria

commented 14 June 2013 at 14:00

Excellent, good to hear. Thanks again for your help!
Committed.

Comment #29

28 June 2013 at 14:00

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Comment #29.0

(not verified) commented 28 June 2013 at 14:00

Issue summary:

View changes

Updates links to other issues to use project issue automatic linking instead of absolute URLs.

Multiple content in the search result, when using multiple terms to filtering

Comments

Comment #1

Comment #1.0

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

Comment #26

Comment #27

Comment #28

Comment #29

Comment #29.0

Referenced by

News items

Our community

Documentation

Drupal code base

Governance of community