Closed (fixed)
Project:
Search API Database Search
Version:
7.x-1.x-dev
Component:
Code
Priority:
Major
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
11 Dec 2012 at 12:48 UTC
Updated:
28 Jun 2013 at 14:00 UTC
Jump to comment: Most recent file
Comments
Comment #1
k.dani commentedI attached a patch which solves the problem.
Comment #1.0
agileadamchange the text
Comment #2
agileadamI came to the same exact solution after some more testing. I wish I would've known you already resolved the issue! I guess it was a good troubleshooting experience :)
At least now I can feel better about the solution knowing someone else came to the same one.
Thanks for posting the patch!
I've updated those other issues accordingly.
Comment #3
k.dani commentedYou are welcome! :) I hope it is useful for everybody having the same problem!
Comment #4
batdesign commentedAnyone know if this in the dev release? The release notes only state 'Contains the latest developments of the module' which is about as useful as a chocolate teapot.
Any plans to roll this in?
Comment #5
rjacobs commented@batdesign, the dev release isn't really a "release" per se, it just represents that most recent snapshot of a branch in the working git repository. Because of this the dev release notes are typically not specific to any change or update. You would need to look at the commit history in the repository to see this, such as what's listed at:
http://drupalcode.org/project/search_api_db.git/shortlog/refs/heads/7.x-1.x
Nothing related to this patch has been committed yet, and may not be until a maintainer personally reviews this or it can be confirmed as "ready to be committed" by multiple testers. I just tested this (against the most recent dev) and can confirm that it fixes the problem described, so we certainly have multiple confirming reviews, which is good. What I'm not sure about is if this patch introduces any performance issues or any problems for cases where a query may actually want duplicates to be displayed. I suppose that the use of DISTINCT in the query may be something that needs to be setup as configurable in the view before this fix could be accepted?
Perhaps someone with a bit more knowledge of this module's architecture may be able to confirm this... and we could then mark it as "ready to be committed" so it gets some attention by maintainers.
Comment #6
drunken monkeyThanks, rjacobs, for helping with this explanation! Always good to have help with such questions.
Also many thanks to k.dani, for reporting this issue and providing a great patch! I just have a few additions: a few other cases in which I think a
DISTINCTwould be needed. Also, forORconditions on multi-valued fields, we don't have to join the table repeatedly. Without theDISTINCT, this still won't make results correct, but it improves performance.So, please have a look aat the attached patch and see if it, too, works for you.
Thanks again!
Comment #7
bago commentedI think the && $filter->getConjunction() === 'AND' is wrong because you can't know at what level of the filter hierarchy you are and you don't know if that table has been previously added for an AND or an OR.
It is safer to leave that condition as it was before.
Comment #8
drunken monkeyHave you got an example of a query where this fails? I have one of the other variant failing, so some change is necessary in any case.
Comment #9
drunken monkeyRe-post of #6 to (hopefully) get automated testing.
Comment #10
bago commentedI've looked at the test suite to understand how to provide a real test case but I didn't find a simple way to run a search with my custom filter.
fieldA = 1 AND fieldA = 2
is logically identical to
(fieldA = 1 OR false) AND (fieldA = 2 OR false)
In the first case when you add the second condition you find that the conjunction is AND and you add a new join. In the second case when you add the second condition you see the conjuction is OR and you reuse the table producing a different result.
Your choice to reuse or not a table should only depend on the field you are joining and not on the current filter/conjunction because you don't know who added the table or whatever could happen later.
Let's make another example:
(fieldA = 1 OR somethingelse) AND (fieldA = 2 AND somethingelse)
vs
(fieldA = 2 AND somethingelse) AND (fieldA = 1 OR somethingelse)
In the first case you add a new join, in the second case you reuse the previous table but in fact they are the same filter.
Comment #11
drunken monkeyFor details, see the documentation for the
SearchApiQueryInterfaceinterface inincludes/query.inc.Please try to reproduce your theory in practice and then report back with the code and an explanation of the contents of each used field (single-/multi-valued, type).
It might very well be that you're right, but with the complexity of the DB code in the module it's hard to tell just by discussing it.
Comment #12
bago commentedI know how to make a custom query (that's what I do in my module having the issue I described). What I don't know is how to do that in the test suite, as I see everything in search_api_db.test runs drupalGet/drupalPost and the real queries are written in the search_api_test module. I believe I have to improve the search_api_test module so to declare an index with "optional multiple list fields", and my "|CUSTOM|" like search call: right?
Or it is better that I reproduce what search_api_test module does (hook_menu/hook_entity_info/hook_entity_property_info/hook_search_api_service_info and custom SearchApiService) in search_api_db.test ? (or maybe there is a simpler way to do that?)
Update: I've currently hacked the search_api_test.module with:
- added a "|CUSTOM|*" type of query that takes a base64(serialized($keys)) input query (so you can pass whatever query you need).
- added an optional "keywords" field of type "list" and the relative form field (as a #multiple select input).
This way I can now write the needed tests in search_api_db.test.
I hope this is the right way to do this and I hope I will be able to produce a proofing test soon.
Comment #13
drunken monkeyAh, OK. Sorry, I understood you wrong previously. I actually also didn't mean that you need to write a proper test case – but I guess it would be a big Plus. I already thought that I should work more on including test cases for fixed bugs to prevent later regressions, especially for this module (where it happens too often that fixes in one part unforeseeably break a completely different part), but I rarely have the time and motivation to really do so.
Your approach seems sound, though if you are already modifying the
search_api_test.module, I'd suggest copying it to this module altogether, renaming it tosearch_api_db_test.moduleand using that one in our tests instead.Also, I'm not sure we really need the workaround with the page GETs to issue searches on the index – if we could get rid of that part, this would also be great. But that's of course not your problem, you can just add the tests like you planned to. If in doubt, just make things more flexible rather than just adding a very specific use case. (The idea with serialized
keysis already a very good step, I think.)Good luck, in any case, and thanks for your effort!
Comment #14
bago commentedHere is what I came with:
- a patch for the search_api_test module so to add support for custom queries and add a list typed field to the search_api_test entity
- a patch for the search_api_db.test so to add new tests proving my theory. You can see a "TODO:" comment on the failing test.
(I'm not sure the patch format is right: this comes from my subversion with "svn diff"... I can provide a different format, but not now, so I wanted to share my progress in the mean time)
I think the changes to the test module are generic enough to be committed there (maybe I should open a search_api issue for this?) as they can be used also by other backends.
To move this even further: many of the tests from search_api_db.test could be made generic and run identical by each backend (the idea is that the search api should provide tests and the backend should be able to pass them, don't you think?) Maybe an abstract class in search_api_test.module could have them and then the single backends could extend from this class (with a configurable constructor so to only test supported features).
This patch doesn't provide a fix to the issue, it just provide a proofing test. The "TODO:" marked test is the only test failing on my local copy where I applied the patch from #9 (and also #2007872: Missing results when using OR filters).
The new "|CUSTOM|" directive make the "|COMPLEX|" directive obsolete: this could be removed from the search_api_test.module if this has been written only for this backend.
Next step?
Comment #16
drunken monkeyHm, yes, you're right, this could be useful for other modules, it's not very specific and it's definitely flexible.
However, as most backends integrate with some external software (Solr, Sphinx, Xapian, …) which mostly can't be tested this way, probably few to no other modules would use this.
Also, depending on an external test module has significant disadvantages when it comes to the test bot here in the issue queue. If an update should still be necessary to the test module, it wouldn't be picked up by the test bot until the next point release (which I myself only recently found out in #1993504: Test seems to use older version of a dependency).
Oh, also, I just tested and it seems it's possible to simply execute a search in the test class, no need to have the page callback for searching. So, forget all that, I'll just rewrite the test class to use no page callback at all and you can add your tests in the normal fashion, without having to change the search_api_test module.
Then we could probably also remove "|COMPLEX|" from the page callback, which would be nice, too, for the simple reason of code clarity.
Regarding your patch, there were a couple of things I noticed. I'll look into the whole thing later, though, when you can provide a complete patch, with the updated way of searching (I'll hopefully have a patch ready tomorrow for the updated test class).
This should apply
array_filter(), too, or you can't have an empty array. Testing exactly that, however, would probably be interesting.Hm, while we're at it, we might also set empty fields to
NULL, so we can testNULLvalues, too. For indexing and searching empty fields, there have been numerous errors in the past, I think.The best option here might be to apply all these conversions at form submit/item creation time (and, in case of the keywords, serializing them) and then only unserializing the keywords when loading.
On the other hand, it's hardly a performance problem in the case of tests, and if it works well this way, why not?
Why is this necessary, shouldn't the code in
search_api_test_load()suffice?Comment #17
bago commentedIf no one but search_api_db is depending on search_api_test.module then why don't we move that module in search_api_db?
About the 2 code snippet you quoted the first one was my first attempt and it didn't work, so I moved to the second one and forgot to remove the first one.
I don't know why the load is not called, and I don't understand if that search_api_test_load is an implementation of https://api.drupal.org/api/drupal/modules!node!node.api.php/function/hoo... or not as the signature is different and the behaviour is not what the hook_load expects.
About the explode+array_filter I admit I wasn't aware of the empty string behaviour (something new to learn after 15 years of PHP ;-) )! And I agree that we should probably better serialize/unserialize the keywords property, or maybe we should serialize the whole entity using a simple 2 column table with id+data where data is the serialization of the whole entity: this way we have better control on how we want an entity to be built. BTW: All of this stuff is about the search_api_test.module: should we move part of this discussion to search_api issue queue or should we move/copy the test module to the search_api_db module? You are the owner of both project, so it should be an easy agreement ;-)
What can I do to help? Should I simply wait for you to refactor the test suite?
PS: while you're at test refactoring I think you should remove most of the "t()" calls in the test assert messages (and keep only the ones required to do the text comparisons). IIRC the testing guide says t() shouldn't be used there.
Comment #18
drunken monkeyPhew. OK, it cost a lot of sweat and tears, but I managed to refactor the test suite. Came across a nasty little bug in the Search API in the process, which was the main reason I didn't manage to finish this yesterday. That and a very, very stupid coding mistake I should have had figured out in a minute.
Anyways, please see #2013609: Change the workflow of the module's tests for the refactoring, and for a list of issues it depends on.
I also moved our Search API-related discussion to #2013581-5: Add multi-valued field to test module – you're right, that doesn't really concern this issue.
Ah, thanks, I didn't know that! Or I did and then forgot again, somehow this sounds familiar … It makes sense, in any case.
Anyways, I now removed the unnecessary
t()calls. Quite some work …(Update on the issue at hand will follow shortly …)
Comment #19
drunken monkeyOK, I could see the tests failing, you were right. This really is a problem for nested filters, since we base our decission on joining on whether there was a join already for the whole query, not just for the conditions on this particular filter.
The attached patch should finally fix this. Just removing the
&& $filter->getConjunction() === 'AND'would have apparently worked, too, but would have performed significantly worse (in my tests up to 25% slower) due to unnecessary joins.Please test/review if the patch now works as intended! (The tests in #2013609-5: Change the workflow of the module's tests all pass now, in any case.)
Comment #21
bago commentedI'm not sure I will be able to test this before the next week as it involve applying a lot of patches to search_api/search_api_db. I hope to see a new search_api release with #2013581: Add multi-valued field to test module and #2012706: Fix $reset parameter for load functions soon ;-) so that we can see tests running/passing here.
I reviewed the patch and it makes sense: i fear that this final patch will bring back my performance problems #2007874] that I solved using more conditions in the "ON" clause, but we'll see that once I put it in production the whole patched thing.
Thank you for the great work you did!
Comment #22
drunken monkeyHm, it should at least be a lot better than without the patch. So, yes, please just see how it goes and then maybe update your patch there. But we should naturally first make it correct, before we make it fast(er).
Thank you for your meticulous testing and review of the patch! It's always great to get such accurate feedback.
Currently, there is something severely wrong with the test bot for this module anyhow. See the failing test here (and in all other issues): it simply fails to enable the required modules and terminates.
However, I don't think we need to wait for the next Search API release. I already ran the tests locally and it came out fine. I'd just let the patch lie here for a few days and then, if no-one objects, commit it. A new Search API release probably won't be created until the end of the month, and I'd like to have this committed before that.
Comment #23
bago commentedI've built search_api and search_api_db starting from the latest devs and applying the following patches:
#2012706: Fix $reset parameter for load functions
./search_api/2012706-1--reset_parameter_for_load_functions.patch
#2013581: Add multi-valued field to test module
./search_api/2013581-6--test_entity_keywords_field.patch
#1863672: Multiple content in the search result, when using multiple terms to filtering
./search_api_db/1863672-19--use_distinct.patch
#2013609: Change the workflow of the module's tests
./search_api_db/2013609-5--test_overhaul--do-not-test.patch
#2012688: Fix left-over data when executing two searches in a row
./search_api_db/2012688-1--reset_ignored_and_warnings.patch
The test suite passes here, too.
I tested the modules in dev environment and it worked fine, so I just pushed to 1 production server.
I also tested the query that was hanging my server in #2007874: Performance issues with multivalued (list) fields and multiple OR filter and it is working (1-2 seconds).
Comment #24
bago commentedComment #25
drunken monkeyExcellent, thanks again! Good to know.
Comment #26
bago commented1 week in production and everything is fine.
Comment #27
drunken monkeyExcellent, good to hear. Thanks again for your help!
Committed.
Comment #28
drunken monkeyExcellent, good to hear. Thanks again for your help!
Committed.
Comment #29.0
(not verified) commentedUpdates links to other issues to use project issue automatic linking instead of absolute URLs.