Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
In testing #681760: Try to improve performance and eliminate duplicates caused by node_access table joins, we found that we could not replicate this proposed Drupal 6 query syntax in Drupal 7.
SELECT n.nid, n.sticky, n.created FROM node n WHERE (EXISTS (SELECT na.nid FROM node_access na WHERE n.nid = na.nid AND (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 0 AND na.realm = 'domain_id'))))) AND ( n.promote = 1 AND n.status = 1 )ORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10;
When using $query->condition(), the fieldname is required, so we are stuck with something like:
$query->condition('n.nid', $subquery, 'EXISTS'), which returns malformed SQL.
SELECT n.nid, n.sticky, n.created FROM node n WHERE ( n.promote = 1 AND n.status = 1 ) AND (n.nid EXISTS (SELECT na.nid FROM node_access na WHERE n.nid = na.nid AND (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 0 AND na.realm = 'domain_id'))))) ANDORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10;
The documentation at http://drupal.org/node/310086 suggests that EXISTS is not properly supported by DBTNG:
Subselects are generally useful only in two cases: Where the subselect results in only a single row and value returned and the operator is =, <, >, <=, or >=; or when the subselect returns a single column of information and the operator is IN. Most other combination would result in a syntax error.
So, how to fix?
Comment | File | Size | Author |
---|---|---|---|
#42 | drupal-1001242-42-revise-exists-test.patch | 2.05 KB | Steven Jones |
#41 | 1001242-41-revise-exists-test.patch | 1.89 KB | bfroehle |
#34 | 1001242-select-exists.patch | 7.6 KB | agentrickard |
#29 | 1001242-select-exists.patch | 7.6 KB | Crell |
#12 | 1001242-exists-with-tests.patch | 7.5 KB | agentrickard |
Comments
Comment #1
pwolanin CreditAttribution: pwolanin commentedCan you supply an empty string as the field name?
update: Ken says no.
Comment #2
pwolanin CreditAttribution: pwolanin commentedSome similar:
"WHERE NOT EXISTS"
"WHERE NOT (boolean expression)"
but generally this is an unusual construction it seems.
Comment #3
pwolanin CreditAttribution: pwolanin commentedSeems like to do this in a sane way we need to make the condition interface more complete, e.g.:
Comment #4
webchickThis sounds fairly non-obtrusive, so as long as there are tests, I'm comfortable squeezing this in to make our node access queries performant.
Comment #5
Crell CreditAttribution: Crell commentedI second Angie's comment. With unit tests this should be a no-break addition so I'm comfortable with it. We want fast node access queries. :-)
Comment #6
webchickIncidentally, can we get confirmation that EXISTS isn't some weird MySQL-ism? I never ran into that clause before.
Comment #7
Crell CreditAttribution: Crell commentedIt certainly appears to be ANSI:
http://www.firstsql.com/iexist3.htm
http://www.databasejournal.com/features/mssql/article.php/1438001/ANSI-J...
http://www.oreillynet.com/pub/a/network/2002/04/08/compsets.html
http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
(There's actually some interesting data in those links for someone working on this patch to consider...)
Comment #8
agentrickardEXISTS is ANSI standard.
http://www.sqlite.org/lang_expr.html [Edit, better link.]
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries....
http://www.postgresql.org/docs/7.4/interactive/functions-subquery.html
Comment #9
agentrickardHere ya go, tests.
If this goes in, #681760: Try to improve performance and eliminate duplicates caused by node_access table joins needs revision to remove patch duplication.
NOTE: The basic db_select() syntax is not tested for other conditionals in core. Only for simple '=' operations. So this patch was held to a higher standard than the rest of core. Tests for items like BETWEEN are limited to Update queries. Someone else needs to fix that in another issue.
Comment #10
Crell CreditAttribution: Crell commentedWouldn't pwolanin's approach in #3 be cleaner? That seems like it would be far more self-documenting, and make it easier to support the various variants of EXISTS that chx tells me, er, exist. (SOME, ALL, ANY.)
Comment #11
agentrickardI knew you would say that. Doing so puts that squarely on you, IMO.
Comment #12
agentrickardThis is as far as I can go. The rest is a black box of OO Fu.
Something actually has to implement adding the EXISTS clause to the query string.
Comment #13
chx CreditAttribution: chx commentedANSI SQL has more quantifiers (namely ANY/SOME, ALL) we are not adding support for them because we ... can't. And that's fine. Noone uses them anyways :) Your query looks like
SELECT * FROM node WHERE nid > ANY (SELECT nid FROM node_access)
. Read http://www.postgresql.org/docs/8.3/static/functions-subquery.html for more. Edit: this makes ANY and ALL unsupportable with just condition so we would need methods for these two.So at this point adding more methods Edit: for just EXISTS is a farce and pointless. I am RTBC'ing #
89. It has tests and does what we need. Making exists a new method is entirely pointless -- we will do that alongside with ANY/SOME and ALL in Drupal 8. With an eye on backport.Comment #14
chx CreditAttribution: chx commentedSorry I changed the title before I decided what I wanted :)
Comment #15
pwolanin CreditAttribution: pwolanin commentedGicen that most of us have never even though of using "EXISTS" in a query, and we are still not sure whether it performs better for the node access use than "IN", I think the simpler patch in #9 is probably the right thing for now.
Comment #17
chx CreditAttribution: chx commentedWe had a long chat on IRC about this, let me try to sum up. The problem is that we are two RCs in and I really would like to keep with current syntax. That's possible with
->condition('', $subquery, 'EXISTS')
and it's really similar to the currentIN
support. This is as far as I would go with Drupal 7.Now, you can't do that with
->condition('nid', $subquery, '> ANY')
because you would need to map> ANY
and all such constructs separately. That's IMO not acceptable as it would immediately triple the size of mapConditionOperator needlessly. So for complete quantifier support we need new method(s) or changing the condition function to be->condition('', $subquery, '>', 'ANY')
and then figure out mapping but that's Drupal 8 territory.Comment #18
agentrickard@pwolanin
Were you ever able to do any query/load testing on the difference between IN and EXISTS in #681760: Try to improve performance and eliminate duplicates caused by node_access table joins? I suppose I can fire up 10,000 nodes and give it a shot.
Comment #19
webchick1) This still needs Crell's sign-off.
2) Do we still need this? #681760: Try to improve performance and eliminate duplicates caused by node_access table joins is talking about IN queries now, which we already have support for.
Comment #20
agentrickardI'm waiting for @pwolanin or another power SQL user to tell us exactly which query format performs better. I also think we might have a COUNT query problem, since the current patch in #681760: Try to improve performance and eliminate duplicates caused by node_access table joins creates a COUNT(*) query with _two_ subselects.
Comment #21
pwolanin CreditAttribution: pwolanin commentedI think we want this in our tool kit even if we decide to go with IN for the other patch.
Comment #22
Crell CreditAttribution: Crell commentedWe do want to have the utility methods. Relying on the implementation detail that right now a '' happens to work does not make EXISTS supported. It's a clever hack. Even if we use that fact internally it should not be exposed to callers.
Assigning to me to handle the extra methods.
Comment #23
chx CreditAttribution: chx commentedWell then.
Comment #24
agentrickard@chx
If we bump that to D8, then the other patch has to get blessed using IN syntax. Which seems fine to me, but someone who is more critical of database performance really needs to stress-test that.
Comment #25
Crell CreditAttribution: Crell commented*sigh*
Comment #26
agentrickard@Crell, AFAIK the EXISTS v. IN doesn't make a bit of performance difference. But I should not have final say.
Comment #27
webchickWhen I talked to Narayan about this the other day, he said he thought the MySQL query optimizer would automatically convert IN to EXISTS but he wasn't totally sure.
Comment #28
agentrickardIt seems to. Naryan and I were discussing that in IRC. But using a native EXISTS is likely better, since it avoids this conversion.
Comment #29
Crell CreditAttribution: Crell commentedWith a utility method. This is based on agentrickard's patch in #9, so credit accordingly.
The reason why we need a method for exists() but don't need one for ANY etc. is that the others do require a field. EXISTS does not. Therefore it is perfectly natural to do ->condition($field, $subquery, '< ANY') or ->condition($field, $subquery, '> ALL') just as we already routinely do ->condition($field, $subquery, 'IN'). However, EXISTS has no field, and we should not rely on the quirk of the current implementation that an empty string for $field just happens to work right now. Instead we wrap that into a utility method that takes advantage of that fact, offering a better separation between interface and implementation.
While MySQL may optimize EXISTS into IN or vice-versa, that doesn't mean every database will do so. Better to support the real SQL syntax and let the specific database handle it. That's what it's good at.
Comment #30
chx CreditAttribution: chx commentedYou still can't do '< ANY' because there is no mapping for it and it needs a mapping and we do nto want to add a mapping for every combination of possible operation + some/any/all.... So you need methods or a double op condition which is too big a change for now. So is #29.
Comment #31
Crell CreditAttribution: Crell commentedmapConditionOperator() is not a whitelist, just a translation list. If you map an operator it doesn't know about, it just upper-cases it and turns it into the array structure the compiler expects. I do believe your statement that we cannot do "< ANY" is false.
And seriously, we've done far worse to an RC before in the past 2 weeks.
Comment #32
chx CreditAttribution: chx commentedyou need prefix/postfix ( ) for < ANY to work and those come from mapping.
Comment #33
andypostI prefer using EXISTS from my oracle's times - if internal dataset is bigger then 10-20 rows then EXISTS gives more performance and eats less memory. I've googled around postgesql and it seems they are adopted same technique for IN and EXISTS
So +1 for commiting this! I think there would be follow-ups to optimize some queries by changing IN into EXISTS
Comment #34
agentrickardThere were two tabs in the patch, this works as expected.
Comment #35
agentrickardNote that this is not really an API change. It is a correction of an oversight, since EXISTS and NOT EXISTS are ANSI standard.
Comment #36
chx CreditAttribution: chx commentedFor the sake of the node patch and peace, let it be but I am going to use this issue as a major argument for backporting ANY/SOME/ALL support into D7.1
Comment #37
Crell CreditAttribution: Crell commentedGrrr Eclipse and the 15 places you have to change a setting to get rid of tabs...
For the record, I am not adverse to ANY/ALL backports at a later date if they can be done cleanly. (And chx's point in #32 is valid.)
Comment #38
webchickI'm not adverse to that either as long as they're similar to this patch: minimally invasive, doesn't break any existing code, documented, comes with ample tests. ;)
Committed to HEAD. Thanks!
Comment #39
webchickMeh. This broke SQLite:
http://ci.drupal.org/result.php?job=drupal-test-sqlite&build=2011-01-03_...
Fail Other database_test.test 1712
Fetched name is correct using EXISTS query.
Comment #40
bfroehle CreditAttribution: bfroehle commentedHere's my analysis of the failing SQLite test.
The SQL test in question, testExistsSubquerySelect(), relies on two tables,
{test}
and{test_people}
.It then builds and runs the following query:
In both MySQL and SQLite, this returns all rows from
{test}
... however the ordering is different and this breaks the test.In SQLite this returns:
and it MySQL
I think this test code is not what the author intended -- I think the goal was to just select one row from
{test}
.Comment #41
bfroehle CreditAttribution: bfroehle commentedI've revised and simplified the test to now test the following query:
Tests pass locally in both MySQL and SQLite.
Comment #42
Steven Jones CreditAttribution: Steven Jones commentedSeems like the original intention of the exists test was to only select rows from test that have a matching name in the test_people table, i.e. just George.
Attached patch corrects the tests to do that.
Passing on PostgreSQL.
(This doesn't need to be assigned to Crell any more)
Comment #43
bfroehle CreditAttribution: bfroehle commentedThe patch in #42 is ready to go. It clearly captures the test author's original intent and adds some comments so it's clear what it is testing for. I've verified that SQLite passes on my local machine, Steven Jones verifies it works for PostgreSQL, and the testbot checked MySQL, so we are all set.
As a nice side benefit, this will make the SQLite Continuous Integration testing pass.
Comment #44
webchickAwesome work folks, thanks!!
Committed to HEAD.
Comment #45
Crell CreditAttribution: Crell commentedNice Sleuthing, guys!
Comment #46
agentrickardAnd correct, btw, my intent was to return only the 'George' row.