With the new 6.x mysql requirements (see 105855), we can now use sub-queries instead of temporary tables. Drupal currently does not allow sub-queries. And the implementation of db_rewrite_query actually prevents them from working.
Should we allow sub-queries? I hope so. Sub queries will eliminate many needs for db_query_temporary, which in turn will allow these queries to be cached. There's a performance gain to be had in making this work. The one I've been working on, is in the search engine. I think that if we can get this to work, page 2 3 4, etc, will use cached results. As it works now, going from page to page of the search results has to re-run the entire query.
Although there are other use cases, I'm only concerned with sub-queries as part of a JOIN. This is a somewhat simplified example of the actual query:
SELECT node.nid, temp_vfs.score, temp_vfs.score AS score FROM node node INNER JOIN (SELECT n.nid, i.score FROM node n LEFT JOIN search_index i ON n.nid=i.sid WHERE i.fromsid=0 AND i.word IN ('lorem') GROUP BY n.nid HAVING COUNT(*)=1) temp_vfs ON node.nid = temp_vfs.nid WHERE node.status = '1'
The problem is that db_rewrite_query does a blind str_replace on the WHERE in this statement and adds the rewritten JOIN clause to both uses of WHERE. It's not a simple problem. db_rewrite_query also looks for GROUP and LIMIT which can occur in the sub-query.
As this currently works, when the user is not logged in, node_access adds JOIN information, and I end up with the following incorrect query (simplified again):
SELECT node.nid, temp_vfs.score FROM node node INNER JOIN (SELECT n.nid, i.score FROM node n LEFT JOIN search_index i ON n.nid=i.sid INNER JOIN node_access na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( i.fromsid=0 AND i.word IN ('lorem') ) GROUP BY n.nid HAVING COUNT(*)=1) temp_vfs ON node.nid = temp_vfs.nid INNER JOIN node_access na ON na.nid = node.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 1 AND na.realm = 'workflow_access') OR (na.gid = 0 AND na.realm = 'workflow_access_owner'))) AND ( (node.status = '1') )
I think that the only technical problem with this SQL is that the sub-query portion references node.nid when it should be n.nid. But the node_access JOIN in the sub-query wasn't expected or needed. I think that it would be better to limit the db_rewrite_query only to SQL not part of a sub-query.
I plan on offering a solution, but I first want to document the problem.
See also #143888 (support sub-queries in views) and views_fastsearch 5.x-dev release notes.
| Comment | File | Size | Author |
|---|---|---|---|
| #38 | db_rewrite_php4_Drupal63.diff | 1.92 KB | emok |
| #34 | db_rewrite_sql.patch | 1.02 KB | ingo86 |
| #26 | db_rewrite_sql.patch | 2.33 KB | chx |
| #23 | db_rewrite_sql.patch | 2.27 KB | chx |
| #21 | db_rewrite_sql.patch | 2.34 KB | chx |
Comments
Comment #1
moshe weitzman commentedone obvious solution is to not send anything through db_rewrite_sql that should be rewritten. the caller can inject his subquery afterwards. if the subquery has to be rewritten, he can send that through separately.
an alternate solution is for the caller to add a hint in the db_rewrite_sql() call and replace some token with the suquery with example_rewrite_sql(). in other words, implement the hook yourself and inject there.
there may be better options though.
Comment #2
douggreen commentedThis patch is an improvement over the current implementation in that it will properly handle all cases where the query (not the sub-query) has a WHERE clause. It does this by looking for the position of the last WHERE in the clause and does not change anything before it.
For the one case that this does not solve, that of a query that uses sub-queries with WHERE's in the sub-query but no WHERE in the query, the developer can easily resolve this by simply adding "WHERE (1)" to the query.
Comment #3
douggreen commentedThe above patch was rolled against 5.x, but works on 6.x too.
Comment #4
douggreen commentedpgsql supports sub-queries as of 7.1 and mysql supports sub-queries as of 4.1. The current (5.x) Drupal requirements for pgsql of 7.3 already supports sub-queries and the coming 6.x requirements for mysql will support sub-queries.
Comment #5
webchickThis is a patch.
Comment #6
catchwhich no longer applies.
Comment #7
douggreen commentedRe-rolled for the latest 6.x Drupal Beta 4. Because both databases minimum versions support subqueries, contrib modules will try to use subqueries. Shipping 6.x without subquery support is IMO a bug so I've changed this from a feature request to a bug.
Comment #9
douggreen commentedComment #10
douggreen commentedWhile re-reading the comments, I'd like to slightly contradict what I said in #2 above. I think that this is actually an adequate long term fix. With this fix, if someone did have a db_query SQL string that didn't have a WHERE clause in the primary SQL, could simple add
WHERE (1), and thus get around the restriction.To also clarify, I do not think that this is not a problem with the new search implementation... I was able to implement that without subqueries, except for the COUNT(*) version, that I don't think is affected by this issue.
Comment #11
catchEr, could you clarify - it's not a problem, or it is a problem?
subqueries are lovely and it'd be great to get this in, but I'm not sure how to test this..
Comment #12
douggreen commentedThis is NOT a problem with the new search implementation. Sorry if this was confusing. I only made this comment in #10 because the initial issue refers to views_fastsearch, and I didn't want people to think that this was a bug in the new search implementation. It's not. This is a bug for anyone who wants to write subqueries with WHERE clauses.
It's kinda hard to test, because it changes db_rewrite_query. To test, first I'd like to make sure that it doesn't break any existing queries. I'd apply the patch and run a full suite of tests, although that's never going to be exhaustive. Maybe keep this patch installed for a day while you test other things. Then I'd create a custom module that has a subquery that has a WHERE in the query and a WHERE in the subquery, and make sure that works.
What would really be helpful, is a code review from a seasoned developer. So that someone else can review the code, this is what it does:
First, it finds the last position of WHERE in the query. The current code finds the first position. Finding the first position of WHERE fails when you have a WHERE in your subquery. For example, this would fail with the current code, but will work with the proposed patch. This is just an example, you'd probably never write this:
SELECT node.nid FROM node INNER JOIN (SELECT uid FROM node WHERE uid > 1) u ON node.uid = u.uid WHERE node.uid > 1Second, it only looks for GROUP, HAVING, ORDER and LIMIT after the last WHERE clause. This example looks for nodes from (non-admin uid=1) user's that have authored two or more nodes:
SELECT node.nid FROM node INNER JOIN (SELECT uid FROM node WHERE uid > 1 GROUP BY uid HAVING COUNT(*) > 1) u ON node.uid = u.uid WHERE node.uid > 1The GROUP BY clause is in the subquery, not the primary query, so the search for GROUP should fail. And the replacement string will properly be based on the primary queries WHERE clause.
The proposed patch's test for WHERE is not perfect. For example, in the above examples, we really don't need the second WHERE clause -- it would be functionally equivalent without it. But the primary query must always have a WHERE clause. In cases that you don't really need the WHERE clause, I propose that developers just add
WHERE 1. For example,SELECT node.nid FROM node INNER JOIN (SELECT uid FROM node WHERE uid > 1) u ON node.uid = u.uid WHERE 1This writeup helped me, because I think I found a bug in my patch. Attached is a new patch.
Comment #13
catchThanks for the write-up. I can confirm that testing with all core modules enabled, adding, deleting users, nodes, taxonomy terms etc. is fine. Complex queries like search and tracker are unaffected. Wish I could help with the seasoned developer bit!
Comment #14
moshe weitzman commentedi am a bit bold tonight and calling this critical. IMO, we upped system requirements in part so we can use subqueries in Drupal6. we can't have a key function like db_rewrite_sql() actively discourage their use. once can't avoid that function - it enforces our whole node access control. every Views query uses it as well.
Comment #15
chx commentedThe WHERE you need is not the first nor the last.
SELECT nid FROM (SELECT nid FROM term_node WHERE tid = 1) AS x INNER JOIN (SELECT nid FROM term_node WHERE tid = 1) AS y USING (nid) INNER JOIN foo USING (nid) WHERE (a = B) OR (weight IN (SELECT weight FROM allowed_weights WHERE (allow = 1))) AND q = 1and I guess I could write even uglier pieces. The WHERE you need is the one outside of balanced parentheses, in other words^(?P<anonymous_view>(?:[^()]++|\((?P>anonymous_view)\))*).*?WHERE. However, we want the last ofGROUP|ORDER|LIMITand there is no point in puttingHAVINGin there because that requires the presence ofGROUP.Comment #16
chx commentedAs the anonymous_view subpattern had a * quantifier having a ? after it is not needed.
Comment #17
catchOK well I won't pretend to know what's going on in #16, but it doesn't break anything I tested either (adding/deleting nodes, users, taxonomy terms, searching, tracker etc.)
Comment #18
gerhard killesreiter commentedI think the patch removes support for "HAVING".
Comment #19
chx commentedBut that's fine as said in my original post, GROUP BY comes before HAVING. Anyways, the patch above found the first group/order/limit instead of the last. We would need strrpos for strings. That's PHP5. We can emulate.
Comment #20
catchNo breakage with #19 either.
Comment #21
chx commentedNo need to retest, this is a trivial cleanup:
$needleis not used so I removed it.Comment #22
douggreen commentedI like where he's going with this, but I'm sorry to report that chx's patch doesn't work, yet mine does.
I wrote a simple module that has the following function (called from a menu handler):
You'll also need the info file:
Look at the db_rewrite_sql that's displayed in the drupal_set_message. chx's version adds the WHERE clause to the subquery, whereas my version adds it to the final WHERE clause. This is what you should see:
Comment #23
chx commentedTry this and still note that your test case does not use a subquery after the main while. Note this version is also a cleanup: uses one less preg.
Comment #24
douggreen commentedI'm not sure what you mean by "your test case does not use a subquery after the main while," but this version does pass my test.
I notice that you dropped the comment about why you're doing the strrev (
// PHP 4 does not support strrpos for strings. We emulate it.). I think the comment added something for me. Did you remove it intentionally?It's too bad that we're still supporting php4 with Drupal 6.x -- that's some convoluted code to do the strrev check. gophp5!
If catch can make sure that this doesn't break anything else, I think it's RTBC. I can live without the comment, although I do think it helped.
Comment #25
catchadding/deleting nodes, adding/deleting user, adding/deleting term all fine. No time for more comprehensive testing until later, possibly tomorrow - so wouldn't hurt for an additional review from someone else.
Comment #26
chx commentedNo change at all, just added back that comment.
SELECT nid FROM node WHERE nid IN (SELECT nid FROM somewhere WHERE foo = 'bar')this kind of subquery will make your version fail IMO.Comment #27
douggreen commented@chx, agreed, you're patch is better. I've also confirmed that it works on the second type of subselect, that you've shown above.
Comment #28
gábor hojtsyThanks for the patch and for all the testing. Hopefully we can make this a bit more nicer come Drupal 7 on PHP 5.
Comment #29
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.
Comment #30
killes@www.drop.org commentedhttp://drupal.org/node/222128
Comment #31
killes@www.drop.org commentedTo elaborate: This patch has broken the use of HAVING without GROUP BY which is IMO perfectly valid SQL. http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt does not mention that GROUP BY would be required of HAVING is used:
Comment #32
killes@www.drop.org commentedmoving
Comment #33
killes@www.drop.org commentedMoving back to D6, chx explained that db_rewrite_sql will be removed in D7.
Comment #34
ingo86 commentedHi everyone,
Here is a patch in order to enable HAVING clause without GROUP BY clause. It's what #31 report.
Comment #35
wedge commentedI was bit by this today and #26 solved my issue. Drupal5 postgresql 8.3. Thanks chx and doug!
Comment #36
colanYes, definitely #26++! I used it on 5.7 (without problems) to solve a huge MySQL error I was getting. I'm not sure what can go into point releases, but would it be possible to also release this as a bug fix for 5 when it finally gets resolved? I'm writing code for some 5.x contrib modules that will depend on this patch.
Comment #37
emok commented(Should I have opened a new bug for this?)
db_rewrite_sql in Drupal 6.3 seems to be similar to #26. With it I get different results between PHP 4.3.11 and PHP 5.2.3. I've nailed the problem down to the actual
preg_match-call. So I think there is some change in how PHP works between these versions, which affects queries where we do not want to use the lastWHEREof the string.On PHP 5.2.3 I get the expected:
While on PHP 4.3.11 the output is:
The erroneous PHP4 output means that db_rewrite_sql will put additional WHERE-code (like node access restrictions) in the inner query ({term_node} in my case). In certain views this can cause a SQL error.
Any ideas about how to overcome this difference, so that PHP4 is equally supported? On PHP5 #26 (and probably #34) works fine.
Comment #38
emok commentedI now have a solution to #37 so that
db_rewrite_sqlworks the same way on PHP 4.3.11 as on PHP 5.2.3. This version is probably a bit slower, 60-90% when only measuring the affected lines, but I doubt it would be worth using different code for PHP4 and PHP5.The attached patch applies to the Drupal 6.3 release. If desired, #34 can still be applied on top of this.
Comment #39
abautu commentedI posted something related here (http://drupal.org/node/162970#comment-1009000) before notice that issue is a duplicate for this one.
Comment #40
arhak commentedsubscribing
Comment #41
codecowboy commentedObviously this isn't a major issue in D6 as a year without activity indicates. Maybe this issue should be won't fixed as community focus is shifting to D7 and code freeze and D6 seems to be humming along fine without this patch.
Comment #42
douggreen commentedAgreed, but I do think this is a problem. But I've done very little D6 work ... been stuck on 5 mostly and I hope my focus will be on 7 soon.
Comment #43
poshaughnessy commentedThanks very much for the patches here, I applied emok's (#38) and it fixed a real headache I was having with SQL syntax errors which seem to have been down to how the original pattern worked. I'm afraid I haven't been able to figure out why it has fixed it (I'm new to PHP and Drupal) but my post explains the issue in case anyone's interested: http://drupal.org/node/585458.