Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I need to construct some complex SQL statements having UNION
or UNION ALL
inside. It looks missing in the documentation... if it has been implemented.
For e.g.
"SELECT ll.*
FROM {linkchecker_links} ll
INNER JOIN (
SELECT lid FROM (
SELECT DISTINCT ll.lid
FROM {node} n
INNER JOIN {node_revisions} r ON r.vid = n.vid
INNER JOIN {linkchecker_nodes} ln ON ln.nid = n.nid
INNER JOIN {linkchecker_links} ll ON ll.lid = ln.lid AND ll.last_checked <> %d AND ll.status = %d AND ll.code NOT IN (" . db_placeholders($ignore_response_codes, 'int') . ")
WHERE n.uid = %d OR r.uid = %d
UNION
SELECT DISTINCT ll.lid
FROM {comments} c
INNER JOIN {linkchecker_comments} lc ON lc.cid = c.nid
INNER JOIN {linkchecker_links} ll ON ll.lid = lc.lid AND ll.last_checked <> %d AND ll.status = %d AND ll.code NOT IN (" . db_placeholders($ignore_response_codes, 'int') . ")
WHERE c.uid = %d
) q1
) q2 ON q2.lid = ll.lid"
I have completly no idea how I should add UNION inside...
Aside - I'm new to D7 DB api, but the rest of this SQL code looks not that easy to convert. It would be great if someone could provide some help with the rest...
Comment | File | Size | Author |
---|---|---|---|
#50 | 557318-2.patch | 8.77 KB | cha0s |
#48 | 557318_8.patch | 8.04 KB | hass |
#46 | 557318_7.patch | 8.08 KB | hass |
#44 | 557318_6.patch | 8.09 KB | hass |
#38 | 557318.patch | 8.73 KB | cha0s |
Comments
Comment #1
Crell CreditAttribution: Crell commentedFor static queries (using db_query()), there is no change. You just write a static string with UNION or whatever else you want. Placeholders are now handled automatically in static queries if you pass in an array rather than a scalar as the parameter.
For dynamic queries using db_select(), UNION is not yet supported. I never figured out how to do that nicely. If you want to send in a patch I'm game to try and sneak that in before code freeze. :-)
Comment #2
hass CreditAttribution: hass commentedThank you for quick feedback.
My main issue with the above query is the
db_placeholders($ignore_response_codes, 'int')
. I have no idea how could looks like in a static fashion.db_placeholders()
is gone and I do not know how to convert to D7... If you could give me an example it would be very helpful.I'm still not sure what the difference is between a static and dynamic query and when I should use on or the other... I was told on IRC complicated querys need to use db_select()... so I expect I have nearly only complicated querys next to me for the reason of db_placeholders() :-(
Comment #3
hass CreditAttribution: hass commentedI'm sorry about the UNION implementation... as I'm still trying to upgrade a module... I do not have the background to develop the core database API stuff :-(. But having no UNION support would be critical for linkchecker module. I may never get a D7 version out if this is not possible. :-(((
Comment #4
Crell CreditAttribution: Crell commentedFor placeholders:
$result = db_query("SELECT ... WHERE nid IN (:nids)", array(':nids' => array(1, 2, 3));
The rest is magic.
You should use the select builder for queries *whose structure will change at runtime*, either because they need to vary per database, or conditionally based on user input, or because they need node access, or they're using a pager or tablesort or something, etc. If it's one hideously long and complicated query whose structure does not change, then you can just use db_query().
Comment #5
hass CreditAttribution: hass commentedOh my goodness... Thank you *very* much. I overcomplicated the most query's :-(. This magic was my problem and this is not documented or I missed it... we should add such an "IN" example to the DB API doc pages (http://drupal.org/node/310086)... not only the "easy" statements...
Comment #6
hass CreditAttribution: hass commentedCrell: Is it possible that extend('PagerDefault')->extend('TableSort') is required to build a pager? I've found this in core very often and there is *no* pager in core that is not build this way. Looks like I need to move all Pager with/o TableSort to db_select() !?
Comment #7
Crell CreditAttribution: Crell commentedYou should be able to do a pager with just PagerDefault, not TableSort. That was tested when extenders were added. In practice nearly all pagers also use tables, and tablesort, but that's not a requirement of pagers.
That said, yes, pagers should be done using PagerDefault. The old pager_query() is deprecated and should be removed as soon as we're sure core is not using it anywhere anymore.
Comment #8
hass CreditAttribution: hass commentedBut how can I use PagerDefault without db_select()? With db_select I cannot use UNION... :-(
Comment #9
hass CreditAttribution: hass commented@Crell: The only thing I can come out is the below... but it's not working at all and misses
$subquery3
and$subquery4
. Are you able to work on the UNION stuff, please? I can test, but cannot develop the driver. I really hope that such a critical thing can get in after code freeze if it breaks modules working in past!? "nicely" is something that we can nail out later... but "working" is required.This gives me:
If you have an idea how to build a pager query without db_select() - PLEASE shed some light on me. I'm fully lost here and I do not like to discontinue the linkchecker module only for a core DB API that is missing a small feature. :-(
Comment #10
Crell CreditAttribution: Crell commentedLet's try this patch. Seems to work and has a unit test. :-)
Comment #12
hass CreditAttribution: hass commented*GREAT* - I will give this a try in a few hours. What is about "UNION ALL"?
Comment #13
Crell CreditAttribution: Crell commentedI don't understand how Union All vs Union distinct works, to be honest. :-) I've only ever just used UNION.
Comment #14
Crell CreditAttribution: Crell commentedTagging.
Comment #15
hass CreditAttribution: hass commentedI cannot test. Latest core code has a broken install process and the next oldest version I have is ~8 days old and the patch does not apply to this old code :-(.
Comment #16
Crell CreditAttribution: Crell commentedCross-posted. Retagging.
Comment #17
zzolo CreditAttribution: zzolo commentedPatch looks good, but I definitely am in the opion that UNION should be avoided, and if needed can be done with the db_query function anyway.
As far as UNION ALL
* http://www.techonthenet.com/sql/union_all.php
I'm not sure UNION DISTINCT is always supported. I know in MSSQL, its simply a matter of using a distinct in one of the queries. I think most SQL implementations are selecting distinct rows by default.
But again, I feel these can be handled in the db_query range, since UNIONS are pretty expensive.
Comment #18
hass CreditAttribution: hass commentedI used my 1 week old core version with the latest database folder to install and apply the patch. By this I was able to get this working as expected and changed in the #9 example the line
to
what resulted in the expected SQL statement:
This patch solves my UNION only issue, but I believe we should have UNION ALL, too.
Comment #19
moshe weitzman CreditAttribution: moshe weitzman commentedPasses its own Beatles test and the bot is green. Code looks good. RTBC
Comment #20
cha0s CreditAttribution: cha0s commentedI got a patch for not only UNION but UNION ALL too. Hopefully it can get pushed in.
Yes, the code is a bit dupe'y, but really I don't think it can be all that much better without making it less clear.
Thanks!
Comment #21
hass CreditAttribution: hass commentedCodewise it looks like you have missed something the the
__clone()
. I'm also not sure if it makes more sense to have something like->union($queryObject, 'ALL')
as it may not duplicate so much code and looks like the condition logic.Comment #22
Crell CreditAttribution: Crell commentedAbsolutely it should be a unified list, if for no other reason than the patch in #20 means you can never have a UNION ALL before a UNION in the same query. The order does matter. Rather, the $union array should store not just a query object but also the type of union to do for each.
That said, I still don't actually understand what the difference between UNION, UNION ALL, and UNION DISTINCT is. Until someone explains that to me, the patch in #10 is still the approach we want to use.
Comment #23
cha0s CreditAttribution: cha0s commentedThere were some problems with indentation.
Comment #24
hass CreditAttribution: hass commented@Crell: There is a link in #17 that explain the difference between UNION and UNION ALL. UNION ALL does not merge the same entries into one. If UNION only is used the *same* rows from different tables are merged into one row. This is not the case with UNION ALL. You see in the latest test an update for this...
or with other words:
I'm not sure what UNION DISTINCT does... maybe not DB independed - never heard about it. Sounds like normal UNION.
Comment #25
hass CreditAttribution: hass commentedt('UNION ALL correctly preserved duplicates.')
sounds wrong... it need to be UNION and does NOT preserve duplicates... I believe you have flippedt('UNION correctly discarded duplicates.')
witht('UNION ALL correctly preserved duplicates.')
Comment #26
hass CreditAttribution: hass commentedUNION and UNION DISTINCT seems to be the same, see http://dev.mysql.com/doc/refman/5.1/en/union.html.
Comment #27
hass CreditAttribution: hass commentedSounds like UNION DISTINCT is defined in the Standards and UNION not... maybe we should better go with the Standard :-)
Comment #28
cha0s CreditAttribution: cha0s commentedGood points, hass. I fixed the backwards strings for the unit tests, and also regular union() now uses the more specific UNION DISTINCT.
Comment #29
hass CreditAttribution: hass commentedWhy are we not merge UNION with UNION ALL and UNION DISTINCT by using
->union($selectQuery, 'ALL')
,->union($selectQuery, 'DISTINCT')
and so on. This would also fix the issue Crell pointed out.Comment #30
cha0s CreditAttribution: cha0s commentedOkay, I got the different union types unified under the union() call, also aggregating the relevant info in the $union variable in the query to get rid of the duplication.
The $union variable is now an array of arrays, the child arrays containing 2 keys, 'type', which is either '', 'ALL', or 'DISTINCT' (mapping to UNION, UNION ALL, and UNION DISTINCT, respectively) defaulting to '' (UNION), and 'query' which is the query object.
Updated the tests for UNION and UNION ALL, noting that the UNION DISTINCT test is not done, since UNION and UNION DISTINCT are semantically equal.
Comment #31
hass CreditAttribution: hass commentedI've found a few code style issues...
1. We may need to remove this from the inline documentation as it's only correct for UNION and UNION DISTINCT.
2. "An array of additional Select queries." documentation also have some good space we can use until the 80 letters limit per line is reached :-). Currently max 50 is used.
3. A few trailing space issues need to be fixed. One is above "// Ensure we only get 2 records." others in the tests and so on.
4. "explicity" -> explicitly
Code wise it looks good... I will try to test.
Comment #32
hass CreditAttribution: hass commentedPatch attached does only change the documentation as written in #31.
I've tested UNION, UNION ALL, UNION DISTINCT with linkchecker and all works as expected (statements have been verified with latest core / devel query output).
Comment #33
cha0s CreditAttribution: cha0s commentedYeah, I noticed the comment margin was a bit conservative, but I was just trying to follow the rest of the docs in the general area.
P.S. Nice work. :)
Comment #34
Crell CreditAttribution: Crell commentedSweet work, folks! A few small notes, though.
1) The docblock for the $union property seems confusing. It should be a better description of the data structure, vis, "an array with each item representing a query to union. Each entry itself is a 2-key associative array. The 'type' key is..." etc.
2) The default handling should not happen in the toString() method. Instead, set the default value of $type to "DISTINCT". That way, all union statements will always specify what type of union to use. That's a bit more self documenting, and let us remove the ugly ternary from the toString() block.
3) The __clone() implementation was not updated for the changed $union structure.
Let's get those fixed and we can get this puppy in.
Comment #35
hass CreditAttribution: hass commentedAfter a quick search around... I'm currently not for adding distinct by default. It looks not very well supportet. Sqlite, mssql may not understand it. No idea what about oracle. Does pdo themself help us here?
Comment #36
cha0s CreditAttribution: cha0s commentedYeah, unfortunately I don't think UNION DISTINCT is implemented across the board, so we should probably leave UNION as the default... I know, the ternary is gross. I've fixed it in this patch.
In addition, I've seen two other query types in a couple database, INTERSECT and EXCEPT, which function like UNIONS, with the operator working like PHP's array_intersect() and array_diff() respectively. I don't mind if this is too much to push, and I know for a fact it isn't universally supported, but should we give users at least the option? If we decide to add it, I'll write unit tests (unless some other kind soul does it first! ;))
I have also cleared up the docs as specified.
Is there any value in actually validating the type argument? My latest patch does that, but that also I don't mind if it's removed... Here's the patch:
Comment #37
hass CreditAttribution: hass commentedI like the validation idea, too. We may also think about automatically changing UNION DISTINCT into UNION only if a developer use it. If we know for sure it's the same like UNION and someone uses it, but it's not working in all DB servers we should care about this... I'm not sure what evangelism rule apply here for Drupal, but it shouldn't be a developers decision to make code incompatible.
The sqlite docs are only talking about UNION and UNION ALL. I have no sqlite with me and cannot verify if DISTINCT nevertheless works... MsSQL2005 documentation also does not have anything about UNION DISTINCT... standard or not... we better go with UNION only for UNION DISTINCT - only to be save and for compatibility reasons. I cannot say anything about INTERSECT, MINUS and EXCEPT, but if this is only available on Oracle I'm not sure if we should support it!???
I like APIs that work flawless and 100% DB independent...
Comment #38
cha0s CreditAttribution: cha0s commentedYeah, INTERSECT and EXCEPT are oddballs, but I saw them in SQLite documentation as well. I suppose we could add them pretty trivially later if deadly necessary..? I don't imagine they're used much anyways, honestly.
I also took your (good) suggestion and made 'DISTINCT' an alias for 'UNION'. I think that is a better way to keep things cross-db... only thing is if we find out there's some crazy db that doesn't treat UNION and UNION DISTINCT as the same.
Comment #39
hass CreditAttribution: hass commentedUNION MINUS (Oracle) is also such a candidate...
I hope Crell can comment on the evangelism?
Comment #40
lambic CreditAttribution: lambic commentedPostgreSQL supports INTERSECT and EXCEPT. We use EXCEPT quite a lot for data import but it's probably less likely to be used in a non-batch environment.
Comment #41
cha0s CreditAttribution: cha0s commentedI think we should stick to the bare minimum of UNION implementations, at least for now....
The implementations in my last patch cover those that seem to be the most common, I'd say personally that if the more exotic UNION ops are needed, we should just let people invoke them statically... I guess I'm just stretching to think of some case where Module X is thinking "I really need to modify this UNION MINUS call that Module Y is making"...
I think I agree with hass, in that the DB API should be as universal as possible, and UNION, UNION ALL, and UNION DISTINCT (with our new alias in my last patch) are the most universal of all.
Comment #42
cha0s CreditAttribution: cha0s commentedSetting to needs review.
Comment #43
Crell CreditAttribution: Crell commentedI think we can easily kill 2 birds with one stone.
Don't validate the incoming union type. Check for just "DISTINCT" and fold that to empty string. Anything else, allow through as is. Don't put the "UNION" keyword into the array.
Then on compile, just put whatever the user entered in as a string after the DISTINCT keyword in the toString method.
That means if someone is working on a DB they know supports a more esoteric UNION variant, they can use it without having to muck about with the core code. The worst side effect is that we may get two spaces instead of one in the query string itself, which is really not an issue. SQL is whitespace insensitive.
Comment #44
hass CreditAttribution: hass commentedUpdated patch. I hope we are done now.
Comment #46
hass CreditAttribution: hass commentedUpdated patch.
Comment #47
Crell CreditAttribution: Crell commentedWe don't need the ternary. It just makes the code uglier. This should work, and be more readable:
Also, I just noticed the change in the comment above that. "UNION and co." is a rather colloquial expression. Even I had to think twice to figure out what it meant. Let's just say "UNION is a little odd" (or whatever). We're referring to UNION in all its various forms as being odd.
Comment #48
hass CreditAttribution: hass commentedUpdated patch.
Comment #49
hass CreditAttribution: hass commentedThis code is RTBC. Do we need to get this in *before* code freeze? It cannot wait for D8...
Comment #50
cha0s CreditAttribution: cha0s commentedI think I disagree on one part of the last patch, that is, why are we forcing 'UNION', when this type of thing can use just INTERSECT and EXCEPT (these don't use the UNION keyword). I think if we check for our favorites, and then pass everything completely through, that should all work. If I have 'UNION WOW' in my db, 'UNION WOW' would be the arg to union().
Comment #51
Crell CreditAttribution: Crell commentedAccording to webchick this is a code slush-friendly patch, but I'd still like to get it in. :-)
I guess with the patch in #50, alternate query combiners besides UNION work, but aren't technically supported. The three possible variants of UNION are all supported, however. I hate that there's a switch statement in the otherwise pretty DB layer, but, yeah. :-P
So #50 is ready, IMO. Over to you, webchick.
Comment #52
hass CreditAttribution: hass commentedTHX
I wonder how it's possible to make
and
database independent without switch statements. We may need to fold TRIM() to LTRIM(RTRIM(' string ')) and I'm not sure what we need to do with SUBSTRING. But this is OT here... Give me a hint if we should fix these issues in a two follow up issues :-)
Comment #53
lambic CreditAttribution: lambic commentedPostgres:
BTRIM(), RTRIM(), LTRIM(), TRIM([leading | trailing | both] [characters] from string)
SUBSTR(), SUBSTRING()
There are lots of SQL functions that differ among RDBMSs (don't get me started on date handling), it's ambitious to try to abstract everything.
Comment #54
Crell CreditAttribution: Crell commentedPlease stay on topic. There are already other threads that deal with the trainwreck that is cross-DB SQL function support. Just get this committed and move on.
Comment #55
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks.
Comment #56
hass CreditAttribution: hass commented@Dries: THX!!!
@Crell: Can you point us to the thread, please? I cannot find it...
Comment #57
Crell CreditAttribution: Crell commentedSee #167335: Handle database-specific SQL functions/operators by simple abstraction for a long and melodramatic discussion of SQL function handling. :-) Please read carefully before diving in, as it's a substantially harder problem than you think.
Oh yes, and w00t!
Comment #59
drewish CreditAttribution: drewish commentedI just realized the one thing this patch didn't handle was ORDER BY clauses. They end up placed ahead of the UNION rather than afterwards.
#1145076: UNION queries don't support ORDER BY clauses
Comment #60
drewish CreditAttribution: drewish commentedGot another UNION related issue: #1145080: SelectQuery::countQuery() incompatible with UNION queries