Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
when editing the "distinct" query attribute, the module comments the option with the phrase "You can use this to try and remove duplicates from a view, though it does not always work."
i will tell you why.
because the Views module always adds the table's primary key to the queried fields.
the primary key is always distinct by definition, thus it always generates distinct rows!
i would say it NEVER works, not "not always".
please fix this, don't compulsorily add the primary key to the query if the user didn't select it among Fields.
Comment | File | Size | Author |
---|---|---|---|
#36 | views-7.x-3.x-dev-add_pure_distinct-1.patch | 2.92 KB | thekevinday |
#23 | views-7.x-3.x-DISTINCT-1.patch | 618 bytes | thekevinday |
Comments
Comment #1
dawehnerPlease reconsider
Comment #2
vv75vv CreditAttribution: vv75vv commentedwhat to reconsider? Views always adds the table's PK to the selected fields even if it would fuck things up completely.
Comment #3
vv75vv CreditAttribution: vv75vv commentedbug still present in -dev.
Comment #4
vv75vv CreditAttribution: vv75vv commentedin views_plugin_query_default.inc there is such a snippet:
basically meaning that there WERE attempts to fix this priorly!
for some reason, commenting just this part doesn't do the trick. needs deeper digging.
Comment #5
vv75vv CreditAttribution: vv75vv commentedi fixed this issue on my installation of drupal.
edit views_plugin_query_default.inc:
1. comment out this:
and this:
2. find the context of "$query = "SELECT" (very end of function query()) and change it as follows:
probably i miss something and this is not all what should be changed, but it works in my case.
Comment #6
merlinofchaos CreditAttribution: merlinofchaos commented1) This is not critical by any definition of critical.
2) This change would change the behavior of all queries currently using the DISTINCT key in a way that would not be beneficial. We can't change the existing behavior without risking breaking behavior on a quarter million installations. I'm sure you are interested only in how it behaves on your system, but I have to be concerned with how it works on all systems. Trust me when I say that row level DISTINCT is not the intent of the checkbox.
3) There isn't a patch file attached. See drupal.org documentation for how to create patches.
4) Belligerent behavior is not an acceptable way of interacting in the issue queues. We're human beings and this is open source volunteer work. Things change through contribution and cooperation, not attack.
5) At best, at *best* we could change it to a radio and allow the user to select row/field level distinct and behave appropriately. I'll consider this as an option, but only if you can treat the issue and the community volunteers with respect.
Comment #7
vv75vv CreditAttribution: vv75vv commentedguys, calm down, there's no "field-level" for DISTINCT. your parenthesis after DISTINCT are treated just as expression delimiters and can be safely omitted.
the only drawback could be if other modules assume the PK is always present in the result set, without them specifying it explicitly.
as of patches... i dont have the toolchain as i have no plans to involve much with drupal. and even if i'd have, don't assume my changes correct without analyzing throughly.
also, there's a $this->add_groupby($base_field_alias) in the place where i commented just one line (very beginning of function query()). i think it should be omitted too, thus elimination out the entire if(...){...} near the comment "Check query distinct value". GROUP BY is for cases when functions are applied, like COUNT(*), and is orthogonal to DISTINCT.
Comment #8
vv75vv CreditAttribution: vv75vv commentedComment #9
merlinofchaos CreditAttribution: merlinofchaos commentedComment #10
Damien Tournoud CreditAttribution: Damien Tournoud commentedActually, on the 7.x version, Views uses a straight DISTINCT instead of the GROUP BY it should be using (as I recommended back in #863478-8: Distinct is not supported by dbtng. I guess something changed at one point, but I cannot figure out what.
Comment #11
vv75vv CreditAttribution: vv75vv commentedi would correct your words a bit ("Column-level distinct is a myth").
not a myth. a PROFANITY!
Comment #12
AlexisWilke CreditAttribution: AlexisWilke commentedThat would explain all the problems with the missing GROUP BY necessary with PostgreSQL when a DISTINCT is used. The proposed solution sounds like a way to fix the long standing psql problem.
Thank you.
Alexis Wilke
Comment #13
vv75vv CreditAttribution: vv75vv commentedAlexis, GROUP BY is when you use functions (count() is the simplest example). DISTINCT is for screening repetitive rows in the final result set. Putting parenthesis after the DISTINCT keyword does not magically transform it ino a column function.
Comment #14
AlexisWilke CreditAttribution: AlexisWilke commentedqwe123qwe,
#1173960: PostgreSQL incompatibility (DISTINCT missing)
If you tell me that PostgreSQL is not following the wrongness of some other databases, then I will agree. 8-)
Anyway, from what I see in the faulty statements, you have both: DISTINCT and FIRST or COUNT. I guess the DISTINCT keyword is not itself the cause, but I know that when I turn off the Distinct row flag in the Views interface, the problem goes away... which is why I blame the DISTINCT keyword. But...
Reading the documentation of psql 9.x, it looks like a valid ORDER BY could resolve the problem with the DISTINCT:
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
However, if you read the documentation of the GROUP BY instruction:
http://www.postgresql.org/docs/current/static/sql-select.html#SQL-GROUPBY
then you see that all the resulting fields appearing in the SELECT must be included in the GROUP BY list. That makes sense because otherwise the group may be broken.
All of this to say that your fix is about a field being magically added to the list of fields in the SELECT list, but not in the GROUP BY list. If it really isn't necessary, then that would solve the problem in the referenced bug which no one has addressed yet. If it is necessary, then that the field which causes problem in #1173960: PostgreSQL incompatibility (DISTINCT missing) and it is the solution to that other problem.
Thank you.
Alexis
Comment #15
vv75vv CreditAttribution: vv75vv commentedAlexis, the bug was revealed while trying to build a very simple view with distinct IP addresses from the accesslog, which must be achievable based on the current feature set of Views. Try to build such a view and you'll see what's the problem. It's 2 minutes, c'mon. :)
For some reason, Views is treating DISTINCT as a function and not as a keyword, also it compulsorily adds the PK even if it's not selected among Fields and would fuck things up.
Comment #16
asb CreditAttribution: asb commentedMaybe related: #1234048: Distinct requires workaround with bad usability to make 6.x-3.x-dev work like Views 2
Comment #17
cheshire137 CreditAttribution: cheshire137 commentedFor me, it was the following code, starting on line 1142:
Comment out the first two lines in the if block, like so:
The file was at modules/views/plugins/views_plugin_query_default.inc.
Comment #18
chrisgross CreditAttribution: chrisgross commentedCould we please ignore the disrespectful people and get a legitimate update out for this? I'm on enterprise installs with strict rules about patching and I desperately need this to work.
Comment #19
sorensong CreditAttribution: sorensong commentedShould I not count on being able to use DISTINCT? Just to be clear?
Comment #20
asb CreditAttribution: asb commentedAs far as I understand it, there are workarounds to mimic the "DISTINCT" behaviour from Views 2, but the "DISTICNCT" checkbox in Views 3 works differently (or does not work at all, depending on your version of Drupal core, and your definition what "DISTINCT" is).
For me, "DISTINCT" mosty means to filter out "duplicates" from result sets, with limited control over what records are being filtered out. In most of my Views I could get the Views 2 behaviour back under Views 3. However, be prepared having to edit every View that relies on "DISTINCT", and also be prepared that Views 3 won't work on every Drupal installation (there are good chances of getting a global WSOD, so make proper backups before updating).
Comment #21
sorensong CreditAttribution: sorensong commentedThanks asb.
Thanks for clarifying. I do indeed mean 'reducing' duplicates. Not having any luck finding any modules that give me this behavior.
Comment #22
asb CreditAttribution: asb commentedNo additional modules needed; a workaround for Views 3 on D6 is here: http://drupal.org/node/1234048#comment-4873846; I'm using this on several sites, and it works finde.
This might workaround might work as well for Views 3 on D7, or it might collide with #863478: Distinct is not supported by dbtng; I don't know since I'm not running any D7 sites in production.
Comment #23
thekevinday CreditAttribution: thekevinday commented#17 solved the issue for me.
What seems to be happening in my case is that when the distinct property is set, the said code manually inserts the base_field into the table and group by.
In the case of loading and presenting a node, this would cause there two be two 'nid' references.
This will then trigger a database ambiguity SQL error.
Applying #17's changes solves my problem by not arbitrarily adding fields to the query.
For the postgresql notes in #14, see: #1241280: Group columns (additional) = Entity ID results in sql error + aggregation type MIN, MAX, AVG and SUM don't work and #1331056: Regression: Improper use of GROUP BY statement produces ambiguous column error.
Keep in mind this issue and those two are 3 separate issues that all happen about the same time, so try not to confuse them and go crazy as I did...
Attached is the patch form of #17.
What I am mostly wondering is why does views arbitrarily add select fields when DISTINCT is set?
Comment #24
thekevinday CreditAttribution: thekevinday commentedComment #25
danielb CreditAttribution: danielb commented+ 1, we were promised the ability to do distinct queries in Views 3. The whole automatically adding the base_field in, with no programmatic way of removing it, totally kills it. If you want to do a query to find out the range of used values in field_something, and you have a bajillion nodes, and the field has only two unique values set on those bajillion nodes, the query should return two records.... not a bajillion records.
I haven't tested the patch, but because I am very familiar with the code in question as it is a frequent issue that comes up, I know it is spot on. However the real question is what effect it would have on configurations/modules that rely on the current behaviour.
Personally I would add another flag like
$this->force_distinct
or$this->no_auto_base_field
- I dunno use your imagination. If this flag is set on the view, then those two lines are not run.Obviously because the Views module is primarily designed for listing records from the base table, i.e. nodes, users, etc... So an assumption has been made that it will always be used for this purpose.
Comment #26
AlexisWilke CreditAttribution: AlexisWilke commenteddanielb,
There is no need to add something like that in the code. If you're worried that $node->nid not be part of the list of fields, go back to your view and add it there! This solves the problem we have by adding the field where it needs to be (in the SELECT, GROUP BY, and ORDER BY,) and the problem you could potentially have (since I'm sure there is a reason for it.)
Now, if I were to use my imagination following in your steps, then I would implement adding a field for the DISTINCT keyword to work "as expected" using the normal way of adding a field as a field and not as a hack like it is now.
However, I do not think it is necessary because the user can decide whether what should be unique is what he has, or whether it is should also include nid, uid, etc. Actually, adding those fields generates a result that's less unique (two nodes with the exact same titles will both appear which would not be what I'm trying to achieve if I only wanted DISTINCT titles.)
Thank you.
Alexis
Comment #27
danielb CreditAttribution: danielb commentedThere are almost half a million existing installations of Views, and several times more actual Views displays configured. Many of which may rely on this functionality. Not all users of views know how to use the user interface, many views are provided by modules, and some views queries are manipulated by modules and modelled on the current behaviour of views.
Are you going to go and tell all these people the same thing you just told me? You have to consider the big surprise people will get when the module updater in Drupal breaks one or more of their views.
Comment #28
asb CreditAttribution: asb commented@danielb: I also hate it when essential modules like Views cause damage "just" because their development progresses. When updating from Views 1 to Views 2, I had to manually edit every single View, and at that time my sites did not even remotely relied on Views as they do nowadays. Every major update causes weeks or even months of tedious maintenance, for every Drupal site, and every single View. It costs a lot of work, and when you are working commercially, a lot of money also. Core and module updates can cause a lot of frustration and desperation. If you don't know the experience yet, try to migrate a five years old CCK-based site to D7, and you'll learn the meaning of data loss and pain.
However, the crucial part here is that acceptable workarounds do exist, they are properly documented, and they are findable on d.o. Alterations like this are necessary to allow development to progress, and Views 3 is definitely worth it. And always keep in mind that Views are just ways to display content; you won't lose real data, you just have to learn how to fix some displays. The "DISTINCT" issue can be handled, as far as DBTNG allows on D7, and on D6 it's simply a lot of work. Not less, but also not more than that. If this "DISTINCT" issue is your biggest problem with Views, system updates, or Drupal in general, you are a very lucky guy.
Comment #29
danielb CreditAttribution: danielb commentedAdding a new feature that breaks views to Views 3 is not the same as adding a feature between Views 1 and Views 2, the major version change suggests such a breakage could occur. I should have said it wouldn't be something that would concern me if it was for Views 4 (not that I want to wait for Views 4 for this change!)
It is an assumption on your part that Views are just ways to display content. Many modules integrate with Views and make decisions, such as creating/updating/deleting data, or providing access controls based on the data returned by the view.
Just to put it into perspective; the Views "DISTINCT" issue was most recently brought to my attention again by a user whose whole site was inaccessible (WSOD) because of this limitation (hence my post #25 about 'a bajillion' nodes). It might be a little problem for you, but a big problem for someone else.
I'm not going to push for it, it isn't my problem, just putting it out there for consideration.
Comment #30
merlinofchaos CreditAttribution: merlinofchaos commentedDISTINCT has always been on the base field and base field only. So when DISTINCT is added, the base field is assumed.
Views 2 always added the base field.
Views has never NEVER NEVER allowed DISTINCT on anything except for the base field.
Anyone trying to say differently is lying. Anyone trying to say that by "no longer allowing this" we have taken a feature away is lying, presumably to pressure us into adding a feature that we never ever promised.
Perhaps I am in a sour mood, but give me one reason why I should interact with you? You were never promised any such thing. If you believe you were, here is my offer: You may return the product you purchased from us and you may have the money you gave me for the product back. I am not, however, particularly enthused to see users trying to railroad stuff in by lying about prior capabilities and demanding capabilities and being told promises were made.
If you want promises about future features, I highly recommend you check out a commercial CMS where you actually are a paying customer and have the ability to tell your vendor that they did not deliver on their promises.
There is a legitimate issue here: Views' use of DISTINCT is crappy. The patch may work, but somehow in all the bitching, moaning, whining and complaining, absolutely no one attempted to address point #2 in comment #6. Which is hilarious, since one of those same people is trying to say that the functionality has changed.
Comment #31
merlinofchaos CreditAttribution: merlinofchaos commentedI'm also curious how this DISTINCT issue can cause whitescreens. Was that just rhetoric or are you discussing a bug that hasn't been actually discussed here?
Comment #32
asb CreditAttribution: asb commentedThis issue is getting a bit chatty, and it probably doesn't make much sense to discuss a WSOD mentioned in #29, as danielb referred to a report from a 3rd party, not from his own experience. Probably someone updated to Views 3 and got a WSOD; we all know that there is a good chance that this happens, but most probably this WSOD has nothing to do with the "DISTINCT" issue. If Daniel speaks in behalf of a client and wants to follow up on this, I'd suggest to open a separate issue with proper description of the problem (e.g. output of Apache's and PHP's error logs), and maybe a link to this issue (if there are really indications that the WSOD has something to do with "DISTINCT").
Regarding this issue (#1234592), we do have two relevant statements about the technical problems of the implementation of "DISTINCT" in Views (#6 and #14), and this issue should probably stick to addressing these problems. #17 attempts this, and #23 provides a proper patch file that needs technical review.
Users that are updating from Views 2 to Views 3 might encounter a different behaviour of their existing Views; this problem is pragmatically "solved" this workaround in #1234048: Distinct requires workaround with bad usability to make 6.x-3.x-dev work like Views 2. It is fully functional for the views-6.x-3.x branch.
Thirdly there is the DBTNG issue (e.g. #863478: Distinct is not supported by dbtng) which is only relevant for views-7.x-3.x on D7; a patch for this exists, and has been committed to the views-7.x-3.x branch more than one year ago. If the DBTNG architecture still imposes restrictions related to the DISTINCT handling in Views, either #863478 needs to be re-opened, or a separate issue should be filed.
Last but no least there is an amourphous meta issue from the user's point of view, which is allowed to ignore the technical ramifications of Drupal's DBTNG, the specific implementations of a rDBMS, base fields, or stuff like row/column-level distinct in SQL statements. This issue is about sane ways to handle stuff like duplicates, and it needs to abstract as well from "what does the DISTINCT checkbox in Views 3 do" as from "how does the PostgreSQL documentation explains DISTINCT", and it probably should start rather with something like usability mock-ups than with patches to the code. For example, duplicate items in result sets can be a) discarded (that's what most users are used to from Views 2), or b) grouped into sections ("Grouping" in Views 2/3, which puts certain items from a result set under a unique heading), or grouped into sets (within one item from the result set, like it is done with multi-value CCK fields, "start with item no...", "limit to..."). My point here is, that we maybe should consider less how to adapt a SQL operator into Views, but more think about smarter ways to present the data the user expects to see. Discarding items from the result set, based on whatever base field, is neither the only, nor necessarily the best approach to do this.
Comment #33
danielb CreditAttribution: danielb commentedhaha, classic earl
See #25. I'm using views as a query builder by providing a custom display handler. Patch in #23 does solve it, I am merely concerned about side effects (in fact post #6 2nd point is the same concern, except I doubled it to half a million installations). asb said views are only used for display, so it shouldn't matter, but actually views queries can be used for many things.
PS my gf always gets up me for claiming she promised stuff...
Comment #34
merlinofchaos CreditAttribution: merlinofchaos commentedNote that a lot of what people want can, in Views 3, be done using aggregation, rather than distinct.
If you have 2 unique values out of a bajillion records, and you just want those values, that's what aggregation does, right? Potentially better than distinct.
Comment #35
merlinofchaos CreditAttribution: merlinofchaos commentedPlease don't patronize me. It's things like this that have made me less and less interested in community work as time has gone on. I'm more or less burnt out on the entire project due to a never ending stream of issues like this one. If you have nothing nice to say, that's fine, but you know how that old saying goes.
Comment #36
thekevinday CreditAttribution: thekevinday commentedIn my opinion, the views module wraps SQL with an interface, so when using words like distinct that already have meaning, one should not redefine the term for convenience of end users. To make things "understandable" to end users who do not know SQL while not misdirecting those educated in SQL, you should come up with new words.
If the base field is a primary key, then adding the base field to the query will result in making each row 'unique'.
But what the distinct sql value does is it removes duplicate rows (as opposed to making them unique by leaving duplicate rows and adding some unique value to each row).
If a user wanted to add the base field to the view, then they already have the ability to do so.
Which means that, unless the base field is not a primary key, distinct never functioned so users who have it enabled are getting the same thing as to if they have it disabled. (and the base field gets added, which is a problem in itself).
From what I think I am understanding is that 2 years ago, dbtng did not support SQL distinct.
As far as I can tell, SQL distinct is supported now.
However, seeing as you want to support this behavior that I, and possibly others, believe is broken behavior, here is a shiny new patch that adds a new term called "Pure Distinct".
The default for "Pure Distinct" is off, so all existing views out there that depend on the broken behavior will be unaffected by this patch.
For those who want to use the SQL distinct, as defined by the SQL standards, then they must select the "Pure Distinct" after they select the "Distinct" checkbox.
Please review the patch.
Comment #37
tim.plunkettI think this patch is reasonable; it provides a workaround for this problem and it doesn't modify the behavior of existing Views.
Here is an exported view and the query with just the regular setting:
http://paste.pocoo.org/show/559780
And one with the new option:
http://paste.pocoo.org/show/559779
The goal of the view was to print out each content type's name, but without this new option, it would print it out once per node.
Comment #38
dawehnerThe patch doesn't chance existing behavior, that's very important and good to know.
@earl
Please tell whether you accept such a feature in general
Comment #39
monil-dupe CreditAttribution: monil-dupe commentedYou mean the path doesn't work well?
Comment #40
dawehnerEarl agreed so let's get this in.
Thanks thekevinday for the none-destructive approach. Committed to 7.x-3.x