when editing the "distinct" query attribute, the module comments the option with the phrase "You can use this to try and remove duplicates from a view, though it does not always work."

i will tell you why.
because the Views module always adds the table's primary key to the queried fields.
the primary key is always distinct by definition, thus it always generates distinct rows!

i would say it NEVER works, not "not always".

please fix this, don't compulsorily add the primary key to the query if the user didn't select it among Fields.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dawehner’s picture

Status: Active » Postponed (maintainer needs more info)

Please reconsider

vv75vv’s picture

what to reconsider? Views always adds the table's PK to the selected fields even if it would fuck things up completely.

vv75vv’s picture

Version: 6.x-3.0-alpha3 » 6.x-3.x-dev
Status: Postponed (maintainer needs more info) » Active

bug still present in -dev.

vv75vv’s picture

in views_plugin_query_default.inc there is such a snippet:

/**
 * -- we no longer want the base field to appear automatigically.
    if ($base_field) {
      $this->fields[$base_field] = array(
        'table' => $base_table,
        'field' => $base_field,
        'alias' => $base_field,
      );
    }
 */

basically meaning that there WERE attempts to fix this priorly!
for some reason, commenting just this part doesn't do the trick. needs deeper digging.

vv75vv’s picture

i fixed this issue on my installation of drupal.

edit views_plugin_query_default.inc:

1. comment out this:

/*vv if (!empty($field['distinct'])) {
        $string = "DISTINCT($string)";
      }
*/

and this:

    // Check query distinct value.
    if (empty($this->no_distinct) && $this->distinct && !empty($this->fields)) {
//vv  $base_field_alias = $this->add_field($this->base_table, $this->base_field, NULL, array('distinct' => TRUE));
      $this->add_groupby($base_field_alias);
    }

2. find the context of "$query = "SELECT" (very end of function query()) and change it as follows:

    $where = $this->condition_sql();

    if ($this->distinct) {
      $diststr = 'DISTINCT ';
    }
    else {
      $diststr = '';
    }
    $query = "SELECT $diststr\n" . implode(",\n", array_merge($distinct, $fields)) . "\n FROM {" . $this->base_table . "} $this->base_table \n$joins $where $groupby $having $orderby";

    $replace = array('&gt;' => '>', '&lt;' => '<');
    $query = strtr($query, $replace);

    return $query;

probably i miss something and this is not all what should be changed, but it works in my case.

merlinofchaos’s picture

Priority: Critical » Normal
Status: Active » Closed (won't fix)

1) This is not critical by any definition of critical.
2) This change would change the behavior of all queries currently using the DISTINCT key in a way that would not be beneficial. We can't change the existing behavior without risking breaking behavior on a quarter million installations. I'm sure you are interested only in how it behaves on your system, but I have to be concerned with how it works on all systems. Trust me when I say that row level DISTINCT is not the intent of the checkbox.
3) There isn't a patch file attached. See drupal.org documentation for how to create patches.
4) Belligerent behavior is not an acceptable way of interacting in the issue queues. We're human beings and this is open source volunteer work. Things change through contribution and cooperation, not attack.
5) At best, at *best* we could change it to a radio and allow the user to select row/field level distinct and behave appropriately. I'll consider this as an option, but only if you can treat the issue and the community volunteers with respect.

vv75vv’s picture

guys, calm down, there's no "field-level" for DISTINCT. your parenthesis after DISTINCT are treated just as expression delimiters and can be safely omitted.

the only drawback could be if other modules assume the PK is always present in the result set, without them specifying it explicitly.

as of patches... i dont have the toolchain as i have no plans to involve much with drupal. and even if i'd have, don't assume my changes correct without analyzing throughly.

also, there's a $this->add_groupby($base_field_alias) in the place where i commented just one line (very beginning of function query()). i think it should be omitted too, thus elimination out the entire if(...){...} near the comment "Check query distinct value". GROUP BY is for cases when functions are applied, like COUNT(*), and is orthogonal to DISTINCT.

vv75vv’s picture

Status: Closed (won't fix) » Active
merlinofchaos’s picture

Status: Active » Closed (won't fix)
Damien Tournoud’s picture

Version: 6.x-3.x-dev » 7.x-3.x-dev
Status: Closed (won't fix) » Active

Actually, on the 7.x version, Views uses a straight DISTINCT instead of the GROUP BY it should be using (as I recommended back in #863478-8: Distinct is not supported by dbtng. I guess something changed at one point, but I cannot figure out what.

vv75vv’s picture

i would correct your words a bit ("Column-level distinct is a myth").
not a myth. a PROFANITY!

AlexisWilke’s picture

That would explain all the problems with the missing GROUP BY necessary with PostgreSQL when a DISTINCT is used. The proposed solution sounds like a way to fix the long standing psql problem.

Thank you.
Alexis Wilke

vv75vv’s picture

Alexis, GROUP BY is when you use functions (count() is the simplest example). DISTINCT is for screening repetitive rows in the final result set. Putting parenthesis after the DISTINCT keyword does not magically transform it ino a column function.

AlexisWilke’s picture

qwe123qwe,

#1173960: PostgreSQL incompatibility (DISTINCT missing)

If you tell me that PostgreSQL is not following the wrongness of some other databases, then I will agree. 8-)

Anyway, from what I see in the faulty statements, you have both: DISTINCT and FIRST or COUNT. I guess the DISTINCT keyword is not itself the cause, but I know that when I turn off the Distinct row flag in the Views interface, the problem goes away... which is why I blame the DISTINCT keyword. But...

Reading the documentation of psql 9.x, it looks like a valid ORDER BY could resolve the problem with the DISTINCT:

  http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

However, if you read the documentation of the GROUP BY instruction:

  http://www.postgresql.org/docs/current/static/sql-select.html#SQL-GROUPBY

then you see that all the resulting fields appearing in the SELECT must be included in the GROUP BY list. That makes sense because otherwise the group may be broken.

All of this to say that your fix is about a field being magically added to the list of fields in the SELECT list, but not in the GROUP BY list. If it really isn't necessary, then that would solve the problem in the referenced bug which no one has addressed yet. If it is necessary, then that the field which causes problem in #1173960: PostgreSQL incompatibility (DISTINCT missing) and it is the solution to that other problem.

Thank you.
Alexis

vv75vv’s picture

Alexis, the bug was revealed while trying to build a very simple view with distinct IP addresses from the accesslog, which must be achievable based on the current feature set of Views. Try to build such a view and you'll see what's the problem. It's 2 minutes, c'mon. :)

For some reason, Views is treating DISTINCT as a function and not as a keyword, also it compulsorily adds the PK even if it's not selected among Fields and would fuck things up.

asb’s picture

cheshire137’s picture

For me, it was the following code, starting on line 1142:

if (empty($this->no_distinct) && $this->distinct && !empty($this->fields)) {
      $base_field_alias = $this->add_field($this->base_table, $this->base_field);
      $this->add_groupby($base_field_alias);
      $distinct = TRUE;
    }

Comment out the first two lines in the if block, like so:

if (empty($this->no_distinct) && $this->distinct && !empty($this->fields)) {
      /*$base_field_alias = $this->add_field($this->base_table, $this->base_field);
      $this->add_groupby($base_field_alias);*/
      $distinct = TRUE;
    }

The file was at modules/views/plugins/views_plugin_query_default.inc.

chrisgross’s picture

Could we please ignore the disrespectful people and get a legitimate update out for this? I'm on enterprise installs with strict rules about patching and I desperately need this to work.

sorensong’s picture

Should I not count on being able to use DISTINCT? Just to be clear?

asb’s picture

As far as I understand it, there are workarounds to mimic the "DISTINCT" behaviour from Views 2, but the "DISTICNCT" checkbox in Views 3 works differently (or does not work at all, depending on your version of Drupal core, and your definition what "DISTINCT" is).

For me, "DISTINCT" mosty means to filter out "duplicates" from result sets, with limited control over what records are being filtered out. In most of my Views I could get the Views 2 behaviour back under Views 3. However, be prepared having to edit every View that relies on "DISTINCT", and also be prepared that Views 3 won't work on every Drupal installation (there are good chances of getting a global WSOD, so make proper backups before updating).

sorensong’s picture

Thanks asb.

Thanks for clarifying. I do indeed mean 'reducing' duplicates. Not having any luck finding any modules that give me this behavior.

asb’s picture

No additional modules needed; a workaround for Views 3 on D6 is here: http://drupal.org/node/1234048#comment-4873846; I'm using this on several sites, and it works finde.

This might workaround might work as well for Views 3 on D7, or it might collide with #863478: Distinct is not supported by dbtng; I don't know since I'm not running any D7 sites in production.

thekevinday’s picture

FileSize
618 bytes

#17 solved the issue for me.
What seems to be happening in my case is that when the distinct property is set, the said code manually inserts the base_field into the table and group by.
In the case of loading and presenting a node, this would cause there two be two 'nid' references.
This will then trigger a database ambiguity SQL error.

Applying #17's changes solves my problem by not arbitrarily adding fields to the query.

For the postgresql notes in #14, see: #1241280: Group columns (additional) = Entity ID results in sql error + aggregation type MIN, MAX, AVG and SUM don't work and #1331056: Regression: Improper use of GROUP BY statement produces ambiguous column error.
Keep in mind this issue and those two are 3 separate issues that all happen about the same time, so try not to confuse them and go crazy as I did...

Attached is the patch form of #17.

What I am mostly wondering is why does views arbitrarily add select fields when DISTINCT is set?

thekevinday’s picture

Status: Active » Needs review
danielb’s picture

+ 1, we were promised the ability to do distinct queries in Views 3. The whole automatically adding the base_field in, with no programmatic way of removing it, totally kills it. If you want to do a query to find out the range of used values in field_something, and you have a bajillion nodes, and the field has only two unique values set on those bajillion nodes, the query should return two records.... not a bajillion records.
I haven't tested the patch, but because I am very familiar with the code in question as it is a frequent issue that comes up, I know it is spot on. However the real question is what effect it would have on configurations/modules that rely on the current behaviour.

Personally I would add another flag like $this->force_distinct or $this->no_auto_base_field - I dunno use your imagination. If this flag is set on the view, then those two lines are not run.

What I am mostly wondering is why does views arbitrarily add select fields when DISTINCT is set?

Obviously because the Views module is primarily designed for listing records from the base table, i.e. nodes, users, etc... So an assumption has been made that it will always be used for this purpose.

AlexisWilke’s picture

danielb,

There is no need to add something like that in the code. If you're worried that $node->nid not be part of the list of fields, go back to your view and add it there! This solves the problem we have by adding the field where it needs to be (in the SELECT, GROUP BY, and ORDER BY,) and the problem you could potentially have (since I'm sure there is a reason for it.)

Now, if I were to use my imagination following in your steps, then I would implement adding a field for the DISTINCT keyword to work "as expected" using the normal way of adding a field as a field and not as a hack like it is now.

However, I do not think it is necessary because the user can decide whether what should be unique is what he has, or whether it is should also include nid, uid, etc. Actually, adding those fields generates a result that's less unique (two nodes with the exact same titles will both appear which would not be what I'm trying to achieve if I only wanted DISTINCT titles.)

Thank you.
Alexis

danielb’s picture

f you're worried that $node->nid not be part of the list of fields, go back to your view and add it there!

There are almost half a million existing installations of Views, and several times more actual Views displays configured. Many of which may rely on this functionality. Not all users of views know how to use the user interface, many views are provided by modules, and some views queries are manipulated by modules and modelled on the current behaviour of views.

Are you going to go and tell all these people the same thing you just told me? You have to consider the big surprise people will get when the module updater in Drupal breaks one or more of their views.

asb’s picture

@danielb: I also hate it when essential modules like Views cause damage "just" because their development progresses. When updating from Views 1 to Views 2, I had to manually edit every single View, and at that time my sites did not even remotely relied on Views as they do nowadays. Every major update causes weeks or even months of tedious maintenance, for every Drupal site, and every single View. It costs a lot of work, and when you are working commercially, a lot of money also. Core and module updates can cause a lot of frustration and desperation. If you don't know the experience yet, try to migrate a five years old CCK-based site to D7, and you'll learn the meaning of data loss and pain.

However, the crucial part here is that acceptable workarounds do exist, they are properly documented, and they are findable on d.o. Alterations like this are necessary to allow development to progress, and Views 3 is definitely worth it. And always keep in mind that Views are just ways to display content; you won't lose real data, you just have to learn how to fix some displays. The "DISTINCT" issue can be handled, as far as DBTNG allows on D7, and on D6 it's simply a lot of work. Not less, but also not more than that. If this "DISTINCT" issue is your biggest problem with Views, system updates, or Drupal in general, you are a very lucky guy.

danielb’s picture

Adding a new feature that breaks views to Views 3 is not the same as adding a feature between Views 1 and Views 2, the major version change suggests such a breakage could occur. I should have said it wouldn't be something that would concern me if it was for Views 4 (not that I want to wait for Views 4 for this change!)

always keep in mind that Views are just ways to display content; you won't lose real data, you just have to learn how to fix some displays.

It is an assumption on your part that Views are just ways to display content. Many modules integrate with Views and make decisions, such as creating/updating/deleting data, or providing access controls based on the data returned by the view.

If this "DISTINCT" issue is your biggest problem with Views, system updates, or Drupal in general, you are a very lucky guy.

Just to put it into perspective; the Views "DISTINCT" issue was most recently brought to my attention again by a user whose whole site was inaccessible (WSOD) because of this limitation (hence my post #25 about 'a bajillion' nodes). It might be a little problem for you, but a big problem for someone else.

I'm not going to push for it, it isn't my problem, just putting it out there for consideration.

merlinofchaos’s picture

DISTINCT has always been on the base field and base field only. So when DISTINCT is added, the base field is assumed.

Views 2 always added the base field.

Views has never NEVER NEVER allowed DISTINCT on anything except for the base field.

Anyone trying to say differently is lying. Anyone trying to say that by "no longer allowing this" we have taken a feature away is lying, presumably to pressure us into adding a feature that we never ever promised.

we were promised the ability to do distinct queries in Views 3

Perhaps I am in a sour mood, but give me one reason why I should interact with you? You were never promised any such thing. If you believe you were, here is my offer: You may return the product you purchased from us and you may have the money you gave me for the product back. I am not, however, particularly enthused to see users trying to railroad stuff in by lying about prior capabilities and demanding capabilities and being told promises were made.

If you want promises about future features, I highly recommend you check out a commercial CMS where you actually are a paying customer and have the ability to tell your vendor that they did not deliver on their promises.

There is a legitimate issue here: Views' use of DISTINCT is crappy. The patch may work, but somehow in all the bitching, moaning, whining and complaining, absolutely no one attempted to address point #2 in comment #6. Which is hilarious, since one of those same people is trying to say that the functionality has changed.

merlinofchaos’s picture

I'm also curious how this DISTINCT issue can cause whitescreens. Was that just rhetoric or are you discussing a bug that hasn't been actually discussed here?

asb’s picture

This issue is getting a bit chatty, and it probably doesn't make much sense to discuss a WSOD mentioned in #29, as danielb referred to a report from a 3rd party, not from his own experience. Probably someone updated to Views 3 and got a WSOD; we all know that there is a good chance that this happens, but most probably this WSOD has nothing to do with the "DISTINCT" issue. If Daniel speaks in behalf of a client and wants to follow up on this, I'd suggest to open a separate issue with proper description of the problem (e.g. output of Apache's and PHP's error logs), and maybe a link to this issue (if there are really indications that the WSOD has something to do with "DISTINCT").

Regarding this issue (#1234592), we do have two relevant statements about the technical problems of the implementation of "DISTINCT" in Views (#6 and #14), and this issue should probably stick to addressing these problems. #17 attempts this, and #23 provides a proper patch file that needs technical review.

Users that are updating from Views 2 to Views 3 might encounter a different behaviour of their existing Views; this problem is pragmatically "solved" this workaround in #1234048: Distinct requires workaround with bad usability to make 6.x-3.x-dev work like Views 2. It is fully functional for the views-6.x-3.x branch.

Thirdly there is the DBTNG issue (e.g. #863478: Distinct is not supported by dbtng) which is only relevant for views-7.x-3.x on D7; a patch for this exists, and has been committed to the views-7.x-3.x branch more than one year ago. If the DBTNG architecture still imposes restrictions related to the DISTINCT handling in Views, either #863478 needs to be re-opened, or a separate issue should be filed.

Last but no least there is an amourphous meta issue from the user's point of view, which is allowed to ignore the technical ramifications of Drupal's DBTNG, the specific implementations of a rDBMS, base fields, or stuff like row/column-level distinct in SQL statements. This issue is about sane ways to handle stuff like duplicates, and it needs to abstract as well from "what does the DISTINCT checkbox in Views 3 do" as from "how does the PostgreSQL documentation explains DISTINCT", and it probably should start rather with something like usability mock-ups than with patches to the code. For example, duplicate items in result sets can be a) discarded (that's what most users are used to from Views 2), or b) grouped into sections ("Grouping" in Views 2/3, which puts certain items from a result set under a unique heading), or grouped into sets (within one item from the result set, like it is done with multi-value CCK fields, "start with item no...", "limit to..."). My point here is, that we maybe should consider less how to adapt a SQL operator into Views, but more think about smarter ways to present the data the user expects to see. Discarding items from the result set, based on whatever base field, is neither the only, nor necessarily the best approach to do this.

danielb’s picture

haha, classic earl

I'm also curious how this DISTINCT issue can cause whitescreens. Was that just rhetoric or are you discussing a bug that hasn't been actually discussed here?

See #25. I'm using views as a query builder by providing a custom display handler. Patch in #23 does solve it, I am merely concerned about side effects (in fact post #6 2nd point is the same concern, except I doubled it to half a million installations). asb said views are only used for display, so it shouldn't matter, but actually views queries can be used for many things.

PS my gf always gets up me for claiming she promised stuff...

merlinofchaos’s picture

Note that a lot of what people want can, in Views 3, be done using aggregation, rather than distinct.

If you have 2 unique values out of a bajillion records, and you just want those values, that's what aggregation does, right? Potentially better than distinct.

merlinofchaos’s picture

haha, classic earl

Please don't patronize me. It's things like this that have made me less and less interested in community work as time has gone on. I'm more or less burnt out on the entire project due to a never ending stream of issues like this one. If you have nothing nice to say, that's fine, but you know how that old saying goes.

thekevinday’s picture

In my opinion, the views module wraps SQL with an interface, so when using words like distinct that already have meaning, one should not redefine the term for convenience of end users. To make things "understandable" to end users who do not know SQL while not misdirecting those educated in SQL, you should come up with new words.

If the base field is a primary key, then adding the base field to the query will result in making each row 'unique'.
But what the distinct sql value does is it removes duplicate rows (as opposed to making them unique by leaving duplicate rows and adding some unique value to each row).
If a user wanted to add the base field to the view, then they already have the ability to do so.
Which means that, unless the base field is not a primary key, distinct never functioned so users who have it enabled are getting the same thing as to if they have it disabled. (and the base field gets added, which is a problem in itself).

From what I think I am understanding is that 2 years ago, dbtng did not support SQL distinct.
As far as I can tell, SQL distinct is supported now.

However, seeing as you want to support this behavior that I, and possibly others, believe is broken behavior, here is a shiny new patch that adds a new term called "Pure Distinct".
The default for "Pure Distinct" is off, so all existing views out there that depend on the broken behavior will be unaffected by this patch.
For those who want to use the SQL distinct, as defined by the SQL standards, then they must select the "Pure Distinct" after they select the "Distinct" checkbox.

Please review the patch.

tim.plunkett’s picture

Title: DISTINCT never works » Provide a secondary DISTINCT option that doesn't add the base field
Category: bug » feature
Status: Needs review » Reviewed & tested by the community

I think this patch is reasonable; it provides a workaround for this problem and it doesn't modify the behavior of existing Views.

Here is an exported view and the query with just the regular setting:
http://paste.pocoo.org/show/559780

SELECT DISTINCT node.type AS node_type, node.nid AS nid
FROM 
{node} node

And one with the new option:
http://paste.pocoo.org/show/559779

SELECT DISTINCT node.type AS node_type
FROM 
{node} node

The goal of the view was to print out each content type's name, but without this new option, it would print it out once per node.

dawehner’s picture

The patch doesn't chance existing behavior, that's very important and good to know.

@earl
Please tell whether you accept such a feature in general

monil-dupe’s picture

You mean the path doesn't work well?

dawehner’s picture

Status: Reviewed & tested by the community » Fixed

Earl agreed so let's get this in.

Thanks thekevinday for the none-destructive approach. Committed to 7.x-3.x

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.