I have a date field in a custom 'events' node type. It is a multiple required field, with the 'site' timezone setting and the selector widget.

I have a view that displays a list view of events, and even with node:distinct set in filters I get duplicate listings of nodes that have multiple dates set. Any ideas?

CommentFileSizeAuthor
#6 date_view_0.txt1.63 KBDarren Oh
#3 date_view.txt2.49 KBDarren Oh
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Darren Oh’s picture

Title: required multiple date shows duplicate in views » Nodes with multiple dates duplicated in views
Component: Miscellaneous » Code
Assigned: Unassigned » Darren Oh

I found that removing Date from the sort criteria eliminates duplicates. Investigating.

KarenS’s picture

I've never run into this. Did you have it set to 'group multiple values' and it still did that? I can't see any reason why a sort setting would have any effect either. See if you can create a very simple, reproducible, illustration where you see this behavior when 'group multiple values' is set and post an export of the view.

Darren Oh’s picture

FileSize
2.49 KB

Here's an export of my view. Reverting CVS commit 63756 seems to fix the problem on MySQL sites, but I'm told it makes the query invalid on PostgreSQL sites.

KarenS’s picture

I know that commit well, it broke a number of things for me. The commit won't be reversed since it's needed to fix a postgres problem, we have to find a work-around. The problem, generally, is that Views now adds anything in the sort to the group-by and vice versa which sometimes give you results you don't intend. So far I've only seen problems on argument summary views, not on full views,. The fix for summary views was to not use the option 'summary-sort by view', which now breaks badly. This is going to take something else. I'll have to first see if I can use this export to create a view that replicates the problem, then see what I can come up with.

KarenS’s picture

Does it still break if you take the 'location' filter out? It is easiest to work with if I can get this down to a view that only has Date elements.

Darren Oh’s picture

FileSize
1.63 KB

OK, took out all filters. Still broken. I did notice that there is a sort option on the field setting. Using that instead of the datestamp sort gets around the problem.

Darren Oh’s picture

Please ignore my workaround. It wasn't sorting, and when I chose the sortable option I got an SQL error.

KarenS’s picture

Project: Date » Content Construction Kit (CCK)
Component: Code » Views Integration

I got a bit further on figuring this out and I see where the problem is coming from but am not sure what to do about it. It turns out it is a problem in the CCK Views integration, or in Views itself, depending on your point of view. Here's the summary:

1) If you want multiple value CCK fields to group together, you use the content_views_field_handler_group. That handler ignores the usual way of adding fields to the view, which would have created a new table row for each multiple value and instead skips past that step and intervenes just once to do its own query and concatenate the results together.

2) All that works just fine unless you add a sort to the view. In that case, Views adds the sort field to the view again, even though it was already in the fields list. That causes another instance of the content_views_field_handler_group() to execute, which creates a second table row.

Not sure if the basic problem is that Views should check if the field is already there and not add it if it is, or if CCK needs to do some more logic to keep from executing its code more than once in a view.

OpenChimp’s picture

I would love to see a solution for this. Intuitively, I would think that it would be better to deal with this within Views rather than CCK. The problem seems to lie in the fact that once you try to sort by a field with multiple values it doesn't know which one to listen to. I don't just get a second record output, but a record for each value in the field I am sorting by.

I can certainly see why it would get confused. In many cases it doesn't make any sense to sort by a multi-value field (If it were a text field, how would I know which to listen to - "apples","bananas","oranges" - is this an A, B or O item?)

But there are times when it's very desirable to sort by multi-value fields. My particular case is setting up events which have multiple performance times. I can display those events on a calendar, but we also want to be able to provide a page that lists the upcoming performances in a teaser format, and it would be best if the order of those teasers went from the events that are coming up soonest to the ones further in the future (Ascending Sort Order). But right now, this is impossible because any sorting using the multi-value date field creates multiple teasers for a given event. The workaround is to sort by publication date, hoping that the most currently published items are the ones of most interest and using stickies for the featured items. Obviously, there are a lot of issues with this method.

A solution?
Since it seems impossible to reliably sort by a field that passes in multiple values, perhaps we need a way to specify which of the values the query should use to do the sort. Should it use the first value in the list, the last value, or what? In the case of dates, it would be nice to be able to tell it to use the date closest to the current time (Next, Previous). That way, if an item has multiple values, then the resulting list could feature the next upcoming node at the top of the list. In other words, if ProdA and ProdB have interlaced performance times, then the one that shows at the top of the Productions page would be determined by whichever one has the next performance coming up.

Maybe this could be handled with additional sort options for multi-value fields. I notice that date fields have the "Granularity" option for sorting. Could we also have a "Multi" option that gave users the ability to specify which of the multiple values to use use for sorting?

What do you think? Am I way off base or is this useful in a broader sense?

capellic’s picture

I am running into this issue as well. I submitted a bug (http://drupal.org/node/186417) and KarenS replied that it was a duplicate of this bug. I have a hack that seems to work - but it is a direct edit to the views.module file - but who knows what else it might be breaking. Follow the link above to see what I found/did.

Any idea of when this issue will be resolved?

capellic’s picture

Title: Nodes with multiple dates duplicated in views » Non-interference hack

I came up with a hack for nodes that overrides the duplicates and it's all done from the site's template.php file. This code keeps track of all the node IDs that have been added to the content. If it happens upon a duplicate, it throws skips it by returning nothing.

I still have my other hack in for the block view - I couldn't figure out where I could apply the same sort of filter I put below. Any tips would be greatly appreciated.

1. Add theme_node to your template.php file from the includes/theme.inc filder

2. Rename the function yourtheme_node

3. Just before the function starts, make this declaration

$node_nids = array();

4. Then, at the top of the yourtheme_node function, put the following. You will need to substitute the 'events' string to whatever it is you named your view.

// Keep track of nodes that have already been added to the array when 
// type == 'events'.  Throw throw it out if it has already been added.
global $node_nids;
if ($node->type == 'events')
{
    if ($node_nids)
    {
        if (in_array($node->nid, $node_nids))
        {
            return;
        }
    }
}
$node_nids[] = $node->nid;
Darren Oh’s picture

Title: Non-interference hack » Nodes with multiple dates duplicated in views
Assigned: Darren Oh » Unassigned
Rowanw’s picture

Capellic, your code removes the repeated nodes from the view, but it also ignores my node template (node-type.tpl.php). Is there anyway to make it use the template file?

FWIW: I'm sorting the view with the CCK Date field.

fworsley’s picture

I am experiencing the same problem, but I am not using CCK. If I remove any sort criteria from my view it displays fine. If I add Date Created as the sort criteria then I get duplicates. I'm not sure what I did to suddenly trigger this. I did not previously have this problem. I've been working on the site most of today and noticed this problem just now as I was about to go home.

Today I installed the votingapi and fivestar modules and configured them. I also uploaded some files to a node using the previously installed upload module. Any idea how using these modules could trigger this problem?

Removing the uploaded files doesn't make a difference. I will try disabling the voting/fivestar modules tomorrow to see if that gets rid of the problem.

fworsley’s picture

Sorry, it turns out I was sorting the view by "Comment: Created Date" and since I had added several comments today this now caused the problem. If I sort by "Node: Created Date" then it works fine.

I still don't think that sorting by comment created date should cause the nodes to show up more than once. Not sure if that is just unintuitive default behaviour or related to this bug.

Darren Oh’s picture

Now that I've done a bit of work to make GROUP BY queries compatible with PostgreSQL, the PostgreSQL way makes more sense. With MySQL, I always wondered which row's values would be used for the group. PostgreSQL simply refuses to accept such vague queries. You must add all the columns for which you request single values to the GROUP BY clause, and use aggregate functions to request values from the rest. So we can solve this problem by sorting on an aggregate.

Views supports arbitrary sort criteria with the handler setting. In other words, CCK needs a "group multiple values" option for sort criteria.

OpenChimp’s picture

It's great to see some solutions for this coming about. I'm still curious what the results of sorting on an aggregate will be. Which of the values will be used for the sorting?

fworsley brought up an issue above which I would certainly consider related to this. It would be great to be able to create a view that you could sort by the commenting date so that you could display all the nodes that have recently been commented on, but you'd have to look at the most recent date. Would the aggregate be able to do this?

Above I tried to mention this, but I'm not sure how clear I was in my thinking, but especially for events that have multiple times, it's important to be able to sort a view by the next upcoming date, not just the first or last performance time. Will the aggregate functions have this kind of flexibility?

Darren Oh’s picture

Title: Sorting duplicates nodes with multiple dates » Nodes with multiple dates duplicated in views

By adding an option setting to the sort, we can tell the sort handler to aggregate the field in any number of ways: highest value, lowest value, lowest value greater than a particular value. I must say, though, since none of the projects I'm working on at the moment require this, you shouldn't expect a patch from me any time soon.

dellis’s picture

I'm experiencing the same issue and haven't got it working properly yet....

Darren Oh’s picture

Title: Nodes with multiple dates duplicated in views » Sorting duplicates nodes with multiple dates

Changing the title to better reflect the source of the problem.

yched’s picture

Title: Nodes with multiple dates duplicated in views » Sorting duplicates nodes with multiple values

I don't think this is specific to date fields

Rowanw’s picture

Title: Sorting duplicates nodes with multiple values » Sorting duplicate nodes with multiple values

typo.

Although it could be just poorly phrased...

yched’s picture

Title: Sorting duplicate nodes with multiple values » Sorting duplicates nodes with multiple values

Actually, this is not a typo. Sorting on a field with multiple values does duplicate the nodes...
Feel free to propose a better wording :-)

Darren Oh’s picture

Related issue: 222255.

Triskelion’s picture

I tracked this down. It is an issue for the Location project, and well as whoever constructed the mysql query to retreive node data for views.

The query is constructed as 'SELECT DISTINCT(node.nid), node.title .....' giving the reader the false sense of expected behaviour that it will retrieve unique records based on nid. (I hope MySQL decides to have a syntax error thrown in future releases. :-) To understand the folly, rewrite the query as 'SELECT DISTINCT(node.nid),(node.title),.......' MySQL will return distinct rows, not rows with distinct nid's! The braces are meaningless! (Test it and weep)

When the query includes 'FROM node node LEFT JOIN location location ON node.vid = location.eid' you may begin to see duplicates. The location module fills its table with blank records for some reason. You might have a hundred blank lid's with the same eid. Without the DISTINCT keyword in the above query, you would see hundreds of rows. With it you only see two, the one you want, and one with blank location information. #222255: No way to add sort handler is NOT a related issue.

On a per-instance basis, I am deleting the blank records by hand. The issue has been reported in the location issue queue at #268281: User locations not deleted with user.

Darren Oh’s picture

DISTINCT is behaving as expected according to the SQL spec. This issue is properly handled by using GROUP BY, HAVING, and aggregate functions in the query. CCK might be able to provide a reasonable default for modules that fail to do this.

bones’s picture

I just ran in to this issue when updating a site to Drupal 5, it all worked as I liked in 4.7! Following what capallic mentioned in his bug report, it does not make sense to me that the 'multiple field' is added to the GROUP BY statement. Removing it does seem to do the trick and get result I am after.
I think that if you are selecting DISTINCT nodes and you wish to sort the 'multiple field' then it should not be in the GROUP BY statement. It will then be ordered by the first value of the field, which in a lot of cases should be adequate, but overall much better than returning multiple nodes when you have asked for distinct ones!
From what I can work out this could either be added to views or put in a sort handler for CCK.
All it needs in views.inc is line 415 to be changed to
if ($this->groupby && !$this->distinct) {
There is probably much more to it all than that, but I thought I would add my bits as I feel this is a very important issue and would like to see it fixed.
Hope this helps!

Darren Oh’s picture

Let's clarify something: DISTINCT is meant to eliminate identical results. Eliminating results in which only one field is identical is a MySQL extension which makes it impossible to be certain which result will be returned.

Views would work for us as is if CCK provided a sensible default sort handler. I have recently had some experience in writing handlers, so I might be able to do this soon.

eeyorr’s picture

Just curious if anyone found a work around.

caver456’s picture

Version: 5.x-1.x-dev » 6.x-2.2

subscribing
same problem here, drupal 6.10, views 6.x.2.5, cck 6.x.2.2, date 6.x.2.x-dev (apr 1 2009, no foolin')
Thanks for all your work!

eeyorr’s picture

Version: 6.x-2.2 » 5.x-1.x-dev
capellic’s picture

My posted above worked in Drupal 5/Views 1, but not in Drupal 6/Views 2. Plus, the hack was the theming layer, not really the right place for it.

The solution I came up with is a bit better, a mini module that takes care of the duplicates by purging rows with duplicate node IDs from the Views result set. I would hope that this could be a checkbox in Views at some point.

I've posted the recipe to my blog: http://capellic.com/blog/cure-duplicate-nodes-in-a-view

This would be a solution with any view that is displaying a node more than once due to whatever reason.

castawaybcn’s picture

@capellic
I tried your module but unfortunately it does not seem to work, I am getting this error message instead:
function viewsremoveduplicates_views_pre_render(&$view, &$query) { if ($view->name == 'my_view_name') { foreach ($view->result as $row) { if (!in_array($row->nid, $used_nids)) { $new_view_result[] = $row; $used_nids[] = $row->nid; } } $view->result = $new_view_result; } }

Also, my test site breaks after enabling your module... Easily fixed by just browsing back and disabling it though.

I dont' know any php code myself, but I was wondering if the code could also be modified to accept more than one view, something like:
if ($view->name == 'view1' or $view->name == 'view2')

david.a.king’s picture

#27 - bones - just wondering if this does anything undesirable in the background that i haven't noticed.. cos as far as i can see it solves the problem.. (so thank you!)

KarenS’s picture

Status: Active » Closed (won't fix)

The D5 version is no longer being supported. Sorry.