I've noticed that when you first import a feed, the items are not in the correct order for feeds that do not supply publishing dates for items.

Cause:
-----
When an item is imported, if no publishing date can be found for the item, the item's timestamp attribute is assigned the current time in number of seconds from UNIX Epoch.

If more than one item is imported at a time from a particular feed, they are assigned the same timestamp since processing for the feed takes less than a second.

When items are extracted from the database for display, they are ordered by timestamp only. Items with the same timestamp are ordered by some internal method determined by the database, which is essentially random.

Solution:
--------
When retrieving items from the database, order items by timestamp as is done now, but also order items with the same timestamp by item id (iid) DESC. This can be accomplished in the sql when retreiving the items from the database by simply adding an "iid DESC" as an additional sort on the items.

The attached patch changes the sql in the following functions which retrieve timstamp DESC ordered items from the database:

aggregator_bundle_block()
aggregator_feed_block()
aggregator_tag()
aggregator_page_last()
aggregator_page_feed()
aggregator_page_bundle()

The patch was done on a fresh export of cvs HEAD.

CommentFileSizeAuthor
#1 timestamp-iid_0.patch3.39 KBeap1935
timestamp-iid.patch3.38 KBeap1935
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

eap1935’s picture

FileSize
3.39 KB
eap1935’s picture

Heh. Ignore the first attachement. It will break the module. Oops. The second one will work.

Dries’s picture

Are you sure items with the same timestamp are ordered randomly? Is that behavior documented somewhere?

We've used the following snippet (taken from aggregator.module) for years:

  /*
  ** We reverse the array such that we store the first item last,
  ** and the last item first.  In the database, the newest item
  ** should be at the top.
  */

  $items = array_reverse($items);
eap1935’s picture

I'm using a recentish (compiled about four months ago from the most recent source) mysql.

The array_reverse() call has no impact on the bug, but the comment you've quoted states your assumption: the insertion order of records into the database determines the order in which they will be retrieved.

This is not a reasonable assumption. From the evidence of my senses, the actual behavior is that the mysql database (at least) retrieves records based on some internal ordering, perhaps organized for some kind of optimization.

If you don't believe there's a problem, do what I did. Take a simple feed at random off of http://syndic8.com. The feed should not provide timestamp information. The second feed I tried from the "random" box had no timestamp info. prweb provides hundreds of feeds of this type.

http://www.prweb.com/xml/techrobotics.xml

Import a feed into the aggregator and compare the order presented by Drupal to the order of the actual xml. Apply the patch. Difference? There sure is on my system.

The call to array_reverse()is a good idea. Reversing the array retrieved from the xml parser makes sure items are inserted into the database in order from oldest to newest. Since iid is incremented on each database insertion, iid is an index of the age of an item.

eap1935’s picture

Yes, I'm sure. You can't assume the items will come out of the database in the same order they were put in. It's broken.

Dries’s picture

I tried three freeds (with no date information) to reproduce this problem, yet to no avail. Can anyone else reproduce this? Can you share the URL of a feed that triggers this problem?

eap1935’s picture

I dunno what's going on then. I'm using a recent version of mysql as the database. The URL included in the comment above, picked at random from syndic8.com appeared out of order in a fresh install of CVS HEAD on my system.

http://www.prweb.com/xml/techrobotics.xml

Thanks for looking.

Dries’s picture

I could reproduce the problem with the feed you provided (probably because it contains many news items?). Committed to both HEAD and the DRUPAL-4-4 branch. Thanks.

Anonymous’s picture