I've noticed that when you first import a feed, the items are not in the correct order for feeds that do not supply publishing dates for items.
Cause:
-----
When an item is imported, if no publishing date can be found for the item, the item's timestamp attribute is assigned the current time in number of seconds from UNIX Epoch.
If more than one item is imported at a time from a particular feed, they are assigned the same timestamp since processing for the feed takes less than a second.
When items are extracted from the database for display, they are ordered by timestamp only. Items with the same timestamp are ordered by some internal method determined by the database, which is essentially random.
Solution:
--------
When retrieving items from the database, order items by timestamp as is done now, but also order items with the same timestamp by item id (iid) DESC. This can be accomplished in the sql when retreiving the items from the database by simply adding an "iid DESC" as an additional sort on the items.
The attached patch changes the sql in the following functions which retrieve timstamp DESC ordered items from the database:
aggregator_bundle_block()
aggregator_feed_block()
aggregator_tag()
aggregator_page_last()
aggregator_page_feed()
aggregator_page_bundle()
The patch was done on a fresh export of cvs HEAD.
Comment | File | Size | Author |
---|---|---|---|
#1 | timestamp-iid_0.patch | 3.39 KB | eap1935 |
timestamp-iid.patch | 3.38 KB | eap1935 | |
Comments
Comment #1
eap1935 CreditAttribution: eap1935 commentedComment #2
eap1935 CreditAttribution: eap1935 commentedHeh. Ignore the first attachement. It will break the module. Oops. The second one will work.
Comment #3
Dries CreditAttribution: Dries commentedAre you sure items with the same timestamp are ordered randomly? Is that behavior documented somewhere?
We've used the following snippet (taken from
aggregator.module
) for years:Comment #4
eap1935 CreditAttribution: eap1935 commentedI'm using a recentish (compiled about four months ago from the most recent source) mysql.
The array_reverse() call has no impact on the bug, but the comment you've quoted states your assumption: the insertion order of records into the database determines the order in which they will be retrieved.
This is not a reasonable assumption. From the evidence of my senses, the actual behavior is that the mysql database (at least) retrieves records based on some internal ordering, perhaps organized for some kind of optimization.
If you don't believe there's a problem, do what I did. Take a simple feed at random off of http://syndic8.com. The feed should not provide timestamp information. The second feed I tried from the "random" box had no timestamp info. prweb provides hundreds of feeds of this type.
http://www.prweb.com/xml/techrobotics.xml
Import a feed into the aggregator and compare the order presented by Drupal to the order of the actual xml. Apply the patch. Difference? There sure is on my system.
The call to array_reverse()is a good idea. Reversing the array retrieved from the xml parser makes sure items are inserted into the database in order from oldest to newest. Since iid is incremented on each database insertion, iid is an index of the age of an item.
Comment #5
eap1935 CreditAttribution: eap1935 commentedYes, I'm sure. You can't assume the items will come out of the database in the same order they were put in. It's broken.
Comment #6
Dries CreditAttribution: Dries commentedI tried three freeds (with no date information) to reproduce this problem, yet to no avail. Can anyone else reproduce this? Can you share the URL of a feed that triggers this problem?
Comment #7
eap1935 CreditAttribution: eap1935 commentedI dunno what's going on then. I'm using a recent version of mysql as the database. The URL included in the comment above, picked at random from syndic8.com appeared out of order in a fresh install of CVS HEAD on my system.
http://www.prweb.com/xml/techrobotics.xml
Thanks for looking.
Comment #8
Dries CreditAttribution: Dries commentedI could reproduce the problem with the feed you provided (probably because it contains many news items?). Committed to both HEAD and the DRUPAL-4-4 branch. Thanks.
Comment #9
(not verified) CreditAttribution: commented