Closed (duplicate)
Project:
Leech
Version:
4.7.x-1.3
Component:
leech
Priority:
Normal
Category:
Bug report
Assigned:
Reporter:
Created:
27 Dec 2006 at 20:54 UTC
Updated:
13 Jun 2007 at 21:02 UTC
Hi
i install the module and everything is ok, but i get some duplicate articles from the feed url, same news twice
Thanks
Comments
Comment #1
alex_b commentedhi toma,
can you check, if the two articles lead to the same original source article? could you post the orginal URLs of the two articles?
alex
Comment #2
toma commentedHi
Thanks for your reply
The two articles came from the same source, you can see at my test website (french)
the source feed
http://www.blogelle.com/beaute-femme
Source URL http://www.beaute-femme.org/news/rss.php
Last checked il y a 1 minute 8 secondes
Time until next refresh 2 heures 58 minutes restant
example of duplicate articles
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain-0
when i leech data i receive that
10 item(s) added, 0 duplicate(s) found.
Errors
You can see here
www.blogelle.com
Comment #3
alex_b commentedThanks toma,
the error occurs when a duplicate entry is being inserted in url_alias. What's strange is, that before that error you should have gotten one by leech that says something similar. I checked
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain-0 and saw that BOTH point to
http://www.beaute-femme.org/blog-femme/maison-en-bois-52 - the same URL - leech really should catch that.
Can you post the entries for the two articles (first two URLs here) in the leech_news_item table? You should be able to identify them by their node id.
thank you - alex
Comment #4
toma commentedHi
thanks for your help; i copy past the table for the two nodes
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain (id 986)
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain-0 (id 987)
Comment #5
funana commentedWhen I leech a feed for the first time there are often a lot of duplicate entries. If I manually delete the duplicate items and leech again (or let them be leeched by cron) no duplicates are produced.
I don't know if this is a hap, but it seems that this mainly occured on atom feeds. But I am not sure...
Comment #6
funana commentedooops, sorry for changing the subject. Changed it back to "duplicate content".
Comment #7
alex_b commentedI cannot reproduce your error. Look here:
http://leechgroups.devseed.org/leech/feed/1122 - leech 4.7-dev (identical dupe handling), PHP 4.4.4/MySQL 4.1.20
What happens if you turn off the pathauto module? Try to exclude any other negative interactions with other modules. (Best you try this on a fresh installation).
This is a peculiar error. I would like to get to the ground of this. Thanks for keeping onto it.
Comment #8
toma commentedI delete all previous entries and disable pathauto module, i leech data
Just 3 nodes added! i try with other feeds, work fine, yahoo terms service works also, no duplicate content.
Comment #9
aron novakHave you experienced that leech isn't compatible w/ pathauto module? If it is disabled then leech works as you want?
Comment #10
alex_b commentedFeed item duplication issue is fixed: http://drupal.org/node/135333