I am using Feeds plus Feed Tamper to create nodes on my website from an RSS feed from another website.

My website is running the current 6x versions of Drupal, Feeds, and Feed Tamper. The other website is running the current 6x versions of Drupal and Views. Both websites run numerous other third-party modules besides those listed.

I am using Feed Tamper to process the taxonomy terms within the RSS feed. There can be several taxonomy terms per feed item. Feed Tamper converts the list of terms into an array that it supplies to Feeds. This is working.

I am having problems with the feeds, in that when an item present in a feed has already been used to create a new node on my website, Feeds generates duplicate nodes. It is not making any difference how I set things up in the feed importer - I get duplicates when I specify "don't update existing nodes," "replace existing nodes," or "update existing nodes."

In the mapping section for the feed importer, I have the "guid" field set as the "unique field." I am working with a custom content type created via CCK, and "guid" is one of the custom fields. I am getting duplicates even though I have the "unique field" box checked for the "guid" field.

When I look at my website's database within phymyadmin, I see some interesting results. For the custom type, "external news," there is a "content_type_external_news" table. It has all of the custom fields for that type except for one - the GUID. There is a separate table, "content_field_guid," that has the values of the GUID field for the "external news" custom type plus an additional custom type - "events."

When I look at the values for the GUID field, it is null for every record in the "content_field_guid" table - regardless of whether the GUID applies to an event node or to an external news node. So I am wondering if this is part of the problem - that Feeds is expecting the GUID field to be in the same table as the rest of the custom fields. [Table "content_type_external_news" instead of "content_field_guid."]

But there is something even stranger happening. Someone else created a feed importer for the events. Her feed importer does not create duplicate event nodes when her feed importer is run. To troubleshoot, I created an exact copy of her feed importer from scratch so that I could see if there was some difference between her importer and mine. My importer creates duplicate event nodes, even though it is processing the same feed and feed items.

I exported the configurations for each feed importer to text files. I then ran the "Beyond Compare" program to display the differences between the two files - in other words, the differences between the two feed importers' configurations. The only differences shown were in the feed importers' names and id's. Otherwise, they were an exact match.

Additionally, the accounts used to create the two feed importers are configured the same. Both have the same two role memberships, one of which grants every possible permission.

There is one other difference that might be significant. In the "feeds_node_item" table, I see that the "guid" field has been filled for four events. [There are four events in the feed.] Otherwise, for all other records the "guid" field is null.

Comments

SteveMM’s picture

Issue summary: View changes

Author caught a few typos.

lquessenberry’s picture

I am having the same problem. I have a CSV, (Homemade In Nature) with no GUIDs. The data does have a uniqeu ID for every row. This is awesome considering that this is about a 20 year old data file. There are 21000 entries and over 70 fields. The problem I am having is that I have mapped the ID field to NID in the mapper and set it to unique. This creates all the listings I need, but when I import again to make changes to the content, it just makes more of the same nodes.

joaomachado’s picture

I seem to be having the same issue...any new updates on this?

Here is my use case:
Using 6.x-1.0-beta11 and patch: reset-real-targets-6x-996808-17.patch

I have a node type with 10 fields but with two sources. Each source handles different fields of the node type.
I use the Node Title as the GUID.

Source 1: Field1, Field2, Field3, Field4, Field5, Field6
Source 2: Field7, Field8, field9, Field10

When attempting to use the second source to update the node, it creates a duplicate.

Should I be using the dev version or has this not been addressed yet?

selwynpolit’s picture

I am experiencing duplicate records being created from a twitter feed. This only seems to happen on the first import - after new tweets are added, they only import one at a time. Kind of an edge case scenario. It happens with the built in feed when I give it my twitter url of https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=selwynp...

I hope this is helpful.

martin.l’s picture

Priority: Major » Critical

I have the same problem.

I am importing from the field "Item URL (link)" to the field "URL" and using this field as a unique field.

Still the import creates duplicate nodes (especially on initial import).

edminn’s picture

+1

twistor’s picture

Component: Feeds Import » Documentation
Priority: Critical » Normal
Status: Active » Postponed (maintainer needs more info)

@SteveMM

The GUID target is not a field that you create, it is a database column supplied by Feeds. The same for the URL target. You feed needs to have a unique id per feed item mapped to either GUID or URL and set as unique in order for Feeds to keep track of the items it is importing.

@lquessenberry, that sounds like a different issue and possibly a bug.

@joaomachado, that is a different issue, could you create a new issue?

@selwynpolit do you have the GUID field mapped and marked as unique?

Everybody, I will need more information to help

twistor’s picture

Issue summary: View changes

More minor changes. Easier for me to spot flaws in "view" mode than when editing.

twistor’s picture

Status: Postponed (maintainer needs more info) » Closed (cannot reproduce)