Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I have many importers parsing external XML and processing into Nodes, setting the URL and GUID correctly, but when the Importers check the sources again, some of then are duplicating nodes, ignoring the "unique target option".
Does anybody having this problem?
Comments
Comment #1
pedrorocha CreditAttribution: pedrorocha commentedComment #2
danmuzyka CreditAttribution: danmuzyka commentedI am having the same issue on 6.x-1.0-beta10. I checked the feeds_node_item table and confirmed that multiple records, each created by a different parent feed node, have identical URLs and GUIDs, even though I selected both of those fields to be unique targets in the importer configuration. Then I tried deleting all of my imported nodes, importing again from one of my feed nodes, and then editing that feed node to change the feed URL field to match the value of another one of my feed nodes. Bingo! When I imported from the first feed node again, I did NOT get duplicate item nodes.
So, it seems that the "unique target" value only works for feed item nodes that have the same parent feed node. I imagine that importing content from feeds that come from the same site, which may have items that appear in more than one feed, is a fairly common use case. For instance, just now I was trying to import feeds from my del.icio.us bookmarks from different feeds based on my tags. In the past, I worked on a news site that has an agreement with Bloomberg news allowing it to import RSS feeds from multiple Bloomberg article categories. Sometimes the same article appeared in two or more feeds. In that case the feeds were populating different parts of the site so it was a moot point, fortunately.
Thanks in advance for any help anyone can provide!
Comment #3
johnvThere are a lot of posts about the Unique target. Regarding your question, see the following link that explains why the behaviour is correct: 1 GUID per feed, not per node type: http://drupal.org/node/761076#comment-2802256
Comment #4
danmuzyka CreditAttribution: danmuzyka commented@johnv, thanks for the quick reply. I see in the comment you pointed out that @alex_b states that this behavior is deliberate, however the business logic behind that decision does not make sense to me. @smscotten makes a similar point in http://drupal.org/node/661606#comment-3799942, and if there is there is a reason that it is better to test uniqueness against other nodes with the same parent feed node rather than against all imported nodes, I am not understanding it.
If there are other issues or comments arguing in favor of the current approach, could you point a few out? Maybe I just overlooked them. Thanks again for your help.
Comment #5
johnv@Dan, I am just a user, not maintainer of this module. In my case, i specify file-names as a source, so I can create one super-importer for different 'feeds'. And you're right, when automating that, I'll run into problems.
But as alex_b states the 'as-is/works-as-designed' situation, perhaps you'd better change this issue to a ' feature request' instead of a bug report.
Comment #6
danmuzyka CreditAttribution: danmuzyka commented@johnv, sure thing, I guess I assumed that you were a close friend or colleague of alex_b or at least had been using this module for long enough you that had particular insight into the rationale behind the current functionality. I'll change the category to feature request if you think that makes more sense. I'm also renaming the issue title for clarity.
Comment #7
johnvCheck also this issue, which already contains a patch for the very thing: Attach multiple importers to one content type.
Comment #8
EvanDonovan CreditAttribution: EvanDonovan commentedSorry for reopening this, but based on the comments in #661606-14: Support unique targets in mappers and following, I think it is necessary. It was incredibly surprising to me when I discovered that the GUID is only a GUID for a specific feed source.
johnv, I am not sure if #634462: Attach multiple importers to one content type (in D6) actually addresses my needs, since that is more about having multiple feed URLs on a single node, whereas I would like to create them as separate nodes for ease of administration on my site. (There are potentially going to be over a hundred of them.)
For anyone who doubts that GUIDs are currently only specific to a feed source (feed_nid in my case), create multiple feed nodes that pull from the same feed URL, then after running them, try the following query:
That query will show that there are duplicates, and that it is because of the different feed sources.
While I see how making GUIDs specific to a feed source could make sense in some cases (standard RSS feeds, where the same URL could show up in multiple feeds and you might want it from both), it doesn't make sense in others (an XML datasource against which you are running multiple API queries).
I think that there ought to be a way to have a GUID which is really a GUID, i.e, the code in FeedsNodeProcessor's existingItemId() method would for this field query with SELECT DISTINCT, and without the feed_nid field. That way it would enforce uniqueness across all feeds. (For the purpose of handling legacy data, the query could instead be a select ordered by created date, ascending, limited to the 1st nid returned. That way, it would always match the oldest one.)
Does anyone else have a need for this? I would even suggest that this new GUID should be called "GUID" in the interface, since it is more consistent with the standard meaning of "GUID". The other one could be called "Feed-Specific GUID" or something (I know that's a clunky name, can't think of something better offhand).
Comment #9
EvanDonovan CreditAttribution: EvanDonovan commentedThis is what I am using for now in my existingItemId function in FeedsNodeProcessor:
This being a hack to the module code, I suppose the correct way to do things would be to create a processor that inherited from FeedsNodeProcessor. But my proposal in #661606-18: Support unique targets in mappers would also work I think, and would mean that it would not be necessary to create an entire new class just for this one thing.
Comment #10
twistor CreditAttribution: twistor as a volunteer commented