This is a follow up from #350409: Checking for uniqueness via a CCK field which stores an id .

Some parsers like CSV parser don't detect URLs and they create GUIDs from random patterns (Parser CSV md5's a cvs file's row).

If there was a mapper for GUID and URL this problem would go away. Users could define which element of the feed item contains GUID and/or URL and thus define how to check for the uniqueness of an item.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

burgs’s picture

This would be easy if guid came through as $field_name into the hook_feedapi_mapper function.

Anyone got any ideas of how to fix this?

tauno’s picture

You can update the guid in $node->feedapi->feed_item->options->guid, but it looks like the unique check has already been made by the time the feed element mappers do their work. See Step 4 of _feedapi_invoke_refresh() in feedapi/feedapi.module.

alex_b’s picture

Category: task » feature

That's true. This feature request goes very deep into FeedAPI mapper's architecture... My initial post wasn't very clear on that. Properly flagging this as a feature request.

alex_b’s picture

Title: Create a URL/GUID mapper » Mapper for FeedAPI URL/GUID
alex_b’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev

This feature will go hand in hand with #539752: Support 'unique' conditions: for full integration with feedapi_node we will need this mapper, and for implementing unique checking right, we will need #539752: Support 'unique' conditions.

Bumping this to 2.x.

Aron Novak’s picture

I see two possible ways here to implement this feature:
1) The mapper remains node-dependent as the feedapi_mapper_node.inc is node-dependent. Then manipulating the node object is fine. As tauno wrote, $node->feedapi->feed_item->options->guid is the place where the guid can be put. However if i consider other processors (like FeedAPI Data), it's not such a good idea.
2) Manipulating the $feed_element variable. This is a totally new approach, until now, FeedAPI Mapper only did feed_element -> node mapping. What i propose here is a feed_element -> feed_element mapping, for example foo1 field of the feed element will be copied to the guid of the feed element. In this case this mapper can be node-independent. This would mean big changes in the architecture.

At first attempt, i'd say 1) is fine, because with the unique conditions feature, guid is not really a critical value anymore, because the mapper can override it.

Aron Novak’s picture

Patch for approach 1), see #6. Basically this works when mapping to a node object.

Aron Novak’s picture

Status: Active » Needs review

The patch needs review, the ideas in #6 needs to be discussed.

alex_b’s picture

Status: Needs review » Needs work

#6: I am not sure whether I understand the practical advantages of 2).

#7: url is options->original_url, not url. I'm getting bitten by this one all the time, too.

Aron Novak’s picture

Status: Needs work » Needs review
FileSize
1.58 KB

#9: yeah, original_url. It's really sick that the db schema and the form name convention is different in FeedAPI :(
I corrected that one and also i updated the mapper according to #539752: Support 'unique' conditions

alex_b’s picture

Project: Feed Element Mapper » FeedAPI
Version: 6.x-2.x-dev » 6.x-1.x-dev
Component: Code » Code feedapi_node
Status: Needs review » Needs work

O, on a second read, this should be actually part of feedapi_node module - sorry, should have seen this when reviewing #7.

Once this is refactored as a patch to feedapi_node, we should be good to go.

We should be able to commit this without #536666: FeedAPI Mapper 2 compatibility.

Aron Novak’s picture

Ahha, this needs a little tweak in feedapi_mapper.module, as i see. Take a look on #541526: Ability to use mappers from other modules
The new patch will land soon.

Aron Novak’s picture

Status: Needs work » Needs review
FileSize
1.57 KB
alex_b’s picture

Status: Needs review » Needs work

Will need work once #539752-8: Support 'unique' conditions is fixed - can we move this into the .module file so that we don't need custom loaders like #541526: Ability to use mappers from other modules. ?

Aron Novak’s picture

Status: Needs work » Needs review
FileSize
2.04 KB
alex_b’s picture

Status: Needs review » Needs work
+  // Only map nodes.
+  if (!$node_type = feedapi_mapper_get_item_node_type($feed_node)) {
+    return;
+  }

These lines should go away after [##541814] is committed.

alex_b’s picture

Please adjust these comments:

// Currently only does site-wide deduping
// Check for any duplicates, decide in _feedapi_node_save() whether to look for in-feed duplicates or any duplicates.

// Both fields support unique-ness
// Both fields support testing for unique items.

Aron Novak’s picture

Status: Needs work » Needs review
FileSize
2.06 KB

#541814: Breaks when enabling multiple processors is committed, so i updated the patch as well.

Aron Novak’s picture

Fixed a bug in #18, the array is keyed by the item ids, not the feed ids.

alex_b’s picture

Status: Needs review » Needs work

Ouch - if we commit this we lock feedapi_node into feedapi_mapper 2.x.

We have 3 options: either we repatch this to add it to feedapi_mapper 2.x or we branch for feedapi 2.x or we build a check for whether $active_processor is a string and thus determine the API version - I lean towards the latter.

Aron Novak’s picture

Project: FeedAPI » Feed Element Mapper
Component: Code feedapi_node » Code

I'd not like to branch. It would be nice to support both mapper branches.

"$active_processor is a string and thus determine the API version " This seems to be problematic.
Let's examine the function signature:
feedapi_node_feedapi_mapper($op, $feed_node, $active_processor, $node = NULL, $feed_element = array()
function node_feedapi_mapper($op, $node, $feed_element = array(), $field_name = '', $sub_field = '') {

Mapper 1.x third argument: $feed_element = array(), but can be a string also: if (is_string($feed_element) || is_numeric($feed_element)) { - from node mapper.
Mapper 2.x third argument: $active_processor, always a string

So just is_string($active_processor) is not helpful.
I move this under feedapi_mapper/mappers/.

Aron Novak’s picture

Status: Needs work » Fixed

I also fixed a tiny bug in the returning stucture.
I replaced

$nids[$item->nid] = $item->feed_nid;

to

$nids[$item->nid][] = $item->feed_nid;

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.