This is a follow up from #350409: Checking for uniqueness via a CCK field which stores an id .

Some parsers like CSV parser don't detect URLs and they create GUIDs from random patterns (Parser CSV md5's a cvs file's row).

If there was a mapper for GUID and URL this problem would go away. Users could define which element of the feed item contains GUID and/or URL and thus define how to check for the uniqueness of an item.

Comments

burgs’s picture

This would be easy if guid came through as $field_name into the hook_feedapi_mapper function.

Anyone got any ideas of how to fix this?

tauno’s picture

You can update the guid in $node->feedapi->feed_item->options->guid, but it looks like the unique check has already been made by the time the feed element mappers do their work. See Step 4 of _feedapi_invoke_refresh() in feedapi/feedapi.module.

alex_b’s picture

Category: task » feature

That's true. This feature request goes very deep into FeedAPI mapper's architecture... My initial post wasn't very clear on that. Properly flagging this as a feature request.

alex_b’s picture

Title: Create a URL/GUID mapper » Mapper for FeedAPI URL/GUID
alex_b’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev

This feature will go hand in hand with #539752: Support 'unique' conditions: for full integration with feedapi_node we will need this mapper, and for implementing unique checking right, we will need #539752: Support 'unique' conditions.

Bumping this to 2.x.

aron novak’s picture

I see two possible ways here to implement this feature:
1) The mapper remains node-dependent as the feedapi_mapper_node.inc is node-dependent. Then manipulating the node object is fine. As tauno wrote, $node->feedapi->feed_item->options->guid is the place where the guid can be put. However if i consider other processors (like FeedAPI Data), it's not such a good idea.
2) Manipulating the $feed_element variable. This is a totally new approach, until now, FeedAPI Mapper only did feed_element -> node mapping. What i propose here is a feed_element -> feed_element mapping, for example foo1 field of the feed element will be copied to the guid of the feed element. In this case this mapper can be node-independent. This would mean big changes in the architecture.

At first attempt, i'd say 1) is fine, because with the unique conditions feature, guid is not really a critical value anymore, because the mapper can override it.

aron novak’s picture

StatusFileSize
new1.1 KB

Patch for approach 1), see #6. Basically this works when mapping to a node object.

aron novak’s picture

Status: Active » Needs review

The patch needs review, the ideas in #6 needs to be discussed.

alex_b’s picture

Status: Needs review » Needs work

#6: I am not sure whether I understand the practical advantages of 2).

#7: url is options->original_url, not url. I'm getting bitten by this one all the time, too.

aron novak’s picture

Status: Needs work » Needs review
StatusFileSize
new1.58 KB

#9: yeah, original_url. It's really sick that the db schema and the form name convention is different in FeedAPI :(
I corrected that one and also i updated the mapper according to #539752: Support 'unique' conditions

alex_b’s picture

Project: Feed Element Mapper » FeedAPI
Version: 6.x-2.x-dev » 6.x-1.x-dev
Component: Code » Code feedapi_node
Status: Needs review » Needs work

O, on a second read, this should be actually part of feedapi_node module - sorry, should have seen this when reviewing #7.

Once this is refactored as a patch to feedapi_node, we should be good to go.

We should be able to commit this without #536666: FeedAPI Mapper 2 compatibility.

aron novak’s picture

Ahha, this needs a little tweak in feedapi_mapper.module, as i see. Take a look on #541526: Ability to use mappers from other modules
The new patch will land soon.

aron novak’s picture

Status: Needs work » Needs review
StatusFileSize
new1.57 KB
alex_b’s picture

Status: Needs review » Needs work

Will need work once #539752-8: Support 'unique' conditions is fixed - can we move this into the .module file so that we don't need custom loaders like #541526: Ability to use mappers from other modules. ?

aron novak’s picture

Status: Needs work » Needs review
StatusFileSize
new2.04 KB
alex_b’s picture

Status: Needs review » Needs work
+  // Only map nodes.
+  if (!$node_type = feedapi_mapper_get_item_node_type($feed_node)) {
+    return;
+  }

These lines should go away after [##541814] is committed.

alex_b’s picture

Please adjust these comments:

// Currently only does site-wide deduping
// Check for any duplicates, decide in _feedapi_node_save() whether to look for in-feed duplicates or any duplicates.

// Both fields support unique-ness
// Both fields support testing for unique items.

aron novak’s picture

Status: Needs work » Needs review
StatusFileSize
new2.06 KB

#541814: Breaks when enabling multiple processors is committed, so i updated the patch as well.

aron novak’s picture

StatusFileSize
new2.06 KB

Fixed a bug in #18, the array is keyed by the item ids, not the feed ids.

alex_b’s picture

Status: Needs review » Needs work

Ouch - if we commit this we lock feedapi_node into feedapi_mapper 2.x.

We have 3 options: either we repatch this to add it to feedapi_mapper 2.x or we branch for feedapi 2.x or we build a check for whether $active_processor is a string and thus determine the API version - I lean towards the latter.

aron novak’s picture

Project: FeedAPI » Feed Element Mapper
Component: Code feedapi_node » Code

I'd not like to branch. It would be nice to support both mapper branches.

"$active_processor is a string and thus determine the API version " This seems to be problematic.
Let's examine the function signature:
feedapi_node_feedapi_mapper($op, $feed_node, $active_processor, $node = NULL, $feed_element = array()
function node_feedapi_mapper($op, $node, $feed_element = array(), $field_name = '', $sub_field = '') {

Mapper 1.x third argument: $feed_element = array(), but can be a string also: if (is_string($feed_element) || is_numeric($feed_element)) { - from node mapper.
Mapper 2.x third argument: $active_processor, always a string

So just is_string($active_processor) is not helpful.
I move this under feedapi_mapper/mappers/.

aron novak’s picture

Status: Needs work » Fixed

I also fixed a tiny bug in the returning stucture.
I replaced

$nids[$item->nid] = $item->feed_nid;

to

$nids[$item->nid][] = $item->feed_nid;

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.