FEMP 2 brings the possiblity to define the uniqueness of a mapping target. I. e. when a mapping target is unique, the implementer of the mapping target will make sure that only one mapping target with a given value exists in the target system.

Right now, the implementation is flawed: it assumes that any target can be unique, which is not the case. FeedAPI and FeedAPI mapper don't know anything about the mapping targets and it is the active processor's responsibility to heed the 'unique' flag.

We should introduce a $op 'unique supported' and 'unique'. Mappers that return TRUE on 'unique supported' are being called with $op = 'unique' when they've been picked. On $op = 'unique' they need to return TRUE if the item is unique, FALSE if not. (Alternatively, we could merge the 'unique supported' flag into the return value of 'list'. Not sure about that though.)

Further, this change will require a change to the mapping UI. If some mapping targets support being unique, others don't we'll have to change setting the unique flag when adding a mapping to setting the unique flag in a separate step. Only like that we can show a unique checkbox or not depending on its availability.

I. e.: instead of

a) picking a mapping, checking unique or not and then
b) adding a mapping to the mapping set,

you would rather

a) pick a mapping,
b) add it to the mapping set and then you would see a checkbox for uniqueness depending on whether the mapping target you chose supports 'unique' or not.
c) You can then check the box and hit save to save the mapping.

Thinking about this more, the 'Save' button on this form will cause some confusion, as people will inevitably think that the entire mapping won't have effect until they hit save, not just the unique flags. We should think about meeting this expectation by creating mappings in an edit cache and saving them to feedapi_mapper when the user hits 'save'.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

Priority: Normal » Critical

This is critical for the first release of FEMP 2.

Aron Novak’s picture

Assigned: Unassigned » Aron Novak

Things to do here:

Add a feedapi_mapper_unique() function to feedapi_mapper.module. This function should utilize feedapi_mapper_get_uniques() to retrieve the list of user-configured unique fields, then call the mappers. In the end, the function should return the array of item IDs (the duplicated items's IDs, for example a bunch of node IDs). If that array is empty, the item is unique.

At the mappers:
They should implement hook_feedapi_mapper() $op = 'unique supported' and $op = 'unique'. They should use feedapi_get_settings() do know if they should do site-wide or feed-wide deduping. 'unique supported' means that field can be marked to be unique, with $op = 'unique', the hook should return the array of item IDs (the duplicated items's IDs, for example a bunch of node IDs).

At the UI:

a) pick a mapping,
b) add it to the mapping set and then you would see a checkbox for uniqueness depending on whether the mapping target you chose supports 'unique' or not.
c) You can then check the box and hit save to save the mapping.

- just as described at the initial post

alex_b’s picture

"They should use feedapi_get_settings() do know if they should do site-wide or feed-wide deduping." - I would add this later. At first we can assume that site wide deduping is required. If this is not a performance problem, we should leave it that way for simplicity's sake.

Aron Novak’s picture

Status: Active » Needs review
FileSize
5.93 KB
alex_b’s picture

Status: Needs review » Needs work

#4: Awesome. Read the patch, here are my observations:

A

feedapi_mapper_unique_toggle() will need to use a token to avoid XSS attacks (I know, minor, but still). We should use FAPI for toggling (i. e. make the toggle link a form submission).

B

If there are no uniques in feedapi_mapper_unique() array_pop() will be executed on an empty array and return NULL.

C

feedapi_mapper_unique() needs better documentation: what are the parameters, what return values to expect, when is this function being used?

Aron Novak’s picture

#5: alright, valid issues. new patch is attached.

Aron Novak’s picture

Status: Needs work » Needs review
alex_b’s picture

Status: Needs review » Needs work

2 problems:

A For effective cross feed deduping we need feed_nids in the array. We've talked about this on IM and found that a mapper that implements 'unique' should return a value in the format array([id] => array([feed_id], [feed_id], [feed_id]).

B The function merges all results into one array. This points to a bigger issue where FEMP does not know which mappers react on which processors - I've opened an issue here #541814: Breaks when enabling multiple processors - once this is in, we should add a $processor flag to the function signature of feedapi_mapper_unique() that indicates which mappers to query.

Aron Novak’s picture

Status: Needs work » Needs review
FileSize
9.14 KB

Here is a new patch, covers all the thoughts from #8.

I preserved that we want AND logic between the fields, but now it won't cause any problems, because of the processor-specific changes.
Also i changed some of the tests to provide mappings by default to fill title and body, i assume this is not chating at the tests :)
All tests are okay.

Aron Novak’s picture

Aron Novak’s picture

Status: Needs review » Needs work

A tiny improvement is still needed:
call_user_func_array('array_intersect_key', $fields_ids);
This is not appropiate, the array of feed ids should be handled somehow. Needs some concepting how to do it right.

Aron Novak’s picture

Status: Needs work » Needs review
FileSize
8.94 KB

After lots of thinking, i think this is not a task for FEMP. We decided that cross-feed deduping and simple deduping is not something that should be handled here. I added one more layer to the returning array, the serial numbered array is about the fields.
Depending patches need to be updated.

Aron Novak’s picture

Improved version of #12
The result can be a numeric value (or any other kind) as well for some kind of simplier processors, so the return value of the mappers at $op = 'unique' is valid if it's not empty().
Also feedapi_mapper_unique() returns FALSE to indicate that the user did not configure any unique fields, this means that feedapi's built-in logic should be used.

alex_b’s picture

Great.

Springing to the eye:

Array structure:
+ *   array(
+ *    [0] => array(itemid1 => array(feedid1, feedid2))
+ *    [1] => array(itemid2 => array(feedid1, feedid2))

this is not correct anymore.

Otherwise looking very good. Need to review further.

Aron Novak’s picture

#14: Yep, good catch, this is only true in the case of node processor.

alex_b’s picture

Status: Needs review » Needs work

More nitpicking:

1 $url_toggle_unique should be $toggle_unique_path, $url_delete $delete_path - it's a 'path' and the natural order of words makes the variable easier to read. Further it's consistent with $feed_path
2 Don't use the ?: ternary form if the statement doesn't fit neatly on a line
3 There is an indentation problem around "if ($unique_supported == TRUE) {"
4 $unique = 'N/A'; - shouldn't it be t('N/A') ?
5 feedapi_mapper_unique_toggle() should be feedapi_mapper_toggle_unique()
6 "Collects the ids of the duplicated items of $item. Utilized from FeedAPI processors." should be "Collects ids of duplicate items of $item. Most commonly used by FeedAPI processors to determine whether given item is unique."
7 "Feed item to decide about." -> "Feed item to decide on."
8 @return of feedapi_mapper_unique() should be:


@return
  An array containing duplicate items per unique elements, FALSE if no unique elements were defined. The exact format of the duplicate items is up to the implementer of feedapi_mapper($op = 'unique'):

array(
  [0] => array(/* duplicates for unique element 1, e. g. URL - format up to implementer */),
  [1] => array(/* duplicates for unique element 2, e. g. GUID - format up to implementer */),
  // ...
  )

For instance, feedapi_node processor expects arrays where the key is the item's node id and the value is an array of feeds the feed item is associated with:

array(
  [0] => array(itemid1 => array(feedid1, feedid2)),
  [1] => array(itemid2 => array(feedid1, feedid2)),
  )

9 Let's use better names for $fields_ids and $dup_ids: $result and $duplicates
10 There is an indentation problem with both patches to .test files

I really love how neat feedapi_mapper_unique() is now.

Aron Novak’s picture

2 Don't use the ?: ternary form if the statement doesn't fit neatly on a line - in which line? FeedAPI Mapper does this elsewhere anyway :)
Others seems to be fixed.
Tests passed.

alex_b’s picture

"2 Don't use the ?: ternary form if the statement doesn't fit neatly on a line - in which line? FeedAPI Mapper does this elsewhere anyway :)" - we should fix that...

Aron Novak’s picture

please, which line?
I know that we should fix that, that's why i did not modify the status of the patch.

Aron Novak’s picture

Status: Needs work » Needs review
FileSize
9.59 KB

Fixed.
Also there was a non-related snippet in #17 about the date mapper, removed.

alex_b’s picture

Status: Needs review » Reviewed & tested by the community

Beautiful. RTBC.

Aron Novak’s picture

Status: Reviewed & tested by the community » Fixed
Aron Novak’s picture

Status: Fixed » Closed (fixed)