The mappper/content_taxonomy.inc manages feeds mapping of multiple comma separated term names to the CCK taxonomy field target, however if there are duplicate terms in the mapped string, it adds identical terms multiple times to a multi-valued CCK taxonomy field:
e.g. 'Asia, America,Asia,    Europe' becomes
Array
(
[0] => Asia
[1] => America
[2] => Asia
[3] => Europe
)

I don't think there is a use case for adding the same term multiple times to a node, is there? I assumed not and made this little patch which reduces these duplicates when splitting the string. It also splits on comma-with-any-number-of-spaces.

I hope this helps someone.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

NealB-1’s picture

I ran into the same issue. Thank you.

I think you are eliminating duplicates at the wrong point. If the field already contains a term, your solution does not prevent it from being added again.

NealB-1’s picture

I came up with a couple additional wrinkles:

Here's how I'm using the content_taxonomy mapper -- my input data has a series of fields like this:

row main_type is_big is_red is_heavy
row1 big TRUE FALSE TRUE
row2 heavy FALSE TRUE TRUE

I'm trying to roll up all of those boolean fields into a single taxonomy field. I'm using Feeds Tamper to replace each TRUE with the corresponding term and each FALSE with nothing. The main_type field will also go into the taxonomy, which creates the problem of duplicates. Duplicates don't make sense here, so they should be eliminated.

The second wrinkle involves multigroups. I'm not sure exactly what the status of CCK3 multigroups is, but I am using them for my project. If the content_taxonomy field is part of a multigroup, it can make sense to have duplicates, because each entry is meaningful in the context of its particular row in the multigroup. For instance:

multigroup relationship first_name last_name
multigroup1 brother Jeb Bush
multigroup2 brother Neil Bush

In this table, relationship would be implemented as a content_taxonomy field. Clearly, duplicates are not a problem here.

So, it's not obvious what should be the behavior with respect to duplicates. In the majority of situations, they are probably meaningless and should not be preserved. In multigroups, they make sense. However, it would probably be difficult to build multigroups using feeds anyway, so perhaps multigroups will never be an issue.

Other thoughts?

NealB-1’s picture

Status: Active » Needs review
FileSize
2.49 KB

Here is a new patch. There are two changes:

  1. Duplicates are prevented, even if they are from different passes.
  2. I moved the logic to enforce the limit on the number of multiples to before the term is added. It was being tested after, which, regardless of the original wisdom, clearly doesn't work here.

I've tested the duplicate removal, but I haven't tested the enforcement of limits on multiples.

XiaN Vizjereij’s picture

Subscribe

twistor’s picture

Issue summary: View changes
Status: Needs review » Closed (outdated)