Ideally Feeds' handling of taxonomy term fields would include options to import only incoming data that matches existing terms, rather than free tagging which is what it effectively does now. Meanwhile, a Tamper plugin can filter the input down to only data that match existing terms.

For example, if the imported field data were "My Tag, My Other Tag, My Random Made-Up Tag" and the chosen vocabulary only contained terms named "My Tag" and "My Other Tag," then this tamper plugin would filter the input to "My Tag, My Other Tag."

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

biwashingtonial’s picture

Assigned: biwashingtonial » Unassigned
Status: Active » Needs review
FileSize
1.67 KB

Here's a first draft. Works in preliminary testing. I wrote it in a hurry, so we can play Where's Waldo with the dumb glitch I might have left in it.

Bastlynn’s picture

Status: Needs review » Needs work

Found a bug with this - having to do with your use of static. +1 for caching, -1 for... caching.
If you have multiple maps going from the same field to different taxonomies, you'll end up mapping all of them against the first one declared.

Bastlynn’s picture

Perhaps this instead?

function feeds_tamper_existing_terms_callback($result, $item_key, $element_key, &$field, $settings) {
  static $allowed;
  if (!isset($allowed[$settings['vocabulary']])) {
    // fetch all term names in the configured vocabulary
    $allowed[$settings['vocabulary']] = db_query(
      'SELECT td.name 
        FROM {taxonomy_vocabulary} AS v 
        INNER JOIN {taxonomy_term_data} AS td ON td.vid = v.vid 
        WHERE v.machine_name = :machine_name
        GROUP BY td.name;',
      array(':machine_name' => $settings['vocabulary'])
    )->fetchCol('name');
  }
  if (is_array($field)) {
    $field = array_intersect($field, $allowed[$settings['vocabulary']]);
  } else {
    $field = in_array($field, $allowed[$settings['vocabulary']]) ? $field : '';
  }
}
biwashingtonial’s picture

You found Waldo! Thanks. :)

Re-rolling.

David_Rothstein’s picture

Status: Needs work » Needs review

I needed this feature, and the above patch seemed to work well in my limited testing. Moving to "needs review".

The code should probably use drupal_static() rather than static, but that's relatively minor...

twistor’s picture

Status: Needs review » Needs work

Looking good. There are some minor whitespace issues. It will need tests though, and that will be a bit harder than the normal tests.

ChristopheDG’s picture

Would your solution fix the following issue too:

I import a feeds line with a category field "saddles/parts".
I have an existing vocabulary with hierarchical terms:
machinery - parts
saddles - parts

After import and exploding the category field into taxonomy, I find all the saddle parts nodes mapped to the machinery parts.
Apparently the hierarchical term structure isn't taken into account. "Parts" under "Saddles" should first be looked for.

David_Rothstein’s picture

I ran into an issue with the above patch. It has an inconsistency in that for single-valued fields, it replaces non-existent terms with an empty string, but for multi-valued fields, it filters them completely out of the array (rather than replacing them). This can cause problems with multivalued fields being mapped to the wrong place (in particular I ran into this when importing fields contained within a field collection; fields that were supposed to be attached to one field collection item were getting imported into a different one, due to the missing array items).

The attached reroll fixes that by consistently using 0's to represent missing terms in both cases.

Still "needs work", because per #6 there are no tests.

bibo’s picture

I tested the patch and it works flawlessly. I would say this plugin is rather commonly needed, and I hope it gets to the module soon. If someone creates tests..

@ChristopheDG: I'd say thats a separate issue. You should probably not have several terms with the same name in the vocabulary (and/or only one parent per term).

acidpotato’s picture

This is a great plugin and exactly what I was looking for, so thank you so much for the work so far. One interesting addition to this plugin would be able to identify variations in spellings or presentation in the source and capture that in the target value.

Currently I am importing data from a couple of wine stores from their websites. I am importing to the term field from the "wine title field". To give an example, Same wine could be named in different way on different stores - 2009 Stags Leap Cab Sauv Russian River or 2009 Stags Leap Cab Sauv Russian River Valley. So to summarize -

Entity - Wine
Vocabulary - Wine Region
- Term - California
Sources - 2009 Stags Leap Cab Sauv Russian River Valley and 2009 Stags Leap Cab Sauv Russian River

Is it possible to create an input box to allow the source data to have variations that will map to the same terms in the target field? So users can enter in the input box -
Russian River Valley|California
Russian River|California

And both of these values will be mapped to California. This of course would be in addition to source data "California" being mapped to target term California.

This feature will allow for additional flexibility for users and wider usage of this plugin, especially when one doesn't have control over source data. I'll buy a drink for anyone who comes up with a solution!

acidpotato’s picture

Ahh.. what I described above can be achieved by using plugin in #1623560: find replace by List or #1525540: Find & Replace multiple and then using the existing taxonomy term plugin. Has worked very well for me so far, so thanks again!

tyler-durden’s picture

I can also confirm the patch in #8 works beautifully. This was my major issue with feeds, and it's solved. THANKS!!

mErilainen’s picture

Works for me also

7wonders’s picture

Number 8 works great. If you use the synonyms module you can use the below for an "existing term OR synonym" version of the same.


/**
 * @file
 * Taxonomy filter only allowing terms with synonyms
 */

$plugin = array(
  'form' => 'feeds_tamper_synonyms_form',
  'callback' => 'feeds_tamper_synonyms_callback',
  'name' => 'Only existing terms/or synonyms',
  'category' => 'Taxonomy',
  'single' => 'direct',
  'multi' => 'direct',
);

function feeds_tamper_synonyms_form($importer, $element_key, $settings) {
  $form = array();
  $options = array();
  foreach (taxonomy_vocabulary_get_names() as $machine_name => $data) {
    $options[$machine_name] = $data->name;
  }
  $form['vocabulary'] = array(
    '#type' => 'select',
    '#title' => t('Vocabulary whose terms are allowed'),
    '#options' => $options,
    '#required' => TRUE,
    '#default_value' => isset($settings['vocabulary']) ? $settings['vocabulary'] : '',
  );
  return $form;
}

function feeds_tamper_synonyms_callback($result, $item_key, $element_key, &$field, $settings) {
  static $allowed = array();
  if (!isset($allowed[$settings['vocabulary']])) {
    // Fetch all term names in the configured vocabulary.
    $allowed[$settings['vocabulary']] = db_query(
      'SELECT td.name
        FROM {taxonomy_vocabulary} AS v
        INNER JOIN {taxonomy_term_data} AS td ON td.vid = v.vid
        WHERE v.machine_name = :machine_name
        GROUP BY td.name;',
      array(':machine_name' => $settings['vocabulary'])
    )->fetchCol('name');
  }


  // Replace terms from the input field that are not in the allowed vocabulary
  // with integer 0, so that they will be skipped by the taxonomy field mapping
  // in taxonomy_feeds_set_target().
  if (is_array($field)) {
    foreach ($field as &$term) {
      $tid = synonyms_get_term_by_synonym($term, taxonomy_vocabulary_machine_name_load($settings['vocabulary']), $parent = 0);
      if (isset($tid)) {
        $term = $tid;
      }
      if (!isset($tid) && !in_array($term, $allowed[$settings['vocabulary']])) {
        $term = 0;
      }
    }
  }
  else {
    $field = in_array($field, $allowed[$settings['vocabulary']]) ? $field : 0;
  }
}

twistor’s picture