I've inherited a site with a very big hierarchical taxonomy:

Vocabulary name: categories
--term: company name
---- many child terms
-- term: country
---- many child terms
-- term: issue
---- many child terms

I realized it would be easier to create Search facets and Views with appropriate content if I set up each of these parent terms

company
country
issue

as their own Vocabularies.

I created the new Vocabularies and used Taxonomy Manager to shift all the sub-terms to their new vocabularies.

I then updated the Article node content type to be associated with the vocabularies

company
country
issue

Unfortunately I also noticed the sub-term relationship between the Article nodes (all 4,000 of them) and the Terms was now lost in the display.

I've been searching for a way to update the vocabulary references for the Article nodes, but I'm still not sure - even after looking at the taxonomy module in Drupal 7 - how the vocabulary is associated with the node. I see new field_data_field_machine-name tables are created for each new Vocabulary - and that this is where the termID and node/entity ID as assocaited.. but where does the node/vocabulary relationship get established? And how can I updated the Vocabulary for a node automatically when I move a term from one vocabulary to another?

I also haven't found this exact questions in the many forums I've been looking in. I wasn't sure if this question was related? http://groups.drupal.org/node/214928. I've recently move this site from Drupal 6 to 7 so it doesn't help that there are a lot of dead tables from D6 cluttering up the database.

So I just would like to know if it's possible to simply do what I want to do?

thank you.

Comments

dawnbuie’s picture

Could anyone tell me at least if this is expected behaviour?

Does no one else need to slip up one big heirarchical vocabulary into a few smaller ones without losing all node references? I find it strange no one else has written about this. I see it has been an issue with Term Merge http://drupal.org/node/1253616

thank you.

garphy’s picture

I think it as to do with the fact that node -> term relationship is a field in D7.
That field explicitely states which vocabulary the referenced terms belongs to.
If terms are moved from a vocabulary to another, there should be :
1. another field on the content type to actually store the new reference
2. some code to actually reflect that change from the ancient field to the new one.

mh86’s picture

Well, it's difficult to find out, in which fields the term might be used, and don't think there is a general valid way to deal with it.
The data itself is definitely not lost, there are no changes to the node (or any other entity) - term relationship. Furthermore a taxonomy term reference field can have multiple vocabularies (although there is no interface for it).

mh86’s picture

there is a duplicated issue: #1598226: switching vocabs on terms should move terms to different fields
the author there says he has some basic code, so you can ask him as well

ronan.orb’s picture

I'm not sure if that is topic is the correct one, but I had a lot of trouble by moving terms around. I hav'nt checked the code base of Taxonomy Manager, but because each field keeps the tid in the field data tables. but the vocabulary in the field settings I ended in a mess of field which had loosed the terms, because the vocabulary has changed. I had like 10 taxonomy reference fields.

I ended up writing scripts which tryed to fix that.

So be aware! A simple move of a term from one voc to another will mess up your field data definitions

larskoeie’s picture

This is the not-pretty code I used for solving the problem described at #1598226: switching vocabs on terms should move terms to different fields. It was meant as a proof-of-concept for the customer to see if it solved the issue (which it did). I'm going to rewrite it and move it to hook_taxonomy_manager_term (probably) if the customer wants to buy it.

The code is inserted into taxonomy_manager.admin.inc, function taxonomy_manager_switch, line 1975

<?php
    $from_vocabulary
= taxonomy_vocabulary_load($from_voc);
       
$to_vocabulary = taxonomy_vocabulary_load($to_voc);
   
       
$fields = db_query('
          SELECT i1.entity_type, i1.bundle, f1.field_name as field_from, f2.field_name as field_to, f1.data as data_from, f2.data as data_to
            FROM `field_config` as f1
            left join `field_config_instance` as i1 on (f1.field_name=i1.field_name)
            left join `field_config_instance` as i2 on (i1.bundle=i2.bundle)
            left join `field_config` as f2 on (f2.field_name=i2.field_name)
            where f1.type=\'taxonomy_term_reference\' and f1.data like \'%"'
. $from_vocabulary->machine_name . '"%\'
            and f2.type=\'taxonomy_term_reference\' and f2.data like \'%"'
. $to_vocabulary->machine_name . '"%\'
           
         '
);
   
        foreach (
$fields as $field_obj) {   
           
$field_from = $field_obj->field_from;
           
$field_to = $field_obj->field_to;
           
$data_from = unserialize($field_obj->data_from);
           
$data_to = unserialize($field_obj->data_to);
   
           
$go = false;
            if (
is_array($data_from['settings']['allowed_values']))
                foreach (
$data_from['settings']['allowed_values'] as $delta => $data)
                   
$go |= ($data['vocabulary'] == $from_vocabulary->machine_name);       
            if (!
$go) continue;
           
           
$go = false;
            if (
is_array($data_to['settings']['allowed_values']))
                foreach (
$data_to['settings']['allowed_values'] as $delta => $data)
                   
$go |= ($data['vocabulary'] == $to_vocabulary->machine_name);       
            if (!
$go) continue;
           
               
           
$entities = db_query('
              SELECT entity_type, entity_id, language, delta
              from field_data_'
. $field_from . '
              where bundle=\''
. $field_obj->bundle . '\'
              '
);
       
             
              foreach (
$entities as $entity) {   
                  switch (
$entity->entity_type) {
                      case
'node' :
                         
$ent=node_load($entity->entity_id);
                          break;
                      case
'user' :
                         
$ent=user_load($entity->entity_id);
                          break;
                  }
                 
                 
// remove from from-field
                 
$temp=$ent->$field_from;
                  foreach (
$temp as $lang=>$values)
                      foreach (
$values as $delta=>$v)
                          if (
$v['tid']==$tid)
                              unset(
$temp[$lang][$delta]);
                 
$ent->$field_from = $temp;

                   
// add to to-field                         
                 
$temp=$ent->$field_to;                         
                 
$exists = false;
                  foreach (
$temp as $lang=>$values)
                      foreach (
$values as $delta=>$v)
                         
$exists |= $v['tid']==$tid;

                  if (!
$exists) {
                     
$temp[$entity->language][]=array('tid'=>$tid);                         
                     
$ent->$field_to = $temp;
                    }                 

                  switch (
$entity->entity_type) {
                      case
'node' :
                         
node_save($ent);                         
                          break;
                      case
'user' :
                         
user_save($ent);                         
                          break;
                  }
                           
                 
                }
        }       
    }
?>
David Lesieur’s picture

Status:Active» Needs review
StatusFileSize
new5.48 KB

Here's an attempt at a generic solution. It requires the Entity API module.

  • Patch supports any entity type that has taxonomy term reference fields affected by the term move. This is not limited to nodes and users.
  • Patch supports all languages (if any) in the affected entities.
  • If an affected entity supports revisions, only the default revision gets modified. Older revisions are unaffected.

Basically, the logic goes like this:

  1. Identify the [source and destination] fields affected by the term move.
  2. For each moved term, identify entities that are referring to the term.
  3. For each found entity, remove term from source field, and add to destination field.

A source field is any term reference field that is configured for allowing terms from the source vocabulary. Similarly, a destination field is any term reference field that is configured for allowing terms from the destination vocabulary.

An entity could have multiple source fields for the same source vocabulary. In that case, the terms get removed from all source fields where they appear. Similarly, an entity could have multiple destination fields for the same destination vocabulary. In that case, the terms get added to all the destinations fields. It is probably rare to encounter any of those cases in real life.

It is also possible that an entity has no fields matching the destination vocabulary. In that case, the terms get deleted from their source field(s) and, therefore, the entity loses its references to those terms. I guess Taxonomy Manager could show some warnings saying that moving terms across vocabularies should be done with care, only when all entities have fields for the destination vocabulary!

A shortcoming of the patch is that it could exceed the maximum execution time if thousands of nodes are affected by a term move, since saving an entity is a costly operation. Perhaps we should look into batching those operations?

David Lesieur’s picture

The previous patch was missing descendant terms when moving a term with children. Fixed.

ñull’s picture

I tried patch in #8 which indeed keeps the relationship. I created the field before using taxonomy_manager to move the tags to the other vocabulary. I didn't try what will happen when no field is created previously. Will this patch take care of that? Or will it warn that a field needs to be created. I mean it should be perfectly safe. I don't think the intention of this patch is to add the feature to delete tags, because that functionality is already available in the manager. Therefore best would be when it refuses the move when no destination field is available.

The only other problem I see is that the entities are in fact all updated in the process. This is not only slow, but it also changes update date. Although technically this might be correct, seen from taxonomy point of view there was no change in the node itself. The same tag applies to the node only the ancestry of the tag is what changed. A change in the tag ancestry should not affect the entity date in my opinion. It is a great problem for content moderation where the change date is crucial. In this I see the disadvantage of the reference fields in Drupal 7 and I think that fixing that would cost too much.

David Lesieur’s picture

Thanks for the tests and review!

Therefore best would be when it refuses the move when no destination field is available.

I agree, silently losing some data is not so satisfactory.

I have not checked yet if Taxonomy Manager has any mechanism at the moment for refusing a term move. We'd need such mechanism to also give the user some feedback about why the operation was refused.

The only other problem I see is that the entities are in fact all updated in the process.

The impact of the date change on a great number of entities is a valid issue. Unfortunately I see no way around this problem except ugly hacks that I'd rather avoid.

Approaches like #6 (using direct SQL queries) would not exhibit this problem, but I think bypassing the field and entity APIs is not the most maintainable way to implement the feature.

Unless a better idea comes up, I'd stick with the current behavior, although for it to really be technically correct, it should probably create a new revision on entity types that are revision-enabled. Also, documentation and/or UI should let users know that moving a term across vocabularies is not an operation to take lightly.

dwaine’s picture

Many thanks for your work.

The patch in #8 works on my installation.

I found that a contributed module (Link Checker) needed to be disabled while Taxonomy Manager is in use.

I also verified that thousands of nodes (about 8k) combined with multiple term relationships (6 vocabularies are referenced in my content type) will cause a process that exceeds the PHP timeout.

ñull’s picture

Approaches like #6 (using direct SQL queries) would not exhibit this problem

While #6 does use queries, I read in the code that in the end it does an entity save, which will affect the updated date too. I agree that api calls is the cleanest solution and the only way to go.

dwaine’s picture

The patch in #8 works for my installation. Unfortunately it times out as there are many DB reads and writes when the number of nodes is high or the number of term associations is high.

By utilizing a drush script I avoided the timeout issue. Unfortunately the time to move all terms and references for my use case exceeded 24 hours. I believe the high overhead of utilizing entity_save() for each term reference change was the root cause of the long execution time.

Next step was to write a drush function that avoided most of the DB abstraction layers and talked more directly to the DB. This code is not production quality. I have not tested this code outside one local development environment. If you choose to utilize this code against your data please backup your data first.

This script offers three drush functions:

taxman-tid utilizes functions included in the Taxonomy Manager module and move one term from one vocabulary to another. This function was more for testing than functionality due to it's 'one term at a time' limit.

taxman-vid utilizes functions included in the Taxonomy Manager module. This is the safe (and slow) script for moving all terms from one vocabulary into another vocabulary. It moves all top level terms to the top level of the target vocabulary. The child terms follow their parents. If the version of Taxonomy Manager is patched via the patch in comment #8 above then term references will be updated. (This assumes the node contains a term reference field for the target vocabulary. This term reference needs to be created before running the drush script.)

taxman-vid-sql offers the same functionality as taxman-vid. However, it gets the work done in a faster and more risky manner by manipulating the DB tables more directly. A 'drush cc all' might be a good idea after the script finishes.

Remember, this code is not production quality.

dwaine’s picture

dwaine’s picture

StatusFileSize
new8.44 KB

Attached file will need to be named 'taxman.drush.inc' when placed in your Drupal environment.

ñull’s picture

I somehow found this post: Saving node's fields without saving the node itself

By using field_attach_update('node', $node); instead of node_save we might solve the time stamp problem and the speed problem at the same time. If I have the time I'll give it a try.

le72’s picture

Any update?

dillix’s picture

I'm also still waiting update.

cimo75’s picture

The patch works but I haven t tried yet to move 1000s of nodes.
S.

dillix’s picture

I moved over 1k nodes with this patch!