We're using migrate to move data from a D5 site to D7. There was substantial use of node revisions and those would need to stay intact. I've looked through all the examples and also external reources, but couldn't really find an applicable example.

I know migrate is not targeted at Drupal->Drupal migrations, but the legacy site didn't make use of CCK etc and has data fairly well contained in a few custom tables. I'd also think that even with other non-Drupal data sources there often could be a requirement to fit data into the node revision model.

Any pointers are really appreciated.

#24 noderevision-1298724-24.patch4.65 KBasherry
PASSED: [[SimpleTest]]: [MySQL] 148 pass(es).
[ View ]
#14 noderevision-1298724-14.patch4.29 KBcthos
PASSED: [[SimpleTest]]: [MySQL] 148 pass(es).
[ View ]
#13 noderevision-1298724-13.patch3.68 KBcthos
PASSED: [[SimpleTest]]: [MySQL] 148 pass(es).
[ View ]


moshe weitzman’s picture

AFAIK, there is no support for migrating old revisions in migrate. You could patch our entity/node destination handler to support that ,or write a new handler which does this. This is going to be needed for #1052692: New import API for major version upgrades (was migrate module), so I'd love others to chime in with tips about how this could be accomplished.

mcarbone’s picture

Subscribe. This is something that has been requested for a custom CMS -> Drupal migration I am working on, so it has a bigger use case than just Drupal -> Drupal migrations.

mikeryan’s picture

Category:support» feature
berenddeboer’s picture

Argggh, gotta do this too. As this must be done, I'll post some notes on what I did to get something.

mikeryan’s picture

Component:Documentation» Code

It's an interesting problem - as I said today in another issue, the Migration class is fundamentally based on a 1-to-1 correspondence between source rows and destination rows, so having a node migration import from multiple revisions into multiple revisions is unnatural. It suggests to me that revisions have their own destination handler, derived from the the existing MigrateDestinationNode. It would add 'vid' to the fields(), set 'revision' appropriately... I don't have time to try it now - could it really be that simple? Probably not, but that's where I would start...

darrenmothersele’s picture

I'm just starting to look at a migration that needs revision history. I'd be interested to hear if anyone made any progress on this, and I'll post details of my efforts.

mikeryan’s picture

Version:7.x-2.2» 7.x-2.x-dev

I haven't come back to it, and it's not on my near horizon. I'm certainly open to a patch, though.

pixlkat’s picture

I also needed to do this as I had an old database I was trying to import into Drupal. I created two content types, one of which was a container for the other one (I might have done this differently, but this is how it worked out). The child nodes have multiple revisions per node. I set systemOfRecord = Migration::DESTINATION set revision=1 in the field mapping, and mapped the date from my revisions table to $node->changed. In my case, there is only one person who was doing data entry on the old system, so I didn't care about the revision_uid, but I assume you could map that as well.

I got it to work after i made some changes to my migration classes and one change to MigrationDestinationNode.

In node.inc, the following code appears in the import() method:

if (isset($node->changed)) {
$changed = MigrationBase::timestamp($node->changed);

In my case, this didn't quite work as this is called before $this->prepare($node, $row) which is where my new value for changed is set. I added the above call (wrapped in a test to see if systemOfRecord == Migration::DESTINATION -- does this matter?) after the $this->prepare($node, $row) line. I got the correct revision dates then; apparently changed is not set prior to that so it used the current date/time.

My data had the revisions in a separate table, with both an autoincrement primary key and a specific revision_id field which was the id per parent id. I used a combination of the id of the parent and this revision_id as my source key. Once I did this I got all 2593 revisions appropriately attached to my 338 nodes.

Thanks for this module.

darrenmothersele’s picture

I got this working, I'll post my code when I get chance later.

I basically created a new Destination handler (called Revision, subclassed from Node) and defined a new key schema on vid. I then create a node migration that imports all nodes at first revision, then a revision migration that imports all revisions. It worked quite nicely on moving Atrium content to Drupal 7.

mikeryan’s picture

I'm looking forward to seeing your work, I expect to have a need for revision migration in the near future...


arscan’s picture

Hi Darren -- did you publish that code somewhere? It would save me the trouble of writing it myself because I'd like to keep revisions as well.


darrenmothersele’s picture

Sorry, took a while for me to get to this, but I have posted my revision migration code here.
It needs some work, as it throws a few warning messages, but I'm using it successfully to migrate all nodes and revisions from Atrium to Drupal 7.

cthos’s picture

Status:Active» Needs review
new3.68 KB
PASSED: [[SimpleTest]]: [MySQL] 148 pass(es).
[ View ]

I've taken the work @darrenmothersele has done here, and modified it a bit to work into a patch. I the destids that were getting saved into the map table weren't actually revisions, so rollback would not have functioned properly before.

Update isn't likely to work too well since there's not a good way to update node revisions via the api.

Finally, I added a property check on $row to tell MigrateDestinationNode not to compare destination ids to the incoming nid, since with revisions they're not going to match what's in the map table.

cthos’s picture

new4.29 KB
PASSED: [[SimpleTest]]: [MySQL] 148 pass(es).
[ View ]

After chatting with @mikeryan a bit, moved the property check from $row to be a class level parameter.

Ryan Weal’s picture

Wow, impressive that #14 mostly works out of the box! It applied it and then ran my node migration. It brought the body field as expected and it brought in my taxonomy field as well... but only for new versions since the last full import.

I have extended migrate_d2d classes in code so it only brought the latest version of the node... then I went to the D6 site and modified some data. Re-running the import (without rollback) picked up the *new* revisions.

So I must now override the default query to get the oldest rather than latest revision and let the new field handler to do what it should to get the revisions since that revision. I'll be back with more feedback as I get more test results.

Ryan Weal’s picture

My dbtng query overrides are a bit rough so I just went and overrode the migrate_d2d/d6/node.inc file directly, changing the joins to node_revision to be n.nid = nr.nid both for the main query and for the subfield (CCK) query.

The results are pretty good... I did an initial import that had some problems as I forgot to update the CCK query. I am pretty sure I rolled back, but now my import is REALLY slow and it seems to be updating all the revisions from the last run. There were only 71 nodes in this group but it is likely going to take a full hour to update each revision (there are a lot).

Tomorrow I'll look into the rollbacks and see if they are working properly or not. I tried using an old snapshot (so a clean install) but it is still a slow process, 257732 "items" for 71 nodes!

Ryan Weal’s picture

Also: I'm getting major timeout errors now every ~10 nodes. I have to run in the UI for this project and this is the first thing that has caused timeouts. I'm calling it a day!

Ryan Weal’s picture

I have it working! Though I will need to revisit it so that I can clean up the code.

Essentially what I have setup is the following:

A node migration, set to use the original version of the node. I modified migrate_d2d (the source of my node migrations) with the following:

    $query->innerJoin('node_revisions', 'nr', 'n.nid=nr.nid'); // modified
    $query->groupBy('nr.nid'); // new
    $query->addExpression('MIN(nr.vid)', 'vid'); // new

Then I created a new migration for the content type's revisions. I did get it working using a heavily modified version of the example in the blog post... but word to the wise: that blog post uses the old API. So you may be better off writing a new migration from scratch.

In particular, the api updates to that example would require:

  • Removing parameters from the constructs, using (array $arguments) and ($arguments) for the parent call
  • Removing the ambiguity with the vid field, and adding an alias in the mapping
  • Adding a source query with map_joinable FALSE
  • Abstact. I'm not abstract enough to deal with this, so just a normal class for now.

... in getting this far I realize I may have taken a completely different approach than that was intended here... but I'm seeing sample data and things are running smoothly.

Ryan Weal’s picture

The handler class is working quite well. I was able to augment my migrate_d2d queries by using the following snippet of code in my node migrations:

   protected function query() {
     // Get the default query
     $query = parent::query();

     // Unset the join to the node_revisions table
     $ref =& $query->getTables(); // chx says this is the way

     // Replace the join statement and filter down
     $query->innerJoin('node_revisions', 'nr', 'n.nid=nr.nid');
     $query->addExpression('MIN(nr.vid)', 'vid');
     return $query;

Right now I'm in the process of manually adding fields into the migration. It would be nice to have a migrate_d2d class to tap into as the fields come through so nicely when getSourceFieldInfo() has data.

Though this has not been tested by our team yet I'm thinking this is really close to RTBC.

Ryan Weal’s picture

Rollbacks seem to be an issue for the added fields, including field_data_body. I cannot replicate it every time, but it seems to fail more often if running the rollbacks as a batch, rather than doing the revision migration *then* the node migration.

Ryan Weal’s picture

I'm down to my last two issues with this patch. The first seems to be noted above and has code to target it... the "changed" (updated) date attached to each revision ($row->timestamp).

I have in my code a complete function that is getting called which should be (and is) updating the node_revision table, however, once the migration is finished ALL the items get updated to the time of import. I even query the node_revision table in the complete function to see if it is working... and it is! Then it gets overridden by something else. Really confusing... any tips on this would certainly help.

The second issue is less of a concern - the original revision (from the node migration) has the title of the last revision, not the original. This is of minor concern to me and is probably more specific to my custom code than the patch.

Ryan Weal’s picture

Issue summary:View changes
Status:Needs review» Reviewed & tested by the community

Marking this as RTBC because revisions do come through and it would be useful to have this patch as-is available for creation of migration classes.

I was able to use this patch in conjunction with migrate_d2d by overriding the base query in the node migrations (see note above), and for the revision migrations I extended DrupalNode6Migration so that I could re-use the same field definitions with my node classes.

One thing I was not able to do was to get the UID and the "changed" value to work, so all revisions appear to have been created by the person running migrate at the time of the migration. I feel like I've tried everything to get around that... it won't work in prepareRow, prepare, complete, parent::complete. I have spent probably half of my time on those two fields with no resolution. I realize this is something that was actually a bug with Drupal7 that has been "fixed" but I can't find any way to get it to work. I even tried looking at migrate/plugins/destinations/node.inc and tried to force the thing to work in the import function... no such luck. I noted the difference between uid and revision_uid but both fail. If I bang my head on my desk WORKBENCH (I was banging my head on workbench_moderation) any more over this I'm going to need to go to the hospital!

The most frustrating part of all is that it works, each revision showing the correct data, until the migration terminates... then it resets everything after the fact back to my user and the migration time. Ugh!

I ended up doing a prepareRow to set the $node->log message and adding the date and user that performed the revision. I hate this "solution" but it will have to work.

PS. Darren's blog post above has been moved to http://darrenmothersele.com/blog/2012/07/16/migrating-node-revisions-dru...

... anyway, thanks for this patch. It certainly saved me a ton of time. If anyone can explain what I missed with the revision_uid and changed fields I'm very keen to hear it.

Ryan Weal’s picture

Found out why my dates and revision owners were not coming through: workbench_moderation module. Special thanks to @Fengtan for finding it!

We should be able to get around that issue using this: https://drupal.org/node/1343112

... so this patch is definitely definitely RTBC!

asherry’s picture

new4.65 KB
PASSED: [[SimpleTest]]: [MySQL] 148 pass(es).
[ View ]

So I believe that this destination handler is meant for a node revision migration that is dependent on a separate node migration.
If that's correct I think I had some issues with rollback. If you delete all the migrated node revisions, it leaves the node not attached to the correct revision, which makes the node look like it doesn't exist. It just means you can't safely rollback a node revision migration without also rolling back the node migration.

At least unless there is something I missed. I added some code to the patch that worked for me.

asherry’s picture

Status:Reviewed & tested by the community» Needs review

Changing the status just in case the extra code is deemed necessary. I haven't figured out how to get patch #14 to work without it though.

Ryan Weal’s picture

I haven't tested the updated patch in #24 yet but I did spend some time revisiting the previous work I had done to try to preserve the revision VIDs... and I'm pleased to report it is possible with a bit of a hack.

If you want to preserve the vids you can copy the implementation of node_save($node) into your own module and name it something like mymodule_revisions_save($node). In your implementation delete the "vid" handling where it unsets the vid. That entire block of code. Then go back to the migrate module and replace the node_save call in the node.inc file that this patch updates to have it use your mymodule_revisions_save($node). Doing this, rather than patching the core node module, will remove the risk of putting this code into production so your revisions will continue to work even if you forget to disable migrate after you deploy.

Upon implementing preserved vids I discovered a possible bug in this patch: it continues to import the original revision even though it is already there. So it needs a test to filter out the first item.

Sorry no code samples yet... left that code at the office this evening. I could easily be coerced into creating a new blog post on this. This is one area where migrate is in need of some docs... for another night. I have another migration to do before morning...

Ryan Weal’s picture

The problems I noted above regarding timestamps and owners was caused by workbench_moderation. We're now using preImport and postImport to disable that module (ignoring dependencies).

alias_hierarchy and link_checker area also candidates for module disable. FYI.

Ryan Weal’s picture

We ended up going even further with this patch... hard to believe, I know.

Now we preserve the VIDs... which has proven to be quite difficult only because it broke the migrate map for awhile but we have now managed to get past that issue.

To achieve preserved VIDs, we had to do a few things:

  • Copy node_save to our own implementation, comment out the vid setting/unsetting aspect of node_save,
  • Patch migrate (or, umm, update this patch) to call our custom node_save for the node destination that this patch relies on,
  • Do an addFieldMapping for vid = vid similarly to how you would preserve nid,
  • Lastly, and CRITICALLY, in our custom node_revision migrations we had to add $query->orderBy('n.changed'); to our base query. This was very difficult to debug! Huge thanks to fengtan for finding it.
Ryan Weal’s picture

One additional (last?) note on this patch:

The patch imports all revisions... but if you have a node migration already and you import the original... then you have two copies of the original. We discovered this a bit late in the process so we just dealt with it in prepareRow of our custom node_revision migrations:

= db_select("node_revision", "nr")->fields("nr", array("vid"))->condition("nid", $row->nid)->condition("vid", $row->vid)->execute()->fetchField();
    if (
$result) {
FALSE; // it was the original revision.

Thought it would likely be a ton of work, it would be even better to approach this differently - by keying ALL node migrations on vid and importing revisions by default. In our case, that would have involved patching migrate, migrate_d2d and all our custom migrations. Perhaps something for a future major version of migrate.

mikeryan’s picture

Component:Code» Documentation
Status:Needs review» Needs work
Issue tags:+Migrate 2.6

Committed, thanks! Now, who wants to add documentation under https://drupal.org/node/1006988?

  • Commit 7326cb5 on 7.x-2.x by mikeryan:
    Issue #1298724 by darrenmonthersele,cthos,asherry: Add a node revision...
geerlingguy’s picture

Status:Needs work» Needs review

I've added a new documentation child page: MigrateDestinationNodeRevision. I'm planning on testing this myself, but in lieu of that, anybody else can review/edit the page. If it's all good, let's mark this fixed.

Remon’s picture

So shall this ticket be marked "fixed" yet?

dsnopek’s picture

Adding related issue for adding support in migrate_d2d: #1834016: Add support for migrating node revisions

Leeteq’s picture

This is absolutely great.
It could benefit from more eyes and confirmations, though, in addition to help inviting more documetors, so lets keep this issue open for a while longer.

mikeryan’s picture

Status:Needs review» Fixed

Time to mark this fixed - any further issues (including documentation) should be opened as fresh issues.

Status:Fixed» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.