Most large, complex sites will need to script Drupal 7 upgrades and run them over and over, tweaking and testing until they are ready to migrate the production site. Right now you can't really script this, so I think we should add a drush command that people can use.

Here is a rough first pass. I am calling each operation in a separate php process, since php fatals are pretty inevitable for many field types (especially right now!). I skipped trying to use the batch functionality, since it doesn't add much and is not as good at error avoidance (more operations per-process). I needed to move the $options construction to a separate function - we may want to do a similar thing to the _content_migrate_batch_process_create_fields() and _content_migrate_batch_process_migrate_data() functions also.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

KarenS’s picture

Great start! I was hoping someone would take this on. I'm very happy to re-structure the code as necessary to support drush.

KarenS’s picture

Status: Needs review » Fixed

I committed this much. Please feel free to supply patches for further changes.

KarenS’s picture

Status: Fixed » Needs work

I guess I will confuse things by marking this fixed. This much is committed but it all still needs work.

Owen Barton’s picture

Status: Needs work » Needs review
FileSize
7.69 KB

Here is a further patch (including the content_migrate.drush.inc file in the patch - it didn't actually make it in to the last commit).

This is now pretty far along - we now have commands to check migration status as well as migrate and rollback all fields, or just a selection. We have more useful status messages along the way too.

I think the last step is to move the content_migrate_get_options(), content_migrate_rollback(), _content_migrate_batch_process_create_fields() and _content_migrate_batch_process_migrate_data() to a separate file (content_migrate.inc or content_migrate.api.inc?) for clarity - the latter 2 functions could lose the underscore and "_batch" in their names. Any thoughts here?

KarenS’s picture

Actually the drush file is already there, in the /includes directory. But I'll swap this one in.

We need another step -- the process to create the new fields is all done in a single pass, which is too much if there are lots of fields. We need to create fields in batches too, as well as moving the data in batches. I haven't had time to make this change, but I assume the drush script will need something similar.

I'm open to moving more things into separate files, but we already have content_migrate.api.php so calling it something with 'api' in the name is going to be confusing I think.

Owen Barton’s picture

Thanks!

The Drush script creates each field as a separate process. The data in each field is also migrated as a separate process. I am not sure it's worth making it smaller batches for this in Drush, as it is not subject to the kind of timeouts the webserver php typically has (obviously in a webserver you do need fairly small batches). For Drush it is mainly using separate processes to allow it to continue even if there are errors in a particular field creation.

dww’s picture

Doing every instance of every field in a separate process sure slows it down. ;) At least during development and testing, I found I needed to use the content migrate UI to have any kind of turn-around on my migration so I could get things done. I ended up having one window with:

drush content-migrate-rollback -y; drush wd-del all -y

which I ran regularly, but then I'd still end up triggering the test migrations via the UI. Not sure if it's worth reconsidering this design to make it more efficient...

That said, this is great. It's so nice to have drush support for this!

Thanks y'all...
-Derek

Owen Barton’s picture

Perhaps we should add an option to run either/both the structure and data migrations in the same process. This should be easy enough - just add a drush_get_option check and an if/then/else to run the function directly rather than call it via drush_invoke_process_args and the command callback. That would give you the option of more resilient (but slower and potentially less accurate) migrations or faster (but liable to stop the whole show if there is an error) migrations. I think these are both valid use cases depending on the quality of the field migrations you are using, the size of your data, the number of times you need to run it etc etc.

KarenS’s picture

I made the change to the UI to process one field at a time. We need that in the UI for sites that have lots of fields. We already have had reports of timeouts in that case.

I think we have to assume that many people using drush will have large datasets with lots of fields and lots of nodes. So I think our drush handling should focus on making it easy to migrate a lot of data with as little interaction as possible.

Boobaa’s picture

Looks like drush content-migrate-fields migrates fields' data of only one node. @Bálint Kléri said calling batch_set() should be done somehow else. Gonna investigate this and report back if it's the case (and hopefully with a patch).

EDIT: I have actually misunderstood @KarenS in #5. Firstly, drush content-migrate-fields (esp. drush content-migrate-field-data has nothing to do with Batch API. Secondly, @KarenS declares that this is the expected behavior, though she didn't declare this explicitly.

So now it looks like I have to forge a patch that makes drush content-migrate-fields, esp. drush content-migrate-field-data work like if it was called from the UI via the Batch API. Shouldn't be hard, though, since we are not burdened by webservers' timeouts in a drush call.

Boobaa’s picture

First things first: I'm not quite sure this is a "feature request"; sounds more like a "bug report" to me. Anyways, I'm the little fish here, so trying not to pee into the aquarium.

Attached is a patch that solves my issue (by "solves" I mean I do have tried it with several fields at once and it worked like a charm) by calling _content_migrate_batch_process_migrate_data() again and again whilst it's not finished its job if and when called from drush, something like Batch API would do. OTOH I'm not doing extra rounds with Batch API for the same reasons stated above: drush commands are run from the shell, not a webserver, so they basically will not have any timeout; so do let's migrate all those fields' data at once!

Owen Barton’s picture

This looks sensible to me. Have you confirmed that "<" is correct, rather than "<=" (I haven't checked, but the latter seems more common in this kind of usage)?

Boobaa’s picture

Well, there is another problem with the patch, esp. on big sites with thousands of nodes: memory consumption. Attached is a patch to achieve the same results but this time utilizing the Drush Batch API. This time I do have checked that it converts everything (in my _tiny_ case), and I have the impression that it's even faster than the one before. OTOH it doesn't have that much output, either.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

Wow. hardly anyone knows about or understands Drush's batch API. Kudos to you ... I didn’t test this but I don't think we have to. We are reusing all the batch logic that content_migrate sets up. The batch processing is unit tested by drush. So, RTBC.

gdl’s picture

I was seeing the same behavior described in #10. For any invocation of drush content-migrate-fields <field_name> the data in that field was only migrated for a single node.

The patch in #13 worked to migrate multiple fields, including text, text_long, list_text, list_boolean, number_integer, and file, on a site of about 75k nodes.

Massive thanks to Boobaa for the patch!

moshe weitzman’s picture

Status: Reviewed & tested by the community » Fixed

Committed to master. Thanks.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.