I'm confused by the behavior I'm experiencing when running drush migrate-import
with the --update
option.
Expected behavior
The drush command description for --update says: "In addition to processing unprocessed items from the source, update previously-imported items with new data". My understanding is this should essentially re-import ALL items; entities created on previous imports will be updated by overwriting with their source item's data.
Experienced behavior
I have a migration for which I have already run the import and status shows all items have been imported. When I re-run migrate-import myMigration --update
, I get Processed 0 (0 created, 0 updated, 0 failed, 0 ignored)
.
Upon querying the map table I found that all rows were marked with needs_update = 1
. So that part worked, but somehow the query for source items to be processed is returning nothing...
My guess what's going on
I did some digging and in MigrateSourceSQL::performRewind()
I noticed the condition on OR needs_update = 1
is added only if the map table is joinable. However, the condition on the highwater field is always added (when the highwater field is defined). So therefore, some of the items marked as needs_update could be excluded from query results because they're "below" the highwater mark. As far as I understand, the corresponding map row's data is added later in MigrateSource::next()
, which pulls in the needs_update status, but that doesn't help for the rows that were already axed via the highwater condition in the query.
So, is this the intended default functionality for the --update option? I did find #2312075: Add an --ignore-highwater option to allow full update without having to manually change the db which unearthed the highwater limitation here, but its patch providing the --ignore-highwater option feels (to me) like a workaround to make --update behave as it seems to be documented, i.e. operating on all items. If --update is designed to only include items that are new/updated since last run, that seems to duplicate the basic migrate-import
behavior.
I'm categorizing this as a support request in case I'm not understanding the intentions here. Thanks in advance!
Comment | File | Size | Author |
---|---|---|---|
#7 | migrate_import_update-2379289-7.patch | 2.37 KB | mikeryan |
Comments
Comment #1
tea.time CreditAttribution: tea.time commentedComment #2
tea.time CreditAttribution: tea.time commentedBy the way, a quick workaround I found is to run
drush migrate-import myMigration --update
which sets all rows with the needs_update = 1 status, thendrush migrate-import myMigration --needs-update
which directly grabs (up to 10000) items marked with needs_update = 1 to be re-imported, regardless of highwater or anything else.Comment #3
mikeryanYeah, the logic around handling queries is pretty convoluted to deal with all the possibilities that could affect what gets selected - --update, --idlist, highwater marks, track_changes, ... Of course, we can't incorporate needs_update from the map table when the map table is not joinable. We don't want to include the highwater mark in the query if there are items below the highwater mark needing update - perhaps we could query the map table and if there are any needs_update=1, leave the highwater mark out of the query, letting MigrateSource::rewind() enforce the highwater mark... But messing with that complex logic is not something I want to risk at this stage in Migrate 2.6, let's pick it up with 2.7.
Comment #4
mikeryanTime to start tagging issues to be addressed in the next release.
Comment #5
mikeryanComment #6
wonder95 CreditAttribution: wonder95 commentedI had an issue all ready to submit wondering about why --update wasn't updating previously migrated content, and I saw the workaround in #2 right before submitting it. It works like a charm.
Comment #7
mikeryanI think I've got this sussed - letting the testbot double-check...
Comment #9
mikeryan