In the course of hammering out #2545672: Handle various migration interruption scenarios, I noticed that isComplete() (which is meant to identify whether a given migration has completed processing all source items) looks like this:
public function isComplete() {
return $this->getMigrationResult() === static::RESULT_COMPLETED;
}
However, this is not reliable. The migration result represents whether a given migration run as requested completed - a run that was requested to be partial (using --limit or --idlist, say) reports a COMPLETED result, but that does not mean that all source records for the migration have been processed. Also, when performing continuous migrations (as in a daily feed), there may now be new source records available to a dependent migration, and the current migration should not run until those records have been processed.
isComplete() should be implemented as it was in D7 migrate - it should return TRUE iff the number of available source records (source plugin count()) is equal to the number of processed records (id map processedCount()).
Comments
Comment #2
mikeryanLetting testbot demonstrate, it'll definitely fail. The problem here is that our tests are basically mocking the bare minimum of data for dependent migrations so they can focus on the migration at hand, and the fix ups that bare minimum. Specifically, that the id mappings we're faking for the dependent migrations don't reflect the dependent migrations' actual source data, which didn't previously matter. There are more urgent issues to deal with, so I'll let this be for the moment.
Comment #4
benjy CreditAttribution: benjy at PreviousNext commentedAre we removing the RESULT_COMPLETED status then? If it doesn't mean the migration is complete then lets get rid of it entirely.
Comment #5
mikeryanNo, RESULT_COMPLETED is still used to indicate that (only) the current operation has completed (in particular it is returned from import() so the UI/drush command knows if the requested operation is done or not). It does not mean that the migration in question has "completed" in the sense that all source rows have been processed.
The RESULT constants were always meant to represent the result of the current operation (i.e., they're ephemeral) - I think someone misunderstood them to have a more persistent meaning, which led to this situation. It really wouldn't make much sense to persist them in state at all, except that now we're using that persistence as part of the interruption processing...
Comment #6
benjy CreditAttribution: benjy at PreviousNext commentedWell I think something has to be renamed. Having an isComplete method on an entity that doesn't use an internal status called RESULT_COMPLETE is both madness and a massive DX WTF.
We should probably put together a plan of action which ends up with us having methods and status that don't mean different things and confuse even the core migrate developers.
Comment #7
mikeryanPerhaps if we renamed isComplete() to allRowsProcessed() (or something like that)?
Comment #8
mikeryanUpdated with the method renamed to allRowsProcessed(). Still need to fix tests with incomplete id mapping fixtures.
Comment #9
benjy CreditAttribution: benjy at PreviousNext commentedYes I very much like that, much better!
Given the new method, can't isComplete() stay with the current implementation? I much prefer to have methods on the entity rather than contrib and other coding having to do $migration->getResult === RESULT_* etc
Comment #10
mikeryanThe result is returned from import() (and soon from rollback()), contrib uses the return result to report the result of the operation. At this point (with the introduction of allRowsProcessed()), the only reason the result isn't just a local $result in the import() and rollback() functions (i.e., the only reason it gets set in state at all) is so interruptMigration() can indicate the result to be returned on interruption. Actually, the setMigrationResult() at the end of import() (and coming in rollback()) should be removed, because nobody's looking at it after then - indeed, the result should probably be cleared from state right after getMigrationResult() in the interrupt case.
Comment #11
benjy CreditAttribution: benjy at PreviousNext commentedComing back to this allRowsProcessed() still makes me smile, it's nice to understand a method just by reading it :)
Maybe we should revisit the interrupt stuff, if we could remove the status or maybe rename it so it's clear it's used only for interrupt i think that would be a good DX improvement.
Comment #14
mikeryanWTF? Retesting...
Yes, the interrupt stuff originally had a separate state variable migrate_interruption_result or some such, but someone suggested just using the existing migrate_result. So, since we're obsoleting any non-interruption use of migrate_result, should I just go ahead and rename it in this issue so the narrower use case is clear? Maybe if I snatch some time from UI work this week I'll do that...
Comment #16
mikeryanOh, duh, that test never passed, was thinking of another issue... #2554321: Clean up Migrate's test suite, by fixing all the tests to really run dependent migrations instead of using incomplete fixtures, should take care of most of the errors we're seeing here, so postponing on that.
Comment #18
mikeryanThis is also blocked on #2570541: Track simple configuration migrations like normal migrations - right now, with the isComplete() patch all migrations dependent on variable migrations will fail because isComplete() will incorrectly fail, because variable migrations don't properly use the map table. So, I believe once that and the test cleanup patch are committed, this patch should pass tests (or at least, the only failures should be tests incorrectly assuming the current broken behavior).
Comment #19
mikeryanHere's a version that removes the migration result in favor of the more specialized interruption result, and also adds the ability to clear the interruption result once it's been fetched. Again, no point in testing this until the two dependent issues are committed.
Comment #20
phenaproximaUnblorked.
Comment #22
phenaproximaComment #23
mikeryanYep, I'll get right on that (and write up a change record too).
Comment #24
mikeryanComment #26
mikeryan*sigh*... The test failures above are exposing real bugs elsewhere: #2577155: Some source plugins produce duplicate rows. This patch won't pass tests until that's fixed, so we need to postpone this again...
Comment #29
phenaproximaComment #30
mikeryanUnblocked.
Comment #33
mikeryanThis patch is an *excellent* canary for problems elsewhere - this one is that the upload test is still using a manually-created fixture for the file migration it depends on, and fixing isComplete() exposes that the fixture is incomplete.
Comment #34
mikeryanFound an answer to the upload test problem. Since it actually only tests a subset of files, it shouldn't need to import them all, so we need to tell it not to check for completeness - this is exactly what the --force option on drush migrate-import is for (#2432991: Implement --force option on migrate-import), so we use the same technique that --force does to skip the check.
This should *finally* go green - I'll work on the change record in the meantime.
Comment #35
mikeryanDraft change record added.
Comment #36
phenaproximaLook good to me, this does. RTBC assuming all DrupalCI bots pass.
Comment #37
webchickCommitted and pushed to 8.0.x. Thanks!
Comment #40
mikeryanComment #42
quietone CreditAttribution: quietone at PreviousNext commentedpublish the cr