Closed (fixed)
Project:
Drupal core
Version:
8.0.x-dev
Component:
migration system
Priority:
Major
Category:
Bug report
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
2 Jul 2015 at 19:13 UTC
Updated:
11 Sep 2015 at 21:04 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
phenaproximaComment #2
phenaproximaAdded the ability to opt-out of joining on the map table. This has no use case yet, but I need to be able to disable it in order to complete #2507607: [META] Replace Cthulhu-forsaken load plugins with migration builders.
Comment #3
phenaproximaRe-rolled without the high-water fixes (those are now in #2485385: Move highwater field support to the source plugin, and do not expose its internals on MigrationInterface) or ability to prevent joining on the map table. This patch fixes the handling of idlist in SqlBase, and removes empty idlists from the source unit tests.
Comment #4
phenaproximaComment #5
phenaproximaComment #6
phenaproximaAdded a test, but it revealed additional problems (namely that the count() method never filtered on the idlist, since the filtering was done in initializeIterator(), which meant that PHPUnit tests using idlist would not pass) and required some fairly extensive changes to SqlBase.
Comment #7
chx commentedPlease do not credit me, all I changed is a sizeof to count. In general we do not use PHP provided aliases to functions.
Comment #8
phenaproximaComment #9
phenaproximaComment #10
chx commentedOK the patch moving code around mislead me, this is not correct.
The priority order is in SourcePluginBase::next() and idList is way way higher than idmap. If idList is specified then idmap needs to be ignored.
Comment #11
phenaproximaFixed.
Comment #12
phenaproximaFixed some old/inconsistent stuff in SourcePluginBase::next().
Comment #13
phenaproximaRemoved a few defunct bits of configuration from some source unit tests.
Comment #14
chx commentedEdit: + // \Drupal\migrate\Plugin\migrate\source\SourcePluginBase::__construct( ensures the idlist is single long. comment needs to be removed but I am sure we will find more reasons to reroll.
Comment #15
phenaproximaCreated a follow-up issue for multi-value key handling. #2529744: SourcePluginBase does not handle multi-value IDs
Comment #16
phenaproximaUpgrading to major and marking as Migrate critical, because this bug affects every single source plugin currently in use. The only reason it has slipped past us is because there is no test coverage of ID filtering.
Comment #18
chx commentedEdit: Funny drupal.org glitch, the patch uploaded here is https://www.drupal.org/files/issues/2522012_17.patch the interdiff is https://www.drupal.org/files/issues/interdiff_14284.txt and this is likely to be the one to be committed.
Comment #19
phenaproximaTo be clear, the patch in the IS should be committed, not the one in #13.Comment #20
chx commentedComment #21
phenaproximaRe-rolled to re-delete a bit of a cruft from some of the source tests.
Comment #22
benjy commentedGiven this, i'm wondering if we should either remove idList entirely until we find a way to bring it back or just fix it all here? It's only going to be utilised by contrib?
Nothing here throws an exception?
Is joining alias.field like this correct? Eg, will it impact other database drivers? I know this is SqlBase but we have made fixes previously so it works with MS SQL and the like.
Does this mean that we never join on the map table if we have an id list? That doesn't seem right?EDIT: this is the correct behaviour SourcePluginBase::next() handles processing the row.Why do we call prepareQuery() now? Is it so count is correct when you provide an id list?
Comment #23
googletorp commentedComment #24
phenaproximaRe-rolled again, fixing #22-2. This patch refactors SqlBase::initializeIterator() a fair bit -- it splits map table joining and high-water support into their own methods, and calls them in prepareQuery(), so the interdiff will probably be fairly useless :( Sorry...I really don't like needlessly monolithic functions.
Comment #28
phenaproxima--Chumbawamba
Ahem...so, it turns out that certain source plugins are not compatible with having the optimization logic in prepareQuery(). (These are usually the ones that override initializeIterator().) In these cases, I changed them to call query() instead of prepareQuery(), since they don't really benefit from the optimizations applied by prepareQuery().
Comment #29
phenaproximaRe-implemented query tags and metadata in the source plugins altered in #28 by splitting it out into a new method, SqlBase::addQueryInfo().
Comment #30
phenaproxima@chx pointed out bad logic (I didn't know || was greedy) when checking if joinMapTable() or addHighWaterProperty() succeeded.
Comment #31
benjy commentedSeparating out into the two separate methods as we discussed in IRC looks good from the patch but it's hard to follow. I'll try review this properly locally later today.
Comment #32
mikeryanSo, an important point to note is that the ID list is purely a runtime concept - on a given migration run you will be able to say something like
And on that run only import those two specific records from the source. The ID list is not a fundamental alteration of the logical query - the count() method should always return the count of all available source records (not that it would be likely to be called in this context anyway).
Keep in mind that other source plugins need to implement ID lists by iterating the full source looking for matching IDs, and this represents the logic of the feature. Adding the list of IDs directly to the SQL query is purely an optimization available only to the SQL source plugin and should not be treated as a "real" modification to the logical query represented by the source.
Comment #33
phenaproximaOkay -- count() will no longer try to optimize the base query.
Comment #35
phenaproximaThis is postponed on #2499793: Several migrate_drupal migrations fatal error on count(), for the same reasons detailed in https://www.drupal.org/node/2485385#comment-10113792. The source tests need to be refactored so that counts can be independent of the expected results.
Comment #36
mikeryanMy feeling is that the idlist should not use configuration, since it's strictly a run-time option - attached is a version with a simple setter. Anticipating #2529744: SourcePluginBase does not handle multi-value IDs being implemented as a follow-up, I've also changed the ID list to an array of id key arrays, rather than an array of scalar IDs, so the API won't have to change with that follow-up (it will just need to figure out how to handle multi-value IDs when they do come in).
As previously, waiting on the count() patch to get the testing straightened out.
Comment #37
mikeryanAlso, see this patch being used in the real world at #2432977: Implement --idlist option on migrate-import.
Comment #38
phenaproximaUnblocked.
Comment #40
phenaproximaRe-rolled.
Comment #42
phenaproximaFixed. MigrateTestCase now has an $idList property, which will be passed to the source plugin's setIdList() method in MigrateSqlSourceTestCase.
I also removed the pointless constructor from SqlBase -- technically patch noise, but you know what they say about cleanliness -- and, more importantly, an assert from MigrateTestCase::queryResultTest(). The assert in question was checking the source iterator's count against the number of expected results, which is incorrect because, according to @mikeryan, the source plugin's count() is always supposed to return the full count of rows in the source data, before any filtering. Whereas the expected results may well have been filtered and/or massaged before the assert is reached, but after setUp() has run (NodeIdListTest being a great example).
Comment #43
benjy commentedLooking good, the split up into new methods seems like a nice clean-up, few nitpicks and questions below:
Missing param docs.
Is array[] valid syntax for an array of arrays? Change which seems to be just syntax.
Nice!
Is checking the highwater against an empty string still a thing? I thought we made that NULL ages ago.
It's a bit worrying that every source has to remember to addQueryInfo()? Seems like something that could easily be overlooked?
Comment #44
mikeryanI discussed this with phenaproxima a day or two ago, sorry I didn't update the issue - I believe we should be able to remove idlist handling entirely from core and put it in contrib's hands if we implement a prepareRow event as a followup to #2535458: Dispatch events at key points during migration. I wouldn't abandon the approach in this issue entirely pending a POC, but I wouldn't invest significant time in it either until we see if that works as I imagine. The basic idea would be the clients could listen to prepareRow events and skip rows there if they don't match the idlist (the general functionality), and they could also do a query alter on SQL sources to optimize the queries up front.
Comment #45
mikeryanTo expand on my last thought - I see idlist as purely a front-end feature, and with some core tweaks that would be generally useful (rather than narrowly targeted to idlist handling) I think the contrib side can do idlists itself. On the core side, the main thing that I think is missing is a way to skip a row without recording it in the ID map - right now prepareRow(), when receiving a FALSE return from a hook, unconditionally records it as STATUS_IGNORED in the map table, which isn't appropriate in the idlist case. My thinking is to add a boolean to MigrateSkipRowException to indicate whether this particular row skippage should be recorded as "ignored", or the skip should simply be done silently. So, this patch as I'm conceiving it at the moment would extend the exception and add handling of the exception to prepareRow(), as well as removing the existing broken idlist stuff.
On the contrib side, the response to prepareRow() would check the incoming source ID(s) against the idlist, and if they don't match throw the don't-record-as-ignore MigrateSkipRowException - this provides the general functionality for all source plugins. Next step would be altering SQL plugin queries as an optimization in that case.
Let's see how this all works out...
Comment #46
mikeryanHere it is - the functional changes (adding a boolean option to MigrateSkipRowException, and catching and handling it in prepareRow) are actually fairly small and simple, the bulk of the line changes are boilerplate - removing idlist references from the tests in particular, moving prepareRow() calls in source plugins and removing their returns (no existing source plugins skip rows), adding a couple parent::prepareRow($row) calls to source plugins that were missing them, and removal of the previous (non-functional) idlist handling re-indenting a chunk of code.
Comment #47
mikeryanAlso, note that the patch #2432977: Implement --idlist option on migrate-import uses this to implement --idlist for the drush migrate-import command.
Comment #48
mikeryanLooking at writing tests, #2544690: Create new source plugin taking data from plugin config would be very handy...
Comment #49
mikeryanLet's get #2544690: Create new source plugin taking data from plugin config in and use it for tests.
Comment #50
mikeryanIn the meantime, I realized removing the returns from the specific source plugins' parent::prepareRow() was wrong - I was thinking propagating the skips was now event-driven, but these are actually above the place where the events are caught (SourcePluginBase::prepareRow(), a.k.a. 'parent' in those calls). So, they need to propagate the FALSE that results from a hook throwing the event, up to next(). I still want to refactor that and just use events completely up the chain, but I'll revisit that when following up in #2488836: Refactor prepareRow() to add events in addition to hook - for now, we'll keep the existing behavior and just do what's needed to allow contrib to throw the skip events (and in particular to implement the idlist functionality).
I did keep the ContactCategory, Upload, and D7 File prepareRow() fixes (they weren't calling the parent at all), as well as the missing return in CckFieldValues - one might argue fixing the existing prepareRow() bugs isn't in scope, but since those bugs prevent the row skips we're dealing with from working for those sources, I'm comfortable fixing them here.
A nascent test module is in here, waiting on #2544690: Create new source plugin taking data from plugin config to provide the test data.
Comment #51
benjy commentedI know this is pre-existing but is there an issue for itemLimit we can link here?
Comment #52
mikeryanActually, we should take the @todo for itemlimit out, the plan is to also do that entirely in contrib: #2432983: Implement --limit option on migrate-import. That is dependent on #2545672: Handle various migration interruption scenarios, which could use a review (hint hint;).
Comment #53
mikeryanRetitled to reflect the current direction. Building on the status/interruption patches recently committed, we need to extend MigrateSkipRowException to permit skipping a row without writing the map table, and we also need to be able to handle that exception at prepareRow() time.
Comment #54
phenaproximaSetting to NR so we can get moving on this again.
Comment #56
phenaproximaWell, we all saw that coming.
Comment #57
mikeryanRerolled. Hoping #2544690: Create new source plugin taking data from plugin config gets in soon so we can have a nice simple test, but I can write an ugly one if need be...
Comment #58
phenaproximaAt a glance, looks good to me. I have a few relatively minor complaints (below), and I'd like @benjy or @chx to review as well before commit.
This reads like "record (noun) in map". That's confusing. Can this be renamed to $shouldBeRecorded or something a bit clearer?
Ditto. Maybe this should be something like shouldBeRecorded(). $exception->shouldBeRecorded() or $exception->isRowIgnored()
Can you add a comment above this line? It's a bit hard to follow.
Comment #59
mikeryanOK, I've implemented a test without the benefit of the nice source plugin, so it has its own.
I'm having trouble coming up with a clearer name that recordInMap, anyone else have any thoughts?
Comment #60
phenaproximaLooks good to me.
Comment #61
mikeryan#2544690: Create new source plugin taking data from plugin config is in (woohoo!), so I'll reroll the test here to make use of the new source plugin.
Comment #62
mikeryanHere we go, interdiff shows how much nicer the embedded_data plugin makes testing.
Comment #63
phenaproximaTwo things. First, I'd like @benjy or @chx to sign off before we commit. Second, I'd like my comments in #58 addressed; I think it is poor (although certainly not earth-shatteringly awful) DX to call the "record in map" flag recordInMap. It sounds like a noun, but it is a verb. My vote is for shouldBeRecorded() or shouldRecordInMap().
Comment #64
benjy commentedDidn't thoroughly look at the tests but the rest looks OK and removing idList from core is fine by me.
Looking in SourcePluginBase::prepareRow() it does all the map stuff, which is nothing like what most source plugins will do in prepareRow(). Could we move that follow-up so calling parent::prepareRow() in source plugins isn't so critical? Maybe that code could move to an event after #2488836: Refactor prepareRow() to add events in addition to hook
Did we have an issue for this?
Comment #65
mikeryanYes, I definitely want to follow up with that issue to cleanup prepareRow() - even if I can't get buy-in to the event concept, it needs some refactoring (see also #2558047: [meta] Dealing with conditions during migration (messaging/exceptions) is confusing and inconsistent).
re: "@todo Support itemlimit." - I should have removed that comment with #2545672: Handle various migration interruption scenarios, the item limit feature is now implemented entirely in contrib as a result.
Comment #66
mikeryanRemoved obsolete references to itemlimit.
Comment #67
mikeryanComment #68
phenaproximaLooks fine and dandy to me.
Comment #69
webchickI was a bit skeptical about this, since instead of fixing idlist, it's removing it. However, Mike assures me that there's a sister patch in the Migrate Plus queue to add it back, and benjy says in #64 that he's ok with this not being in core. It still seems like a really useful feature to me, so maybe we can try and add it back in 8.1.x or so.
Committed and pushed to 8.0.x. Thanks!