Problem/Motivation

Migrate supposedly has a feature where one can provide a list of source ids to be migrated. This does not work with the current SqlBase source.

Proposed resolution

Fix it. Fix other glitches found.

Remaining tasks

Review, commit, party!

User interface changes

API changes

Data model changes

idlist moves under source configuration.

CommentFileSizeAuthor
#67 interdiff.txt3.34 KBmikeryan
#67 remove_broken_idlist-2522012-67.patch38.63 KBmikeryan
#66 interdiff.txt1.45 KBmikeryan
#66 remove_broken_idlist-2522012-66.patch38.66 KBmikeryan
#62 interdiff.txt1.94 KBmikeryan
#62 remove_broken_idlist-2522012-62.patch38.63 KBmikeryan
#59 interdiff.txt6.51 KBmikeryan
#59 remove_broken_idlist-2522012-59.patch39.8 KBmikeryan
#57 remove_broken_idlist-2522012-57.patch36.35 KBmikeryan
#50 interdiff.txt14.82 KBmikeryan
#50 idlist_does_not_work_at-2522012-50.patch37.53 KBmikeryan
#46 idlist_does_not_work_at-2522012-46.patch50.71 KBmikeryan
#42 interdiff-2522012-40-42.txt2.98 KBphenaproxima
#42 2522012-42.patch39.15 KBphenaproxima
#40 2522012-40.patch36.99 KBphenaproxima
#36 interdiff.txt5.65 KBmikeryan
#36 idlist_does_not_work_at-2522012-36.patch36.75 KBmikeryan
#33 interdiff-2522012-30-33.txt491 bytesphenaproxima
#33 2522012-33.patch36.99 KBphenaproxima
#30 interdiff-2522012-29-30.txt1.23 KBphenaproxima
#30 2522012-30.patch37.25 KBphenaproxima
#29 interdiff-2522012-28-29.txt3.66 KBphenaproxima
#29 2522012-29.patch37.16 KBphenaproxima
#28 interdiff-2522012-24-28.txt2.43 KBphenaproxima
#28 2522012-28.patch35.96 KBphenaproxima
#24 interdiff-2522012-21-24.txt8.39 KBphenaproxima
#24 2522012-24.patch33.53 KBphenaproxima
#21 interdiff-2522012-17-21.txt2.17 KBphenaproxima
#21 2522012-21.patch28.05 KBphenaproxima
2522012_17.patch27.04 KBchx
interdiff.txt4.41 KBchx
#14 interdiff.txt4.31 KBchx
#14 2522012_14.patch25.43 KBchx
#13 interdiff-2522012-12-13.txt2.17 KBphenaproxima
#13 2522012-13.patch24.48 KBphenaproxima
#12 interdiff-2522012-11-12.txt1.51 KBphenaproxima
#12 2522012-12.patch23.47 KBphenaproxima
#11 interdiff-2522012-7-11.txt5.93 KBphenaproxima
#11 2522012-11.patch22.02 KBphenaproxima
#7 2522012_7.patch26.53 KBchx
#7 interdiff.txt717 byteschx
#6 interdiff-2522012-3-6.txt8.25 KBphenaproxima
#6 2522012-6.patch26.53 KBphenaproxima
#3 2522012-3.patch19.06 KBphenaproxima
#2 interdiff-2522012-0-2.txt771 bytesphenaproxima
#2 2522012-2.patch26.11 KBphenaproxima
sqlbase_bugfixes.patch25.57 KBphenaproxima

Comments

phenaproxima’s picture

Issue summary: View changes
phenaproxima’s picture

StatusFileSize
new26.11 KB
new771 bytes

Added the ability to opt-out of joining on the map table. This has no use case yet, but I need to be able to disable it in order to complete #2507607: [META] Replace Cthulhu-forsaken load plugins with migration builders.

phenaproxima’s picture

Title: SqlBase is buggy » SqlBase does not set idlist condition correctly
StatusFileSize
new19.06 KB

Re-rolled without the high-water fixes (those are now in #2485385: Move highwater field support to the source plugin, and do not expose its internals on MigrationInterface) or ability to prevent joining on the map table. This patch fixes the handling of idlist in SqlBase, and removes empty idlists from the source unit tests.

phenaproxima’s picture

Issue summary: View changes
phenaproxima’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests
phenaproxima’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
StatusFileSize
new26.53 KB
new8.25 KB

Added a test, but it revealed additional problems (namely that the count() method never filtered on the idlist, since the filtering was done in initializeIterator(), which meant that PHPUnit tests using idlist would not pass) and required some fairly extensive changes to SqlBase.

chx’s picture

Title: SqlBase does not set idlist condition correctly » idlist does not work at all with SqlBase
Status: Needs review » Reviewed & tested by the community
StatusFileSize
new717 bytes
new26.53 KB

Please do not credit me, all I changed is a sizeof to count. In general we do not use PHP provided aliases to functions.

phenaproxima’s picture

Issue summary: View changes
phenaproxima’s picture

Issue summary: View changes
chx’s picture

Status: Reviewed & tested by the community » Needs work

OK the patch moving code around mislead me, this is not correct.

The priority order is in SourcePluginBase::next() and idList is way way higher than idmap. If idList is specified then idmap needs to be ignored.

phenaproxima’s picture

Status: Needs work » Needs review
StatusFileSize
new22.02 KB
new5.93 KB

Fixed.

phenaproxima’s picture

StatusFileSize
new23.47 KB
new1.51 KB

Fixed some old/inconsistent stuff in SourcePluginBase::next().

phenaproxima’s picture

StatusFileSize
new24.48 KB
new2.17 KB

Removed a few defunct bits of configuration from some source unit tests.

chx’s picture

StatusFileSize
new25.43 KB
new4.31 KB

Edit: + // \Drupal\migrate\Plugin\migrate\source\SourcePluginBase::__construct( ensures the idlist is single long. comment needs to be removed but I am sure we will find more reasons to reroll.

phenaproxima’s picture

phenaproxima’s picture

Priority: Normal » Major
Issue tags: +Migrate critical

Upgrading to major and marking as Migrate critical, because this bug affects every single source plugin currently in use. The only reason it has slipped past us is because there is no test coverage of ID filtering.

Status: Needs review » Needs work

The last submitted patch, 14: 2522012_14.patch, failed testing.

chx’s picture

Edit: Funny drupal.org glitch, the patch uploaded here is https://www.drupal.org/files/issues/2522012_17.patch the interdiff is https://www.drupal.org/files/issues/interdiff_14284.txt and this is likely to be the one to be committed.

phenaproxima’s picture

To be clear, the patch in the IS should be committed, not the one in #13.

chx’s picture

Issue summary: View changes
phenaproxima’s picture

StatusFileSize
new28.05 KB
new2.17 KB

Re-rolled to re-delete a bit of a cruft from some of the source tests.

benjy’s picture

  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/SourcePluginBase.php
    @@ -198,6 +190,23 @@ protected function getModuleHandler() {
    +      // @TODO: Add support for multiple IDs https://www.drupal.org/node/2529744
    +      if (count($this->getIds()) != 1) {
    +        throw new \LogicException('Source IDs can only be specified by a list if there is only one id.');
    +      }
    

    Given this, i'm wondering if we should either remove idList entirely until we find a way to bring it back or just fix it all here? It's only going to be utilised by contrib?

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -103,6 +103,21 @@ protected function prepareQuery() {
    +    // keys, so throw an exception if the key is multi-value.
    

    Nothing here throws an exception?

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -103,6 +103,21 @@ protected function prepareQuery() {
    +          $field = $definition['alias'] . '.' . $field;
    

    Is joining alias.field like this correct? Eg, will it impact other database drivers? I know this is SqlBase but we have made fixes previously so it works with MS SQL and the like.

  4. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -116,19 +131,8 @@ protected function initializeIterator() {
    +    if (!$this->getIdList()) {
    

    Does this mean that we never join on the map table if we have an id list? That doesn't seem right? EDIT: this is the correct behaviour SourcePluginBase::next() handles processing the row.

  5. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -200,7 +204,7 @@ protected function initializeIterator() {
    +    return $this->prepareQuery()->countQuery()->execute()->fetchField();
    

    Why do we call prepareQuery() now? Is it so count is correct when you provide an id list?

googletorp’s picture

Status: Needs review » Needs work
phenaproxima’s picture

Status: Needs work » Needs review
StatusFileSize
new33.53 KB
new8.39 KB

Re-rolled again, fixing #22-2. This patch refactors SqlBase::initializeIterator() a fair bit -- it splits map table joining and high-water support into their own methods, and calls them in prepareQuery(), so the interdiff will probably be fairly useless :( Sorry...I really don't like needlessly monolithic functions.

Status: Needs review » Needs work

The last submitted patch, 24: 2522012-24.patch, failed testing.

phenaproxima queued 24: 2522012-24.patch for re-testing.

The last submitted patch, 24: 2522012-24.patch, failed testing.

phenaproxima’s picture

Status: Needs work » Needs review
StatusFileSize
new35.96 KB
new2.43 KB

I get knocked down! But I get up again! You're never gonna keep me down!

--Chumbawamba

Ahem...so, it turns out that certain source plugins are not compatible with having the optimization logic in prepareQuery(). (These are usually the ones that override initializeIterator().) In these cases, I changed them to call query() instead of prepareQuery(), since they don't really benefit from the optimizations applied by prepareQuery().

phenaproxima’s picture

StatusFileSize
new37.16 KB
new3.66 KB

Re-implemented query tags and metadata in the source plugins altered in #28 by splitting it out into a new method, SqlBase::addQueryInfo().

phenaproxima’s picture

StatusFileSize
new37.25 KB
new1.23 KB

@chx pointed out bad logic (I didn't know || was greedy) when checking if joinMapTable() or addHighWaterProperty() succeeded.

benjy’s picture

Separating out into the two separate methods as we discussed in IRC looks good from the patch but it's hard to follow. I'll try review this properly locally later today.

mikeryan’s picture

Status: Needs review » Needs work

So, an important point to note is that the ID list is purely a runtime concept - on a given migration run you will be able to say something like

drush mi articles --idlist=45,890

And on that run only import those two specific records from the source. The ID list is not a fundamental alteration of the logical query - the count() method should always return the count of all available source records (not that it would be likely to be called in this context anyway).

Keep in mind that other source plugins need to implement ID lists by iterating the full source looking for matching IDs, and this represents the logic of the feature. Adding the list of IDs directly to the SQL query is purely an optimization available only to the SQL source plugin and should not be treated as a "real" modification to the logical query represented by the source.

phenaproxima’s picture

Status: Needs work » Needs review
StatusFileSize
new36.99 KB
new491 bytes

Okay -- count() will no longer try to optimize the base query.

Status: Needs review » Needs work

The last submitted patch, 33: 2522012-33.patch, failed testing.

phenaproxima’s picture

Status: Needs work » Postponed

This is postponed on #2499793: Several migrate_drupal migrations fatal error on count(), for the same reasons detailed in https://www.drupal.org/node/2485385#comment-10113792. The source tests need to be refactored so that counts can be independent of the expected results.

mikeryan’s picture

StatusFileSize
new36.75 KB
new5.65 KB

My feeling is that the idlist should not use configuration, since it's strictly a run-time option - attached is a version with a simple setter. Anticipating #2529744: SourcePluginBase does not handle multi-value IDs being implemented as a follow-up, I've also changed the ID list to an array of id key arrays, rather than an array of scalar IDs, so the API won't have to change with that follow-up (it will just need to figure out how to handle multi-value IDs when they do come in).

As previously, waiting on the count() patch to get the testing straightened out.

mikeryan’s picture

Also, see this patch being used in the real world at #2432977: Implement --idlist option on migrate-import.

phenaproxima’s picture

Status: Postponed » Needs review

Unblocked.

Status: Needs review » Needs work

The last submitted patch, 36: idlist_does_not_work_at-2522012-36.patch, failed testing.

phenaproxima’s picture

Status: Needs work » Needs review
StatusFileSize
new36.99 KB

Re-rolled.

Status: Needs review » Needs work

The last submitted patch, 40: 2522012-40.patch, failed testing.

phenaproxima’s picture

Status: Needs work » Needs review
StatusFileSize
new39.15 KB
new2.98 KB

Fixed. MigrateTestCase now has an $idList property, which will be passed to the source plugin's setIdList() method in MigrateSqlSourceTestCase.

I also removed the pointless constructor from SqlBase -- technically patch noise, but you know what they say about cleanliness -- and, more importantly, an assert from MigrateTestCase::queryResultTest(). The assert in question was checking the source iterator's count against the number of expected results, which is incorrect because, according to @mikeryan, the source plugin's count() is always supposed to return the full count of rows in the source data, before any filtering. Whereas the expected results may well have been filtered and/or massaged before the assert is reached, but after setUp() has run (NodeIdListTest being a great example).

benjy’s picture

Looking good, the split up into new methods seems like a nice clean-up, few nitpicks and questions below:

  1. +++ b/core/modules/migrate/src/Plugin/MigrateSourceInterface.php
    @@ -62,4 +62,12 @@ public function __toString();
    +   * @param array[] $id_list
    +   */
    +  public function setIdList(array $id_list);
    

    Missing param docs.

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/SourcePluginBase.php
    @@ -84,11 +84,12 @@
    -   * @var array
    +   * @var array[]
        */
    -  protected $idList = array();
    +  protected $idList = [];
    

    Is array[] valid syntax for an array of arrays? Change which seems to be just syntax.

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/SourcePluginBase.php
    @@ -303,14 +306,14 @@ public function rewind() {
    -      $row = new Row($row_data, $this->migration->getSourcePlugin()->getIds(), $this->migration->get('destinationIds'));
    +      $row = new Row($row_data, $this->getIds());
    

    Nice!

  4. +++ b/core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
    @@ -92,103 +87,152 @@ protected function select($table, $alias = NULL, array $options = array()) {
    +    if (isset($high_water_property['name']) && ($high_water = $this->migration->getHighWater()) !== '') {
    

    Is checking the highwater against an empty string still a thing? I thought we made that NULL ages ago.

  5. +++ b/core/modules/migrate_drupal/src/Plugin/migrate/source/d6/FieldInstancePerViewMode.php
    @@ -22,7 +22,9 @@ class FieldInstancePerViewMode extends ViewModeBase {
    +    $query = $this->query();
    +    $this->addQueryInfo($query);
    +    $result = $query->execute();
    

    It's a bit worrying that every source has to remember to addQueryInfo()? Seems like something that could easily be overlooked?

mikeryan’s picture

I discussed this with phenaproxima a day or two ago, sorry I didn't update the issue - I believe we should be able to remove idlist handling entirely from core and put it in contrib's hands if we implement a prepareRow event as a followup to #2535458: Dispatch events at key points during migration. I wouldn't abandon the approach in this issue entirely pending a POC, but I wouldn't invest significant time in it either until we see if that works as I imagine. The basic idea would be the clients could listen to prepareRow events and skip rows there if they don't match the idlist (the general functionality), and they could also do a query alter on SQL sources to optimize the queries up front.

mikeryan’s picture

Status: Needs review » Needs work

To expand on my last thought - I see idlist as purely a front-end feature, and with some core tweaks that would be generally useful (rather than narrowly targeted to idlist handling) I think the contrib side can do idlists itself. On the core side, the main thing that I think is missing is a way to skip a row without recording it in the ID map - right now prepareRow(), when receiving a FALSE return from a hook, unconditionally records it as STATUS_IGNORED in the map table, which isn't appropriate in the idlist case. My thinking is to add a boolean to MigrateSkipRowException to indicate whether this particular row skippage should be recorded as "ignored", or the skip should simply be done silently. So, this patch as I'm conceiving it at the moment would extend the exception and add handling of the exception to prepareRow(), as well as removing the existing broken idlist stuff.

On the contrib side, the response to prepareRow() would check the incoming source ID(s) against the idlist, and if they don't match throw the don't-record-as-ignore MigrateSkipRowException - this provides the general functionality for all source plugins. Next step would be altering SQL plugin queries as an optimization in that case.

Let's see how this all works out...

mikeryan’s picture

Status: Needs work » Needs review
Issue tags: +Needs tests
StatusFileSize
new50.71 KB

Here it is - the functional changes (adding a boolean option to MigrateSkipRowException, and catching and handling it in prepareRow) are actually fairly small and simple, the bulk of the line changes are boilerplate - removing idlist references from the tests in particular, moving prepareRow() calls in source plugins and removing their returns (no existing source plugins skip rows), adding a couple parent::prepareRow($row) calls to source plugins that were missing them, and removal of the previous (non-functional) idlist handling re-indenting a chunk of code.

mikeryan’s picture

Also, note that the patch #2432977: Implement --idlist option on migrate-import uses this to implement --idlist for the drush migrate-import command.

mikeryan’s picture

mikeryan’s picture

Status: Needs review » Postponed
mikeryan’s picture

StatusFileSize
new37.53 KB
new14.82 KB

In the meantime, I realized removing the returns from the specific source plugins' parent::prepareRow() was wrong - I was thinking propagating the skips was now event-driven, but these are actually above the place where the events are caught (SourcePluginBase::prepareRow(), a.k.a. 'parent' in those calls). So, they need to propagate the FALSE that results from a hook throwing the event, up to next(). I still want to refactor that and just use events completely up the chain, but I'll revisit that when following up in #2488836: Refactor prepareRow() to add events in addition to hook - for now, we'll keep the existing behavior and just do what's needed to allow contrib to throw the skip events (and in particular to implement the idlist functionality).

I did keep the ContactCategory, Upload, and D7 File prepareRow() fixes (they weren't calling the parent at all), as well as the missing return in CckFieldValues - one might argue fixing the existing prepareRow() bugs isn't in scope, but since those bugs prevent the row skips we're dealing with from working for those sources, I'm comfortable fixing them here.

A nascent test module is in here, waiting on #2544690: Create new source plugin taking data from plugin config to provide the test data.

benjy’s picture

  1. +++ b/core/modules/migrate/src/Plugin/migrate/id_map/Sql.php
    @@ -754,7 +754,7 @@ public function destroy() {
    -   * @todo Support idlist, itemlimit.
    +   * @todo Support itemlimit.
    

    I know this is pre-existing but is there an issue for itemLimit we can link here?

mikeryan’s picture

Actually, we should take the @todo for itemlimit out, the plan is to also do that entirely in contrib: #2432983: Implement --limit option on migrate-import. That is dependent on #2545672: Handle various migration interruption scenarios, which could use a review (hint hint;).

mikeryan’s picture

Title: idlist does not work at all with SqlBase » Remove broken idlist handling, replace with more robust exception handling
Status: Postponed » Active

Retitled to reflect the current direction. Building on the status/interruption patches recently committed, we need to extend MigrateSkipRowException to permit skipping a row without writing the map table, and we also need to be able to handle that exception at prepareRow() time.

phenaproxima’s picture

Status: Active » Needs review

Setting to NR so we can get moving on this again.

Status: Needs review » Needs work

The last submitted patch, 50: idlist_does_not_work_at-2522012-50.patch, failed testing.

phenaproxima’s picture

Issue tags: +Needs reroll

Well, we all saw that coming.

mikeryan’s picture

Status: Needs work » Needs review
Issue tags: -Needs reroll
StatusFileSize
new36.35 KB

Rerolled. Hoping #2544690: Create new source plugin taking data from plugin config gets in soon so we can have a nice simple test, but I can write an ugly one if need be...

phenaproxima’s picture

Status: Needs review » Needs work

At a glance, looks good to me. I have a few relatively minor complaints (below), and I'd like @benjy or @chx to review as well before commit.

  1. +++ b/core/modules/migrate/src/MigrateSkipRowException.php
    @@ -12,4 +12,35 @@
    +  protected $recordInMap;
    

    This reads like "record (noun) in map". That's confusing. Can this be renamed to $shouldBeRecorded or something a bit clearer?

  2. +++ b/core/modules/migrate/src/MigrateSkipRowException.php
    @@ -12,4 +12,35 @@
    +  public function getRecordInMap() {
    

    Ditto. Maybe this should be something like shouldBeRecorded(). $exception->shouldBeRecorded() or $exception->isRowIgnored()

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/SourcePluginBase.php
    @@ -189,19 +184,29 @@ protected function getModuleHandler() {
    +      $skip = ($result_hook && in_array(FALSE, $result_hook)) || ($result_named_hook && in_array(FALSE, $result_named_hook));
    

    Can you add a comment above this line? It's a bit hard to follow.

mikeryan’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
StatusFileSize
new39.8 KB
new6.51 KB

OK, I've implemented a test without the benefit of the nice source plugin, so it has its own.

I'm having trouble coming up with a clearer name that recordInMap, anyone else have any thoughts?

phenaproxima’s picture

Looks good to me.

mikeryan’s picture

Status: Needs review » Needs work

#2544690: Create new source plugin taking data from plugin config is in (woohoo!), so I'll reroll the test here to make use of the new source plugin.

mikeryan’s picture

Status: Needs work » Needs review
StatusFileSize
new38.63 KB
new1.94 KB

Here we go, interdiff shows how much nicer the embedded_data plugin makes testing.

phenaproxima’s picture

Two things. First, I'd like @benjy or @chx to sign off before we commit. Second, I'd like my comments in #58 addressed; I think it is poor (although certainly not earth-shatteringly awful) DX to call the "record in map" flag recordInMap. It sounds like a noun, but it is a verb. My vote is for shouldBeRecorded() or shouldRecordInMap().

benjy’s picture

Didn't thoroughly look at the tests but the rest looks OK and removing idList from core is fine by me.

  1. +++ b/core/modules/file/src/Plugin/migrate/source/d6/Upload.php
    @@ -47,6 +47,7 @@ public function prepareRow(Row $row) {
    +    return parent::prepareRow($row);
    
    +++ b/core/modules/file/src/Plugin/migrate/source/d7/File.php
    @@ -71,6 +71,7 @@ public function prepareRow(Row $row) {
    +    return parent::prepareRow($row);
    

    Looking in SourcePluginBase::prepareRow() it does all the map stuff, which is nothing like what most source plugins will do in prepareRow(). Could we move that follow-up so calling parent::prepareRow() in source plugins isn't so critical? Maybe that code could move to an event after #2488836: Refactor prepareRow() to add events in addition to hook

  2. +++ b/core/modules/migrate/src/Plugin/migrate/id_map/Sql.php
    @@ -754,7 +754,7 @@ public function destroy() {
    +   * @todo Support itemlimit.
    

    Did we have an issue for this?

mikeryan’s picture

Could we move that follow-up so calling parent::prepareRow() in source plugins isn't so critical? Maybe that code could move to an event after #2488836: Refactor prepareRow() to use events and exceptions

Yes, I definitely want to follow up with that issue to cleanup prepareRow() - even if I can't get buy-in to the event concept, it needs some refactoring (see also #2558047: [meta] Dealing with conditions during migration (messaging/exceptions) is confusing and inconsistent).

re: "@todo Support itemlimit." - I should have removed that comment with #2545672: Handle various migration interruption scenarios, the item limit feature is now implemented entirely in contrib as a result.

mikeryan’s picture

StatusFileSize
new38.66 KB
new1.45 KB

Removed obsolete references to itemlimit.

mikeryan’s picture

StatusFileSize
new38.63 KB
new3.34 KB
phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

Looks fine and dandy to me.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

I was a bit skeptical about this, since instead of fixing idlist, it's removing it. However, Mike assures me that there's a sister patch in the Migrate Plus queue to add it back, and benjy says in #64 that he's ok with this not being in core. It still seems like a really useful feature to me, so maybe we can try and add it back in 8.1.x or so.

Committed and pushed to 8.0.x. Thanks!

  • webchick committed f72ef07 on 8.0.x
    Issue #2522012 by phenaproxima, mikeryan, chx, benjy: Remove broken...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.