Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
- Create a migration where initializeIterator yields. This is a great fit and works well.
- Now add this as a requirement to another migration.
- SourcePluginBase::count() gets called which in turn does
$count = $this->getIterator()->count();
which fatals as Generator does not implement count(). - See $this->skipCount at the top of the method, add
protected $skipCount = TRUE
to the class. - Observe the constructor unconditionally overriding this value.
- Tear your hair out.
Proposed resolution
Uh, I dunno, at least check whether the iterator is Countable before trying to count the bloody thing. And/or only pick up the skip_count from the configuration if it is set. This is one of those D7 vestiges which got ported and abandoned to bite us later.
Remaining tasks
doCount could use a little doxygen.
Write a test source plugin which uses yield and try to count it and assert it is -1. Check the configuration keys are picked up properly.
User interface changes
API changes
Data model changes
Comment | File | Size | Author |
---|---|---|---|
#42 | requiring_a_migration-2684567-42.patch | 5.12 KB | gnuget |
#34 | requiring_a_migration-2684567-34.patch | 5.12 KB | gnuget |
#34 | 2684567_interdiff-31-34.txt | 2.44 KB | gnuget |
#31 | 2684567_31_only_test.patch | 2.69 KB | gnuget |
#22 | interdiff-20-22.txt | 1.07 KB | gnuget |
Comments
Comment #2
chx CreditAttribution: chx at Integral Vision Ltd commentedHow can I credit this to Integral Vision. Odd. Anyways, here's how it could look.
Comment #3
chx CreditAttribution: chx at Integral Vision Ltd commentedThis organization system drupal.org has is one of most confusing UIs I've ever seen. If the organization is only usable for credits if you enter it precisely then why is there no info saying so? I had "Integral Vision" prior now I have "Integral Vision Ltd" and it works. Anyways, fixing credit.
Comment #4
chx CreditAttribution: chx at Integral Vision Ltd commentedEdit: nevermind.
Comment #5
heddn+1 on approach. Just skip counting, if
Comment #6
heddnThe downside, contrib and maybe core doesn't always implement \Countable, even though it has a count() method. It would be safer to check for count() on the object than trust that all classes implement the interface. For BC reasons at least.
Comment #7
chx CreditAttribution: chx at Integral Vision Ltd commentedCore most of the time extends from SqlBase which has its own count(). EmbeddedDataSource and EmtptySource both return ArrayIterator() which is both Countable and also sets an example for contrib. I wouldn't be so worried.
Comment #8
heddnmigrate_source_csv might be the only one affected then. migrate_source_json seems to return \ArrayIterator. And I'm reworking it right now so we'll be sure to add \Countable to it. I don't know the status for xml, I seem to think it is less mature.
Comment #9
chx CreditAttribution: chx at Integral Vision Ltd commentedOh I found a fantastic trick in the csv source! Thanks, here's a better patch. I don't particularly think I've heard of iterator_count before or if I did it was so long ago I didn't remember.
This both fixes the issue AND keeps full backwards compatibility.
Edit: we need to review the code flow to ensure the source plugin is recreated before/after counting this way because generators are not resetable. Might not worth the bothering?
Comment #10
dawehnerOh, this is a pretty neat patch. Using a generator was always pratical already, just some admin features via drush didn't worked. It is great that we fix it.
IMHO we should add a test with a generator to ensure that generators continue to work from end to end.
Comment #11
benjy CreditAttribution: benjy at PreviousNext commentedAlso still need to address the feedback in #9 regarding iterator_count messing the internal pointer of the iterator.
Comment #12
chx CreditAttribution: chx at Integral Vision Ltd commentedOr ignore #9 because of it.
Comment #13
heddnI really like #9. That's how I unknowingly got around the problem in csv, because there is/was an inherent (hidden) assumption that getIterator() returned an Iterator that was Countable. This approach just hides that.
The warning on iterator_count about not retaining the position of the pointer is a little scary. So we should do some testing in that area.
Comment #14
chx CreditAttribution: chx at Integral Vision Ltd commentedSo iterator_count is only usable if we re-instantiate the iterator because the way it counts, the only way it possibly can count is to exhaust the iterator. But, turns out we have a method for that... this is the non-scary version of the patch and through the layers of caching it should not be horribly slow but also, what slow, this is migrate.
If you have something like a remote thing where count just takes way too long then by all means add $skipCount to your class or implement your own count. There's no real regression here because the only case when it takes degenerate long previously fataled so there surely was some coding around that already like
Comment #15
gnugetI'm going to work on this.
Comment #16
edysmpComment #17
gnugetI'm still working on the tests.
Sorry this took me more time than I expected because in my country just finished the holiday season, hope be able to finish my tests this week.
Comment #18
gnugetSorry, I have been having very hectic weeks and not sure if I'm gonna be able to work on this soon so I switch back the ticket to unassigned.
Comment #19
gnugetComment #20
gnugetOk I created a couple migration tests.
Both using a generator instead of an iterator and the second migration has as a dependency the first one.
Comment #22
gnugetI fixed some typos.
Comment #23
vasi CreditAttribution: vasi at Evolving Web commentedHi gnuget, thanks for the tests. However, they don't test the actual thing we want to test. From the summary: "Create a migration where initializeIterator yields."
That means we want to use a PHP generator: http://php.net/manual/en/language.generators.overview.php
Comment #24
vasi CreditAttribution: vasi at Evolving Web commentedComment #25
gnugetActually, it does:
The
initializeIterator
useGeneratorObject
which use:I had to do it in this way because the rewind method in the generators doesn't work as expected so I had to wrap the generator in another object and fix the rewind.
Also in the
MigrateUsingGenerator
I made two tests, one using the generator and a second one using as a dependency a migration which uses the generator plugin.Thanks for your review!
Comment #26
mikeryanI don't see how reorganizing this code is in scope for this patch? Oh, now I do, so skipCount (and cacheCounts) is only overridden if explicitly in configuration. However, cacheKey is not supposed to be a boolean - that's being fixed in #2723115: Count cache configuration does not work as expected - so shouldn't be part of the loop here.
s/generator/Generator/
Coding standard for member variables is $countriesFile/$statesFile.
The test seems more elaborate than necessary to test the actual problem being fixed (counts) - presumably in response to dawehner's comment "IMHO we should add a test with a generator to ensure that generators continue to work from end to end." Maybe that should be a follow-up.
Speaking of which... I don't see the counts being tested?
Comment #27
vasi CreditAttribution: vasi at Evolving Web commentedHmm. I've tried making migrations with just a generator in initializeIterator(), eg:
I didn't need to wrap it in an iterator. Did you try that? What went wrong?
I think it could make your tests much simpler. The simpler the better :)
Comment #28
gnugetHi Vasi.
As far as I remember when I wrote this the bug is only reproducible when the migration depends on another migration which uses generators.
I will fix the points of #26 and try to make this simpler
Thanks for your time and review!
Comment #29
vasi CreditAttribution: vasi at Evolving Web commentedYes, in my testing I was able to make migrationA use a generator (like the code I pasted above), and then migrationB depend on it, and that triggered the bug from the summary. So it would make a great test!
I didn't see any secondary issue with rewind().
Comment #30
gnugetAlright, I will start fresh here, I will move my end-to-end test to a follow up, and here I will just test the
counts
value.So, I attach a reroll of #14. and in my next patch I will include the test.
Comment #31
gnugetHi Vasi, you were right, I was overthinking this, This time I was able to write a much simpler patch, thanks for your help!
Tests and the patch.
Comment #32
gnugetComment #34
gnugetI improved a little the comments and renamed the generator class.
Comment #36
mikeryanOK, looks good to me...
Comment #37
heddnJust did a quick run through this again and refreshed myself why the fancy checking is necessary. It is because SourcePluginBase implements \Countable, therefore absolving any modules that extend from it to implement the count() method. But not all iterators are Countable, therefore the need to iterator_count() instead. A source plugin that yields would also be in that same camp.
+1 on RTBC.
Comment #41
mikeryanComment #42
gnugetHere a #34 reroll.
Comment #43
chx CreditAttribution: chx at Integral Vision Ltd commentedBack to RTBC then.
Comment #44
mikeryanNoting here (as chx did in the other issue) this overlaps with #2598670: Source count caching does not work for SQL sources, except it names the new method doCount rather than computeCount. If this patch renamed SqlBase::count() to SqlBase::doCount() the other issue could be closed as a duplicate.
Comment #46
chx CreditAttribution: chx at Integral Vision Ltd commentedComment #47
alexpottCommitted and pushed ae1f599 to 8.3.x and 1d5f0f4 to 8.2.x. Thanks!
Committed to 8.2.x since this is a migrate patch and the module is still experimental.
Fixed on commit.
Comment #51
quietone CreditAttribution: quietone at PreviousNext commentedpublish change record