Problem/Motivation
Caching of source counts is implemented in SourcePluginBase::count(), which obtains (when necessary) the real count using $this->getIterator()->count(). SqlBase implements count() itself as $this->query()->countQuery()->execute()->fetchField(), overriding SourcePluginBase::count() so caching can't happen.
Proposed resolution
If we simply remove SqlBase::count(), will $this->getIterator()->count() get the right count? Even if it does, this is going to be significantly slower than using countQuery().
In the Drupal 7 version of Migrate, the underlying count() implementation which did caching would call the source plugin's computeCount() method when it needed the current real count. I think that's a better model, but it's disruptive at this stage of the game (requiring any source plugins being developed now in contrib/custom contexts to add a computeCount() method). Or... SourcePluginBase could implement a default computeCount() as $this->getIterator()->count(), and SqlBase could override that with $this->query()->countQuery()->execute()->fetchField().
Remaining tasks
None.
User interface changes
N/A
API changes
protected computeCount() method added, which should be overridden instead of public count() for source plugins with custom count implementations.
Data model changes
N/A
| Comment | File | Size | Author |
|---|---|---|---|
| #30 | 2598670-30.drupal.Source-count-caching-does-not-work-for-SQL-sources.patch | 5.17 KB | mikelutz |
| #25 | source_count_caching-2598670-25.patch | 2.17 KB | jofitz |
| #25 | interdiff-20-25.txt | 1.24 KB | jofitz |
| #20 | source_count_caching-2598670-20.patch | 2.11 KB | imiksu |
| #17 | source_count_caching-2598670-17.patch | 1.99 KB | mikeryan |
Comments
Comment #2
mikeryanPretty simple, no change necessary to existing source plugins, although any that are implementing count() now and thus not getting the benefit of caching should rename their count() method to computeCount().
Comment #4
mikeryanWell, the testbot tells us nothing useful, but locally I get "Symfony\Component\DependencyInjection\Exception\InvalidArgumentException: The service definition "cache.migrate" does not exist." in MigrateSqlSourceTestCase.php. Which means that by allowing Sql sources to be cached we're exposing that the Sql source tests aren't mocking all they should...
Comment #5
mikeryanComment #9
mikeryanComment #10
yogeshmpawarI have rerolled the patch for 8.2.x version.
Comment #11
mikeryanThanks for the reroll @Yogesh Pawar!
Looking back at this, how did I make SqlBase::computeCount() public? Fixing that...
Comment #12
mikeryanI've added a draft change record - this change does not break BC, but it's worth noting to anyone who had been overriding count() that they're now better off overriding computeCount().
Comment #13
mikeryanComment #14
chx commentedDoes the other counting issue help here?
Comment #15
mikeryanThe other issue introduces "doCount()", which is pretty much the same as this issue's "computeCount()", but does not address the original problem here of SqlBase overriding count() and thus losing caching. If that issue changed SqlBase::count() to SqlBase::doCount(), then this one becomes redundant.
FWIW, I do prefer computeCount to doCount, but no biggie.
Comment #16
alexpott#2684567: Requiring a migration w/ a source plugin using a generator fatals has been committed - but it did not rename the method - that should be done here...
Comment #17
mikeryanHere's a reroll using doCount() from #2684567: Requiring a migration w/ a source plugin using a generator fatals. Setting to review just to run it through the testbot as-is - it should explicitly test that SqlBase is caching counts, and I think that test will best be done extending MigrateSqlSourceTestBase from #2791119: Write meaningful Migrate source tests, so I'll postpone once the tests run.
Comment #18
chx commentedthe @return typehint contradicts the text. And I didn't know it could be an array. Didn't we use ArrayIterator for those cases?
now this needs a good comment.
Comment #19
heddnBased on #18, marking needs work. Also tagging Novice so someone can pick that up and provide the updated docs changes.
Comment #20
imiksuI was testing if patch still applies and it had some offsets and fuzz. Updated the patch.
Comment #21
imiksuWe'll work on this today.
Comment #23
mikeryanRetesting.
Comment #24
mikeryanComment #25
jofitzUpdated the documentation as recommended in #18.
Comment #26
mikeryanThanks Jo - setting back to needs work for tests.
Comment #30
mikelutzI wrote a test, but I have some concerns on this. A quick glance at the code suggests that we are caching the count keyed by a hash of the source plugin id. But we have migrations that use the same plugin with different configurations to get different counts, like d7_node and d7_node_translations, which both use the d7_node plugin. Is there any chance of cross contamination here? Should we be basing the key off the entire source plugin configuration array?
Comment #31
heddnYes, there is a chance. We should cache on the full config hash.
Comment #33
mikeryanWow, this still isn't in...
As for the cache key, that's out-of-scope here, and we have the `cache_key` option to work around it. If we want to change the default cache_key, that should be a separate issue.
Comment #35
mikelutzI opened #3092227: SourcePluginBase should cache based on a hash of the source configuration, not just the plugin id. because the bad cache key is out of scope of this issue. At minimum this needs to be postponed on that, as I see no need to open up new buggy caching here. I'm not completely convinced this should even be done. The documentation on count states:
So I expect the lack of implementation here is intentional. I'm not sure how much speed gain we realistically get, a single count query versus a cache lookup, and all the overhead of checking the cache first? Anyway, significant savings or no, I don't mind the change, but we definitely need to fix the cache key first.
Comment #36
mikeryan@mikelutz: Oh, this is definitely valuable, a count query can be quite expensive depending on db engine/storage engine/query complexity. On my current project, we have a SQL Server source, over a hundred migrations, and a lot of complex queries with some returning over 600k rows. We use this patch, and
drush mswith a warm cache runs in a few seconds, as opposed to several minutes after a cache refresh.Comment #37
mikelutzAlright, I'm sold on the change, let's just fix the default cache key first.
Comment #38
heddnComment #42
quietone commentedLooks like this was fixed in a later issue, #3190815: Source count caching broken: impossible to enable source count caching for SqlBase-based source plugins (plus, unneeded cache I/O).