Problem/Motivation

Caching of source counts is implemented in SourcePluginBase::count(), which obtains (when necessary) the real count using $this->getIterator()->count(). SqlBase implements count() itself as $this->query()->countQuery()->execute()->fetchField(), overriding SourcePluginBase::count() so caching can't happen.

Proposed resolution

If we simply remove SqlBase::count(), will $this->getIterator()->count() get the right count? Even if it does, this is going to be significantly slower than using countQuery().

In the Drupal 7 version of Migrate, the underlying count() implementation which did caching would call the source plugin's computeCount() method when it needed the current real count. I think that's a better model, but it's disruptive at this stage of the game (requiring any source plugins being developed now in contrib/custom contexts to add a computeCount() method). Or... SourcePluginBase could implement a default computeCount() as $this->getIterator()->count(), and SqlBase could override that with $this->query()->countQuery()->execute()->fetchField().

Remaining tasks

None.

User interface changes

N/A

API changes

protected computeCount() method added, which should be overridden instead of public count() for source plugins with custom count implementations.

Data model changes

N/A

Comments

mikeryan created an issue. See original summary.

mikeryan’s picture

Status: Active » Needs review
StatusFileSize
new1.92 KB

Pretty simple, no change necessary to existing source plugins, although any that are implementing count() now and thus not getting the benefit of caching should rename their count() method to computeCount().

Status: Needs review » Needs work

The last submitted patch, 2: source_count_caching-2598670-2.patch, failed testing.

mikeryan’s picture

Well, the testbot tells us nothing useful, but locally I get "Symfony\Component\DependencyInjection\Exception\InvalidArgumentException: The service definition "cache.migrate" does not exist." in MigrateSqlSourceTestCase.php. Which means that by allowing Sql sources to be cached we're exposing that the Sql source tests aren't mocking all they should...

mikeryan’s picture

Status: Needs work » Needs review
StatusFileSize
new2.8 KB
new755 bytes

Status: Needs review » Needs work

The last submitted patch, 5: source_count_caching-2598670-5.patch, failed testing.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

mikeryan’s picture

Issue tags: +Needs reroll
yogeshmpawar’s picture

Status: Needs work » Needs review
StatusFileSize
new2.8 KB

I have rerolled the patch for 8.2.x version.

mikeryan’s picture

Issue tags: -Needs reroll
StatusFileSize
new2.8 KB
new463 bytes

Thanks for the reroll @Yogesh Pawar!

Looking back at this, how did I make SqlBase::computeCount() public? Fixing that...

mikeryan’s picture

I've added a draft change record - this change does not break BC, but it's worth noting to anyone who had been overriding count() that they're now better off overriding computeCount().

mikeryan’s picture

Issue summary: View changes
chx’s picture

Does the other counting issue help here?

mikeryan’s picture

The other issue introduces "doCount()", which is pretty much the same as this issue's "computeCount()", but does not address the original problem here of SqlBase overriding count() and thus losing caching. If that issue changed SqlBase::count() to SqlBase::doCount(), then this one becomes redundant.

FWIW, I do prefer computeCount to doCount, but no biggie.

alexpott’s picture

Status: Needs review » Needs work

#2684567: Requiring a migration w/ a source plugin using a generator fatals has been committed - but it did not rename the method - that should be done here...

mikeryan’s picture

Status: Needs work » Needs review
Issue tags: +Needs tests
Related issues: +#2791119: Write meaningful Migrate source tests
StatusFileSize
new1.99 KB

Here's a reroll using doCount() from #2684567: Requiring a migration w/ a source plugin using a generator fatals. Setting to review just to run it through the testbot as-is - it should explicitly test that SqlBase is caching counts, and I think that test will best be done extending MigrateSqlSourceTestBase from #2791119: Write meaningful Migrate source tests, so I'll postpone once the tests run.

chx’s picture

+   * @return \Traversable
+   *   A traversable object or array

the @return typehint contradicts the text. And I didn't know it could be an array. Didn't we use ArrayIterator for those cases?

-  public function count() {
+  protected function doCount() {
     return $this->query()->countQuery()->execute()->fetchField();

now this needs a good comment.

heddn’s picture

Status: Needs review » Needs work

Based on #18, marking needs work. Also tagging Novice so someone can pick that up and provide the updated docs changes.

imiksu’s picture

Issue tags: +Novice
StatusFileSize
new2.11 KB

I was testing if patch still applies and it had some offsets and fuzz. Updated the patch.

Your branch is up-to-date with 'origin/8.2.x'.
$ curl https://www.drupal.org/files/issues/source_count_caching-2598670-17.patch | patch -p1
patching file core/modules/migrate/src/Plugin/migrate/source/SourcePluginBase.php
Hunk #1 succeeded at 174 (offset 11 lines).
patching file core/modules/migrate/src/Plugin/migrate/source/SqlBase.php
Hunk #1 succeeded at 232 (offset -6 lines).
patching file core/modules/migrate/tests/src/Unit/MigrateSqlSourceTestCase.php
Hunk #1 succeeded at 2 with fuzz 2.
Hunk #2 succeeded at 132 (offset 28 lines).
imiksu’s picture

Issue tags: +DCampBaltics

We'll work on this today.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

mikeryan’s picture

Status: Needs work » Needs review

Retesting.

mikeryan’s picture

Status: Needs review » Needs work
jofitz’s picture

Status: Needs work » Needs review
Issue tags: -Novice
StatusFileSize
new1.24 KB
new2.17 KB

Updated the documentation as recommended in #18.

mikeryan’s picture

Status: Needs review » Needs work

Thanks Jo - setting back to needs work for tests.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

mikelutz’s picture

Version: 8.6.x-dev » 8.7.x-dev
Status: Needs work » Needs review
Issue tags: -Needs tests, -DCampBaltics
StatusFileSize
new5.17 KB

I wrote a test, but I have some concerns on this. A quick glance at the code suggests that we are caching the count keyed by a hash of the source plugin id. But we have migrations that use the same plugin with different configurations to get different counts, like d7_node and d7_node_translations, which both use the d7_node plugin. Is there any chance of cross contamination here? Should we be basing the key off the entire source plugin configuration array?

heddn’s picture

Status: Needs review » Needs work

Yes, there is a chance. We should cache on the full config hash.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

mikeryan’s picture

Status: Needs work » Needs review

Wow, this still isn't in...

As for the cache key, that's out-of-scope here, and we have the `cache_key` option to work around it. If we want to change the default cache_key, that should be a separate issue.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

mikelutz’s picture

Status: Needs review » Postponed

I opened #3092227: SourcePluginBase should cache based on a hash of the source configuration, not just the plugin id. because the bad cache key is out of scope of this issue. At minimum this needs to be postponed on that, as I see no need to open up new buggy caching here. I'm not completely convinced this should even be done. The documentation on count states:

   * @param bool $refresh
   *   (optional) Whether or not to refresh the count. Defaults to FALSE. Not
   *   all implementations support the reset flag. In such instances this
   *   parameter is ignored and the result of calling the method will always be
   *   up to date.

So I expect the lack of implementation here is intentional. I'm not sure how much speed gain we realistically get, a single count query versus a cache lookup, and all the overhead of checking the cache first? Anyway, significant savings or no, I don't mind the change, but we definitely need to fix the cache key first.

mikeryan’s picture

@mikelutz: Oh, this is definitely valuable, a count query can be quite expensive depending on db engine/storage engine/query complexity. On my current project, we have a SQL Server source, over a hundred migrations, and a lot of complex queries with some returning over 600k rows. We use this patch, and drush ms with a warm cache runs in a few seconds, as opposed to several minutes after a cache refresh.

mikelutz’s picture

Alright, I'm sold on the change, let's just fix the default cache key first.

heddn’s picture

Title: Source count caching does not work for SQL sources » [PP-1] Source count caching does not work for SQL sources

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

quietone’s picture

Title: [PP-1] Source count caching does not work for SQL sources » Source count caching does not work for SQL sources
Status: Postponed » Closed (duplicate)
Issue tags: +Bug Smash Initiative