Problem/Motivation

Large operations dealing with hundreds or thousands of nodes (e.g. a migration of 1000+ nodes, a view that can load 1000+ nodes, etc.) seem to get exponentially slower as more nodes are processed, and all signs indicate more and more time is spent stuck in mergeTags(), and specifically in array_unique() calls.

In one case, testing a normal content migration, running the migration on 100-500 nodes takes only a couple minutes (at a rate of ~300/min), whereas processing a full set of 2000+ nodes takes 50+ min (at a rate of ~20/min).

Digging deeper into XHProf reports and using backtraces, it looks like > 90% of the time is spent in array_unique/mergeTags calls that are caused by calls to PathautoPattern->applies() for each entity, which triggers a seeming exponential increase in contexts that are checked per entity as more and more entities are created/updated.

Proposed resolution

TBD.

Remaining tasks

TBD.

User interface changes

N/A

API changes

N/A

Data model changes

N/A

Original Issue

I'm still doing some investigation, but from all my debugging it looks like something that happens during the course of Pathauto's operations during pathauto_entity_insert seems to really start bogging down Migrations after a thousand or two items have been imported. During the beginning of a migration, when there are only a few paths in the system/nodes of this type, nodes are imported at a rate of ~300/min. By the end, nodes are importing at a rate of maybe 5-10/min.

I'm attaching a callgraph of a representative migration run (I've been doing a bunch of shorter runs with --limit=100 to test how slow things start getting).

You can see that things get bogged down during calls to array_unique(), and it looks like the calls stem from Pathauto applying its pattern during node save, which then calls Context::addCacheableDependency, which calls CacheableMetadata::merge, then Cache::mergeTags (which does the array_unique call).

I'm still investigating further on my end—I've found that if I limit the number of items I import at a time, the migration seems to be much faster/more consistent, so I don't want to make any false conclusions yet. Just wanted to document this here and attach the callgraph in case it does turn out to be something we could fix in Pathauto (or with CacheTags).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

geerlingguy created an issue. See original summary.

Berdir’s picture

Looks like something crazy is going on with merging cache contexts over and over again.

While triggered by pathauto, this might need to be fixed in core. Adding the related core issue.

Berdir’s picture

geerlingguy’s picture

I'm still doing some more debugging, but one quick note: it looks like the memory isn't the issue—it's pretty reliably staying under 60-70 MB peak no matter how many items I import. I'm going to continue doing more digging.

One note—I was asking Wim Leers about the cache tags, and he said it could be that I have certain dev environment settings set in my local.settings.php, and that could cause huge slowdowns:

- that's because you're running this with assertions enabled
- You've got settings.local.php enabled, *and* in there you've got assertions enabled (which is the default, with BLT)
- in other words, you're in developer mode
- follow the arrows in opposite direction: array_unique() is called by Cache::mergeTags(), which is called by assert()
- it's great to have assertions on while developing/testing, not while you are running actual migrations

So another thing to test in once I'm back into this later.

catch’s picture

Haven't reviewed the code, but from the description this sounds like it could be related to #2744715: Generated URLs should only have cache contexts on GET/HEAD - i.e. the cacheability safeguards kicking in when they shouldn't be.

geerlingguy’s picture

More test results:

  • ~41 minutes with default dev environment configuration (ASSERT_ACTIVE = TRUE.
  • ~30 minutes with ASSERT_ACTIVE = FALSE.

Looking at XHProf's log, it seems that array_unique() takes up ~36 min of excl. wall time—Drupal\Component\Assertion\Inspector::assertAll() takes about ~2 min.

So it is annoying and soaks up a little extra time, but not much in comparison to the cache tag stuff. I'm going to see if the patch in #2744715: Generated URLs should only have cache contexts on GET/HEAD helps at all.

geerlingguy’s picture

After running some more tests with and without #2744715: Generated URLs should only have cache contexts on GET/HEAD, that seems to have not made a major difference. I'm now running a migration without a pathauto pattern configured for this content type to see if things are faster without Pathauto in the loop.

geerlingguy’s picture

Things sped up a bit when I deleted the Pathauto alias—this was a modified migration from the example benchmarked above, so the absolute numbers are different, but:

  • With pathauto alias: ~72m
  • Without pathauto alias: ~56m

Note that array_unique() was still by far the biggest burden, even without a pathauto pattern configured (excl wall time of 2,226,133,624 microsec / ~37m!). So maybe the fault doesn't lie with Pathauto in particular... but a similar callgraph of Pathauto events was generated (see attached).

geerlingguy’s picture

Title: Pathauto path generation gets exponentially slower during content migration » mergeTags() exponentially slows down operations with large numbers of nodes
Project: Pathauto » Drupal core
Version: 8.x-1.x-dev » 8.2.x-dev
Component: Code » cache system
Category: Support request » Bug report
Issue summary: View changes

It seems something is really slow with mergeTags() and the caching processing in general, when dealing with more than a few entities. Another person emailed me and explicitly called out mergeTags() as being the hot path/slowest part of a view render that pulled in a couple thousand nodes...

I'm beginning to think this may be a core issue and could have a very major impact on anything that deals with > 500 or so nodes at a time :/

wendellw’s picture

I seem to be experiencing this issue, or an issue related to this, in views. You can experience this issue by going here: http://simplytest.me/ and using devel generate to generate on Drupal 8 about 3500 nodes on a content type (add some extra fields to the content type).

Next, to experience the issue, go into views and set up a view of that content type to output an html table with a trimmed body field, title, extra fields, etc. It will take minutes+ to return.

Testing on our own system, an inordinate amount of time is spent in mergeTags. Disabling caching and it is faster, but still very (very, very) slow.

There may be deeper issues than just mergeTags but that seems like a good place to start as that is where our call table is spending upwards of half its time.

We also tried adding memcache for drupal 8 (which required a patch from an issue in their issue queue) but we did not have a lot of time to experiment. We created our own module to do views-like work for this one narrow use case as we did not have a lot of time to chase this bug down but will monitor this space eagerly. If there is anything specific anyone would like us to do, let us know, but it seems easy to replicate the results on stock drupal + devel generate.

Wim Leers’s picture

18K or 57K calls to array_unique() shouldn't be this slow… unless it's given enormous arrays. Did you look into that already? I think something like this could be helpful:

diff --git a/core/lib/Drupal/Core/Cache/Cache.php b/core/lib/Drupal/Core/Cache/Cache.php
index 2d0c220..5ed7944 100644
--- a/core/lib/Drupal/Core/Cache/Cache.php
+++ b/core/lib/Drupal/Core/Cache/Cache.php
@@ -54,6 +54,7 @@ public static function mergeContexts(array $a = [], array $b = []) {
    *   The merged array of cache tags.
    */
   public static function mergeTags(array $a = [], array $b = []) {
+    \Drupal::logger('cache_tags_perf_debug')->debug(count($a) . ',' . count($b));
     assert('\Drupal\Component\Assertion\Inspector::assertAllStrings($a) && \Drupal\Component\Assertion\Inspector::assertAllStrings($b)', 'Cache tags must be valid strings');
 
     $cache_tags = array_unique(array_merge($a, $b));

For now, my money is on \Drupal\Core\Plugin\Context\ContextHandler::applyContextMapping(). For every call to it, addCacheableDependency() is being called 4 times. That's a bit suspicious.


Rather than just saying "18K or 57K calls is nothing", let's just prove it: https://3v4l.org/X97m4/perf#tabs runs this code:

<?php

$tag_merge_count = 50000;

$a = [];
$b = ['foo', 'bar', 'baz'];
$c = ['llamas', 'are', 'awesome'];

$iterations = $tag_merge_count/4;
for ($i = 0; $i < $iterations; $i++) {
    array_unique(array_merge($a, $b)); // empty + non-empty
    array_unique(array_merge($b, $a)); // non-empty + empty
    array_unique(array_merge($b, $b)); // non-empty + non-empty, with 100% collision
    array_unique(array_merge($b, $c)); // non-empty + non-empty, with 0% collision
}

As you can see, I'm simulating each of the different edge cases 12.5K times, good for a total of 50K array_unique() calls. This takes 0.14 seconds!

So, clearly, something must be generating an insane amount of cache tags.

mpp’s picture

Hi, see my remarks in https://www.drupal.org/node/2662044.
I'm experiencing similar performance degradation with migratie and pathauto.

Some things I noted:
* Performance degrades gradually
* Disabling pathauto results in significant speed
* Running optimise table improved speed - the indexes in the migrate tables are quite large and may need to be optimized regularly
* Migrating translations using an event subscriber (postprocessing) is faster than using a separate migration script for translations (as is the best practice in Drupal core)

I find it weird that the wall time would be in a static function as I'd suspect a database schema issue somewhere. Or either a false (cache) optimisation (more cache invalidations and write overhead than reads), the latter could explain the incremental degradation and the calls to array_unique.

If I'm on my laptop I'll have a look at the call graphs.

EDIT: seems that array_unique can be quite slow after all.
See http://www.puremango.co.uk/2010/06/fast-php-array_unique-for-removing-du... & http://stackoverflow.com/questions/8321620/array-unique-vs-array-flip.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

geerlingguy’s picture

Just as an aside (for my future debugging), it looks like up until processing > 1,000 rows at a time, array_unique isn't the main slowdown. Once we get to around 1k-1.5k items, that's when that function starts rising to the top of the Excl. Wall Time chart above items like copy, file_exists, curl_exec, etc.

geerlingguy’s picture

I decided to just dump to the screen the full $a and $b arrays that were passed to mergeTags() to see what could possibly be so slow... it looks like mergeTags() is called twice per migrated node, once with a structure like:

$a = ['node:123'];
$b = [
  'node:1',
  'node:2',
  'node:3',
  'node:4',
  'node:5',
  etc...
];

Then immediately again with both $a and $b containing the same thing, except for $b also containing the result of the first array_unique() above.

So on every successive node import, array_unique is called:

  1. With an array $a (with 1 item) and an array $b (with n items)
  2. With an array $a (with n items/same as $b last time) and an array $b (with n + 1 items/result from array_unique's 1st call)
  3. (See edit below...)

I've been letting an import run for about 30 minutes, it's up to around 1000 items per array_unique() call now. I'll see if there's some inflection point where things start going insane. Maybe a cache table where the index or column length isn't optimized for such a huge set of cache tags and things get crazy-bogged-down?

[Edit: So... as the migration is going on longer, I'm noticing that instead of having just two calls to array_unique() per node, there are now dozens of calls to mergeTags() per node, with array $a having the single node ('node:123') and $b having all the nodes up to that point.

So something in the process is causing mergeTags() to get called dozens (so far... maybe hundreds once we get near the end) of times. And this would make some sense in terms of raw numbers—if we call array_unique on arrays with a couple thousand items a few hundred times per second... that can be a bit slow.]

geerlingguy’s picture

A few other notes from the MySQL query log (per node):

FROM 
cachetags cachetags
WHERE ( (tag = 'node:7011') )
      288 Query UPDATE cachetags SET invalidations=invalidations + 1

FROM 
key_value key_value
WHERE ( (name = '7011') AND (collection = 'pathauto_state.node') )
      288 Query UPDATE key_value SET value='i:1;'

		  288 Query	INSERT INTO cache_entity (cid, expire, created, tags, checksum, data, serialized) VALUES ('values:node:20247', '-1', '1471459323.972', 'entity_field_info node_values', '976', 'O:23:\"Drupal\\node\\Entity\\Node\ [...]

		  288 Query	DELETE FROM cache_entity 
WHERE  (cid IN  ('values:node:20247'))
		  288 Query	SELECT 1 AS expression
FROM 
key_value key_value
WHERE ( (name = '20247') AND (collection = 'pathauto_state.node') )
		  288 Query	INSERT INTO key_value (name, collection, value) VALUES ('20247', 'pathauto_state.node', 'i:1;')

So it looks like there's some cache flushing/rebuilding going on during each node update/save, but nothing out of the ordinary, and no slow queries (at least not yet). Still monitoring the process and will debug further.

geerlingguy’s picture

Also, digging around, it seems Drupal\pathauto\Entity\PathautoPattern::applies also appears in the call stack a lot, so I'll see if there's any other juicy details that can help there.

catch’s picture

@geerlinguy could you post a backtrace from the mergeTags() calls (Exception::getTraceAsString() probably easiest)? It'd be good to know where the different calls are coming from.

geerlingguy’s picture

@catch - Will do! Kicked off another run now. Annoying that this only happens if I import 1000+ nodes :P

geerlingguy’s picture

So, almost every single one of the stack traces followed the pattern:

Aug 17 23:26:52 local drupal: http://local.example.com|1471476412|cache_tags|127.0.0.1|http://local.example.com/||0||#0 /var/www/site/docroot/core/lib/Drupal/Core/Cache/CacheableMetadata.php(114): Drupal\Core\Cache\Cache::mergeTags(Array, Array)
#1 /var/www/site/docroot/core/lib/Drupal/Core/Plugin/Context/Context.php(145): Drupal\Core\Cache\CacheableMetadata->merge(Object(Drupal\Core\Cache\CacheableMetadata))
#2 /var/www/site/docroot/core/lib/Drupal/Core/Plugin/Context/Context.php(175): Drupal\Core\Plugin\Context\Context->addCacheableDependency(Object(Drupal\Core\Plugin\Context\Context))
#3 /var/www/site/docroot/core/lib/Drupal/Core/Plugin/ContextAwarePluginBase.php(73): Drupal\Core\Plugin\Context\Context::createFromContext(Object(Drupal\Core\Plugin\Context\Context), Object(Drupal\node\Entity\Node))
#4 /var/www/site/docroot/modules/contrib/pathauto/src/Entity/PathautoPattern.php(346): Drupal\Core\Plugin\ContextAwarePluginBase->setContextValue('node', Object(Drupal\node\Entity\Node))
#5 /var/www/site/docroot/modules/contrib/pathauto/src/PathautoGenerator.php(270): Drupal\pathauto\Entity\PathautoPattern->applies(Object(Drupal\node\Entity\Node))
#6 /var/www/site/docroot/modules/contrib/pathauto/src/PathautoGenerator.php(310): Drupal\pathauto\PathautoGenerator->getPatternByEntity(Object(Drupal\node\Entity\Node))
#7 /var/www/site/docroot/modules/contrib/pathauto/pathauto.module(110): Drupal\pathauto\PathautoGenerator->updateEntityAlias(Object(Drupal\node\Entity\Node), 'update')
#8 [internal function]: pathauto_entity_update(Object(Drupal\node\Entity\Node))
#9 /var/www/site/docroot/core/lib/Drupal/Core/Extension/ModuleHandler.php(402): call_user_func_array('pathauto_entity...', Array)
#10 /var/www/site/docroot/core/lib/Drupal/Core/Entity/EntityStorageBase.php(169): Drupal\Core\Extension\ModuleHandler->invokeAll('entity_update', Array)
#11 /var/www/site/docroot/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php(418): Drupal\Core\Entity\EntityStorageBase->invokeHook('update', Object(Drupal\node\Entity\Node))
#12 /var/www/site/docroot/core/lib/Drupal/Core/Entity/EntityStorageBase.php(470): Drupal\Core\Entity\ContentEntityStorageBase->invokeHook('update', Object(Drupal\node\Entity\Node))
#13 /var/www/site/docroot/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php(304): Drupal\Core\Entity\EntityStorageBase->doPostSave(Object(Drupal\node\Entity\Node), true)
#14 /var/www/site/docroot/core/lib/Drupal/Core/Entity/EntityStorageBase.php(395): Drupal\Core\Entity\ContentEntityStorageBase->doPostSave(Object(Drupal\node\Entity\Node), true)
#15 /var/www/site/docroot/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php(761): Drupal\Core\Entity\EntityStorageBase->save(Object(Drupal\node\Entity\Node))
#16 /var/www/site/docroot/core/lib/Drupal/Core/Entity/Entity.php(364): Drupal\Core\Entity\Sql\SqlContentEntityStorage->save(Object(Drupal\node\Entity\Node))
#17 /var/www/site/docroot/core/modules/migrate/src/Plugin/migrate/destination/EntityContentBase.php(109): Drupal\Core\Entity\Entity->save()
#18 /var/www/site/docroot/core/modules/migrate/src/Plugin/migrate/destination/EntityContentBase.php(90): Drupal\migrate\Plugin\migrate\destination\EntityContentBase->save(Object(Drupal\node\Entity\Node), Array)
#19 /var/www/site/docroot/core/modules/migrate/src/MigrateExecutable.php(233): Drupal\migrate\Plugin\migrate\destination\EntityContentBase->import(Object(Drupal\migrate\Row), Array)
#20 [internal function]: Drupal\migrate\MigrateExecutable->import()
#21 /var/www/site/vendor/drush/drush/includes/drush.inc(720): call_user_func_array(Array, Array)
#22 /var/www/site/vendor/drush/drush/includes/drush.inc(711): drush_call_user_func_array(Array, Array)
#23 /var/www/site/docroot/modules/contrib/migrate_tools/migrate_tools.drush.inc(280): drush_op(Array)
#24 [internal function]: _drush_migrate_tools_execute_migration(Object(Drupal\migrate\Plugin\Migration), 'doctor', Array)
#25 /var/www/site/docroot/modules/contrib/migrate_tools/migrate_tools.drush.inc(238): array_walk(Array, '_drush_migrate_...', Array)
#26 [internal function]: drush_migrate_tools_migrate_import('doctor')
#27 /var/www/site/vendor/drush/drush/includes/command.inc(373): call_user_func_array('drush_migrate_t...', Array)
#28 /var/www/site/vendor/drush/drush/includes/command.inc(224): _drush_invoke_hooks(Array, Array)
#29 [internal function]: drush_command('doctor')
#30 /var/www/site/vendor/drush/drush/includes/command.inc(192): call_user_func_array('drush_command', Array)
#31 /var/www/site/vendor/drush/drush/lib/Drush/Boot/BaseBoot.php(67): drush_dispatch(Array)
#32 /var/www/site/vendor/drush/drush/includes/preflight.inc(66): Drush\Boot\BaseBoot->bootstrap_and_dispatch()
#33 /var/www/site/vendor/drush/drush/drush.php(12): drush_main()
#34 @main

This jives with the tracing I was doing in XHProf—more than 90% of the time is spent in mergeTags() calls that derive from Pathauto's PathautoPattern->applies() calls. And I think having some nodes that also add taxonomy terms, etc. might exacerbate the issue. But the craziest thing is, this content type purposely doesn't have a pattern, in case that can help speed things up. So regardless, it looks like Pathauto does some crazy context work on every node, and if you process more than one node at a time, they are all cached into a giant array through pathauto_entity_update().

Glancing quickly through Pathauto's code, it's a little too complex for me to see a quick/patchy fix, and I might not get time to work on a fix right now regardless.

But it does seem that this might be the fault of Pathauto, like I originally presumed—I may try uninstalling it entirely and see if that clears up the performance issue (so that I can be 100% sure) for a test run.

geerlingguy’s picture

Title: mergeTags() exponentially slows down operations with large numbers of nodes » PathautoPattern->applies() exponentially slows down operations with large numbers of nodes
Project: Drupal core » Pathauto
Version: 8.3.x-dev » 8.x-1.x-dev
Component: cache system » Code
Issue summary: View changes

With Pathauto uninstalled, the entire migration takes ~10 minutes (instead of ~50-60). I'm 99% sure that the problem lies inside that module, as XHProf now doesn't show array_unique() or mergeTags() in the top 500 function calls by excl. wall time.

Moving back to the Pathauto queue.

Berdir’s picture

Well, it's not that easy.

As you can see, pathauto doesn't do anything with cache tags. That happens some levels below that. We just call setContextValue(), which is a method from core, which we have to call.

Pathauto isn't interested in cache tags, this also has nothing to do with queries, those cache tags are in fact never stored anywhere, they are just collected.

I guess what we can do is override \Drupal\Core\Plugin\ContextAwarePluginBase::setContextValue() to avoid collecting any cache tags.

Not sure if core should do something, not sure if something can be done other than micro-optimisations in that method.

Wim Leers’s picture

Agreed; it's the Context system that needs further scrutiny.

Berdir’s picture

For now, this should help pathauto.

Status: Needs review » Needs work

The last submitted patch, 25: avoid-cacheablity-metadata-merging-2765729-25.patch, failed testing.

Berdir’s picture

Had a debug left-over in there, also fixed the schema fail, that's in HEAD and I'll likely commit that separatey.

Status: Needs review » Needs work

The last submitted patch, 27: avoid-cacheablity-metadata-merging-2765729-27.patch, failed testing.

geerlingguy’s picture

Nitpick ahead:

+++ b/src/Plugin/pathauto/AliasType/EntityAliasTypeBase.php
@@ -213,4 +214,15 @@ class EntityAliasTypeBase extends ContextAwarePluginBase implements AliasTypeInt
+

Extra newline.

I'm testing the patch now.

geerlingguy’s picture

Success! With this patch, the migration only takes ~10 minutes (same as without Pathauto enabled).

array_unique() is still in the top four functions, and takes 22s total time, but that's a lot better than before—before patch, average was 60-90 minutes. After patch, average is 9 minutes 30 seconds! (Only did two runs for each... I don't have all the time in the world.)

  • Berdir committed 445624c on 8.x-1.x
    Issue #2765729 by Berdir, geerlingguy: PathautoPattern->applies()...
Berdir’s picture

Status: Needs work » Fixed

Thanks for testing, committed.

As written, I still think we should create a core issue to investigate this in the context/cache systems and whether we can optimize this. Can you create an issue for that?

Wim Leers’s picture

As written, I still think we should create a core issue to investigate this in the context/cache systems and whether we can optimize this.

+1

geerlingguy’s picture

As written, I still think we should create a core issue to investigate this in the context/cache systems and whether we can optimize this. Can you create an issue for that?

Agreed... but I'm heading into surgery this week, and am wrapping up a lot of things right now as I'm not sure if it'll be a week or longer before I can get back into contributions. I would like to officially pass that torch to someone else (maybe @Wim Leers?) as I'm sure this issue will be far down on my radar once I'm back in the game :)

That said, thanks again for the fix, this is a lifesaver for that large migration!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

The last submitted patch, 25: avoid-cacheablity-metadata-merging-2765729-25.patch, failed testing.