Problem/Motivation
Large operations dealing with hundreds or thousands of nodes (e.g. a migration of 1000+ nodes, a view that can load 1000+ nodes, etc.) seem to get exponentially slower as more nodes are processed, and all signs indicate more and more time is spent stuck in mergeTags()
, and specifically in array_unique()
calls.
In one case, testing a normal content migration, running the migration on 100-500 nodes takes only a couple minutes (at a rate of ~300/min), whereas processing a full set of 2000+ nodes takes 50+ min (at a rate of ~20/min).
Digging deeper into XHProf reports and using backtraces, it looks like > 90% of the time is spent in array_unique/mergeTags calls that are caused by calls to PathautoPattern->applies()
for each entity, which triggers a seeming exponential increase in contexts that are checked per entity as more and more entities are created/updated.
Proposed resolution
TBD.
Remaining tasks
TBD.
User interface changes
N/A
API changes
N/A
Data model changes
N/A
Original Issue
I'm still doing some investigation, but from all my debugging it looks like something that happens during the course of Pathauto's operations during pathauto_entity_insert
seems to really start bogging down Migrations after a thousand or two items have been imported. During the beginning of a migration, when there are only a few paths in the system/nodes of this type, nodes are imported at a rate of ~300/min. By the end, nodes are importing at a rate of maybe 5-10/min.
I'm attaching a callgraph of a representative migration run (I've been doing a bunch of shorter runs with --limit=100
to test how slow things start getting).
You can see that things get bogged down during calls to array_unique()
, and it looks like the calls stem from Pathauto applying its pattern during node save, which then calls Context::addCacheableDependency
, which calls CacheableMetadata::merge
, then Cache::mergeTags
(which does the array_unique call).
I'm still investigating further on my end—I've found that if I limit the number of items I import at a time, the migration seems to be much faster/more consistent, so I don't want to make any false conclusions yet. Just wanted to document this here and attach the callgraph in case it does turn out to be something we could fix in Pathauto (or with CacheTags).
Comment | File | Size | Author |
---|---|---|---|
#27 | avoid-cacheablity-metadata-merging-2765729-27.patch | 1.68 KB | Berdir |
#25 | avoid-cacheablity-metadata-merging-2765729-25.patch | 1.64 KB | Berdir |
#9 | xhprof-callgraph-slow-pathauto-no-alias.png | 540.4 KB | geerlingguy |
xhprof-callgraph-slow-pathauto.png | 1.97 MB | geerlingguy |
Comments
Comment #2
BerdirLooks like something crazy is going on with merging cache contexts over and over again.
While triggered by pathauto, this might need to be fixed in core. Adding the related core issue.
Comment #3
BerdirComment #4
BerdirComment #5
geerlingguy CreditAttribution: geerlingguy commentedI'm still doing some more debugging, but one quick note: it looks like the memory isn't the issue—it's pretty reliably staying under 60-70 MB peak no matter how many items I import. I'm going to continue doing more digging.
One note—I was asking Wim Leers about the cache tags, and he said it could be that I have certain dev environment settings set in my
local.settings.php
, and that could cause huge slowdowns:So another thing to test in once I'm back into this later.
Comment #6
catchHaven't reviewed the code, but from the description this sounds like it could be related to #2744715: Generated URLs should only have cache contexts on GET/HEAD - i.e. the cacheability safeguards kicking in when they shouldn't be.
Comment #7
geerlingguy CreditAttribution: geerlingguy commentedMore test results:
ASSERT_ACTIVE
=TRUE
.ASSERT_ACTIVE
=FALSE
.Looking at XHProf's log, it seems that
array_unique()
takes up ~36 min of excl. wall time—Drupal\Component\Assertion\Inspector::assertAll()
takes about ~2 min.So it is annoying and soaks up a little extra time, but not much in comparison to the cache tag stuff. I'm going to see if the patch in #2744715: Generated URLs should only have cache contexts on GET/HEAD helps at all.
Comment #8
geerlingguy CreditAttribution: geerlingguy commentedAfter running some more tests with and without #2744715: Generated URLs should only have cache contexts on GET/HEAD, that seems to have not made a major difference. I'm now running a migration without a pathauto pattern configured for this content type to see if things are faster without Pathauto in the loop.
Comment #9
geerlingguy CreditAttribution: geerlingguy commentedThings sped up a bit when I deleted the Pathauto alias—this was a modified migration from the example benchmarked above, so the absolute numbers are different, but:
Note that
array_unique()
was still by far the biggest burden, even without a pathauto pattern configured (excl wall time of 2,226,133,624 microsec / ~37m!). So maybe the fault doesn't lie with Pathauto in particular... but a similar callgraph of Pathauto events was generated (see attached).Comment #10
geerlingguy CreditAttribution: geerlingguy at Acquia commentedIt seems something is really slow with
mergeTags()
and the caching processing in general, when dealing with more than a few entities. Another person emailed me and explicitly called outmergeTags()
as being the hot path/slowest part of a view render that pulled in a couple thousand nodes...I'm beginning to think this may be a core issue and could have a very major impact on anything that deals with > 500 or so nodes at a time :/
Comment #11
wendellw CreditAttribution: wendellw commentedI seem to be experiencing this issue, or an issue related to this, in views. You can experience this issue by going here: http://simplytest.me/ and using devel generate to generate on Drupal 8 about 3500 nodes on a content type (add some extra fields to the content type).
Next, to experience the issue, go into views and set up a view of that content type to output an html table with a trimmed body field, title, extra fields, etc. It will take minutes+ to return.
Testing on our own system, an inordinate amount of time is spent in mergeTags. Disabling caching and it is faster, but still very (very, very) slow.
There may be deeper issues than just mergeTags but that seems like a good place to start as that is where our call table is spending upwards of half its time.
We also tried adding memcache for drupal 8 (which required a patch from an issue in their issue queue) but we did not have a lot of time to experiment. We created our own module to do views-like work for this one narrow use case as we did not have a lot of time to chase this bug down but will monitor this space eagerly. If there is anything specific anyone would like us to do, let us know, but it seems easy to replicate the results on stock drupal + devel generate.
Comment #12
Wim Leers18K or 57K calls to
array_unique()
shouldn't be this slow… unless it's given enormous arrays. Did you look into that already? I think something like this could be helpful:For now, my money is on
\Drupal\Core\Plugin\Context\ContextHandler::applyContextMapping()
. For every call to it,addCacheableDependency()
is being called 4 times. That's a bit suspicious.Rather than just saying "18K or 57K calls is nothing", let's just prove it: https://3v4l.org/X97m4/perf#tabs runs this code:
As you can see, I'm simulating each of the different edge cases 12.5K times, good for a total of 50K
array_unique()
calls. This takes 0.14 seconds!So, clearly, something must be generating an insane amount of cache tags.
Comment #13
mpp CreditAttribution: mpp as a volunteer and at AmeXio commentedHi, see my remarks in https://www.drupal.org/node/2662044.
I'm experiencing similar performance degradation with migratie and pathauto.
Some things I noted:
* Performance degrades gradually
* Disabling pathauto results in significant speed
* Running optimise table improved speed - the indexes in the migrate tables are quite large and may need to be optimized regularly
* Migrating translations using an event subscriber (postprocessing) is faster than using a separate migration script for translations (as is the best practice in Drupal core)
I find it weird that the wall time would be in a static function as I'd suspect a database schema issue somewhere. Or either a false (cache) optimisation (more cache invalidations and write overhead than reads), the latter could explain the incremental degradation and the calls to array_unique.
If I'm on my laptop I'll have a look at the call graphs.
EDIT: seems that array_unique can be quite slow after all.
See http://www.puremango.co.uk/2010/06/fast-php-array_unique-for-removing-du... & http://stackoverflow.com/questions/8321620/array-unique-vs-array-flip.
Comment #15
geerlingguy CreditAttribution: geerlingguy at Acquia commentedJust as an aside (for my future debugging), it looks like up until processing > 1,000 rows at a time, array_unique isn't the main slowdown. Once we get to around 1k-1.5k items, that's when that function starts rising to the top of the Excl. Wall Time chart above items like copy, file_exists, curl_exec, etc.
Comment #16
geerlingguy CreditAttribution: geerlingguy at Acquia commentedI decided to just dump to the screen the full $a and $b arrays that were passed to
mergeTags()
to see what could possibly be so slow... it looks likemergeTags()
is called twice per migrated node, once with a structure like:Then immediately again with both
$a
and$b
containing the same thing, except for$b
also containing the result of the firstarray_unique()
above.So on every successive node import, array_unique is called:
$a
(with 1 item) and an array$b
(with n items)$a
(with n items/same as$b
last time) and an array$b
(with n + 1 items/result from array_unique's 1st call)I've been letting an import run for about 30 minutes, it's up to around 1000 items per array_unique() call now. I'll see if there's some inflection point where things start going insane. Maybe a cache table where the index or column length isn't optimized for such a huge set of cache tags and things get crazy-bogged-down?
[Edit: So... as the migration is going on longer, I'm noticing that instead of having just two calls to array_unique() per node, there are now dozens of calls to mergeTags() per node, with array $a having the single node (
'node:123'
) and $b having all the nodes up to that point.So something in the process is causing mergeTags() to get called dozens (so far... maybe hundreds once we get near the end) of times. And this would make some sense in terms of raw numbers—if we call array_unique on arrays with a couple thousand items a few hundred times per second... that can be a bit slow.]
Comment #17
geerlingguy CreditAttribution: geerlingguy at Acquia commentedA few other notes from the MySQL query log (per node):
So it looks like there's some cache flushing/rebuilding going on during each node update/save, but nothing out of the ordinary, and no slow queries (at least not yet). Still monitoring the process and will debug further.
Comment #18
geerlingguy CreditAttribution: geerlingguy at Acquia commentedAlso, digging around, it seems
Drupal\pathauto\Entity\PathautoPattern::applies
also appears in the call stack a lot, so I'll see if there's any other juicy details that can help there.Comment #19
catch@geerlinguy could you post a backtrace from the mergeTags() calls (Exception::getTraceAsString() probably easiest)? It'd be good to know where the different calls are coming from.
Comment #20
geerlingguy CreditAttribution: geerlingguy at Acquia commented@catch - Will do! Kicked off another run now. Annoying that this only happens if I import 1000+ nodes :P
Comment #21
geerlingguy CreditAttribution: geerlingguy at Acquia commentedSo, almost every single one of the stack traces followed the pattern:
This jives with the tracing I was doing in XHProf—more than 90% of the time is spent in mergeTags() calls that derive from Pathauto's
PathautoPattern->applies()
calls. And I think having some nodes that also add taxonomy terms, etc. might exacerbate the issue. But the craziest thing is, this content type purposely doesn't have a pattern, in case that can help speed things up. So regardless, it looks like Pathauto does some crazy context work on every node, and if you process more than one node at a time, they are all cached into a giant array throughpathauto_entity_update()
.Glancing quickly through Pathauto's code, it's a little too complex for me to see a quick/patchy fix, and I might not get time to work on a fix right now regardless.
But it does seem that this might be the fault of Pathauto, like I originally presumed—I may try uninstalling it entirely and see if that clears up the performance issue (so that I can be 100% sure) for a test run.
Comment #22
geerlingguy CreditAttribution: geerlingguy at Acquia commentedWith Pathauto uninstalled, the entire migration takes ~10 minutes (instead of ~50-60). I'm 99% sure that the problem lies inside that module, as XHProf now doesn't show
array_unique()
ormergeTags()
in the top 500 function calls by excl. wall time.Moving back to the Pathauto queue.
Comment #23
BerdirWell, it's not that easy.
As you can see, pathauto doesn't do anything with cache tags. That happens some levels below that. We just call setContextValue(), which is a method from core, which we have to call.
Pathauto isn't interested in cache tags, this also has nothing to do with queries, those cache tags are in fact never stored anywhere, they are just collected.
I guess what we can do is override \Drupal\Core\Plugin\ContextAwarePluginBase::setContextValue() to avoid collecting any cache tags.
Not sure if core should do something, not sure if something can be done other than micro-optimisations in that method.
Comment #24
Wim LeersAgreed; it's the Context system that needs further scrutiny.
Comment #25
BerdirFor now, this should help pathauto.
Comment #27
BerdirHad a debug left-over in there, also fixed the schema fail, that's in HEAD and I'll likely commit that separatey.
Comment #29
geerlingguy CreditAttribution: geerlingguy at Acquia commentedNitpick ahead:
Extra newline.
I'm testing the patch now.
Comment #30
geerlingguy CreditAttribution: geerlingguy at Acquia commentedSuccess! With this patch, the migration only takes ~10 minutes (same as without Pathauto enabled).
array_unique()
is still in the top four functions, and takes 22s total time, but that's a lot better than before—before patch, average was 60-90 minutes. After patch, average is 9 minutes 30 seconds! (Only did two runs for each... I don't have all the time in the world.)Comment #32
BerdirThanks for testing, committed.
As written, I still think we should create a core issue to investigate this in the context/cache systems and whether we can optimize this. Can you create an issue for that?
Comment #33
Wim Leers+1
Comment #34
geerlingguy CreditAttribution: geerlingguy at Acquia commentedAgreed... but I'm heading into surgery this week, and am wrapping up a lot of things right now as I'm not sure if it'll be a week or longer before I can get back into contributions. I would like to officially pass that torch to someone else (maybe @Wim Leers?) as I'm sure this issue will be far down on my radar once I'm back in the game :)
That said, thanks again for the fix, this is a lifesaver for that large migration!