Problem/Motivation
loadEntityByUuid() is really slow as entity queries are slow. It's a query builder API on top of another query builder, involving all kinds of services and hooks. Right now, we even run access alters on it, see #2878483: loadEntityByUuid() should skip access checks.
For most entity types, that's unfortunate, but not a huge deal. Except block_content. Because every placed block on a page results in an uncached loadEntityByUuid() (see #2479459: BlockRepository::getVisibleBlocksPerRegion() does an uncached entity query on every request) as part of access checking. So that means even on pages that a block is not actually shown.
Placing a lot of block_content blocks for a very specific purpose on specific pages is a pretty common thing to do to solve all kinds of problems, especially by not so experienced people. On one of the sites I'm currently looking at, I'm counting 10 such blocks. That's 10 entity queries on *every* page.
Proposed resolution
Luckily, there's also a pretty limited amount of block_content entities on sites, at least not as much as storing a list of them all by UUID is a problem (we load them *all* completely to create block plugin derivates for examples.).
So what we can do is add a cache collector implementation that loads them in a single cache get.
Curently implementing this implicitly through a new entity storage handlers for block_content that overrides loadbyProperties(). Other approaches would be possible as well, including calling that service directly from the plugin as well as thinking about more generic solutions.
Remaining tasks
* Do we need to worry about *changing* UUID's? Is that something we support for existing entities?
* Tests (what exactly? unit tests of the storage? we have plenty of indirect integration tests)
User interface changes
API changes
Data model changes
Comment | File | Size | Author |
---|---|---|---|
#16 | block_content-load-by-uuid-cache-2878673-16-interdiff.txt | 589 bytes | Berdir |
#16 | block_content-load-by-uuid-cache-2878673-16.patch | 8.44 KB | Berdir |
#12 | block_content-load-by-uuid-cache-2878673-12.patch | 8.42 KB | Berdir |
#7 | Selection_153.png | 9.5 KB | Berdir |
#5 | block_content-load-by-uuid-cache-2878673-5-interdiff.txt | 488 bytes | Berdir |
Comments
Comment #2
BerdirComment #3
larowlanLooks good, makes sense - few observations
, pretty sure access control handlers deny updates to UUID field, so we're good there.
derivatives?
Is it worth instead using getAggregateQuery here and getting the whole uuid => id map in one query?
c/p error
Missing param comment
nit ===
Comment #4
Berdir1. Yeah, looks like derivates kind of exists but is very seldomly used (https://forum.wordreference.com/threads/derivative-derivate.858795/). Might be a german thing that it sounds ok in my head :)
2. We could, but we'd also load UUID's that we never need and it wouldn't be a cache collector anymore, just a service that caches.
3./4. Every time I have to do this nonsense I think about opening a policy issue to make docblocks on constructors optional :)
5. I have different opinions on strict type checks than most core developers, but sure, can change that ;) *
I also forgot to actually specify the storage, so the patch above didn't do anything :)
Interdiff created with interdiff, didn't have the previous changes committed.
* I'm pretty sure I have seen more bugs in core *because* of type safe checks than the other way round :)
Comment #5
BerdirAnd now I apparently forgot how to correctly implement a resolveCacheMiss() method, it's too late to do performance optimizations :)
Comment #6
larowlan5: But then there are things like https://blog.sucuri.net/2017/02/content-injection-vulnerability-wordpres... ;)
Comment #7
BerdirOk, based on a quick test on an actual site with 6 placed blocks for block_content entities, I'm saving about 5.7ms, spending 1.2 instead of 7ms in loadEntityByUuid().
Not a huge amount but also saves a bit of memory and that's happening on every request that's not a dynamic page cache hit.
I wouldn't be surprised to see sites with a few dozen block_content placements, I also have a case where we have a frontpage that consists mostly of block_content blocks in page_manager, that should also benefit from this. I'd expect the time to grow more or less linear with more blocks, will do some more tests later.
Comment #8
dawehnerThis change reminds me about #2690747: [PLAN] Create an index of UUIDs
You seem to have accidentally stopped in the middle of the
Should we check whether we get something as result back?
Comment #9
BerdirYeah, #2690747: [PLAN] Create an index of UUIDs is related, but this is still a useful improvement IMHO as having the index/table would still result in a query per UUID but this is a single cache get. But we can possibly remove the @todo that I added. Wondering if we'd suddenly stop calling loadByProperties() for an uuid load with that issue, though, maybe we should switch to an approach that explicitly uses the lookup in the block plugin.
2. I was thinking about caching misses as well on purpose, but thinking about it again, I don't think that's needed and might cause problems as we would need to invalidate when new entities are created. Will remove.
Comment #11
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThis seems to be very close, here's a few nitpicks that I found:
Ids -> IDs ;)
Wraps over 80 cols.
Should we use
\Drupal\Component\Uuid\Uuid::isValid()
instead of the is_string() check? Or it doesn't matter in this case?Comment #12
BerdirUpdated. Refactored now to make it a specific service that we use explicitly in the block content plugin. I think that makes the API easier and is also less risky to do something unexpected.
interdiff is twice as big as the patch, so not included :)
Comment #13
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedLooks very clean and simple!
Comment #14
edurenye CreditAttribution: edurenye at ENDPHASYS Technologies commentedRTBC +1
Looks great!
Can it be added to d8.4?
Comment #15
catchFor backport it'd be good to explicitly mark the cache collector as @internal and the service as private in services.yml
Comment #16
Berdir@catch: Making it @internal makes sense but private services can't be used with ContainerFactoryPluginInterface. And even if we'd use it in an actual service, that would mean that each service who would use it would receive its own instance, would would likely cause the weirdest issues with a cache collector service.
Comment #17
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented#16 makes sense, back to rtbc :)
Comment #20
catchCommitted/pushed to 8.5.x and cherry-picked to 8.4.x. Thanks!
Comment #21
Wim LeersHurray! This should help #2479459 — see #2479459-95: BlockRepository::getVisibleBlocksPerRegion() does an uncached entity query on every request.