Problem/Motivation
If you didn't do the extra work to add fields as Base fields (who does on most projects?) then it results in N number of select queries to retrieve the data. Even when these fields have a cardinality of 1.
SqlContentEntityStorage::loadFromDedicatedTables
I was building a Performance test for a client project and it immediately jumped out at me that all the fields on the user were adding their own dedicated select query. In my case, this was on a user account. But it could just easily be on a node or some other entity. In many, many cases those fields are all single cardinality. So instead of N number of select queries, we could do a single (albeit potentially nasty) JOIN and retrieve all the field data for those in a single query instead of round tripping in a foreach for each of the fields.
Steps to reproduce
The SQL looks like:
'SELECT "t".* FROM "user__field_field1 "t" WHERE ("entity_id" IN (2)) AND ("deleted" = 0) AND ("langcode" IN ("en", "und", "zxx")) ORDER BY "delta" ASC'
'SELECT "t".* FROM "user__field_field2" "t" WHERE ("entity_id" IN (2)) AND ("deleted" = 0) AND ("langcode" IN ("en", "und", "zxx")) ORDER BY "delta" ASC'
'SELECT "t".* FROM "user__field_field3" "t" WHERE ("entity_id" IN (2)) AND ("deleted" = 0) AND ("langcode" IN ("en", "und", "zxx")) ORDER BY "delta" ASC'
'SELECT "t".* FROM "user__field_field4" "t" WHERE ("entity_id" IN (2)) AND ("deleted" = 0) AND ("langcode" IN ("en", "und", "zxx")) ORDER BY "delta" ASC'
'SELECT "t".* FROM "user__field_field5" "t" WHERE ("entity_id" IN (2)) AND ("deleted" = 0) AND ("langcode" IN ("en", "und", "zxx")) ORDER BY "delta" ASC'
'SELECT "t".* FROM "user__field_field6" "t" WHERE ("entity_id" IN (2)) AND ("deleted" = 0) AND ("langcode" IN ("en", "und", "zxx")) ORDER BY "delta" ASC'
The resulting UNION SQL would look something like the following if we needed to handle ALL the things. But if we only looked at cardinality 1 fields, scroll lower for a cleaner approach.
SELECT
'field1' as field_name,
t1.*
FROM "user__field_field1" t1
WHERE t1."entity_id" IN (2)
AND t1."deleted" = 0
AND t1."langcode" IN ('en', 'und', 'zxx')
UNION ALL
SELECT
'field2' as field_name,
t2.*
FROM "user__field_field2" t2
WHERE t2."entity_id" IN (2)
AND t2."deleted" = 0
AND t2."langcode" IN ('en', 'und', 'zxx')
UNION ALL
SELECT
'field3' as field_name,
t3.*
FROM "user__field_field3" t3
WHERE t3."entity_id" IN (2)
AND t3."deleted" = 0
AND t3."langcode" IN ('en', 'und', 'zxx')
UNION ALL
SELECT
'field4' as field_name,
t4.*
FROM "user__field_field4" t4
WHERE t4."entity_id" IN (2)
AND t4."deleted" = 0
AND t4."langcode" IN ('en', 'und', 'zxx')
UNION ALL
SELECT
'field5' as field_name,
t5.*
FROM "user__field_field5" t5
WHERE t5."entity_id" IN (2)
AND t5."deleted" = 0
AND t5."langcode" IN ('en', 'und', 'zxx')
UNION ALL
SELECT
'field6' as field_name,
t6.*
FROM "user__field_field6" t6
WHERE t6."entity_id" IN (2)
AND t6."deleted" = 0
AND t6."langcode" IN ('en', 'und', 'zxx')
ORDER BY field_name, delta ASC
The better aproach, to only pull back cardinality 1 field values. The rest could be done as a union all or the current individual selects.
SELECT
t1.bundle,
t1.deleted,
t1.entity_id,
t1.revision_id,
t1.langcode,
t1.delta,
t1.field_field1,
t2.field_field2,
t3.field_field3,
t4.field_field4,
t5.field_field5,
t6.field_field6
FROM "user__field_field1" t1
LEFT JOIN "user__field_field2" t2
ON t1.entity_id = t2.entity_id AND t2.deleted = 0 AND t2.langcode IN ('en', 'und', 'zxx')
LEFT JOIN "user__field_field3" t3
ON t1.entity_id = t3.entity_id AND t3.deleted = 0 AND t3.langcode IN ('en', 'und', 'zxx')
LEFT JOIN "user__field_field4" t4
ON t1.entity_id = t4.entity_id AND t4.deleted = 0 AND t4.langcode IN ('en', 'und', 'zxx')
LEFT JOIN "user__field_field5" t5
ON t1.entity_id = t5.entity_id AND t5.deleted = 0 AND t5.langcode IN ('en', 'und', 'zxx')
LEFT JOIN "user__field_field6" t6
ON t1.entity_id = t6.entity_id AND t6.deleted = 0 AND t6.langcode IN ('en', 'und', 'zxx')
WHERE t1.entity_id = 2
AND t1.deleted = 0
AND t1.langcode IN ('en', 'und', 'zxx')
Which would have an example result set of:
+--------+---------+-----------+-------------+----------+-------+--------------+--------------+--------------+--------------+--------------+--------------+
| bundle | deleted | entity_id | revision_id | langcode | delta | field_field1 | field_field2 | field_field3 | field_field4 | field_field5 | field_field6 |
+--------+---------+-----------+-------------+----------+-------+--------------+--------------+--------------+--------------+--------------+--------------+
| user | 0 | 2 | 10 | en | 0 | john@ex.com | 555-1234 | New York | Admin | 2024-01-15 | Active |
+--------+---------+-----------+-------------+----------+-------+--------------+--------------+--------------+--------------+--------------+--------------+
Proposed resolution
Instead of running a database query for every single-cardinality field table, run a single query that joins all the field tables.
Screenshots of improvements
Unpatched

With Patch

Unpatched

With Patch

Remaining tasks
User interface changes
Introduced terminology
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #46 | withpatch.png | 17.71 KB | nicxvan |
| #46 | unpatched.png | 8.18 KB | nicxvan |
| #46 | executeWithPatch.png | 192.97 KB | nicxvan |
| #46 | executeWithoutPatch.png | 190.99 KB | nicxvan |
Issue fork drupal-3551308
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
heddnComment #3
heddnFull disclosure, the example SQL queries and results were assisted via Claude AI.
Comment #4
heddnIf we carve out only the single cardinality fields, we could really get a fair amount of improvement on queries. Especially for the 80% situation where most fields are single language and single cardinality, created using the Field UI.
Comment #5
heddnComment #6
catchComment #7
catchBumping this to major, this would potentially save dozens/hundreds of database queries on a page when entity caches are cold, and that can be the case for tens of thousands or more entities/pages if the site has a decent amount of content.
I think we should start with:
and we could open a second issue to look at optimizing multiple cardinality fields after this one.
Comment #9
catchGot something more or less working.
Comment #10
catchThe remaining functional test failure was a bug in ResourceTestBase. Another version of what #2066993: Use magic methods to sync container property to \Drupal::getContainer in functional tests is trying to prevent, but because it sets the entity storage as a class property, the fix from that issue was circumvented.
What was happening is the test was loading an entity, with an entity storage that didn't have knowledge of its field columns - this works by accident in HEAD because the current query is against the tables with a wildcard on the fields, whereas the new query specifies which columns it wants (and didn't get any back). I did not try to fix the entire test, just the bit that causes the bug, but we should open a follow-up to change all the class property services in that test.
Comment #11
catchComment #12
catchUpdated the rest of the performance tests, ready for review I think.
Comment #13
catchWith this MR compared to 11.2.x (e.g. with other already-committed 11.3.x performance fixes applied) we're down to less than 50% of queries on the node page cool cache test - and the 'cool cache' scenario is very realistic for sites with a decent amount of content - e.g. site-wide caches built but all the individual content pages not in the render/dynamic/page cache yet.
Comment #14
godotislateNice work! The difference in the performance stats is impressive.
Some comments on the MR.
Comment #15
catchJust to clarify #13 is relative to 11.2.x not HEAD, this saves about 20 queries in some Umami tests although it could easily be 50 on real sites with a lot of configurable fields on a node type, and sometimes that's multiplied by multiple separate entity loads on a page. However this issue is what's tipping us under 50% compared to 11.2. The other pending one that will have a big impact (and on similar types of requests) is #3529464: Make menu trail behaviour in SystemMenuBlock optional.
I think I either incorporated or replied to all the MR comments.
Comment #16
catchHere's an EXPLAIN against Umami:
I think we can simplify the query more, going to try that and do an EXPLAIN - but probably won't update the performance tests until the new version has had a review.
Comment #17
catchOK yeah - if add $table.langcode = $base_table.langcode to the join, and make the only IN (:langcodes) on the base table, then we get one row per langcode, not one row per field per langcode which means a much smaller result set from MySQL and also less to iterate over in PHP.
Here's the EXPLAIN of the query before this change:
The actual query gives 16 rows of results.
Here's the after:
So the explain looks the same pretty much, but the newer version avoids lots of duplicate rows in the result set.
Comment #18
catchAnd.... we can't reliably do that because 'langcode' is only guaranteed to exist on the field tables, not the base tables, which don't have to define it, as demonstrated by multiple kernel test failures.
Comment #19
godotislateI think this looks good now.
Comment #20
catchThanks for the RTBC but I think I have a way to do #17 without too much complexity or breakage.. Trying quickly now.
Comment #21
catchPushed commits for #17/20 - ran entity + field module kernel tests and the performance tests that should be affected and things look good - see what a full run says.
I think there are a couple of possible follow-ups to open here:
1. We could try to optimize multiple-cardinality fields per the issue summary - if there's more than one multiple-cardinality field it will save a query per-field per-entity load. this could be done internally to the same method so no API implication, it'll be as hard or harder than this issue was.
2. I think with more refactoring we could combine the query from ::loadFromSharedTables() and the query we're changing here - it would need an additional join on the data table and then mapping the fields back from that table, but fundamentally it does a very similar thing. This would save one query per entity load, where there's a shared table for the entity type. This would probably involve deprecating one or both methods since the naming would all be wrong, but the extra join and processing logic might turn out to be not too much extra on top of what we're doing in this issue.
Comment #22
catchNope. #21 changes the semantics of the query relative to what's in HEAD, whereas the previous approach apparently does not. Failures in migration + content_translation tests.
So... let's go back to a langcode condition on each field table. Was worth a try but also not worth holding this up on what could be a complete dead end.
This means everything's the same as code that got RTBCed in #19 again.
Comment #23
godotislateI've confirmed that the MR has the same code as when it was RTBC'd previously, so this can go back to RTBC.
Comment #24
alexpottLooking at the queries in the performance tests....
Should we be looking to combine these two. Otherwise as the performance test shows - for users we've got the same numbers of queries but we've a replace a query on a single table to a multiple table query.
Comment #25
catchOpened #3561960: Combine the database queries in ::loadFromSharedTables() and ::loadFromDedicatedTables() for #21.2/#24.
Comment #26
alexpottThe other question I have is what happens when we have 100 or so single cardinality fields on an entity - yes we will get 99 less queries which sounds fantastic but do we have problems with limits like mysql's max_allowed_packet?
I would be good to have some idea of where the limits lie and perhaps a change record so we can make any necessary recommendations. Also if these single cardinality fields are all blobs of HTML we're likely to have some interesting issues.
Comment #27
catchSo I am not too worried about client-side max allowed packet. It's a big query to send, but it's no worse than a lot of views queries - don't think it'll hit 16mb.
However we could easily run into max_allowed_packet issues on the server side, where the mysql default is 32mb. Anecdotally I semi-regularly here about people managing to load thousands of entities at once.
With the current implementation, the same field values appear multiple times in the results - this is what I was trying to fix in https://git.drupalcode.org/project/drupal/-/merge_requests/14048/diffs?c... - which you would hope would lead to them appearing only once each, but instead leads to some of them appearing zero times in those tests. There's only a handful of test failures with that approach, so it could be bad test content or some kind of tweak needed, but didn't work out what the problem was yet.
Even if we resolved that, there's the chance of an entity type having one text field that can have loads of text in it, and also having lots of small fields, and this tipping multiple entity loads that were previously OK over the edge. An example would be d.o issue comments where you can have 500 comments loaded at a time.
A workaround would be to chunk the fields, so instead of doing all of them, we group the fields into groups of ten, and do exactly the same logic on each group of 10. This would still reduce 100 queries to 10, just not to 1.
Comment #28
catchMore max_allowed_packet thoughts.
The main thing that causes entity queries to go over max_allowed_packet is long text/blob fields, and it's unusual to have more than one of those on an entity bundle. So if a field has e.g. 2mb of text, how many kb is even 100 fields that are not blobs going to be? Usually not very much on top of the 2mb.
So the bigger risk here is the duplicate rows that #17 was trying to fix, because that increases the amount of bytes that come back from MySQL even though the actual data set is not that much bigger. So revisiting that again - was able to fix the two kernel tests, see if that improves the rest.
Comment #29
catchAfter going back and forth on it more than a couple of times, MR is now a green version of the #17 - just needed to join on the revision data table where available, instead of only the revision table, and that allows it all to work.
This gets us back down to one row per-language, per-entity, instead of multiple.
Comment #30
godotislateText with summary fields have 2 blob columns, and even when it is deprecated/removed, people might start using a separate text field for summary text. That being said, generally the contents of a summary field should be a lot shorter than the main text, but even if it were, that gets us to around 4MB total? Which should still be OK?
Nice catch on the missing revision data table join. New changes look good.
Comment #31
catchAdded an explanation of the additional cache get on the MR. On that specific request the only entity being loaded is the user, all the other performance tests either load other entities as well (which already requires that cache item in HEAD) or no entities at all, so are unaffected in either direction.
Also created a change record noting the possibility of max_allowed_packet issues:
https://www.drupal.org/node/3562172
Then did a bit of investigation about how much we might run into max_allowed_packet issues:
This is what the $row variable looks like for two rows of the node load query on the umami front page:
I put the array structure into gedit and it comes out at 2kb. If I cut out the rest of the array structure other than the field_recipe_instruction + bundle to get closer to the result in HEAD, it comes out as 660bytes.
So that means a handful of fields including a varchar adding about 1.4kb per language, per entity.
If we account for ten times as many fields, so about 30-40, that's 14kb per language per entity. We could have 30 languages, that would make it 420kb per entity. However 50 fields on 30 translations seems very extreme. And you could still load 50+ of those at once before running into problems.
I also copy and pasted an example 8,000 word essay into gedit, checked the word count to make sure it really was 8k words, and that worked out at 52kb. So a single text field with a 'long read' level of content in it is still more bytes to transfer than dozens or hundreds of small fields.
Comment #32
catchLooking at that makes me realise we can drop the 'deleted' column from the query, it's only ever going to be 0.
And we can also drop 'bundle' because that's already available in the 'values' array.
That reduces the original 2kb for the result in #31 down to 1.6kb, so instead of 1.4kb extra data transferred compared to the text field, down to 1kb.
Comment #33
catchAfter #32 realised we can also skip the langcode column - this is all made possible by the join change from #17.
So each row (i.e. per language for an entity) now looks like:
The total bytes for the original array structure (in the text editor) was
2212the total bytes for the slimmed down array is1302, so a bit over half the size.An additional side-effect of this is that it should reduce the bytes transferred over MySQL across the total request - because the multiple queries we're replacing included all the columns for each field, and the new one only includes the field value. If you're using redis for caching, that's potentially quite a lot of overall network traffic to MySQL saved. No I'm not planning to quantify this precisely, but maybe 900 bytes per language per entity with the number of fields in Umami, a lot more with a lot more fields. :)
I've pushed that change without updating the performance tests, because that's quite time consuming for this issue and I've learned my lesson from updating queries that had to change an hour later before. But if all other tests pass, I'll update them.
Comment #34
catchThat works but we can also remove the ID from the field table query too now, the actual query we're sending is now looking a lot more reasonable, as well as the result set being much smaller, so double checking that locally now. Glad I held off updating the 25 or so query strings in performance tests for a bit...
Comment #35
catchBefore the recent changes:
Current status:
Comment #36
amateescu commented@catch, I think it should be safe to exclude all "extra" columns (
DefaultTableMapping::getExtraColumns()) since they only repeat the entity data, not the field data we're looking for here.Edit: just looked at the latest changes, and the MR is now only adding the columns from
::getColumnNames(), which is great!Comment #37
catchtl;dr of the latest changes:
1. We join on the langcode instead of adding a condition on the langcode to each table.
2. We only add the actual field columns as fields to the query, none of the other fields on the field tables like id, bundle, langcode because these can be retrieved from elsewhere.
This means that the query only returns 1 row per language per entity (the minimum), and also each row contains the minimum amount of data we need to build the field record.
Because we're avoiding loading bundle, langcode, id, deleted etc. for each field, even though the size of the result from the query we execute is bigger than one of the individual field queries would have been, the total bytes returned across all queries should be reduced along with the number of queries - i.e. because we load the fields in one go, we don't need to load duplicate data that's common to all of the fields.
This contrasts with earlier approaches on the issue where it was either more or about the same.
MR is green again.
I think we could possibly open another follow-up to look into removing the 'bundle' column from the field tables because it duplicates the base/revision table, but not sure if that is actually possible or not given entity queries. We'd probably need to refactor the multiple cardinality field queries a bit firstr. A proof of concept would tell us if it's used or not otherwise.
Comment #38
godotislateI pulled the lastest locally again and set a breakpoint. The resulting single cardinality query on the Umami front page looks like this cast to string, which essentially matches #35 and the change in the performance test:
I think this looks good. Added needs follow up tag for #37, to look at optimizing multiple cardinality query, or dropping the bundle column from field tables, or both.
Made a very minor edit to the CR.
Comment #39
catchI had already opened #3561960: Combine the database queries in ::loadFromSharedTables() and ::loadFromDedicatedTables() but failed to link it.
Had a very cursory look at the bundle column stuff, and it might not be blocked on that issue at all, so opened #3562341: Consider dropping the 'bundle' column from field tables as a separate follow-up to explore that one.
Comment #40
catchComment #41
alexpottGetting this in 11.x nice and early so we can hopefully find out if this causes issues earlier rather than later...
Committed 223992a and pushed to 11.x. Thanks!
Comment #44
godotislateCR needs to be published: https://www.drupal.org/node/3562172
Comment #45
catchPublished the CR! Thanks for the various reviews here, I was pleased just to get it working at all, but the refinements I hope got us somewhere very efficient. Hopefully learnings here will make #3561960: Combine the database queries in ::loadFromSharedTables() and ::loadFromDedicatedTables() easier to work on too.
Comment #46
nicxvan commentedSpent a few hours doing performance tests over slack so I'm adding some of those screenshots here for posterity.
Forgot I didn't comment on the work I did here, it probably qualifies for credit on this issue, but I'll defer to the committers.
Comment #47
heddnAdding screenshots to the IS for posterity.
Comment #48
heddnComment #49
mondrakeThis seems to have broken Build tests on PHP 8.3 with
Error: Call to undefined function Drupal\Core\Entity\Sql\array_any() in Drupal\Core\Entity\Sql\SqlContentEntityStorage->loadFromDedicatedTables() (line 1289 of /tmp/build_workspace_5a5e3e47a81f62fac6f1a5e00acc18f2K8Lil7/project/web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php)array_anywas introduced in PHP 8.4.Comment #50
catchWe can switch back to array_filter() easily enough to workaround the polyfill not being loaded in build tests, then open a follow-up to look into what's going on with the build tests.
Comment #52
catchOpened an MR for that.
Comment #53
mondrakeLooks good to me.
Comment #54
xjmCommitted the hotfix to 11.x. Thanks!
Comment #58
nicxvan commentedCreated the follow up here: https://www.drupal.org/project/drupal/issues/3563359
Bit of a stub for now, but we can track it there.