Problem/Motivation
Entity queries have a "simple mode" which does not group by entity revision ID and entity ID and in some cases can be significantly more performant. It is not generally applicable because without the explicit grouping the same entity (revision) rows will appear multiple times in the case of multi-value fields and this messes with various expectations.
However, if we know the query is limited to a single result and there is no offset, this does not matter so we can still enable simple mode to avoid the performance penalty.
Steps to reproduce
Run
(string) \Drupal::entityTypeManager()->getStorage('node')->getQuery()
->accessCheck(FALSE)
->sort('changed', 'DESC')
->range(0, 1);
The result is:
SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid", max(node_field_data.changed) AS "expression"
FROM
"node" "base_table"
LEFT JOIN "node_field_data" "node_field_data" ON "node_field_data"."nid" = "base_table"."nid"
GROUP BY "base_table"."vid", "base_table"."nid"
ORDER BY "expression" DESC
LIMIT 1 OFFSET 0
Proposed resolution
Enable simple query mode for limit 1 queries with no offset.
The changes the resulting query to be the following:
SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid", "node_field_data"."changed" AS "changed"
FROM
"node" "base_table"
LEFT JOIN "node_field_data" "node_field_data" ON "node_field_data"."nid" = "base_table"."nid"
ORDER BY "node_field_data"."changed" DESC
LIMIT 1 OFFSET 0
Remaining tasks
User interface changes
-
Introduced terminology
-
API changes
-
Data model changes
-
Release notes snippet
Issue fork drupal-3549946
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3549946-simple-entity-query-limit-1
changes, plain diff MR !13387
Comments
Comment #3
tstoecklerAdded MR and updated issue summary to be explicit about not allowing an offset either, as that's important.
Comment #4
smustgrave commentedTempted to tag for tests coverage to show this is needed.
But could we get profiling stats to show this is an improvement please
Comment #5
tstoecklerHmm... I'm a bit confused. Not removing the tag but I don't quite understand why this would need profiling. As can be seen from the change in
OpenTelemetryNodePagePerformanceTestthe query does get changed in the correct way with the patch and just from looking at it you can see that it's simpler and technically more performant, and thus I would consider the change correct just on that alone.However, it's not going to make any measurable difference in most setups. Even with a crazy amount of content I'm not sure you will see a reliably measurable difference. How I found this was on a site with a lot of nodes with a query with a whole bunch of conditions where we want to find the (1) node with the latest changed timestamp among the nodes that meet the conditions. Without this patch the query took ~6 seconds, the patch took that down to ~.5 seconds. Needless to say, the fact that this query is so slow (I'm quite aware that even .5 seconds for a query that finds a single result is hugely problematic) is a problem on our end and something that cannot be resolved just with this patch, but since it's such a nicely contained fix and it fits quite well into the existing API (i.e. I would understand your reservation better if this patch were introducing the concept of a simple query mode) I don't understand the reservation or the need for explicit profiling.
Maybe you can elaborate more explicitly what results you would expect from the profiling that would validate this patch in your eyes. Thanks!
Comment #6
berdirYes, it will not be possible to profile this in a reliable way, but less GROUP BY is always good.
The impact is visible in one query in the performance tests, no existing tests break, I don't think there is a straightforward way to test this explicitly as we afaik do not test anywhere what kind of query are actually generated by entity queries, just their results and that doesn't change.
Didn't fully think this through yet, so just setting back to needs review for now.
Comment #7
smustgrave commentedWouldn't we see an impact in the OpenTelementary test?
Comment #8
berdirYou do see a difference, the executed query changed. Performance tests assert which queries run and how many. We do not assert any time constraints, because that is too variable. Even if we would, with only a handful of nodes, we're talking about nanoseconds if anything at all. If you run those tests locally, you'll see likely see seconds of differences between each run, depending on what your system as well as those cloud kubernetes servers are doing at the same time. In other words, the savings of this are a tiny fraction of the differences this test has without any changes.
But if you have a million revisions, then things will look quite different, as you can see in the example in #5, where those queries then run 12x faster and save 5.5s.
Comment #9
smustgrave commentedOkay I'll leave for you to review then
Comment #10
berdirI had another look at the altered query. It's a revision query, so there can be multiple different revisions there, and it has a sort on it. I tried to think of a case where there could be a problem with this with revisions or sorting or something, but I couldn't really think of one. Limit 1 is still Limit one and should always result in the same ID being returned.
FWIW, I doubt that specific query results in an improvement at all, there's still the sort and joins and what not. But it also certainly doesn't make it worse.
Comment #11
catchI think with #5 it would not hurt to have the before/after
EXPLAINfrom your site with lots of content - but also this is obviously simpler so doesn't seem to be required.Comment #14
catchLet's go ahead here, it's definitely going to be an improvement, how much will depend on which query is running and the data on the site.
Committed/pushed to 11.x and cherry-picked to 11.3.x, thanks!
Comment #17
tstoecklerAwesome, thanks! Sorry, for missing to follow-up to #11, but agreed that it's an improvement either way, however minuscule it ends up being on any given site...