Problem/Motivation

Entity queries have a "simple mode" which does not group by entity revision ID and entity ID and in some cases can be significantly more performant. It is not generally applicable because without the explicit grouping the same entity (revision) rows will appear multiple times in the case of multi-value fields and this messes with various expectations.

However, if we know the query is limited to a single result and there is no offset, this does not matter so we can still enable simple mode to avoid the performance penalty.

Steps to reproduce

Run

(string) \Drupal::entityTypeManager()->getStorage('node')->getQuery()
  ->accessCheck(FALSE)
  ->sort('changed', 'DESC')
  ->range(0, 1);

The result is:

SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid", max(node_field_data.changed) AS "expression"
FROM
"node" "base_table"
LEFT JOIN "node_field_data" "node_field_data" ON "node_field_data"."nid" = "base_table"."nid"
GROUP BY "base_table"."vid", "base_table"."nid"
ORDER BY "expression" DESC
LIMIT 1 OFFSET 0

Proposed resolution

Enable simple query mode for limit 1 queries with no offset.

The changes the resulting query to be the following:

SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid", "node_field_data"."changed" AS "changed"
FROM
"node" "base_table"
LEFT JOIN "node_field_data" "node_field_data" ON "node_field_data"."nid" = "base_table"."nid"
ORDER BY "node_field_data"."changed" DESC
LIMIT 1 OFFSET 0

Remaining tasks

User interface changes

-

Introduced terminology

-

API changes

-

Data model changes

-

Release notes snippet

Issue fork drupal-3549946

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

tstoeckler created an issue. See original summary.

tstoeckler’s picture

Title: Use simple query mode for entity queries with limit 1 » Use simple query mode for entity queries with limit 1 and no offset
Issue summary: View changes
Status: Active » Needs review

Added MR and updated issue summary to be explicit about not allowing an offset either, as that's important.

smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +needs profiling

Tempted to tag for tests coverage to show this is needed.

But could we get profiling stats to show this is an improvement please

tstoeckler’s picture

Hmm... I'm a bit confused. Not removing the tag but I don't quite understand why this would need profiling. As can be seen from the change in OpenTelemetryNodePagePerformanceTest the query does get changed in the correct way with the patch and just from looking at it you can see that it's simpler and technically more performant, and thus I would consider the change correct just on that alone.

However, it's not going to make any measurable difference in most setups. Even with a crazy amount of content I'm not sure you will see a reliably measurable difference. How I found this was on a site with a lot of nodes with a query with a whole bunch of conditions where we want to find the (1) node with the latest changed timestamp among the nodes that meet the conditions. Without this patch the query took ~6 seconds, the patch took that down to ~.5 seconds. Needless to say, the fact that this query is so slow (I'm quite aware that even .5 seconds for a query that finds a single result is hugely problematic) is a problem on our end and something that cannot be resolved just with this patch, but since it's such a nicely contained fix and it fits quite well into the existing API (i.e. I would understand your reservation better if this patch were introducing the concept of a simple query mode) I don't understand the reservation or the need for explicit profiling.

Maybe you can elaborate more explicitly what results you would expect from the profiling that would validate this patch in your eyes. Thanks!

berdir’s picture

Status: Needs work » Needs review
Issue tags: -needs profiling

Yes, it will not be possible to profile this in a reliable way, but less GROUP BY is always good.

The impact is visible in one query in the performance tests, no existing tests break, I don't think there is a straightforward way to test this explicitly as we afaik do not test anywhere what kind of query are actually generated by entity queries, just their results and that doesn't change.

Didn't fully think this through yet, so just setting back to needs review for now.

smustgrave’s picture

Wouldn't we see an impact in the OpenTelementary test?

berdir’s picture

You do see a difference, the executed query changed. Performance tests assert which queries run and how many. We do not assert any time constraints, because that is too variable. Even if we would, with only a handful of nodes, we're talking about nanoseconds if anything at all. If you run those tests locally, you'll see likely see seconds of differences between each run, depending on what your system as well as those cloud kubernetes servers are doing at the same time. In other words, the savings of this are a tiny fraction of the differences this test has without any changes.

But if you have a million revisions, then things will look quite different, as you can see in the example in #5, where those queries then run 12x faster and save 5.5s.

smustgrave’s picture

Okay I'll leave for you to review then

berdir’s picture

Status: Needs review » Reviewed & tested by the community

I had another look at the altered query. It's a revision query, so there can be multiple different revisions there, and it has a sort on it. I tried to think of a case where there could be a problem with this with revisions or sorting or something, but I couldn't really think of one. Limit 1 is still Limit one and should always result in the same ID being returned.

FWIW, I doubt that specific query results in an improvement at all, there's still the sort and joins and what not. But it also certainly doesn't make it worse.

catch’s picture

I think with #5 it would not hurt to have the before/after EXPLAIN from your site with lots of content - but also this is obviously simpler so doesn't seem to be required.

  • catch committed 3da14ca4 on 11.3.x
    fix: #3549946 Use simple query mode for entity queries with limit 1 and...

  • catch committed 4ee8d86b on 11.x
    fix: #3549946 Use simple query mode for entity queries with limit 1 and...
catch’s picture

Version: 11.x-dev » 11.3.x-dev
Status: Reviewed & tested by the community » Fixed

Let's go ahead here, it's definitely going to be an improvement, how much will depend on which query is running and the data on the site.

Committed/pushed to 11.x and cherry-picked to 11.3.x, thanks!

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

tstoeckler’s picture

Awesome, thanks! Sorry, for missing to follow-up to #11, but agreed that it's an improvement either way, however minuscule it ends up being on any given site...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.