After upgrading to 8.4 we noticed a performance decrease on one of our sites. The page that is taking minutes to load is a multilanguage page with several field collections and revisions enabled.

Some debugging lead me to the change in issue 2864995.

This change introduces a latestRevision() method which is used by QuickEdit (this explains why it is only when logged in). On a field collection that has about 4000 revisions, this query takes seconds to run.

If I understand correctly the change introduces a self-join to the base revision table. Because of this, for each revision in the table, a set of revisions + next revisions is added to the result. From this result only the row that has no next revision (is null) is used (in my case 7000000 rows are searched for the one that has no next revisions).

base_table.revision_id base_table_2.revision_id
... ...
335916 335986
335916 336066
335986 336066
336066 NULL

Maybe we can think of a more efficient way of querying the latest revision?

Note: Please do not create patches or re-roll previous patches. Use the Merge Request functionality.

CommentFileSizeAuthor
#139 2950869-139.patch13.16 KB_pratik_
#134 interdiff-133-134.txt8.8 KBRassoni
#134 2950869-134.patch13.3 KBRassoni
#133 interdiff_131-133.txt1021 bytespooja saraah
#133 2950869-133.patch5.63 KBpooja saraah
#131 reroll_diff_124-131.txt4.14 KBravi.shankar
#131 2950869-131.patch5.59 KBravi.shankar
#124 entity-slow-query-2950869-124.patch5.64 KBsaranya ashokkumar
#85 node_revision_lenovo_stored_proc.txt1.72 KBbryanmanalo
#85 drupal-core-node-revision-lenovo.patch1.48 KBbryanmanalo
#81 2950869-81.patch5.62 KBalexpott
#81 71-81-interdiff.txt1.61 KBalexpott
#79 71-79-interdiff.txt1.4 KBTimmy_Cos
#79 2950869-79.patch5 KBTimmy_Cos
#71 2950869-71.patch5 KBalexpott
#71 70-71-interdiff.txt2.78 KBalexpott
#70 2950869-70.patch4.57 KBalexpott
#70 2950869-70.test-only.patch1.49 KBalexpott
#69 2950869-69.patch4.11 KBalexpott
#69 67-69-interdiff.txt5.53 KBalexpott
#67 2950869-67.patch3.61 KBalexpott
#67 65-67-interdiff.txt873 bytesalexpott
#65 2950869-65-optimizing-revision-query.patch2.96 KBabhishek-anand
#64 2950869-64-optimizing-revision-query.patch2.93 KBabhishek-anand
#63 2950869-63-optimizing-revision-query.patch2.01 KBabhishek-anand
#59 2950869-59-optimizing-revision-query.patch1.09 KBadam.weingarten
#58 2950869-58-optimizing-revision-query.patch1017 bytesadam.weingarten
#55 Seconds to create each new translation of a node.png33.69 KBmatsbla
#52 interdiff-52.txt621 bytesamateescu
#52 2950869-52-for-review.patch22.11 KBamateescu
#52 2950869-52-combined.patch91.65 KBamateescu
#50 interdiff-50.txt3.88 KBamateescu
#50 2950869-50-for-review.patch22.11 KBamateescu
#48 interdiff-48.txt705 bytesamateescu
#48 2950869-48-for-review.patch18.95 KBamateescu
#48 2950869-48-combined.patch81.29 KBamateescu
#46 2950869-46-combined.patch77 KBamateescu
#44 2950869-44-combined.patch77.03 KBamateescu
#41 interdiff-41.txt3.89 KBamateescu
#41 2950869-41-for-review.patch18.26 KBamateescu
#41 2950869-41-combined.patch74.5 KBamateescu
#38 interdiff-38.txt1.75 KBamateescu
#38 2950869-38-for-review.patch14.37 KBamateescu
#38 2950869-38-combined.patch93.02 KBamateescu
#36 2950869-36-combined.patch92.01 KBamateescu
#34 2950869-34-combined.patch87.53 KBamateescu
#32 interdiff-32.txt1.37 KBamateescu
#32 2950869-32.patch13.36 KBamateescu
#32 2950869-32-combined.patch65.52 KBamateescu
#30 2950869-28-update-workspace-queries.txt2.74 KBamateescu
#30 2950869-28-update-the-query-for-retrieving-latest-revisions.txt1.75 KBamateescu
#30 2950869-28-change-entity-query-to-use-expressions-instead-of-fields.txt8.93 KBamateescu
#30 2950869-28.patch12.64 KBamateescu
#30 2950869-28-combined.patch65.17 KBamateescu
#25 explain-latest-revision-with-revision-parent.png22.5 KBamateescu
#25 explain-latest-revision-head.png22.22 KBamateescu
#6 explain revision select.txt1.81 KBcorneboele

Issue fork drupal-2950869

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

corneboele created an issue. See original summary.

Berdir’s picture

Priority: Normal » Major

I think we discussed adding an explicit flag or so, sounds like we might need to re-think that.

Is ther possibly an index that we could add? Indexes also always result in an additional cost, though.

Making this major for now. A performance regression in a minor is a problem.

timmillwood’s picture

corneboele’s picture

@timmillwood Hmm, I think it has the same title, but is a different issue. That one is more views related. Also when I update to 8.5, the problem persists.

catch’s picture

Version: 8.4.x-dev » 8.6.x-dev
Issue tags: +Performance

Switching version. Tagging for performance.

Could you post an EXPLAIN for the query?

corneboele’s picture

FileSize
1.81 KB
catch’s picture

Priority: Major » Critical

Marking #2957334: Slow query > 30 seconds in quickedit module as duplicate and bumping this to critical in its stead.

camoa’s picture

My customer found the issue on a similar query:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM
node_revision base_table
LEFT OUTER JOIN node_revision base_table_2 ON base_table.nid = base_table_2.nid AND base_table.vid < base_table_2.vid
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
WHERE (base_table_2.nid IS NULL) AND (node_field_data.nid = '666');

Where 666 is the home page, if we explain this query:

+----+-------------+-----------------+------+----------------------------------------------+-----------+---------+----------------------------+------+--------------------------------------+
| id | select_type | table           | type | possible_keys                                | key       | key_len | ref                        | rows | Extra                                |
+----+-------------+-----------------+------+----------------------------------------------+-----------+---------+----------------------------+------+--------------------------------------+
|  1 | SIMPLE      | node_field_data | ref  | PRIMARY,node__id__default_langcode__langcode | PRIMARY   | 4       | const                      |   10 | Using index                          |
|  1 | SIMPLE      | base_table      | ref  | node__nid                                    | node__nid | 4       | const                      | 3328 | Using where; Using index             |
|  1 | SIMPLE      | base_table_2    | ref  | PRIMARY,node__nid                            | node__nid | 4       | amdcom_prod.base_table.nid |   82 | Using where; Using index; Not exists |
+----+-------------+-----------------+------+----------------------------------------------+-----------+---------+----------------------------+------+--------------------------------------+

camoa’s picture

the left join makes the results for 3k node_revision records for the home page nid to get over 5M results when just running:

select base_table.vid AS vid, base_table.nid AS nid  FROM  node_revision base_table  LEFT OUTER JOIN node_revision base_table_2 ON base_table.nid = base_table_2.nid AND base_table.vid < base_table_2.vid where base_table.nid = '666';

This seems to be the code culprit for that left outer join:
core/lib/Drupal/Core/Entity/Query/Sql/Query.php Lines 125 to 129

    // Add a self-join to the base revision table if we're querying only the
    // latest revisions.
    if ($this->latestRevision && $revision_field) {
      $this->sqlQuery->leftJoin($base_table, 'base_table_2', "base_table.$id_field = base_table_2.$id_field AND base_table.$revision_field < base_table_2.$revision_field");
      $this->sqlQuery->isNull("base_table_2.$id_field");
    }

Why a left join and not the revision_timestamp? I assume it has to do with current revision?

Berdir’s picture

Beside improving the query itself, we are also calling it way too often.

There doesn't seem to be any kind of static caching for this (e.g. in \Drupal\Core\Entity\ContentEntityBase::isLatestRevision or in the storage), but you can see that quickedit_preprocess_field() and quickedit_entity_view_alter() is calling it not just for each entity but again for each *field* being rendered for that entity. So we're not just doing this call once but maybe 10 times (or 30, if you have a lot of fields being displayed) on a site with quickedit enabled.

And in \Drupal\content_moderation\aEntityOperations::entityView(), we are *first* checking if it's the latest revision and only after that check if it's the default revision. I think we can also optimize that a lot and if the entity is the published default revision, then return early and save lots of queries and moderation entity loading.

Berdir’s picture

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

AMDandy’s picture

@berdir, those are the same issue. Is there another issue for this somewhere?

Berdir’s picture

Oops. It was referenced as a child issue in the sidebar: #2983837: Optimize EntityOperations::entityView().

Wim Leers’s picture

#3 linked to #2949939: The latest_revision ViewsFilter has poor performance, which in turn linked to #2865579: Rewrite the 'Latest revision' views filter and remove the revision_tracker table. Adding those to the related issues to help resolve this.

I'll leave it to Workflow Initiative Coordinators to decide whether it's okay to mark #2949939: The latest_revision ViewsFilter has poor performance as a duplicate of this.

Wim Leers’s picture

amateescu’s picture

#2949939: The latest_revision ViewsFilter has poor performance is about the Views query for the latest revision, while this one is about the Entity Query part, so they're not really duplicates..

Wim Leers’s picture

Issue tags: +Needs title update

If they're not duplicates, let's make that clear in the issue titles: let's make one clearly about Entity Query, and one clearly about Views queries.

Wim Leers’s picture

Wim Leers’s picture

Title: latestRevision() very slow with lots of revisions » Entity queries querying the latest revision very slow with lots of revisions
Issue tags: -Needs title update

#17: updated #2949939: The latest_revision ViewsFilter has poor performance issue title, now also updated this. Hope you like it!

catch’s picture

On a field collection that has about 4000 revisions, this query takes seconds to run.

This is going to be a problem independent of the static caching, I think whether it's critical or not depends on whether we consider 4,000 to be a 'reasonable' number of revisions. Or if not, what a reasonable number of revisions would be, and how bad the query is with that number.

This is an editor-facing bug that happens when just browsing around a site - (i.e. not under admin/*) so tend towards leaving it critical.

effulgentsia’s picture

Separate from this issue, but just a shameless request for input on #2997931: Revision tables are queried even when loading the default revision. Thanks.

matsbla’s picture

Another way to avoid the problem might be to not create field revisions when the field data hasn't changed.
#2960887: Do not create field revisions when field data hasn't changed

matsbla’s picture

amateescu’s picture

The patch for adding a revision parent field proposed in #2725523: Add a revision_parent field to revisionable entities is able to fix this performance problem nicely.

I created 6065 revisions for a node, and here's a comparison for latest revision queries:

HEAD:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM node_revision base_table
LEFT OUTER JOIN node_revision base_table_2 ON base_table.nid = base_table_2.nid AND base_table.vid < base_table_2.vid
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
WHERE (base_table_2.nid IS NULL) AND (node_field_data.nid = 4)
 Showing rows 0 - 0 (1 total, Query took 3.0554 seconds.)

With parent revision field:

SELECT MAX(base_table.vid) AS vid, base_table.nid AS nid
FROM node_revision base_table
LEFT OUTER JOIN node_revision base_table_2 ON (base_table_2.revision_parent__target_id = base_table.vid)
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
WHERE (base_table_2.vid IS NULL) AND (node_field_data.nid = 4)
Showing rows 0 - 0 (1 total, Query took 0.0200 seconds.)

Wim Leers’s picture

NICE! 🤘

Any reason we shouldn't postpone this issue on that one?

amateescu’s picture

If we decide that #25 is an acceptable solution for this issue, then yes, we should mark this as postponed and raise that one to critical :)

amateescu credited mheip.

amateescu’s picture

I've been working in the last couple of days to implement the approach described in #25, and I'm happy to say that it turned out rather nicely!

I've split the patch into three distinct parts so I can explain better each of them:

2950869-28-change-entity-query-to-use-expressions-instead-of-fields.txt

This is the main change that is needed in order to have the ability to end up with a query like the one in #25. The current entity query code builds up a database query using both fields (e.g. base_table.id) and expressions (mostly for aggregated queries), so the major problem with transforming the revision ID field (base_table.vid) into an expression (MAX(base_table.vid)) is that when the database layers builds the SQL query string, it outputs the fields first and then the expressions.

The problem with that behavior is that as soon as we add more columns to the SQL query (for example by adding a sort to the EQ), the generated query ends ups with something like this: SELECT base_table.nid AS nid, field_table.sorted_field AS sorted_field, MAX(base_table.vid) AS vid, which in turn means that we don't have any way of knowing the position of the vid expression in the select statement. That position is very important because we would need to pass it to fetchAllKeyed() in \Drupal\Core\Entity\Query\Sql\Query::result() in order to keep the result of an EQ as an array of entity IDs keyed by revision IDs.

Therefore, the only solution that I could come up with is to use expressions everywhere instead of fields. This also has the following advantages:

2950869-28-update-the-query-for-retrieving-latest-revisions.txt

The changes needed to update the query for retrieving the latest revision are minimal, as can be seen in this part of the patch.

2950869-28-update-workspace-queries.txt

The changes needed in the EQ override from the Workspaces module are only code simplifications, and they are also fixing a bug as a bonus: #2987956: Entity queries with sorting and ordering are not working in a non-default workspace. I'm adding credit to the folks who worked on that patch because it helped me shape the proposed solution detailed above.


Another interesting thing to point out is that I've managed (by accident really) to improve even further the performance of the new query by about 40% or so for very large data sets, by adding the GROUP BY base_table.vid, base_table.nid clause every time for latest revision queries.

For a node with 50,000 revisions, the new query without GROUP BY was executed in ~0.1185 seconds, and with GROUP BY it drops to ~0.0709 seconds. Note that the query performance numbers from #25 were for a node with "only" 6,000 revisions.

The last submitted patch, 30: 2950869-28-combined.patch, failed testing. View results

amateescu’s picture

The last submitted patch, 32: 2950869-32-combined.patch, failed testing. View results

amateescu’s picture

All the fails have been fixed in #2725523-54: Add a revision_parent field to revisionable entities, so here's a new combined patch which includes the "for review" one from #32 and the combined one from the blocker issue.

Status: Needs review » Needs work

The last submitted patch, 34: 2950869-34-combined.patch, failed testing. View results

amateescu’s picture

Rerolled the combined patch to include the latest patch from the other issue.

An Vu’s picture

Hi, @amateescu:
Could you convert the patch for Drupal version 8.4.5.
Thank you.

amateescu’s picture

@An Vu, I can't, sorry. It would be better to update your site to 8.6.x, that way the patch could be ported more easily too.

Fixed a problem with the main column of the revision parent field, which was hard-coded in the query.

The last submitted patch, 38: 2950869-38-combined.patch, failed testing. View results

The last submitted patch, 38: 2950869-38-combined.patch, failed testing. View results

amateescu’s picture

The inline block test fails have been fixed by the revert of #3033686-24: Saving Layout override will revert other field values to their values when the Layout was started. and the other can be fixed by moving the unit test methods to a kernel test.

The last submitted patch, 41: 2950869-41-combined.patch, failed testing. View results

amateescu’s picture

The blocker for #2725523: Add a revision_parent field to revisionable entities has been committed, so I queued a re-test for the combined patch from #41 and it should come back green :)

amateescu’s picture

#2725523: Add a revision_parent field to revisionable entities was improved quite a bit recently, so here's an updated combined patch.

Status: Needs review » Needs work

The last submitted patch, 44: 2950869-44-combined.patch, failed testing. View results

amateescu’s picture

Status: Needs work » Needs review
FileSize
77 KB

That patch needed a reroll, so this one needs one too :)

Status: Needs review » Needs work

The last submitted patch, 46: 2950869-46-combined.patch, failed testing. View results

amateescu’s picture

Status: Needs work » Needs review
FileSize
81.29 KB
18.95 KB
705 bytes

Most of the fails were fixed in the blocking issue, but we need to fix a test here as well.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

amateescu’s picture

Added some test coverage to ensure that latest revision queries return the proper result when we have revision branches, and it turns out that we need to perform an additional join on the merge parent column.

The blocker issue needs some updates so not including a combined patch for now.

dixon_’s picture

Status: Needs review » Needs work

I did some manual testing by first applying #2725523-152: Add a revision_parent field to revisionable entities and then the latest patch here.

But I'm getting SQL errors upon installing Workspace module after a fresh site install (I sent the dump to @amateescu in a DM).

Overall the code looks good, the resulting patch is also a nice cleanup IMO! Here are a few small things I found after an initial quick review:

+  /**
+   * Stores the sql expressions used to build the sql query.
+   *
+   * @var array
+   *   An array of expressions.
+   */
+  protected $sqlExpressions = [];
+

1. Nitpick: "sql" should be in capital letters.


-      // Since the query is against the base table, we have to take into account
-      // that the revision ID might come from the workspace_association
-      // relationship, and, as a consequence, the revision ID field is no longer
-      // a simple SQL field but an expression.
-      $this->sqlFields = [];
-      $this->sqlExpressions[$revision_field] = "COALESCE(workspace_association.target_entity_revision_id, base_table.$revision_field)";
-      $this->sqlExpressions[$id_field] = "base_table.$id_field";
+      $this->sqlExpressions[$revision_field]['expression'] = "COALESCE(workspace_association.target_entity_revision_id, base_table.$revision_field)";
+      $this->sqlExpressions[$revision_field]['sql_fields'][] = 'workspace_association.target_entity_revision_id';

2. I don't think we should hard-code the workspace_associaiton base table name here?

amateescu’s picture

Status: Needs work » Needs review
FileSize
91.65 KB
22.11 KB
621 bytes

Thanks for the review!

Re #51:

1. Oops, fixed :)
2. That table name is hardcoded in HEAD as well, so we're not changing the behavior here. We could open a followup to change it, but given that the workspace association is marked as internal we should be able to rely on a fixed table name. And we'll probably get rid of it anyway in #2983639: [PP-1] Figure out a way to not join the workspace_association table for every duplicate base table join of an entity query.

dixon_’s picture

Looks good! The installation issues I experiences was due to my own local environment, which I sorted now.

The patch seems to be working really well now! I'm gonna give it all another look and more manual testing with the Workspace module installed later today.

amateescu’s picture

Status: Needs review » Postponed

This doesn't seem to have any chance of getting into 8.7.x anymore, so let's postpone it on #2725523: Add a revision_parent field to revisionable entities.

matsbla’s picture

I'm trying to make some research on how D8 can scale better for websites with many many languages. I want to elaborate on the performance gains in #25. I'm using a test from Stress Test Add Translations that creates 1 node and then try add 300 translations to that node while measuring the performance for every new translation added. I've tested how many seconds it takes to create each translation on 8.7.0-rc1 and then I've tested it with 2950869-52-combined.patch from #52. Here you can see the performance differences:
Seconds to create each new translation of a node
To me it seems like this performance gain also will make D8 scale a lot better for websites with very many languages!

adam.weingarten’s picture

Wanted to give a bump to this issue for criticality and patching in 8.7. We are encountering significant performance issues on a pre-prod site with only 10k node revisions.

Most enterprise sites could have 100k to 1M revisions. This suggests that this could be a nasty core regression with broad applicability.

_doyle_’s picture

We are seeing this issue becoming more apparent across several sites as well running 8.7. Would definitely like to see this patch prioritized.

adam.weingarten’s picture

I was reviewing the offending query and was looking for a solution for 8.7 that was more limited in scope than the parent revision idea. How does this look? Am I oversimplifying the solution.

The original query took 28s
The revised version took 11ms

adam.weingarten’s picture

catch’s picture

Status: Postponed » Needs work
seanlo’s picture

are there final patch to this issue? I do have a big performance problem in my production site. I have almost near 400K nodes and over 1M revisions. My Drupal version is 8.7.2. When I applied the patch of comment 48, I got the error page about SQL. Anything that I miss before I apply this patch.

dksdev01’s picture

I am also having slowness issue after 8.7.x upgrade, looks like for translated pages NodeViewController.php always check access for entity.node.delete_form. That call getLatestTranslationAffectedRevision() and getLatestRevisionId(), where we have slow query. This happens on every page load and even not cached.

abhishek-anand’s picture

Fixed the hardcorded revision_id and handled expressions in a way similar to sqlFields.

abhishek-anand’s picture

abhishek-anand’s picture

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

alexpott’s picture

Status: Needs work » Needs review
FileSize
873 bytes
3.61 KB

In #65 the result of a latest revision query is [ID => REVISION_ID] where in HEAD it is [REVISION_ID => ID] so we need to maintain this.

Status: Needs review » Needs work

The last submitted patch, 67: 2950869-67.patch, failed testing. View results

alexpott’s picture

Status: Needs work » Needs review
FileSize
5.53 KB
4.11 KB
  1. +++ b/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
    @@ -129,11 +137,11 @@ protected function prepare() {
    +      $this->sqlExpressions[$revision_field] = "MAX(base_table.$revision_field)";
    

    Manually supplying our own alias scares me slightly. I think we should use $this->sqlQuery->addExpression() to add the expression and store the alias for use in sorting.

  2. +++ b/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
    @@ -264,7 +275,13 @@ protected function result() {
    +    // @todo
    +    if ($this->latestRevision && $this->entityType->getKey('revision')) {
    +      $result = array_flip($result);
    +    }
    +    return $result;
    

    So this is symptomatic of the fact we're changing the order of the ID fields as they're selected. In HEAD the revision ID is always the key. Rather than do this swapping around we can move the normal ID into an expression too.

alexpott’s picture

So the patch in #69 fails Thunder's tests. This is because of a query in \Drupal\scheduler\SchedulerManager::publish() that does

      $query = $this->entityTypeManager->getStorage('node')->getQuery()
        ->exists('publish_on')
        ->condition('publish_on', $this->time->getRequestTime(), '<=')
        ->condition('type', $scheduler_enabled_types, 'IN')
        ->latestRevision()
        ->sort('publish_on')
        ->sort('nid');

The problem is the additional sort - with the patch this results in us adding a field that is not part of a group by - and we need the field there to sort on it. The way we're using max() is completely incompatible with adding an extra sort like this.

I think if we detect an extra sort we're going to need to fallback existing behaviour. The test-only patch should pass and proves we're missing test coverage in core.

alexpott’s picture

Here's a potential solution... pretty ugly. I'm also wondering about \Drupal\workspaces\EntityQuery\Query and \Drupal\workspaces\EntityQuery\QueryTrait::prepare() not sure they are playing along that nicely with latestRevision in HEAD let alone with this patch.

amateescu’s picture

The last submitted patch, 70: 2950869-70.patch, failed testing. View results

id.aleks’s picture

Status: Needs review » Reviewed & tested by the community

The patch in comment #71 is applied cleanly and solved the issue for me.
@alexpott Thanks for your work. Maybe the patch is not ideal, but it solves the issue. That's why I'm changing the issue status to RTBC.

catch’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
@@ -129,11 +137,24 @@ protected function prepare() {
+      }, TRUE);

Needs a comment explaining why this is necessary.

@amateescu I'm still not sure what to do about that issue.

Krzysztof Domański’s picture

sjohn19’s picture

Just checking to see if there was any updates to the issue since we had upgraded to 8.7.9 ND have been seeing slow SQL queries causing 524 errors. Any patches that I can use in prod

catch’s picture

@sjohn19 if you look at the last few comments you can see that the most recent review was less than a week ago, and that there's a patch from two weeks ago in #71, patch testing is welcome.

Timmy_Cos’s picture

I've applied this patch in a development environment and ran into a few troubles :(

I have a query that looks for latest revisions subject to a field-not-null condition. With this patch the condition is applied before filtering to the MAX revision id; resulting in the query fetching some revisions that meet the condition but are not truly the latest.

I've put together a quick fix that works by doing the MAX operation within a subquery. This is working for me but I'm not too experienced here (first comment!) so I'm not sure how this goes for efficiency/compatibility/etc.

Worth noting that I'm working off of Drupal 8.7, not 8.9.

alexpott’s picture

@Timmy_Cos thanks for finding this edge case. Rather than adding a sub select - which will lose the performance gains - we need to add this to the modes where optimisation is not possible.

alexpott’s picture

Status: Needs work » Needs review
FileSize
1.61 KB
5.62 KB

Here's a test the proves what @Timmy_Cos has pointed out. The optimised path breaks expectations... so I think @amateescu is correct that the current approach is wrong.

alexpott’s picture

So we could "fix" #81 with something like:

diff --git a/core/lib/Drupal/Core/Entity/Query/Sql/Query.php b/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
index 27e65f548f..b4897e4285 100644
--- a/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
+++ b/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
@@ -140,7 +140,8 @@ protected function prepare() {
     // Use max and group by to only return the latest revision in the most
     // optimal way.
     if ($this->latestRevision && $revision_field) {
-      $can_optimise = array_reduce($this->sort, function ($carry, $sort) use ($id_field, $revision_field) {
+      $can_optimise = empty($this->condition->conditions());
+      $can_optimise = $can_optimise && array_reduce($this->sort, function ($carry, $sort) use ($id_field, $revision_field) {
         return $carry && ($sort['field'] == $id_field || $sort['field'] == $revision_field);
       }, TRUE);

But that would limit the usefulness of the optimisation to be pretty moot for most situations.

Status: Needs review » Needs work

The last submitted patch, 81: 2950869-81.patch, failed testing. View results

Timmy_Cos’s picture

@alexpott Damn, I knew a subquery wouldn't be as good as the fully optimised version but I'd hoped it would at least beat the original table join. Upon closer inspection I suppose the two are roughly equivalent; maybe I'll try some experiments with a bigger database if I get some time.

bryanmanalo’s picture

Sharing our work around because #52 patch is postponed until 9.1.x.

1. Created a new table that has only two columns, nid and vid. The intention of this table is that it will contain the latest nid vid pair and that there will only be one record for a nid to keep the table small.

Created MySQL Triggers to update the table with the latest nid-vid upon every insert/delete to node_revision. Instead of a trigger, this part can also be achieved using hook_entity_insert/delete in a custom module.

2. Patch the query to join to this new table instead of a self join to node_revision table.

ravi.shankar’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 85: drupal-core-node-revision-lenovo.patch, failed testing. View results

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

esolitos’s picture

We have also encountered this issue on a multilingual site (5 languages) which heavily uses Paragraphs with revisions enabled and workflows (with of course QuickEdit / Contextual links enabled). We were experiencing query times of up to 60 seconds with reasonably powerful database servers.

We briefly tested the patches but er encountered some issues, and the stored procedure suggested in #85 didhn't really help, probably because we had over 200 000 paragraphs.

The only solution we found was to clear as many revisions as possible, thanks to node_revision_delete we were able to tweak the cleanup and go from 200k paragraphs to almost 1/10th, this of course is a temporary solution as the more the node count grows, the more the paragraphs will increase again.

Berdir’s picture

Wondering what kind of queries you have for "latest paragraph revision"? Because I don't think the paragraphs.module does that by default, maybe in combination with quickedit that's trying to find the editable entity? But paragraphs doesn't really support quickedit anyway, so maybe we can find a way to just completely opt out from that.

merauluka’s picture

Throwing my findings into the mix. After reading this article, it seems promising to use a subquery in the WHERE statement instead of adding an entire join against a potentially quite large table. When I performed this modification to the views query I was running (running directly into SQL, not through the UI) I found the query time was substantially smaller for a larger site.

Article: https://explainextended.com/2010/05/27/left-join-is-null-vs-not-in-vs-no...

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

yakoub’s picture

for @bryanmanalo on comment #85 your patch works against table `node_revision` but the concerned method `latestRevision` is required to work with any entity and not only the Node entity .
also note that `node` table already contains the latest vid value, so your new table and procedures are redundant .

moreover drupal doesn't use sql procedures for CRUD operations,
the reason in my opinion that it doesn't allow abstraction and extensiblilty by object oriented code .

yakoub’s picture

a join is required @merauluka since the query does not execute against a single entity id which means result set may contain multiple entity record .

yakoub’s picture

this whole development branch from issue#2864995 is very wrong, the latestRevision method should never have been introduced .

first of all the joining of tables happens already here : https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/lib/Drupal/C...
and here : https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/lib/Drupal/C...

which implements functionality of "all revisions" vs "current revision" .
for example if all revisions is needed then base table is `node_revisions`, while if current revision only then base table is `node` which has column vid for the current revision .

there is no latest revision logic in core, the base table for `node` entity for example contains the current revision column vid so no need for additional sql filtering to get it.
if some contrib module wants to introduce new revision moderation logic, then it should add new columns to the base table of every entity or add a new table of its own to follow the revisions logic it wants to support, this is not core's responsibility .

drupal core has "current revision" logic, and that is it .
if someone want to add new logic for selecting revisions then core is not expected to support it .

note that test introduced by that issue uses `$entity->isDefaultRevision(FALSE)` deliberately to break the node table vid column, which is supposed to host the current (latest) vid value .
instead of breaking vid column on base table, add a new column `vid_draft` or `vid_latest` and save your logic there .

yakoub’s picture

generally the moderation of entity revisions requires a "finite state machine" table, and that is where the relevant revision id should be saved for every moderation state record .
you can not achieve revision moderation by bad performance sql joining of the revision table .

Sam152’s picture

@yakoub, I believe there is an existing issue for what you are describing.

yakoub’s picture

@Sam152 i will read that issue,
but from my current findings the "latest revision" can be found after enabling the `content_moderation` and then the table `content_moderation_state_field_data` contains column `content_entity_revision_id` which is exactly the "latest" revision described in original issue#2864995 .

if core Entity Query doesn't support querying the table of content_moderation, then it should be extended to allow so or instead use a new specialized content_moderation Query class .
but this sql "left join" + "isnull" approach is not only bad performance but also inconsistent with what "latest revision" may mean for different moderation contexts .

yakoub’s picture

either way all the group by aggregate max patches are wrong in my opinion because it might prevent further filtering on other fields and and joining referenced entities .
EDIT: actually i just tried it and it worked, so never mind .

yakoub’s picture

there is a separation between Query and QueryAggregate under namespace Drupal\Core\Entity\Query\Sql
therefore introducing aggregation to the Query class like in above max() patches seems wrong .
if you want to aggregate to reach the latest revision, then use QueryAggregate and not Query .

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Anybody’s picture

neclimdul’s picture

So i spent my morning looking at this problem and going back and forth through the solutions trying to figure out how to make then work but something just wasn't sitting right? The outer join just felt wrong so I started trying to piece together _why_ it was doing it and now I feel a bit dumb asking this 3 years later but is this logic even correct? I mean its correct for what its doing but why would you ever care if a revision is the latest? Latest doesn't actually have any meaning.

Since quickedit seems to be the key trigger of this lets use it as an example and why this logic seems broken. In the alter hooks it checks if the entities being rendered are the "latest revision" and only if it is it adds the quick edit links. #2903524: Don't add quickedit to displays where entities have a forward revision. where at least some of this logic spawns from shows this was added because quickedit itself doesn't take the revision into account and acts on the current revision so iterating with it on other revisions overwrites and destroys the current revision. Easy enough.

The problem is that the "latest" logic here is _not_ the active logic. You can recreate this pretty easily.

1) Install 9.3
2) Enable content moderation.
3) Attach the content moderation workflow to the article type.
4) Create a published article. Quickedit should work.
5) Edit and save the update as a draft. Quickedit breaks.

The new draft is the latest but it is _not_ the active. Worse, if you click the latest tab you'll actually _see_ the quickedit attributes added to the html. This only seems to not function and clobber the existing content because I guess contextual links are kinda working correctly and not showing up here? Not sure.

Anyways, yeah, the performance is terrible but it seems like we shouldn't care and move everything to use a more accurate method.

neclimdul’s picture

2hrs ago me was very naive. Those observations are correct but the implications weren't clear to me at the time. I was basically coming to the conclusion that amateescu was right all along and the assumptions around revisions are problematic and open up a slew of workflow/content moderation issues. specifically i guess the rabbit hole mentioned in #25.

So much for a fresh set of eyes seeing a way to buckled down and knock out this issue.

Alex seems to be 100% right the optimizaion falls apart and completely misses the mark on the issue's goal since the main use of the prepare logic is from ContentEntityStorageBase::getLatestRevisionId and similar methods which pass the entityid as a condition. Maybe we can special case just that one condition but the entire optimization is just getting super brittle.

Based on the struggles of those issues though I'm still wondering if maybe there's some stop gap between now and a revamped revisioning system that can optimize "get the latest revision id" apart from "apply a highly generalized filter for the latest revisions" and at least address the bulk of the critical performance issues at the root of this.

Berdir’s picture

Maybe we could have a separate table that tracks the latest revision id per entity id and language (if an entity type is translatable). Could be an optional but standard feature of the entity storage handler. Not sure if we can mix that with other conditions/entity query, but we could maybe use it at least for the common use cases of just getting the latest revision of a give entity? Updating existing entity types to add that is certainly going to be tedious, but since it's entirely optional, we could maybe do that as some sort of cron index thing combined with a flag so we know when it's ready to be used?

Not great, but might be a feasible thing to implement while I'm not sure if and when we're going to see certain major entity revision storage related improvements land.

imclean’s picture

Maybe we could have a separate table that tracks the latest revision id per entity id and language (if an entity type is translatable)

#23 links to an issue where I suggested something similar for field revisions #2960887-7: Do not create field revisions when field data hasn't changed:

Rather than assume the latest revision is the correct one perhaps an index could be kept, at the expense of added complexity. For example, storing the corresponding ER revisions in a separate table.

There's some further discussion of the idea in that issue.

Fabianx’s picture

I agree that the root cause is #2864995: Allow entity query to query the latest revision.

This method makes no sense, because the latestRevision is indeed meaningless, e.g. once you add workspaces or any kind of publishing that's not just making a former draft revision "live".

latestRevision is meaningless, because it is very much dependent on the context of things and how the revisions table is used.

Hence it is only useful for content moderation right now, which is a really limited use case -- but if only that is used then yes, the latest revision is the one that is highest -- but even then probably one query per entity, which just returns the highest vid is likely much faster instead of this JOIN as MAX() is a really fast operation on a database, but a LEFT OUTER JOIN is not as it's a cross-join on all revisions.

This behavior even shows up in an umami loadtest, which does not have many revisions.

As for the quickedit use case, unless I miss something usually I want to edit what I see -- regardless if that is in a workspace, in a draft, a published version or anywhere else.

So maybe one solution is that quickedit should not call this method in the first place -- regardless of the performance of the query.

Edit:

A tracking table is nothing else than a perfect cache. Right now one of the problems of that query in how it's used is also that quickedit calls it with just one entity id (unless I miss something) and calls it again and again, but it's not really optimized for that.

However this would be perfectly cacheable (even with Drupal's normal cache).

So there is several things:

1. Philosophically: Does it make sense to keep it or should it be content moderation specific? Do we optimize the 95% case or the 5% case? (e.g. in CPS we did the deliberate decision to do two entity loads when loading a specific revision from a workspace -> makes the 5% case slower, but has the property of leaving the 95% case untouched)

2. Quickedits usage of this method: Does it even make sense? Or can it be removed.

3. Practically: Let's cache the quickedit query using appropriate entity cache tags and maybe the problem is solved for most practical cases.

4. Practically: Optimize for the less entities loaded case and figure out which revisions to load first, then add those manually. (could even be a temp table that can be joined -- those can be much faster still than a slow join -- or a IN() condition)

4. Perfectly: Let's ensure we never need to run that query (might not even be needed once cache is warm)

Edit 2:

#2903524: Don't add quickedit to displays where entities have a forward revision. is not even to blame as the used query is really sensible and likely a much better candidate to bring back:

+  $revision_ids = $entity_type_manager
+    ->getStorage($entity->getEntityTypeId())
+    ->getQuery()
+    ->allRevisions()
+    ->condition($entity_definition->getKey('id'), $entity->id())
+    ->sort($entity_definition->getKey('revision'), 'DESC')
+    ->range(0, 1)
+    ->execute();
+  return $entity->getLoadedRevisionId() == array_keys($revision_ids)[0];

e.g. a SORT is a good idea, the range() is a good idea and the condition ensures this is really easy. As long as an appropriate index is on there (PRIMARY is enough) this is easy. Even easier is to use MAX(), but the explain shows that MySQL / MariaDB is smart enough to end up with the same explain in the end.

So the simplest to solve this is to NOT use the latestRevision() interface on the getQuery(), but a specialized case for just one entity_id.

Berdir’s picture

Unless you use workspaces, "I want to edit what I see" is just not how editing entities works at the moment. You can't edit a published revision right now if there is a draft. Content moderation and how it works might on paper just be one of many possible use cases, but it is by far the most common one, and this concept of latest revision editing is baked into core API's, not just content moderation.

And sure, what I'm proposing is a sort of cache aka normalized table, the difference to our regular cache API is that we would, after a possible initial index period, update it on write and never have to run those original complex queries in the first place. So again, yes, in theory it is a cache, but it's something quite different from how caching works in practice in Drupal for most things. There is no cache clearing, no cache misses/building up a cache, beside the updating existing sites use case.

Fabianx’s picture

Unless you use workspaces, "I want to edit what I see" is just not how editing entities works at the moment. You can't edit a published revision right now if there is a draft. Content moderation and how it works might on paper just be one of many possible use cases, but it is by far the most common one, and this concept of latest revision editing is baked into core API's, not just content moderation.

Why can't I edit the draft instead? Is that not what I want as a user with a big message that this is an existing draft and if I need to start fresh I need to do something about it?

And sure, what I'm proposing is a sort of cache aka normalized table, the difference to our regular cache API is that we would, after a possible initial index period, update it on write and never have to run those original complex queries in the first place. So again, yes, in theory it is a cache, but it's something quite different from how caching works in practice in Drupal for most things. There is no cache clearing, no cache misses/building up a cache, beside the updating existing sites use case.

Yes, I was being pragmatical, but further analysis revealed that the query of getLatestRevisionId() can be fixed quite easily.

It's a severe performance regression when latestRevision() was introduced and while that query is fixable [and horrible right now, because the isNULL means that all those columns need to exist to then be blown away -- why not use an INNER JOIN so that those rows never appear in the first place?], it's much less of a concern if the only user is:

getLatestRevisionId(), which can just use the original specialized query to solve the critical issue. (see my edit)

Fabianx’s picture

To optimize the original query the simplest is to transform it to:

SELECT base_table.nid, MAX(base_table.vid) FROM node_revision base_table GROUP BY nid;

which can use an index:

+------+-------------+------------+-------+---------------+-----------+---------+------+------+-------------+
| id   | select_type | table      | type  | possible_keys | key       | key_len | ref  | rows | Extra       |
+------+-------------+------------+-------+---------------+-----------+---------+------+------+-------------+
|    1 | SIMPLE      | base_table | index | NULL          | node__nid | 4       | NULL |   40 | Using index |
+------+-------------+------------+-------+---------------+-----------+---------+------+------+-------------+

It would just mean that using another "group by" with latestRevision() could be incompatible [though any sorting adds the entity_id anyway as one group], so I think that should be fine.

The nice thing is that once this is filtered by rows (to just one entity_id), it only looks at "number of revisions" rows in the index, which means that:

SELECT base_table.nid, MAX(base_table.vid) FROM node_revision base_table WHERE entity_id = 42 GROUP BY nid;

is the same as:

SELECT base_table.nid, MAX(base_table.vid) FROM node_revision base_table WHERE entity_id = 42;

is the same as:

SELECT base_table.nid, base_table.vid FROM node_revision base_table WHERE entity_id = 42 ORDER BY vid DESC LIMIT 0,1;

which is the original query that worked well. So if that query works, then we don't even need to specialize the query for getLatestRevisionId() anymore. Given how often it's called it's probably _still_ a good idea to add a cache - because ideally during a typical Drupal read request we only want to hit redis / memcache.

While a tracking table is also still a good idea, fixing the query for now is likely the better approach given this is critical.

Berdir’s picture

You still need to include the language aspect there, as each language can have a different draft and revision id as its latest version.

Fabianx’s picture

Here is a patch, which does the trick.

diff --git a/core/lib/Drupal/Core/Entity/Query/Sql/Query.php b/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
index de25f69358..f9431f0a76 100644
--- a/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
+++ b/core/lib/Drupal/Core/Entity/Query/Sql/Query.php
@@ -132,8 +132,15 @@ protected function prepare() {
     // Add a self-join to the base revision table if we're querying only the
     // latest revisions.
     if ($this->latestRevision && $revision_field) {
-      $this->sqlQuery->leftJoin($base_table, 'base_table_2', "[base_table].[$id_field] = [base_table_2].[$id_field] AND [base_table].[$revision_field] < [base_table_2].[$revision_field]");
-      $this->sqlQuery->isNull("base_table_2.$id_field");
+      // @todo Make it possible to put an expression before a field in core.
+      unset($this->sqlFields["base_table.$revision_field"]);
+      unset($this->sqlFields["base_table.$id_field"]);
+
+      $this->sqlQuery->addExpression("MAX(\"base_table\".\"$revision_field\")", $revision_field);
+      $this->sqlQuery->addExpression("\"base_table\".\"$id_field\"", $id_field);
+
+      $group_by = "base_table.$id_field";
+      $this->sqlGroupBy[$group_by] = $group_by;
     }
 
     if ($this->accessCheck) {

As it depends on fetchAllKeyed() there is two ways to get this to work, neither are really pretty:

- Ensure that in core expressions can come before normal addField() fields
- Use expressions for all fields instead if latestRevision is asked for [like I do here, as sqlFields seems to be internal this might be okay to just do in "finish()" method?

- Or ensure that result() is not using fetchAllKeyed() if latestRevision() is used and find `vid_max` or similar manually.

Overall it feels that likely doing this transformation later is better, but I stop here as I feel there is enough for the Community to work on to get this properly fixed.

---

With 5000 revisions this is a 100% performance improvement (from 73 seconds down to ~ 100 ms).

So well worth it.

Fabianx’s picture

Status: Needs work » Active

#111: getLatestTranslationAffectedRevisionId() is not affected by the slowness as it uses the old sort method. So ironically it might be a bug in quickedit of using the wrong function as it directly calls getLatestRevisionId(), but does not take care of the language ...

Overall the MAX should work well also for additional conditions however.

amateescu’s picture

@Fabianx,

It would just mean that using another "group by" with latestRevision() could be incompatible [though any sorting adds the entity_id anyway as one group], so I think that should be fine.

#70 provides some extra test coverage which shows that this approach is not working for all cases.

Also, for the @todo in your MR, see 2950869-28-change-entity-query-to-use-expressions-instead-of-fields.txt from #30 :)

Fabianx’s picture

#79: That's incorrect, your subquery is much better already, lets look:

Let r be the number of revisions and n the number of nodes.

The current solution has always O(r^2) and O(n^2) complexity, which means if you have 5000 revisions, you need to create a table with 25000 rows, then filter it. It also cannot be optimized further.

Your solution has O(n) complexity but is not affected by the number of revisions, because while the subquery needs to take the MAX() of all vids and group it by nid, we only need to filter the vid table once and MAX() is a fast operation.

And often this can be materialized by the database.

With millions of nodes this might get slower, but still has linear vs quadratic complexity (so still a n-times improvement).

It is however the worst case for when the entity_id is specified, though a simple hack could be to include any simple id_field conditions to optimize for that special case in the subquery (if the entity ids are passed as IDs e.g. as in the quickedit case).

This also is a generic replacement for the original code and not only for simple queries as it filter the node_revision table to just the latest revisions.

So it's already much much much better than what exists currently.

---

Another way is to use:

SELECT base_table.vid AS vid, base_table.nid AS nid FROM node_revision base_table INNER JOIN (SELECT nid, MAX(vid) AS max_vid FROM node_revision GROUP BY nid) subquery ON base_table.nid = subquery.nid AND base_table.vid = subquery.max_vid;

which only profits from the case where the base_table.nid is specified and else also needs a filesort, but the subquery with the hack to include the entity_id is probably more profitable anyway.

Given how hard this is to fix, I suggest:

1. Fix getLatestRevisionId() to use a dedicated query again using sort(DESC) and limit -> fixes quickedit
2. Get the subquery approach from #79 into core -> O(n) is bad, but O(n^2)+O(r^2) is way way way worse even with as little as 5000 revisions.
3. Work on dedicated mapping table entity_latest_revision and track any non-default entity write there, join it in once it exists.

Unless 3 happens fast, 1 and 2 are still good in-between solutions.

Fabianx’s picture

Status: Active » Needs review
Issue tags: +Needs tests

Someone should add the tests from alexpott, but besides that, this MR is ready for review.

Fabianx’s picture

Issue tags: -Needs tests

Thx ravi!

This should be ready now!

catch’s picture

Changes here look very straightforward but do we have a follow-up to look at the quickedit usage?

neclimdul’s picture

Just a note, there is actually another bit core that triggers this performance problem. Content moderation hacks routes in its route subscriber to force moderated entity type routes to load the latest revision. This means that local task building for nodes ends up triggering this behavior as well. Pretty sure it uses getLatestRevisionId by way of the translation method so it would get the optimized improvement here.

I think this is generally hidden by caching and I didn't bring it up because this was already critical and active so didn't add anything but if we're looking for things to test the impact that's one of them.

neclimdul’s picture

      $this->sqlQuery->condition("base_table.$revision_field", $revision_subquery, " IN ");

" IN "should use single quotes

Fabianx’s picture

daffie’s picture

Status: Needs review » Needs work
saranya ashokkumar’s picture

daffie’s picture

@saranya This patch uses a merge request instead of a patch. For more info about merge request see https://www.drupal.org/docs/develop/git/using-git-to-contribute-to-drupa...

dksdev01’s picture

Any update on this issue? thanks

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

stephencamilo’s picture

Status: Needs work » Closed (won't fix)
dpi’s picture

James Marks’s picture

The patch in #124 solved a site timeout problem caused by hung queries on a site with 600+ nodes but more than 1,000,000 revisions. It would be good to get this merge request resolved

ravi.shankar’s picture

Added reroll of patch #124 on Drupal 9.4.x.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

pooja saraah’s picture

Fixed Failed commands on #131
Attached patch against drupal 9.5.x

Rassoni’s picture

Status: Needs work » Needs review
FileSize
13.3 KB
8.8 KB

Fixed Failed commands on #133

Status: Needs review » Needs work

The last submitted patch, 134: 2950869-134.patch, failed testing. View results

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

heddn’s picture

Issue tags: +Needs reroll

This needs to be rebased on 10.1.

_pratik_’s picture

Assigned: Unassigned » _pratik_
_pratik_’s picture

Assigned: _pratik_ » Unassigned
Status: Needs work » Needs review
FileSize
13.16 KB

Reroll for 10.1.x
Thanks

Anybody’s picture

@_pratik_: Patch fails to apply. Also it would make a lot more sense to update the MR and fix the review comments instead of creating further patches. Thanks!

neclimdul’s picture

Patch conflicts because there's a bunch of unrelated changes to EntityQueryTest. Merge request seems fine and doesn't need a reroll. Also, I think the merge request is actually the newer code so should we even be re-rolling it? Fabian branched his development in to the merge request but for some reason the patch keeps getting rerolled and I'm not sure why.

jonathan1055’s picture

The patch gets re-rolled probably partly because the patch files are displayed at the top of the issue, ahead of the MR. Also not every developer has the knowledge for how to pull a MR branch, update, and push back. However, we must encourage everyone to use the Merge Request functionality, as it is so much easier to keep the changes up-to-date. I've hidden the files, and added a note in the issue summary.

smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +Needs Review Queue Initiative

This issue is being reviewed by the kind folks in Slack, #needs-review-queue-initiative. We are working to keep the size of Needs Review queue [2700+ issues] to around 400 (1 month or less), following Review a patch or merge request as a guide.

@jonathan1055 is correct. If an issue is being worked on in the MR it just adds noise to continue with patches. MRs are also easier to review.

At this time we will need a MR for D10 though addressing the open threads in the current. Recommend starting a new one vs updating the existing.

Also starting in #134 the patch size almost tripled.

lucassc made their first commit to this issue’s fork.

lucassc’s picture

Status: Needs work » Needs review

Hi! I opened a MR for 10.1.x, which was not available first from the "Create new branch" link at the top of the issue. So I followed this doc to add 10.1.x as the basis for my issue branch 2950869-optimizing-revision-queries.

Not sure if this is the correct way, I'm novice.

Please, review.

lucassc’s picture

We had comments in the old MR that still need to be discussed.

core/lib/Drupal/Core/Entity/Query/Sql/Query.php:

// Add a self-join to the base revision table if we're querying only the
// latest revisions.

@daffie said "This comment needs an update as it no longer reflects what we are doing."

core/tests/Drupal/KernelTests/Core/Entity/EntityQueryTest.php:

$results = $this->queryResults = $this->storage
  ->getQuery()
  ->latestRevision()
  ->notExists("$figures.color")
  ->accessCheck(TRUE)
  ->execute();
$expected = [16 => '4', 8 => '8', 20 => '12'];
$this->assertSame($expected, $results);

// Update an entity.
$entity = EntityTestMulRev::load(4);
$entity->setNewRevision();
$entity->$figures->color = 'red';
$entity->save();

$results = $this->queryResults = $this->storage
  ->getQuery()
  ->latestRevision()
  ->notExists("$figures.color")
  ->accessCheck(TRUE)
  ->execute();
$expected = [8 => '8', 20 => '12'];
$this->assertSame($expected, $results);

@daffie said: "We need to do a change that keeps it in the result and therefore we see the changed revision id. The current change removes it from the result."

@neclimdul replied "I'm not sure what this is asking. What result would we be able to see the revision ID in?"

catch’s picture

Priority: Critical » Major

Now that quickedit is no longer in core, this is less likely to cause timeouts during normal site operation - so moving back down to major, but of course we should still fix it.

smustgrave’s picture

Status: Needs review » Needs work

Removing reroll as @lucassc you seemed to get it correct.

But there does appear to be a test failure.

magtak’s picture

I work for Acquia Support and I came across this issue while debugging a use case where all the revisionable entities (Site Studio layout_canvas entities) were set to be sequentially re-saved (Site Studio rebuild).

Looking at the query,

...LEFT OUTER JOIN "cohesion_layout_revision" "base_table_2" ON "base_table"."id" = "base_table_2"."id" AND "base_table"."revision" < "base_table_2"."revision"...

The LEFT OUTER JOIN and the '<' conditional operator will match rows proportionate to the number of revisions an ID has, this will create new rows in an exponential way.

For the case I was looking at in particular, a 10h+ operation that resulted in 1000+ second SQL queries at times which locked the revisions table and thus blocked any content editing, finished in about 35 minutes with the patch applied with barely any queries taking over 1 second. The items generating those 1000+ second queries were processed instantly.

I checked our fleet's logs and there are multiple sites impacted by this as I can see thousands of slow ( > 1s) queries generated by the code above.

Patch #133 worked for the client I was working on given their Drupal version (9.5.x).

While we will be advising clients to use patches from this thread if they encounter this issue, this fix should be high priority and should be backported to all affected Drupal branches (8, 9, 10).

Dries’s picture

Great to see we are getting closer to a solution.

In addition to improving the queries, it might not be a bad idea to show a warning on /admin/reports/status when there are entities with more than $number revisions. At a minimum, we should inform a site administrator about this.

As a next step, I would not be opposed to a new feature in core to prune revision after $number days/weeks/months/years/never. We should tackle that in a separate issue though.

Anybody’s picture

Hey @Dries, cool to have your feedback here! :)
+1 on that (while the limit should be configurable in config, at least in settings.php)

I guess that should be combined with the work from: #2925532: Provide bulk action to delete all old revisions of an entity
#53 there please.

So if you create or find a task for that revision prune in core, please link both. It would really make a lot of sense to at least have a full-featured Entity revisioning API in core. So contrib can implement more specific concepts, using a consistent API for all entity types. Like bulk deletion manually (checkboxes in revisions tab and as views bulk operation), by age, by limit, etc. if that's too specific for core.

Thanks :)

heddn’s picture

As FYI, https://www.drupal.org/project/node_revision_delete has this feature in contrib. The hardest part of removing revisions is when the site has multiple languages installed. The way revisions are stored w/ multiple languages is complicated to say the least. You can't just remove the last 5 revisions of a node. Because you'll loose a revision that is published as an alternative language.

joel_osc’s picture

Also as an FYI we are experiencing performance issues due to revisions and wanted to use the well-written node_revision_delete module, but found that not only was multilingual problematic, things got worse with content moderation and trying to ensure that you don't prune forward revisions. Also, we had a requirement to preserve all historic 'published' revisions in all languages. We have contributed this module: https://www.drupal.org/project/cm_revision_delete to help meet these requirements. Cheers!

magtak’s picture

I second @heddn 's concern about multilingual sites.
While working on this bug I used both

- node_revision_delete and
- entity_reference_revisions (*)

... to prune both node revisions ( (*) but also "orphaned entities" that I believe do *not* get deleted when the node revision does) and it did remove revisions in a way that wouldn't make sense for the governance of multilingual sites.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

rp7’s picture

FWIW, I work on a high traffic Drupal installation & the DB CPU usage skyrocketed to 100% after applying the patch from https://git.drupalcode.org/project/drupal/-/merge_requests/715. The DB (and the site) went down eventually.

This was (a part of) the query causing the high CPU usage (for some reason the AWS RDS UI won't show me the whole query):

SELECT "base_table"."vid" AS "vid", "base_table"."nid" AS "nid"
FROM
"node_revision" "base_table"
INNER JOIN "node" "node" ON "node"."nid" = "base_table"."nid"
LEFT JOIN "content_moderation_state_field_data" "cmsfd" ON cmsfd.content_entity_type_id = 'node' AND cmsfd.content_entity_id = base_table.nid AND cmsfd.content_entity_revision_id = base_table.vid
WHERE ("base_table"."vid" IN (SELECT MAX(base_table.vid) AS "expression"
FROM
"node_revision" "base_table"
GROUP BY base_table.nid)) AND ("node"

The project has about ~500K node revisions. It's multilingual & uses content moderation.

We do have a hook_query_entity_reference_alter() implementation doing the following though:

    $entity_type_id = $query->getMetaData('entity_type');
    $entity_type = $this->entityTypeManager->getDefinition($entity_type_id);
    if (!$this->moderationInformation->isModeratedEntityType($entity_type)) {
      return;
    }

    // Alter the query in order to limit the returned results to entities of
    // which their current revision is either 'published' or 'draft'.
    $query->addJoin('LEFT', 'content_moderation_state_field_data', 'cmsfd', "cmsfd.content_entity_type_id = :entity_type_id AND cmsfd.content_entity_id = base_table.{$entity_type->getKey('id')} AND cmsfd.content_entity_revision_id = base_table.{$entity_type->getKey('revision')}", [
      ':entity_type_id' => $entity_type_id,
    ]);
    $moderation_state_condition = $query->orConditionGroup();
    $moderation_state_condition->condition('cmsfd.moderation_state', [
      'published',
      'draft',
    ], 'IN');
    // Add a NULL condition on the moderation state so we don't exclude entities
    // of bundles that are not moderated.
    $moderation_state_condition->isNull('cmsfd.moderation_state');
    $query->condition($moderation_state_condition);

So probably there's an incompatibility that I haven't figured out yet. Just wanted to post this to inform people (which are doing custom stuff like above) to be cautious.