Problem/Motivation
This is a follow-up for: #2753971: ContentEntityStorageBase::deleteRevision() function does not remove node_field_revision entries. Some sites could have revision data in the {node_field_revision}
table that was not deleted because of #2753971.
In other words: revisions that have been deleted, have actually been deleted from {node_revision}
, but not from {node_field_revision}
. (Or, generally speaking: {ENTITY_TYPE_ID_revision}
and {ENTITY_TYPE_ID_field_revision}
.)
#2753971 was fixed in Drupal 8.3.2 (see https://www.drupal.org/project/drupal/releases/8.3.2), and so any site that had lots of revisions deleted before it updated to 8.3.2, will have lots of stale entries in {ENTITY_TYPE_ID_field_revision
, which this upgrade path will clean up.
Proposed resolution
- Write an upgrade path that delete revisions from
{node_field_revision}
that don't actually match with any of the revisions in{node_revision}
. - Make this upgrade path use the Batch API to avoid possible performance issues on sites with heavy use of revisions (and hence potentially many stale records in
{node_field_revision}
that need to be deleted).
Remaining tasks
Write the patchWrite tests
User interface changes
None
API changes
None
Data model changes
None
Comment | File | Size | Author |
---|---|---|---|
#69 | 2869568-69.patch | 8.93 KB | quietone |
#69 | interdiff-68-69.txt | 3.38 KB | quietone |
#68 | 2869568-68.patch | 7.93 KB | quietone |
#68 | interdiff-51-68.txt | 5.74 KB | quietone |
#51 | interdiff-2869568-49-51.txt | 3.1 KB | dagmar |
Comments
Comment #2
catchThanks. Bumping this to major and bug report since it's at least possible that entity queries with allRevisions() will return invalid results.
Comment #3
Eric_A CreditAttribution: Eric_A commentedComment #4
cilefen CreditAttribution: cilefen commentedComment #5
BerdirHere's a first patch that does this.
Tested this locally with a node with some deleted revisions that also had paragraphs, so two entity types with revisions. I also set the batch size to 1 to test that this works.
Downside is that could be fairly slow on a site with a lot of nodes, I'll test it soon on a big site.
Comment #7
Berdirwrong core version.
This is a bit interesting, actually, wondering how we're going to add this to 8.3 and 8.4, due to the different update function number, so most/all sites would end up running it twice? We could add it as the same 83xx update in 8.4 as well maybe?
Comment #9
BerdirComment #10
BerdirAnd here's a basic test.
Comment #11
BerdirThis should fix that test fail.
Comment #12
BerdirDid test the update function an site with 600k nodes. Worked fine but only found a handful of rows to delete, clearly deleting revisions is not done often on that site.
The query is pretty slow, took about 2.4s, so if you have a look of deleted revisions on a site, this might take a considerable amount of time. Without the order, it was about 0.9s, with just the sort on nid, it was about 1.1s. Which is actually enough for what's needed here (optimize the deletions. It might also be possible to write a delete query which uses a set of OR conditions but not sure if that's worth the complexity.
Comment #16
dawehnerUsing a number to reference to an entity type is a bit weird, to say the least. Maybe renaming 'current' to something more descriptive could help.
Comment #17
BerdirThis is exactly the same logic as in system_update_8400(). Agreed that it's a bit weird, but it's a list of entity type ids with keys and this is the key. We could do something like current_key. But I think it's a pretty nice way of looping through that list I think.
Comment #18
dagmarSome small comments in addition to to the two messages from the syntax checker:
var_dump here
Two extra lines here
$database
is defined before, we could use that right?This can fit into two lines.
Comment #19
BerdirThanks for the reviews. Cleaned that up and added some more comments.
Comment #21
BerdirTestbot, I'm so very sorry.
Comment #22
dagmarIs this correct? you are using $sandbox['current'] inside the
if()
but $sandbox['current_key'] in the rest of the code.Also, would be nice to trigger the tests on Sqlite and PostgreSQL just to be sure this work fine in all the backends.
Comment #23
Berdir#NotMyDay ;)
It works for most tests as they only have one matching entity type, so we're finished after one round anyway. But yeah, obviously it is going to cause another endless loop on at least some tests.
Comment #25
dagmarThanks Berdir. This is almost RBTC for me, but I have some extra comments that in my opinion can improve the code quality.
I can't find where this is used. I think we can delete it.
Would be nice to have at least one more revision "not the default" and ensure is not deleted.
Sorry but this seems a bit cryptic to me. Specially the return with all those conditions. Any chance you can make it more clear?
Comment #26
Berdir1. Yeah, all those properties and the trait was copied, not needed indeed.
2. 9 is actually not the default revision, ther's also 10. I've added asserts for 10 now as well, which also has translations.
3. Not exactly sure what the to change, this is basically exactly the same pattern as in the update function above, I added some more comments for now.
Comment #28
BerdirThat fail was unrelated.
Comment #29
dagmarThanks Berdir. Some final comments.
Extra debug here.
This can fit into:
I think this can be simplified to
Comment #30
BerdirThis is getting embarassing, thanks for all the reviews.
Fixed 1 and 2, I'm not sure that your suggestion for 3 is simpler, I kind of like the additional structure of my approach and it is the same as the one from the update function above.
I also talked to @alexpott and he said we only need to add this to 8.4.x and not backport to 8.3.x unless we have reports about those inconsistencies breaking sites.
Comment #31
dagmarI tested this from creating a node in 8.3.x, deleting some revisions, upgrading to 8.4.x with the patch and running the updates. Is working as advertised.
Comment #32
tstoecklerOnly one of these is actually being asserted on.
Not sure this is correct due to the fallback we do in SqlContentEntityStorage. I would think that entity types that do not specify a revision data table but are revisionable and translatable still suffer from this.
entiy -> entity
Also, shouldn't this use $settings['entity_batch_updates'] instead of hardcoding 100?
Why do we need the ID? Shouldn't revision IDs be unique regardless of IDs?
Comment #33
BerdirWell, as I said in #30, embarassing ;)
1. The first line was a copy & paste error, removed.
2. I was wondering about the falback, yes. I rewrote it a bit looking for revisionable and translatable but also making sure that those tables really exist in case your site is in a weird state.
3. .. entity is the hardest word...
4. Somehow that fact doesn't want to stay in my brain. Of course and that also means we can considerably simplify the update function.
Especially given 4, I'm not sure we should respect entity_update_batch_size. We are technically not doing entity updates but just raw delete queries and now only a single delete query for those 100 records. I'm even wondering if we want to increase that amount further now to 200 or so. I suppose we could use the setting with a different default but new sites have it by default in settings.php. Thoughts?
Now I'm bold and I'll upload the patch without re-running the tests locally, lets see how that goes :)
Comment #35
BerdirI knew that was a mistake.
Comment #36
tstoecklerNice!!!
Yeah you're right regarding entity_update_batch_size. Not sure if we should increase it to 200, but as you already point out above, while a lot of sites may have thousands or millions of revisions, I don't think many sites will have thousands or millions of deleted revisions. So I don't think it's a big deal either way.
Thanks!!
Comment #38
BerdirRandom fail?
Comment #39
Wim Leers:O \o/ Thanks!
Comment #41
catchI don't think this needs a CR, the table size is going to be significantly larger than the data tables after the update which is the main thing.
I thought about test coverage with a non-node entity type, but I think it's reasonable to test just with node here.
Committed 9a16601 and pushed to 8.4.x. Thanks!
Comment #43
xjmThis is failing on postgres (or it did at least once), so I've reverted it:
https://www.drupal.org/pift-ci-job/693265
Also, I had to go back and find and check on https://www.drupal.org/pift-ci-job/675973 as well. It's not the same but it might have been and would have meant a different way of approaching the fail. :) (@Berdir, when there are suspected random fails, please link the failing result and check #2829040: [meta] Known intermittent, random, and environment-specific test failures.)
In general, we really should test things that touch storage tables against all databases before commit. I'll queue a postgres test.
Thanks!
Comment #44
xjmAlso, should this issue be critical?
Comment #46
vijaycs85Had to reroll the patch again 8.5.x and it is
greenfails onPHP 7 & PostgreSQL 9.1
Comment #47
dagmarIt seems we have to be explicit about the order of the records. Lets try with this.
Comment #49
dagmarNow that everything fails consistently, this patch should fix the issue.
Comment #50
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedWe could put a
@see system_update_8404()
in this method's doc block.The first check here for
getKey()
seem unnecessary becauseisRevisionable()
does the same check internally.Nit: over 80 chars.
Another nit: It looks like "to" can be moved above without exceeding the 80 chars limit.
I think we should mention the actual name of the update function here (system_update_8404()) instead of a d.o link.
Comment #51
dagmarThanks @amateescu.
Comment #52
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedPerfect! :)
Comment #53
catchSince it's a batched update, do we want to add 101 test rows to the data for the update test to double check the batching works?
Comment #56
vickey CreditAttribution: vickey commentedRemoved irrelevant comments.
Comment #57
Berdir> Please do the needful.
From https://en.wiktionary.org/wiki/needful:
Commonly found in phrases such as "please do the needful", which occur commonly in Indian English but are held as archaic in other dialects. Global interactions between English speakers have to some extent led to these phrases being seen as stereotypical of Indian English and parodied by speakers of other dialects.
I'd suggest to avoid that sentence, it sounds very demanding and "not nice" to me.
I see how it would be a problem for you, but this is a patch that was not committed, so not sure why you even noticed this, there isn't really a reason to apply this to a site?
Comment #58
vickey CreditAttribution: vickey commentedKindly excuse me. I have removed my comments, will post it in relevant issue.
Comment #68
quietone CreditAttribution: quietone at PreviousNext commentedJust updating the patch.
Comment #69
quietone CreditAttribution: quietone at PreviousNext commentedAdded another 100 rows to the test data (#53) and more assertions on the data.
Comment #70
smustgrave CreditAttribution: smustgrave at Mobomo commentedApplied patch #69
Ran updb and that ran without issue.
Tests all green and was previously RTBC #52 so restoring that.
Comment #71
quietone CreditAttribution: quietone at PreviousNext commentedComment #73
catchChanges in #69 look good, committed/pushed to 11.x, thanks!
Comment #74
znerol CreditAttribution: znerol commentedSmall follow-up. The test added here doesn't run #3394186: SqlContentEntityStorageRevisionDataCleanupTest does not run.