Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
When a reference to a composite entity was removed in an earlier revision, it is not removed from the database on host delete.
Proposed resolution
Remove composite entities from the database on host delete, if the entities are not referenced anywhere else.
Remaining tasks
- Write a patch
- Review
- Commit
User interface changes
None.
API changes
None.
Data model changes
Composite entities are removed from the database on host delete, if the entities are not referenced anywhere else.
Comment | File | Size | Author |
---|---|---|---|
#40 | 2953650-40-38-interdiff.txt | 18.82 KB | mbovan |
#40 | 2953650-40.patch | 21.89 KB | mbovan |
| |||
#38 | interdiff-2953650-38-36.txt | 1.28 KB | mbovan |
#38 | 2953650-38.patch | 20.3 KB | mbovan |
#38 | 2953650-38-3016388-60-COMBINED.patch | 55.99 KB | mbovan |
|
Comments
Comment #2
idebr CreditAttribution: idebr at ezCompany commentedAttached test-only patch shows the current behavior.
Comment #4
BerdirYes, that's a known issue, but please provide a separate scenario for that in the test. There are several issues in paragraphs related to this.
The scenario should also test doing that with revisions and without, because the entity can only be deleted if all revisions that used it have been deleted.
Comment #5
BerdirSee also #2771531: [META] Remove obsolete composite entities from existing storage (garbage collection)
Comment #6
idebr CreditAttribution: idebr at ezCompany commentedOn host entity delete, the EntityReferenceRevisionsItem is no longer called for empty fields, so I implemented a hook_entity_delete.
#4 Added testCompositeDeleteAfterRemovingReference and testCompositeDeleteAfterRemovingReferenceWithRevisions to EntityReferenceRevisionsCompositeTest
I did not include an interdiff, since it is essentially a new patch.
Comment #9
idebr CreditAttribution: idebr at ezCompany commentedIt doesnt seem unreasonable to enable the Field module for the failing test.
Comment #11
mtodor CreditAttribution: mtodor at Thunder commentedNice work! The database looks way cleaner with this patch when paragraphed articles are deleted. ;)
I would just add one or two nitpicks.
It would be nicer to pull out
\Drupal::entityTypeManager()
outside foreach loop in separate variable.I don't know how the module works in details. So, I'm just concluding this from code => here we are checking for existence of
entity_revision_parent_type_field
andentity_revision_parent_id_field
, but not forentity_revision_parent_field_name_field
. Why not?And also, maybe there is no need for creating new variables.
In general, it looks good.
Comment #12
seanBRerolled patch #9
Comment #13
BerdirAs we actually have an entity here, the usual pattern is to just do something like this:
foreach ($entity->getFieldDefinitions() as $field_name => $field_definition) {
I think the performance impact of that is minimal, it's a loop over an array of objects and a single method call on them.
if ($field_definition->getType() != 'entity_reference_revisions') {
continue;
}
Comment #14
BerdirPlus, it would also work in case entity types have base fields, like the paragraphs_library_item entity.
Comment #15
johnchqueAddressing suggestions of comments above. :)
Comment #16
johnchqueAdding test only patch again. :)
Comment #18
BerdirSimilar to the manual cleanup issue, weird things could happen if an entity changes its parent.
Imagine a scenario where an entity switches the parent but then that new entity is deleted.
So you have P(arent)1.1 referencing C(omposite)1.1 (id 1, revision 1).
Now, for example using Paragraphs drag & drop, that composite entity is moved to a new parent P2, which is stored as a new revision, so now we have this:
P1.1 => C1.1
P2.1 => C1.2
Then P2 is removing the reference again in a ne revision, resulting in P2.2 not referencing it.
Now we delete P2, which triggers this hook which would find C1 as a referenced composite entity, but P1.1 still uses the old revision, which means we are not allowed to delete it.
Another example could be an incorrectly cloned entity where two entities point to the same composite.
So we would, again, need to check this revision-by-revison, but that could take a very long time. Or at least look for any other entity referencing it, also kinda slow.
One option would be to move the actual deleting into a background queue process.
Lets start by adding these cases as additional test coverage.
this change is possibly not needed anymore now as we no longer call field.module functions.
Comment #19
johnchqueAddressing changes from previous comments. :)
We will definitely need to create a revision checking for avoiding data lose.
Comment #21
johnchqueWorking on this. Will have a patch tomorrow. :)
Comment #22
johnchqueFirst attempt with a cron approach.
Comment #24
johnchqueForgot the interdiff.
Comment #25
johnchqueOK, updating tests, updating logic for adding the composite entities. :) One scenario failing since we are not yet using allRevisions in the query when adding to the queue. :) Will fix tomorrow.
Comment #27
Berdirthis is a plugin, so we can use createInstance() to inject this properly.
this query now also exists twice, so we could refactor it into a dedicated method as well, can also wait until the other issue is committed if that's easier.
this is to run cron through a http request, so we don't need that here.
Another case that we need is the reverse of this. Same until the end, but then delete the first node *first* and then run cron.
The entity should still exist, *but* the first revision should get deleted as that one is no longer in-use.
loading it again doesn't seem necesary, you still have $node from before.
what happens if we do this *without* first deleting the reference to it?
Does it get deleted then?
That would be a bug and would indicate that we want to adjust the existing delete logic to also use the new queue.
And if it doesn't get deleted at all, then we should also adjust it and add it to the queue there.
Comment #28
johnchqueSo addressing changes of previous comments.
About 6, yes, it gets deleted, so switching now to be added to the queue.
Also, fixed the first test method by running cron.
Found a problem with the tests, when creating a composite referenced by Parent1, then remove the reference in a new revision of Parent1, then create Parent2 with the revision. Both revisions of the composite get their parent id updated. Not sure how that even happens.
Comment #30
miro_dietikerThat would mean that we can not trust the parent id and all Paragraphs that have been moved across hierarchy have stale inconsistent data. This would confirm also why i experienced issues with reverting, but never figured out how to fully reproduce.
However, a follow-up could then also be to recalculate the parent field data in revisions. Otherwise we need to implement more delete-skip conditions to avoid stale references don't lead to data loss...
Comment #31
johnchqueWe just found out that the fields for the composite test entity are not revisionable. Will work on that too.
Comment #32
johnchqueAnd yes, we need to recalculate parent fields, however we would need to check all parent entities with ERR fields because if there was a change of parent, the parent_type field updated all revisions so we might not know who and what type the previous parent was.
Comment #33
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedConnecting with the related Paragraphs issues.
Comment #34
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedRerolled #28, fixed code style issues, fixed dependency inejction issues, cleanup.
This depends on #3016388: Manual cleanup process for obsolete composite entities so I am connecting it as a related issue.
Comment #36
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedFixed the queue ID.
Comment #38
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedThis patch should fix the test fails.
Comment #39
BerdirShould we maybe also improve the check here to support modules extending from us?
If so, I'm again worried about performance at this point, as we'd have to actually check every field. Maybe it's actually better to check the class and that it is an instanceof ERRItem?
strange class name/id, I guess copied from aggregator module. Maybe entity_reference_revisions_orphan_purger as ID and just OrphanPurgerWorker as class name?
this comment is probably copied, we don't have a batch here. Also would be good to reference the core issue.
We're loading all revisions of a single paragraph here, so wondering if in some extreme cases with hundreds of revisions, this could be an issue, but this is cron/queue, so it should be fine if it takes a few seconds. We've seen in the batch profiling that even with blackfire on, we can process ~300 paragraph revisions in about a second.
this part is almost identical to the code in \Drupal\entity_reference_revisions\EntityReferenceRevisionsOrphanPurger::deleteOrphansBatchOperation(), apart from the batch progress tracking there. Maybe we can put it in a public method on the service and return if we deleted and if just revision or the whole entity?
We seem to have several very similar/identical method descriptions, maybe we can improve that a bit?
Comment #40
mbovan CreditAttribution: mbovan at MD Systems GmbH commentedThanks for the review!
#39:
Comment #41
BerdirLooks good, committed.