Problem/Motivation

When a reference to a composite entity was removed in an earlier revision, it is not removed from the database on host delete.

Proposed resolution

Remove composite entities from the database on host delete, if the entities are not referenced anywhere else.

Remaining tasks

  1. Write a patch
  2. Review
  3. Commit

User interface changes

None.

API changes

None.

Data model changes

Composite entities are removed from the database on host delete, if the entities are not referenced anywhere else.

CommentFileSizeAuthor
#40 2953650-40-38-interdiff.txt18.82 KBmbovan
#40 2953650-40.patch21.89 KBmbovan
#38 interdiff-2953650-38-36.txt1.28 KBmbovan
#38 2953650-38.patch20.3 KBmbovan
#38 2953650-38-3016388-60-COMBINED.patch55.99 KBmbovan
#36 interdiff-2953650-36-34.txt1.96 KBmbovan
#36 2953650-36.patch19.44 KBmbovan
#36 2953650-34-3016388-58-COMBINED.patch54.65 KBmbovan
#34 interdiff-2953650-34-28.txt10.11 KBmbovan
#34 2953650-34.patch19.45 KBmbovan
#34 2953650-34-3016388-58-COMBINED.patch54.66 KBmbovan
#28 interdiff-2953650-25-28.txt9.49 KBjohnchque
#28 2953650-28.patch18.07 KBjohnchque
#28 2953650-28-COMBINED.patch50.34 KBjohnchque
#25 interdiff-2953650-22-25.txt6.02 KBjohnchque
#25 2953650-25.patch13.39 KBjohnchque
#25 2953650-25-COMBINED.patch45.52 KBjohnchque
#24 interdiff-2953650-19-22.txt6.28 KBjohnchque
#22 2953650-22.patch12.22 KBjohnchque
#22 2953650-22-combined.patch44.35 KBjohnchque
#19 interdiff-2953650-16-19.txt5.3 KBjohnchque
#19 2953650-19.patch9.24 KBjohnchque
#16 2953650-16.patch6 KBjohnchque
#16 2953650-16-test-only.patch3.59 KBjohnchque
#15 interdiff-2953650-12-15.txt2.96 KBjohnchque
#15 2953650-15.patch6 KBjohnchque
#12 2953650-12.patch6.84 KBseanB
#9 remove_obsolete_composite_entities-2953650-9.patch7.46 KBidebr
#9 remove_obsolete_composite_entities-2953650-9-test-only.patch2.52 KBidebr
#9 interdiff-6-9.txt1.07 KBidebr
#6 remove_obsolete_composite_entities-2953650-6.patch6.31 KBidebr
#6 remove_obsolete_composite_entities-2953650-6-test-only.patch2.52 KBidebr
#2 entity_reference_revisions-historical_composite_entities_not_removed-2953650-2-test-only.patch1 KBidebr
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

idebr created an issue. See original summary.

idebr’s picture

Attached test-only patch shows the current behavior.

Status: Needs review » Needs work

The last submitted patch, 2: entity_reference_revisions-historical_composite_entities_not_removed-2953650-2-test-only.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

Yes, that's a known issue, but please provide a separate scenario for that in the test. There are several issues in paragraphs related to this.

The scenario should also test doing that with revisions and without, because the entity can only be deleted if all revisions that used it have been deleted.

Berdir’s picture

idebr’s picture

On host entity delete, the EntityReferenceRevisionsItem is no longer called for empty fields, so I implemented a hook_entity_delete.

#4 Added testCompositeDeleteAfterRemovingReference and testCompositeDeleteAfterRemovingReferenceWithRevisions to EntityReferenceRevisionsCompositeTest

I did not include an interdiff, since it is essentially a new patch.

The last submitted patch, 6: remove_obsolete_composite_entities-2953650-6-test-only.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Status: Needs review » Needs work

The last submitted patch, 6: remove_obsolete_composite_entities-2953650-6.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

idebr’s picture

The last submitted patch, 9: remove_obsolete_composite_entities-2953650-9-test-only.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

mtodor’s picture

Nice work! The database looks way cleaner with this patch when paragraphed articles are deleted. ;)

I would just add one or two nitpicks.

  1. +++ b/entity_reference_revisions.module
    @@ -218,3 +220,68 @@ function entity_reference_revisions_form_field_ui_field_storage_add_form_alter(a
    +    $target_entity_type = \Drupal::entityTypeManager()->getDefinition($target_entity_type_id);
    ...
    +      $entities = \Drupal::entityTypeManager()->getStorage($target_entity_type_id)->loadMultiple($entity_ids);
    

    It would be nicer to pull out \Drupal::entityTypeManager() outside foreach loop in separate variable.

  2. +++ b/entity_reference_revisions.module
    @@ -218,3 +220,68 @@ function entity_reference_revisions_form_field_ui_field_storage_add_form_alter(a
    +    if ($target_entity_type->get('entity_revision_parent_type_field') && $target_entity_type->get('entity_revision_parent_id_field')) {
    +      $parent_type_field = $target_entity_type->get('entity_revision_parent_type_field');
    +      $parent_id_field = $target_entity_type->get('entity_revision_parent_id_field');
    +      $parent_name_field = $target_entity_type->get('entity_revision_parent_field_name_field');
    

    I don't know how the module works in details. So, I'm just concluding this from code => here we are checking for existence of entity_revision_parent_type_field and entity_revision_parent_id_field, but not for entity_revision_parent_field_name_field. Why not?

    And also, maybe there is no need for creating new variables.

In general, it looks good.

seanB’s picture

Rerolled patch #9

Berdir’s picture

+++ b/entity_reference_revisions.module
@@ -221,6 +222,71 @@ function entity_reference_revisions_form_field_ui_field_storage_add_form_alter(a
+
+  /** @var \Drupal\Core\Entity\ContentEntityInterface $entity */
+  $entity_reference_revisions_fields = entity_reference_revisions_load_field_storage_config($entity->getEntityTypeId());
+  foreach ($entity_reference_revisions_fields as $err_field) {
+    if ($entity->hasField($err_field->getName()) === FALSE) {
+      continue;
+    }

As we actually have an entity here, the usual pattern is to just do something like this:

foreach ($entity->getFieldDefinitions() as $field_name => $field_definition) {

I think the performance impact of that is minimal, it's a loop over an array of objects and a single method call on them.

if ($field_definition->getType() != 'entity_reference_revisions') {
continue;
}

Berdir’s picture

Plus, it would also work in case entity types have base fields, like the paragraphs_library_item entity.

johnchque’s picture

Addressing suggestions of comments above. :)

johnchque’s picture

Adding test only patch again. :)

The last submitted patch, 16: 2953650-16-test-only.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

Status: Needs review » Needs work
  1. +++ b/entity_reference_revisions.module
    @@ -222,6 +223,45 @@ function entity_reference_revisions_form_field_ui_field_storage_add_form_alter(a
    +    if ($target_entity_type->get('entity_revision_parent_type_field') && $target_entity_type->get('entity_revision_parent_id_field')) {
    +      $parent_type_field = $target_entity_type->get('entity_revision_parent_type_field');
    +      $parent_id_field = $target_entity_type->get('entity_revision_parent_id_field');
    +      $parent_name_field = $target_entity_type->get('entity_revision_parent_field_name_field');
    +      $entity_ids = \Drupal::entityQuery($target_entity_type_id)
    +        ->condition($parent_type_field, $entity->getEntityTypeId())
    +        ->condition($parent_id_field, $entity->id())
    +        ->condition($parent_name_field, $field_name)
    +        ->execute();
    +      if (empty($entity_ids)) {
    +        continue;
    +      }
    +      $entities = $entity_type_manager->getStorage($target_entity_type_id)->loadMultiple($entity_ids);
    +      foreach ($entities as $entity) {
    +        $entity->delete();
    +      }
    +    }
    

    Similar to the manual cleanup issue, weird things could happen if an entity changes its parent.

    Imagine a scenario where an entity switches the parent but then that new entity is deleted.

    So you have P(arent)1.1 referencing C(omposite)1.1 (id 1, revision 1).

    Now, for example using Paragraphs drag & drop, that composite entity is moved to a new parent P2, which is stored as a new revision, so now we have this:

    P1.1 => C1.1
    P2.1 => C1.2

    Then P2 is removing the reference again in a ne revision, resulting in P2.2 not referencing it.

    Now we delete P2, which triggers this hook which would find C1 as a referenced composite entity, but P1.1 still uses the old revision, which means we are not allowed to delete it.

    Another example could be an incorrectly cloned entity where two entities point to the same composite.

    So we would, again, need to check this revision-by-revison, but that could take a very long time. Or at least look for any other entity referencing it, also kinda slow.

    One option would be to move the actual deleting into a background queue process.

    Lets start by adding these cases as additional test coverage.

  2. +++ b/tests/src/Kernel/Plugin/migrate/destination/EntityReferenceRevisionsDestinationTest.php
    @@ -32,6 +32,7 @@ class EntityReferenceRevisionsDestinationTest extends KernelTestBase implements
         'entity_reference_revisions',
         'entity_composite_relationship_test',
    +    'field',
         'user',
         'system',
         'text',
    @@ -242,7 +243,7 @@ class EntityReferenceRevisionsDestinationTest extends KernelTestBase implements
    
    @@ -242,7 +243,7 @@ class EntityReferenceRevisionsDestinationTest extends KernelTestBase implements
        * @dataProvider destinationFieldMappingDataProvider
        */
       public function testDestinationFieldMapping(array $data) {
    -    $this->enableModules(['node', 'field']);
    +    $this->enableModules(['node']);
         $this->installEntitySchema('node');
    

    this change is possibly not needed anymore now as we no longer call field.module functions.

johnchque’s picture

Status: Needs work » Needs review
FileSize
9.24 KB
5.3 KB

Addressing changes from previous comments. :)

We will definitely need to create a revision checking for avoiding data lose.

Status: Needs review » Needs work

The last submitted patch, 19: 2953650-19.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

johnchque’s picture

Assigned: Unassigned » johnchque

Working on this. Will have a patch tomorrow. :)

johnchque’s picture

Status: Needs work » Needs review
FileSize
44.35 KB
12.22 KB

First attempt with a cron approach.

The last submitted patch, 22: 2953650-22-combined.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

johnchque’s picture

FileSize
6.28 KB

Forgot the interdiff.

johnchque’s picture

OK, updating tests, updating logic for adding the composite entities. :) One scenario failing since we are not yet using allRevisions in the query when adding to the queue. :) Will fix tomorrow.

The last submitted patch, 25: 2953650-25-COMBINED.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Berdir’s picture

  1. +++ b/src/Plugin/QueueWorker/AggregatorCompositeRevisions.php
    @@ -0,0 +1,66 @@
    +    // Check the usage of data and remove when not used.
    +    /** @var \Drupal\entity_reference_revisions\EntityReferenceRevisionsOrphanPurger $purger_service */
    +    $purger_service = \Drupal::service('entity_reference_revisions.orphan_purger');
    ...
    +    $composite_type = \Drupal::entityTypeManager()->getDefinition($data['entity_type_id']);
    

    this is a plugin, so we can use createInstance() to inject this properly.

  2. +++ b/src/Plugin/QueueWorker/AggregatorCompositeRevisions.php
    @@ -0,0 +1,66 @@
    +        $count = $composite_storage->getQuery()
    +          ->allRevisions()
    +          ->condition($composite_type->getKey('id'), $composite_revision->id())
    +          ->count()
    +          ->accessCheck(FALSE)
    +          ->execute();
    +        if ($count <= 1) {
    +          $composite_revision->delete();
    

    this query now also exists twice, so we could refactor it into a dedicated method as well, can also wait until the other issue is committed if that's easier.

  3. +++ b/tests/src/Kernel/EntityReferenceRevisionsCompositeTest.php
    @@ -21,6 +22,7 @@ class EntityReferenceRevisionsCompositeTest extends EntityKernelTestBase {
     
       use ContentTypeCreationTrait;
       use NodeCreationTrait;
    +  use CronRunTrait;
    

    this is to run cron through a http request, so we don't need that here.

  4. +++ b/tests/src/Kernel/EntityReferenceRevisionsCompositeTest.php
    @@ -420,4 +431,202 @@ class EntityReferenceRevisionsCompositeTest extends EntityKernelTestBase {
    +  /**
    +   * Tests the composite entity is not deleted when having multiple parents.
    +   *
    +   * Includes revisions on the host entity.
    +   */
    +  public function testCompositeDeleteAfterChangingParent() {
    

    Another case that we need is the reverse of this. Same until the end, but then delete the first node *first* and then run cron.

    The entity should still exist, *but* the first revision should get deleted as that one is no longer in-use.

  5. +++ b/tests/src/Kernel/EntityReferenceRevisionsCompositeTest.php
    @@ -420,4 +431,202 @@ class EntityReferenceRevisionsCompositeTest extends EntityKernelTestBase {
    +
    +    // Delete the parent of the previous revision.
    +    $node = $this->getNodeByTitle('First node');
    +    $node->delete();
    

    loading it again doesn't seem necesary, you still have $node from before.

  6. +++ b/tests/src/Kernel/EntityReferenceRevisionsCompositeTest.php
    @@ -420,4 +431,202 @@ class EntityReferenceRevisionsCompositeTest extends EntityKernelTestBase {
    +    // Test that the composite entity is not deleted when the duplicate is
    +    // deleted.
    +    $duplicate_node->delete();
    +    $composite = EntityTestCompositeRelationship::load($composite->id());
    +    $this->assertNotNull($composite);
    

    what happens if we do this *without* first deleting the reference to it?

    Does it get deleted then?

    That would be a bug and would indicate that we want to adjust the existing delete logic to also use the new queue.

    And if it doesn't get deleted at all, then we should also adjust it and add it to the queue there.

johnchque’s picture

So addressing changes of previous comments.

About 6, yes, it gets deleted, so switching now to be added to the queue.

Also, fixed the first test method by running cron.

Found a problem with the tests, when creating a composite referenced by Parent1, then remove the reference in a new revision of Parent1, then create Parent2 with the revision. Both revisions of the composite get their parent id updated. Not sure how that even happens.

The last submitted patch, 28: 2953650-28-COMBINED.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

miro_dietiker’s picture

Both revisions of the composite get their parent id updated.

That would mean that we can not trust the parent id and all Paragraphs that have been moved across hierarchy have stale inconsistent data. This would confirm also why i experienced issues with reverting, but never figured out how to fully reproduce.

However, a follow-up could then also be to recalculate the parent field data in revisions. Otherwise we need to implement more delete-skip conditions to avoid stale references don't lead to data loss...

johnchque’s picture

We just found out that the fields for the composite test entity are not revisionable. Will work on that too.

johnchque’s picture

And yes, we need to recalculate parent fields, however we would need to check all parent entities with ERR fields because if there was a change of parent, the parent_type field updated all revisions so we might not know who and what type the previous parent was.

mbovan’s picture

Connecting with the related Paragraphs issues.

mbovan’s picture

Title: Composite entities are not removed on host delete if reference was removed in prior revision » [PP-1] Composite entities are not removed on host delete if reference was removed in prior revision
Related issues: +#3016388: Manual cleanup process for obsolete composite entities
FileSize
54.66 KB
19.45 KB
10.11 KB

Rerolled #28, fixed code style issues, fixed dependency inejction issues, cleanup.

This depends on #3016388: Manual cleanup process for obsolete composite entities so I am connecting it as a related issue.

The last submitted patch, 34: 2953650-34-3016388-58-COMBINED.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

mbovan’s picture

The last submitted patch, 36: 2953650-34-3016388-58-COMBINED.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

mbovan’s picture

This patch should fix the test fails.

Berdir’s picture

Title: [PP-1] Composite entities are not removed on host delete if reference was removed in prior revision » Composite entities are not removed on host delete if reference was removed in prior revision
Status: Needs review » Needs work
  1. +++ b/entity_reference_revisions.module
    @@ -285,3 +287,47 @@ function entity_reference_revisions_entity_revision_create(ContentEntityInterfac
    +  foreach ($entity->getFieldDefinitions() as $field_name => $field_definition) {
    +    if ($field_definition->getType() != 'entity_reference_revisions') {
    +      continue;
    +    }
    

    Should we maybe also improve the check here to support modules extending from us?

    If so, I'm again worried about performance at this point, as we'd have to actually check every field. Maybe it's actually better to check the class and that it is an instanceof ERRItem?

  2. +++ b/src/Plugin/QueueWorker/EntityReferenceRevisionsAggregator.php
    @@ -0,0 +1,131 @@
    +/**
    + * Removes composite revisions that are no longer used.
    + *
    + * @QueueWorker(
    + *   id = "entity_reference_revisions_aggregator",
    + *   title = @Translation("Entity Reference Revisions Aggregator"),
    + *   cron = {"time" = 60}
    + * )
    + */
    +class EntityReferenceRevisionsAggregator extends QueueWorkerBase implements ContainerFactoryPluginInterface {
    +
    

    strange class name/id, I guess copied from aggregator module. Maybe entity_reference_revisions_orphan_purger as ID and just OrphanPurgerWorker as class name?

  3. +++ b/src/Plugin/QueueWorker/EntityReferenceRevisionsAggregator.php
    @@ -0,0 +1,131 @@
    +
    +    // Get the next batch of revision ids from the selected entity type.
    +    // @todo Entity queries on all revisions with a revision ID condition are
    +    //   not working, use a direct sql query
    +    $entity_revision_ids = $this->database->select($composite_type->getRevisionTable(), 'r')
    +      ->fields('r', [$composite_revision_key])
    

    this comment is probably copied, we don't have a batch here. Also would be good to reference the core issue.

    We're loading all revisions of a single paragraph here, so wondering if in some extreme cases with hundreds of revisions, this could be an issue, but this is cron/queue, so it should be fine if it takes a few seconds. We've seen in the batch profiling that even with blackfire on, we can process ~300 paragraph revisions in about a second.

  4. +++ b/src/Plugin/QueueWorker/EntityReferenceRevisionsAggregator.php
    @@ -0,0 +1,131 @@
    +      // If this is the default revision of the composite entity, check if there
    +      // are other revisions. If there are not, delete the composite entity.
    +      if ($composite_revision->isDefaultRevision()) {
    +        $count = $composite_storage->getQuery()
    +          ->allRevisions()
    +          ->condition($composite_type->getKey('id'), $composite_revision->id())
    +          ->count()
    +          ->accessCheck(FALSE)
    +          ->execute();
    +        if ($count <= 1) {
    +          $composite_revision->delete();
    +        }
    +      }
    +      else {
    +        // Delete the revision if this is not the default one.
    +        $composite_storage->deleteRevision($composite_revision->getRevisionId());
    

    this part is almost identical to the code in \Drupal\entity_reference_revisions\EntityReferenceRevisionsOrphanPurger::deleteOrphansBatchOperation(), apart from the batch progress tracking there. Maybe we can put it in a public method on the service and return if we deleted and if just revision or the whole entity?

  5. +++ b/tests/src/Kernel/EntityReferenceRevisionsCompositeTest.php
    @@ -439,8 +452,269 @@ class EntityReferenceRevisionsCompositeTest extends EntityKernelTestBase {
    +   * Tests the composite entity is not deleted when having multiple parents.
    +   *
    +   * Includes revisions on the host entity.
    +   */
    +  public function testCompositeDeleteRevisionAfterChangingParent() {
    

    We seem to have several very similar/identical method descriptions, maybe we can improve that a bit?

mbovan’s picture

Status: Needs work » Needs review
FileSize
21.89 KB
18.82 KB

Thanks for the review!

#39:

  1. ✔️Fixed.
  2. ✔️Fixed.
  3. ✔️Improved
  4. ✔️Makes sense. I extracted it to a new method on the service.
  5. Hopefully improved/cleared tests.
Berdir’s picture

Status: Needs review » Fixed

Looks good, committed.

  • Berdir committed ae23b70 on 8.x-1.x authored by mbovan
    Issue #2953650 by yongt9412, mbovan, idebr, seanB, Berdir: Composite...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.