Problem/Motivation

There is a set of situations where composite entities (such as Paragraphs) are not deleted from the database:
- #3016388: Manual cleanup process for obsolete composite entities
- #2834374: Support deletion of composite entities when parent field is translatable
- #2953650: Composite entities are not removed on host delete if reference was removed in prior revision

Once these issues are resolved, this issue will provide the ability to remove obsolete composite entities.

Proposed resolution

Delete all composite entities from existing storage that are no longer relevant through either:

  1. An (involuntary) database-update
  2. A batch from the Status report

Composite entities pass all checks from the related issues:
- Not used in the default revision of the host entity
- Not used in the any translation of the host entity
- Not used in earlier revisions of the host entity

Remaining tasks

  1. Write a patch
  2. Review
  3. Commit

User interface changes

To be determined.

API changes

None.

Data model changes

Obsolete composite entities are removed from existing storage

Comments

miro_dietiker created an issue. See original summary.

miro_dietiker’s picture

Priority: Normal » Minor

So we have host revision and host delete working now.
#2771519: support host delete
#2771523: support host revision delete

This issue is then more about stale data from the time before.
We could offer a clean-up as an update function if we want to offer this to everyone.
Otherwise we could also offer a drush command that checks if there are stale entities or revisions.

Demoting to minor as it's not a true integrity issue. MD Systems will not work on this.

berdir’s picture

not sure about update function, you're forced to run those, if anything, I'd offer it as a form/status report something that starts a batch.

We still have at least two other related issues open, about deleting removed translations and deleting on translatable fields, I don't think those issues were already committed?

idebr’s picture

Title: delete / garbage collect composite entities when no more used » Remove obsolete composite entities from existing storage
Issue summary: View changes
miro_dietiker’s picture

Title: Remove obsolete composite entities from existing storage » [META] Remove obsolete composite entities from existing storage (garbage collection)

The common term for such an action is garbage collection and as i'm sometimes searching through the issue queue, it should remain there.

miro_dietiker’s picture

Priority: Minor » Major

Promoting as a meta topic according to the roadmap.

miro_dietiker’s picture

Priority: Major » Critical

One more. :-)

idebr’s picture

Status: Active » Needs review
StatusFileSize
new279.56 KB

Attached patch provides the following elements:
- A database fixture with orphaned entities from different host entity types (revisionable/non-revisionable, translatable/non-translatable field)
- A 'Delete orphaned composite entities' form available at /admin/config/entity-reference-revisions/delete-orphans'
- An UpdatePathTestBase test that checks valid composite entities remain and orphaned entities are deleted

My first iteration called the code from an update hook, but our production database with 100k paragraphs would cause a longer maintenance period than acceptable. This form can be called after regular maintenance is completed, since it does not affect regular requests.

Status: Needs review » Needs work
berdir’s picture

+++ b/src/Form/OrphanedCompositeEntitiesDeleteForm.php
@@ -0,0 +1,243 @@
+   *   The context array.
+   */
+  public static function deleteOrphansBatchOperation($entity_type_id, array &$context) {
+    $entity_type = \Drupal::entityTypeManager()->getDefinition($entity_type_id);
+    $entity_type_key = $entity_type->getKey('id');
+    $entity_storage = \Drupal::entityTypeManager()->getStorage($entity_type_id);

Yes, offering a batch is something that I would suggest as well.

Will review in depth later, just some quick input/thoughts.

* Would be nice to have the actual checking if a paragraph is still used in a service or so. One idea I have is that on saving, we maintain a background queue that checks and deletes orphaned paragraphs, then we want to re-use the main logic
* A problem we identified is that the parent fields are non-revisionable and we are missing a revision ID parent reference. This becomes really nasty if you move a paragraph around. e.g. you might have it on the node as parent, then you move it into a nested paragraph. Later, you delete that nested paragraph with its children. Then your query will report that the child paragraph is now orphaned. But you still have an old node revision that uses the paragraph directly and we have no way to find that.

idebr’s picture

Status: Needs work » Needs review
StatusFileSize
new2.91 KB
new276.49 KB

Removed the EntityReferenceRevisionsAutocompleteWidgetTest from #2955240: entity_reference_autocomplete widget generates error when autocreating entities that caused the test failures

miro_dietiker’s picture

My idea was that we stay in this issue on a meta level and do implementations in child issues.

idebr’s picture

#12 Other than the [META] prefix you added in #5, this issue does not have a meta goal or child issues. I'd be happy to help out with issue administration though, if that leaves time for you to look at the patches instead? ;-)

seanb’s picture

StatusFileSize
new14.3 KB

Rerolled patch since it wouldn't apply anymore.

Status: Needs review » Needs work

The last submitted patch, 14: 2771531-14.patch, failed testing. View results

seanb’s picture

StatusFileSize
new275.6 KB

Forgot --binary...

berdir’s picture

So, we managed to get some support for working on this and will try to bring it forwards.

Next steps:

* Lets keep this as a meta issue, move the patches from this issue to a new one, something like "Manual cleanup process for obsolete composite entities"
* Update the issue summary, mention the related issues, there are at least two still open, one about translatable fields and the other about deleting old, no longer used entities when a host entity is deleted.
* I'm not really happy about the approach for testing. I expect we will need to review the cases and add more while working on it and also over time. Dumps are fine for upgrade path tests, where the old data never changes, but in this case, it likely will. I would recommend that we convert that dump to regular insert queries. If you extract the dump and search for the relevant tables for the entities used in this test, you could as a starting point just take those insert queries and add them into a set up method directly in the test. Then make the test pass again.
* As commented before, I'm still very worried about deleting anything that might still be used. With recent fancy paragraph features drag and drop across fields, it is possible that different revisions of the same paragraph have different parents. So I would propose we change the query to instead query through all revisions, and then check if that specific revision is still being used. That will take longer, but we can likely also delete a lot of revisions of paragraphs that are still used. If a revision that we find is unused and is the default revision, delete it if there are no other revisions of that entity left.

johnchque’s picture

Issue summary: View changes
Status: Needs work » Active

Updating IS.

johnchque’s picture

@idebr could you give us a detailed description of what entities and bundles you had created? That would speed up the process to update the tests.

berdir’s picture

Issue summary: View changes
berdir’s picture

Priority: Critical » Normal

So, the current 3 issues have been dealt with, we've improved automatic cleanup and have an option to clean up existing sites manually. There are a bunch of follow-ups, and there is one thing that we didn't really look at and that is cases when not using revisions and expecting paragraphs to be cleaned up immediately when saving and no longer using one.

We could possibly support that by triggering a queue-based check when an entity is saved but not as a new revision. that will however not fully fix issues of people that incorrectly rely on e.g. views-based list of paragraphs with conditions on the parent field, but that shouldn't be done anyway.

Anyway, downgrading to normal for now.

dqd’s picture

Awesome progress here! +1 for all the hard work on it. Not an easy task. Revisions are always an underated complex bundle of things to get sorted, what users often slidely miss.

and have an option to clean up existing sites manually.

Where is this documented? Where is this option?

Sorry for my objection but I am not sure if I agree with downgrading... Apart from the fact that I am not a fan of downgrading an issue just because of parts being solved, since the status should not reflect the progress but the initial risk the issue has caused. And that does not change by solving an issue. Critical / Fixed would be better then?

berdir’s picture

It's a meta, the specific issues which are referenced above have been fixed, see #3016388: Manual cleanup process for obsolete composite entities on the manual cleanup.

dqd’s picture

@Berdir: Thanks for pointing to the manual cleanup issue one more time. But I was rather thinking about something "official" on D.O. what helps the average user to a) decide if he wants to start using modules with ERR this way, or b) who is in the middle of orphaned paragraphs or similar types using ERR and looking for solutions. Maybe me or somebody of my team could start a documentation page reflecting the linked solution and the in #67 mentioned help info topic in the module help pages installed? Orphaned revision or other data entries in DB are not obvious for all user levels. That's what scares me to downgrade such initiatives before we are 100% sure that the user knows what he enters and should be - let's say - provided with a kindly warning? Similar to "Experimental". Just thinking ...

Apart from that: Do you say that [META] prio status should be handled different regarding its initial prio level than issues? Wouldn't it make sense to keep the initial priority level to reflect the importance the contributors gave that issues? Never mind. Sorry for OT. All the best and Greetings from Berlin. Stay safe!

hmartens’s picture

Thank you for all your hard work on this issue, it is much appreciated! I am looking forward to this fixing the issue I am experiencing where Views is showing deleted paragraph items

Thank you.

karimbou’s picture

Hello, so basically in D8.9x or D9.x I'm having issues regarding these orphan paragraphs (item_paragraph) they are nested in one parent paragraph (list_paragraph)
So basically, when a contributor or content manager deletes these paragraphs they can't be seen on the node after saving, so that's alright and working as expected.

But I have a view that lists all these paragraphs based on it's parent (uuid) inside a node. And I keep seeing all orphaned and previously deleted paragraphs I can't understand why, they're not supposed to be publish or be displayed. What is missing on my view to filter these out ?
Of cours deleting manually or with a cron all orphan paragraphs isn't an option here ... can't be done in a normal content moderation workflow.
Thanks,

berdir’s picture

This is never going to fix that, with revisions and so on, there are always going to be those old paragraphs around. Either don't use views, there isn't really a need for that, you should be able to achieve what you want with paragraph and field templates or check the issue queue for the issue that improves views so that you can do correct joins with views from the parent to the referenced paragraphs.

karimbou’s picture

The idea behind the view and the paragraphs, is to be able to list and sorting the paragraphs based on contribution order on edit, but also by adding dropdown filters to filter the view of paragraph.

berdir’s picture

If you consider them standalone, sortable things then we usually build that as a separate node or entity type, they will be edited on their own then, for example with a reference to some other type/term/something to filter them by. If commonly only single entries are edited or added that will also speed that up.

Either way, again, if you want that approach you need a correct views integration, that's the problem, not orphans.

Aramithran’s picture

I understand we can try different approaches than resorting to Views. However, what can we do about the problem of the database being populated by entries that were supposed to be deleted but weren't.
Could there be a script that deletes the paragraph instance from the database when the user removes it from the node?

umit’s picture

Is there any development regarding this issue?
This issue is a big problem on complex sites.

Following

bensti’s picture

same problem, any update?

azovsky’s picture

Priority: Normal » Major
Issue tags: +orphan paragraphs

Sorry, but I would like to raise the priority level for this issue. This is (orphan paragraphs) a big problem for data analysis on large sites.

berdir’s picture

Priority: Major » Normal
Issue tags: -orphan paragraphs

Per #21, there are only some minor and edge case situations here, plus things like the still broken views integration.

Are you certain your "orphans" aren't just data from old revisions? that's expected and obviously still required to be kept until the host revision is deleted.

If you have specific scenarios that do not work as expected, then create a new issue with steps to reproduce and add it to this meta.

alfthecat’s picture

I have to phase out paragraphs because of this issue. I've tried using eca to delete the orphaned paragraphs which doesn't work. Need to move to eck. I think this is an important issue.

umit’s picture

"/admin/config/system/delete-orphans" you can clean orphans from this section.

alfthecat’s picture

@umit thanks, I was not aware. In my case though, I need to actually be able to react to the deletion event. I use paragraphs as a ui to automatically create commerce products and their variations. When the paragraph is deleted by an end user, so should those products and variations. That's not possible due to this issue.

steinmb’s picture

Read though this issue, including #34, that mention there is a few separate issues that needs to be fixed. What are they, and where are they? I think the IS neeed to be updated to include what ever issues left. We might avoid noise in here, and it might give the pending issue more eyes.

solideogloria’s picture

Curious, is there a way to set a specific paragraph type as not revisioned? I have some that are revisioned and some that shouldn't be.

junaidpv’s picture

@umit Thank you!

moshe weitzman’s picture

I think this issue could be closed.