Problem/Motivation

Images/files uploaded to a file/image fields on entities with multiple revisions are never deleted, even when the entity is deleted. Because the file usage counts for the revisions is never released.

Steps to reproduce:

  1. create an article node with an image A uploaded to its image field
  2. verify that the uploaded file A has its status column in the file_managed table set to 1,
    and it has 1 usage in the file_usage table
  3. edit the node, check the "Create new revision" checkbox, remove the original image A and upload a new image B, save
  4. verify that the uploaded file B has its status column in the file_managed table set to 1,
    and it has 1 usage in the file_usage table. Verify that the rows for file A are unchanged in both tables.
  5. delete the article
  6. verify that the uploaded file B has its status column in the file_managed table set to 0, and it has no row anymore in the file_usage table
    • Expected result: file A has its status column in the file_managed table set to 0, and it has no row anymore in the file_usage table
    • Actual result: file A has its status column in the file_managed table still set to 1, and it still has n1 usage in the file_usage table
  7. run cron 6 hours later (this is the default value of the temporary_maximum_age setting in system.file)
  8. verify that uploaded file B is now removed (because usage = 0)
    • expected result: the uploaded file A is still present (because usage > 0)
    • actual result: the uploaded file A is absent (because usage = 0)
  9. Proposed resolution

    TBD, see #126 for an summary of comments #1–#125.

    Remaining tasks

    TBD

    User interface changes

    None.

    API changes

    TBD

    Data model changes

    TBD

CommentFileSizeAuthor
#119 1239558-diff-114-119.txt763 bytesvijaycs85
#119 1239558-119.patch33.35 KBvijaycs85
#114 1239558-114.patch33.49 KBchipway
#111 interdiff-1239558-102-111.txt9.19 KBchipway
#111 1239558-111.patch33.5 KBchipway
#102 1239558-102.patch33.62 KBalexpott
#102 99-102-interdiff.txt3.01 KBalexpott
#99 1239558-99.patch31.26 KBalexpott
#99 97-99-interdiff.txt9.2 KBalexpott
#97 1239558-97.patch22.74 KBalexpott
#97 93-97-interdiff.txt3.57 KBalexpott
#93 interdiff.txt829 bytesclaudiu.cristea
#93 1239558-93.patch23.34 KBclaudiu.cristea
#90 1239558-90.patch21.88 KBalexpott
#90 88-90-interdiff.txt4.92 KBalexpott
#88 1239558-88.patch17.62 KBalexpott
#88 78-88-interdiff.txt2.3 KBalexpott
#78 interdiff.txt2.94 KBclaudiu.cristea
#78 1239558-78.patch16.3 KBclaudiu.cristea
#76 1239558-76.patch16.21 KBalexpott
#76 74-76-interdiff.txt5.34 KBalexpott
#74 1239558-74.patch12.18 KBalexpott
#74 71-74-interdiff.txt2.44 KBalexpott
#71 1239558-71.patch10.03 KBalexpott
#71 70-71-interdiff.txt6.99 KBalexpott
#70 interdiff.txt1.6 KBclaudiu.cristea
#70 1239558-70.patch5.48 KBclaudiu.cristea
#67 interdiff.txt1.55 KBclaudiu.cristea
#67 1239558-67.patch5.44 KBclaudiu.cristea
#65 1239558-65.patch5.34 KBalexpott
#65 62-65-interdiff.txt651 bytesalexpott
#62 1239558-62.patch5.32 KBalexpott
#53 interdiff.txt4.99 KBamateescu
#53 1239558-53.patch7.58 KBamateescu
#46 deleting_a_node_with-1239558-46.patch8.58 KBmtodor
#43 deleting_a_node_with-1239558-43.patch7.33 KBmtodor
#39 test-deleting_a_node_with-1239558-39.patch3.07 KBmtodor
#15 add_file_usage_deletion_to_node_deletion.patch0 bytesjoshua.albert
#10 node_system-delete_node_revision_hook-1239558-10.patch956 byteschaby
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

aaron’s picture

Project: D7 Media » Drupal core
Version: 7.x-2.x-dev » 8.x-dev
Component: Code » image system
Issue tags: +Needs backport to D7

I've tested, and this appears to be a core issue.

Dave Reid’s picture

Project: Drupal core » D7 Media
Version: 8.x-dev » 7.x-2.x-dev
Component: image system » Code
Issue tags: -Needs backport to D7

Are you able to delete the file using media/[file id]/delete?

aaron’s picture

Project: D7 Media » Drupal core
Version: 7.x-2.x-dev » 8.x-dev
Component: Code » file system
Issue tags: +Needs backport to D7

most likely part of the file system, rather.

aaron’s picture

steps to reproduce w/o media module, from a new installation of drupal:

1) create an article post with an image file
2) edit the post, check to create a new revision, remove the original image and upload a new image, and submit
3) edit and delete the post
4) you'll see the new image removed from the files/field/image folder, yet the original remains
5) examining the file_managed table in mysql, you'll see an entry for the original.

aaron’s picture

you'll also see an entry in the file_usage table.

aaron’s picture

Component: file system » node system

thinking about it, it's most likely in the node system, to do with revisions not properly decrementing the count in file_usage when deleting.

aaron’s picture

Title: Unable to delete media » Deleting a node with revisions does not release file usage

oddly, deleting a single revision does indeed properly release the file. it's only when you delete a node with revisions that this occurs.

pwolanin’s picture

Does this only happen with media module, or other files too?

idiotzoo’s picture

Happens with other files too. Behaviour is exactly the same with a CCK file field.

chaby’s picture

Hi,

Agree with #6 and i faced with this trouble too in 7.x (having orphans files with revision enable and a image field which heavy used file element and process).

I have understand that when a node is delete, it did't delete revisions associated by invoking appropriate hooks (node_revision_delete and field_attach_delete_revision) for example by using node_revision_delete().
As a result, hook_field_revision_delete aren't called and file module did'nt decrement file usage.
Indeed, you need to delete revision one by one and then delete node itself !

Or implements a hook node delete and do what the node module should do (?) : load revisions and for each of them, execute node_revision_delete() (or do it manually whatever).
E.g :

function MYMODULE_node_delete($node) {
  $revisions = node_revision_list($node);
  foreach ($revisions as $revision) {
    //By executing this method, it will call revision hook
    //File field revision hook is called and decrement file usage properly.
    //If no other nodes and revisions use this file, it will be deleted.
    node_revision_delete($revision);
  }
}

Here is a quick unstable 7.x patch for example.
But the real problem is : modules invoke in node delete maybe need to load revisions data about the current node beeing deleted (for e.g field revision value). But mostly do it already rather than using revision hook ?
By deleting revisions first, it will just breaks these invoked modules behaviors...
And i'm sure that i miss a lot of things...!

chaby’s picture

forgot to say that in an extrem use case, if we have 10000 revisions to delete in one shot per node to delete in a huge transaction, sure some bad behavior will happen on the server! And this is maybe one reason why it haven't been done...
maybe using a cron to create/update a limited queue to delete orphans file (forced) is a better solution (from file_usage table, make a left join on node table and select nid where nid is null)...

Dave Reid’s picture

Status: Active » Needs work
Issue tags: +Needs tests

Have a patch but it seems to need work and tests.

gilsbert’s picture

Hi.

We need this solution urgently for D7.

Is there a patch that we may test?

Thank you very much.

Regards,
Gilsberty

capellic’s picture

Issue summary: View changes

Is there a module out there that will clean-up orphaned files? Or a recipe for using VBO to view a list of orphaned files with the checkboxes to delete them?

I did some research and found these resources -- thought I have not yet tried them:

https://www.drupal.org/project/auditfiles
https://www.drupal.org/node/733258#comment-5582764

joshua.albert’s picture

I submitted a patch for the same issue (https://www.drupal.org/node/2393641), but quickly marked it as a duplicate when I found this (the patch here makes more sense as it allows for node_revision_delete() hooks).

@capellic - I've used the following SQL to remove the orphans manually:

DELETE from file_usage WHERE fid IN (select fid from (select fu.fid as fid from file_usage fu LEFT JOIN node n ON n.nid = fu.id WHERE n.nid IS NULL) as tt);
joshua.albert’s picture

joshua.albert’s picture

Is there any reason why we couldn't also add this? After all, if the node is deleted, so should all revisions and all file use references to the file, correct?

I supposed using only the hooks allows for modules more control, but as I see it, if a node is deleted, there shouldn't be any references in the file_usage table which, of course, references the non-existence node id.

db_delete('file_usage')
       ->condition('id', $nids, 'IN')
       ->execute();
joshua.albert’s picture

After some testing with the core patch that invokes revision deletion, I've verified that, at least in my case, the file_usage table record is still not being released. The code above tests out to work correctly. It removes the records from the file_usage table and allows the file to be deleted. This includes deleting files when deleting nodes, due to the fact that node deletion invokes file deletion.

FreekyMage’s picture

Just confirming that #10 works (although line numbers are different now and it might not be the best solution as the author says) and is still needed in D7.38

catch’s picture

Version: 8.0.x-dev » 8.1.x-dev
Priority: Normal » Major

Bumping to major since this is a data-integrity issue.

alexpott’s picture

Issue tags: +Triaged core major

Discussed with @catch, @effulgentsia, @xjm, @amateescu, @plach, @berdir, @swentel, and @fago. We agreed this issue is major.

Wim Leers’s picture

Holy cow, we've had this bug since 2011, and it also exists in Drupal 7?!? This is impressive.

xjm’s picture

Status: Needs work » Active

I also confirmed this issue is still a problem while working on #2725415: Text Editor module fails to track usage of images uploaded in text_with_summary fields, allows uploaded images to be deleted.

Also setting to active since the existing patch on the issue has no bearing on D8. For D7, we can create a separate followup issue postponed on this one, per the updated backport policy: #2462101: [policy, no patch] Update backport policy to account for 8.x minor versions and 7.x divergence.

tame4tex’s picture

Component: node system » file.module

I would like to propose that this is the responsibility of the file module not the node module.

catch’s picture

This patch will fix the bug for later deletions, but the existing file usage records are going to remain corrupted - we don't have any way to rebuild file usage from scratch, the API doesn't allow for this.

So while the files themselves don't get lost in this case (in fact it makes them almost impossible to lose), we are losing data in file usage that can never be repaired. So I think this should be promoted to critical.

xjm’s picture

Priority: Major » Critical

#25 seems worth discussing to me. Promoting to critical for now for visibility since it is a data integrity problem, although it is kind of borderline since it's theoretically possible to manually check the uses, then confirm that the referenced thing does not exist. I'm leaving the major triage tag on purpose in case we re-demote it.

Edit: I should say that with revisions on by default in 8.2.x, this feels more critical to me there since it becomes a whole lot worse for new sites, but I might consider it major for 8.1.x.

xjm’s picture

#2732393: Decide if and how to try to save incorrectly orphaned images uploaded in text_with_summary fields from #2725415 also has a need to rebuild usage data (but for the opposite reason), so adding that as a related issue here. It's possible that both of these issues could have data updates resolved by the same functionality.

xjm’s picture

OTOH is this actually so hard to clean up? All you have to do is an inner join on the usage table and relevant entity tables. If no entities listed in the usage exist, then that's a list of files to mark for purge.

Wim Leers’s picture

catch’s picture

That works for editor module, but afaik this affects every file_usage implementation that could possibly be applied to any revisionable entity - due to the way that revisions are deleted. So I don't think it's an equivalent case for which generic code can be written - we'd need to know what editor, file/image and any other modules are doing.

catch’s picture

Well I guess we could write an update for each type we know about in core to clear those out, then a change record for contrib. So not impossible to do something.

David_Rothstein’s picture

I marked #2393641: File usage records for files referenced in older node revisions are not removed on node deletion (the Drupal 7 issue) postponed on this one.

#10 and #11 look like an accurate description of the problem for both Drupal 7 and Drupal 8 (revisions don't get formally deleted when an entity is deleted) but I agree with some of the concerns mentioned there.

So another way to do it could just be to have File module, when it reacts to an entity being deleted, try to find all files referenced by all revisions of the entity (rather than just the current version of the entity) and try to delete (D7) or delete usage records for (D8) those.

jelo’s picture

I added a number of related issues. Files in revisions have lots of issues and maybe this should be considered as a whole to see how this area can be improved. The related issues mention potential access/security issues as well.

catch’s picture

Deleting individual revisions when we delete an entity is probably the most robust way to deal with this and avoid similar issues in other places. That really needs #89181: Use queue API for node and comment, user, node multiple deletes though since revisions are unbounded.

catch’s picture

Opened an issue to discuss that.

jonathanshaw’s picture

Looks like you forgot to reference the issue @catch?

catch’s picture

Whoops, was trying to add to related issues mostly likely. Here it is: #2743345: Revisions should be individually deleted when deleting entities

mtodor’s picture

I'm currently working on this issue. There is at least test - I'll attach it, before fix.

mtodor’s picture

Here is patch with test for issue.

xjm’s picture

Status: Active » Needs review

Thanks @mtodor! Setting NR to trigger testbot and demonstrate the test.

The last submitted patch, 10: node_system-delete_node_revision_hook-1239558-10.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 39: test-deleting_a_node_with-1239558-39.patch, failed testing.

mtodor’s picture

Here is patch with possible solution for problem. It's full patch with Test included.

To describe solution in short:
- new method is added to 'file.usage' service (deleteForEntity)
- that method is executed after all fields are deleted for entity, to cleanup file relations for that entity

In future some better generic solution should be introduced, for example: to chain deletion and propagate to entity revision, where revision would take care of proper deletion of it's fields and so on. In short to delegate responsibility to elements that should be aware of made action and handle it accordingly. I don't know is that part of #2743345: Revisions should be individually deleted when deleting entities.

mtodor’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 43: deleting_a_node_with-1239558-43.patch, failed testing.

mtodor’s picture

Status: Needs work » Needs review
FileSize
8.58 KB

Here is new solution of problem. This option actually replaces code that is not doing proper deletion of entity related files.

In process of changing I have found other issue: when field is deleted for that entity type, file usage still stays unchanged for not active revision of entity. Test for that problem is added too.

Status: Needs review » Needs work

The last submitted patch, 46: deleting_a_node_with-1239558-46.patch, failed testing.

xjm’s picture

@alexpott, @catch, @effulgentsia, @Cottser, and I discussed this issue today and agreed to handle it as a critical issue. While it is not data loss per se, orphaned files are a data integrity problem and could also result in problems like unintended information disclosure (since site users have no reason to expect that files attached to something they deleted would still exist.) This is also another data integrity problem that becomes worse the longer it exists, so it does make sense to explore a hotfix as @alexpott and I discussed in person with @mtodor at the sprint.

Thanks all! Looks like the next step is to work more on @mtodor's patch from #46 and then review it.

Gábor Hojtsy’s picture

Issue tags: +Media Initiative
Gábor Hojtsy’s picture

Issue tags: -Media Initiative +D8Media

Fix to use the right media tag. Sorry.

xjm’s picture

This is also critical for phase A of the workflow initiative.

The last submitted patch, 46: deleting_a_node_with-1239558-46.patch, failed testing.

amateescu’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
7.58 KB
4.99 KB

The approach from the latest patches can not work because we have to take translations into account and they are not stored in the file_usage table.

Instead, I managed to write a tricky aggregated entity query which gives us the file IDs used by all revisions of an entity, with optional filtering on a specific translation. This works fine for all uses cases but one: when we delete the field, the query can not find the renamed field tables anymore.

Not sure how to fix that yet but I'm posting what I have so far to get some feedback on the approach :)

Status: Needs review » Needs work

The last submitted patch, 53: 1239558-53.patch, failed testing.

jibran’s picture

+++ b/core/modules/file/src/Plugin/Field/FieldType/FileFieldItemList.php
@@ -80,9 +81,31 @@ public function postSave($update) {
+    foreach (File::loadMultiple($fids) as $file) {

We should have an empty $fids check else we'll load all the files.

amateescu’s picture

We should have an empty $fids check else we'll load all the files.

Nope, passing an empty array to loadMultiple() makes it return an empty array, so we're good.

The last submitted patch, 53: 1239558-53.patch, failed testing.

amateescu’s picture

Status: Needs work » Postponed

Discussed this issue with @catch today and we concluded that we should postpone it on the only viable solution that we have so far: deleting individual revisions.

In order for that to happen, we need at least two things: use the Queue API for deletion (#89181: Use queue API for node and comment, user, node multiple deletes) and have some kind of archiving capabilities in order make any entity 'unpublished' until the deletion process is finished (#2725463: WI: Add StatusItem field type to store archived state).

xjm’s picture

If this is postponed on that, then those would need to be critical too I think.

Postponing this on those issues worries me though; the first has been stuck for years.

catch’s picture

I don't think there's a viable solution that can be done without piggybacking off those issues though - anything we come up with will be pretty fragile, which is not what we want. amateescu spent two days trying to fix this without deleting individual revisions, spoke to me about it in irc, and we ended up with this plan.

This is almost a duplicate of #89181: Use queue API for node and comment, user, node multiple deletes really, and the specific bug here has been around since 2010/11 when file_usage was added in the first place.

From the issue summary there:

When deleting large datasets the delete multiple API does not invoke all dependent hook_[node|comment|user]_delete.
e.g. deleting a node does not invoke hook_comment_delete()

i.e. fixing that issue, then applying it to revisions, should mean this issue only needs test coverage.

I've bumped #89181: Use queue API for node and comment, user, node multiple deletes to critical for now.

alexpott’s picture

Status: Postponed » Needs work

Discussed with @catch, @xjm, @effulgentsia, @Cottser and @webchick. Whilst #89181: Use queue API for node and comment, user, node multiple deletes is a good solution to this problem maybe not doing revision deletes before an entity delete causes problems for contrib that are worse than the potential timeout issues if we do this.

Re-opening the issue to explore doing entity revision deletes to fix it.

alexpott’s picture

Status: Needs work » Needs review
FileSize
5.32 KB

Something like this.

alexpott’s picture

We could improve the query to load revision IDs if we could exclude default revision ID if the entity is the default revision. However we can;t do that because #2766135: EntityQuery with condition on the revision field leads to wrong results

catch’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
    @@ -312,6 +312,29 @@ protected function doPostSave(EntityInterface $entity, $update) {
    +  public function delete(array $entities) {
    

    Had to double check, but this is called by SqlContentEntityStorage which wraps its parent::delete() call in a transaction, so individually deleting the revisions still happens within a single transaction.

  2. +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
    @@ -312,6 +312,29 @@ protected function doPostSave(EntityInterface $entity, $update) {
    +          // The delete revision will be deleted later.
    

    s/delete/default/

    But if we have the entity, don't we also have the default revision ID available, so could do an if check instead?

Otherwise this looks great. It will fall over when entities have hundreds or thousands of revisions (and I've seen one of those recently, albeit via a bug), but that's an issue we can deal with separately - at least in that case with luck the transaction gets rolled back, which is better than just bits of an entity getting deleted silently.

alexpott’s picture

Re #64.1 well we can't be sure that the entities passed are the default revision - so how would we know every entities default revision. Ideally what we could do is exclude the revision IDs for entities that are default revisions that are passed in. However we can't do that because of #2766135: EntityQuery with condition on the revision field leads to wrong results. And even if we could we'd still need the exception handling.

The other option is to load all the revisions here and check but that is exactly what happens in \Drupal\Core\Entity\ContentEntityStorageBase::deleteRevision so that seems meh.

One thing to explore might be a loadMultipleRevision.

catch’s picture

Re #64.1 well we can't be sure that the entities passed are the default revision

Good point.

claudiu.cristea’s picture

Isn't this faster?

EDIT: I wonder how the 'OR' operator works. Probably is not documented.

catch’s picture

IN is a tidier query whether it's faster or not.

dawehner’s picture

Status: Needs review » Reviewed & tested by the community

One thing I am wondering: Could we add an update path, which cleans up the file_usage table? That sounds though like a major follow up or so, to be honest.

claudiu.cristea’s picture

Check if there are entities earlier. In this way we avoid the call to parent::delete() that is checking again the same thing.

@dawehner, added #2788571: Cleanup file_usage table after #1239558.

alexpott’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
6.99 KB
10.03 KB

Well we shouldn't be loading revisions one by one - not doing that is the biggest performance optimisation we can be making here.

alexpott’s picture

Issue tags: +Needs tests

I'm also pretty sure that we should have some more explicit testing of this.

Status: Needs review » Needs work

The last submitted patch, 71: 1239558-71.patch, failed testing.

alexpott’s picture

Status: Needs work » Needs review
FileSize
2.44 KB
12.18 KB

Pretty sure we just exposed a nasty bug in \Drupal\Core\Entity\Sql\SqlContentEntityStorage::loadFromDedicatedTables()

alexpott’s picture

Hmmm \Drupal\Core\Entity\Sql\SqlContentEntityStorage::loadFromSharedTables is going to have the same problem but it looks like we don't have a test to catch that. And now were doing something different from loadRevision... mapFromStorageRecords is just meant to work on records keyed by entity ID and not revision ID... ho hum.

alexpott’s picture

But that is fixable :)

The last submitted patch, 74: 1239558-74.patch, failed testing.

claudiu.cristea’s picture

Some cleanup in tests.

I'm also pretty sure that we should have some more explicit testing of this.

@alexpott, what exactly do you think it need some additional testing?

amateescu’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -438,17 +438,67 @@ protected function getFromStorage(array $ids = NULL) {
    +   * {@inheritdoc}
    +   */
    +  protected function loadMultipleRevisions(array $revision_ids) {
    

    Where does this inherit the doc from? And.. why not make it public?

  2. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -438,17 +438,67 @@ protected function getFromStorage(array $ids = NULL) {
    +      $revision_ids = $this->cleanIds($revision_ids);
    

    \Drupal\Core\Entity\ContentEntityStorageBase::cleanIds() relies on the entity 'id' key but we're acting on revision IDs.

  3. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -644,18 +703,20 @@ protected function buildPropertyQuery(QueryInterface $entity_query, array $value
    +  protected function buildQuery($ids, $revision_id = FALSE, $revision_ids = []) {
    

    This feels a bit icky :) Since it is a protected method, why don't we change the existing $revision_id param to also accept an array? Or only an array?

  4. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -644,18 +703,20 @@ protected function buildPropertyQuery(QueryInterface $entity_query, array $value
    -      $query->join($this->revisionTable, 'revision', "revision.{$this->idKey} = base.{$this->idKey} AND revision.{$this->revisionKey} = :revisionId", array(':revisionId' => $revision_id));
    
    @@ -692,6 +753,13 @@ protected function buildQuery($ids, $revision_id = FALSE) {
    +    if ($revision_id) {
    +      $query->condition("revision.{$this->revisionKey}", $revision_id);
    +    }
    

    Could this change have any performance impact on the query?

alexpott’s picture

Re #79
1. Because making it public can happen in another issue that uses it where appropriate. Maybe we should mark it internal till that followup issue. It inherits from the abstract base class
2. Yerp - needs fixing
3. Because BC but it is all protected
4. If it does our indexes are totally out of whack... I'll run an explain.

Fabianx’s picture

Status: Needs review » Needs work

Needs work for more tests, alexpott will need to describe which ones.

catch’s picture

Just a note #1239558: Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger went in yesterday with a @todo for this issue. If revisions are enabled in the test for removing a translation, this issue prevents the file usage from being removed so the tests fail. We should be able to enable revisions in that test as part of fixing this bug (although it would be good to keep a non-revision variant of the test so both cases are covered).

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

phillamb168’s picture

Just a note to anyone who happens to come across this issue while trying to figure out why deleting node revisions doesn't also delete the files associated with those revisions, if you're using workbench moderation you'll need this patch plus the one found here: https://www.drupal.org/node/2353491

I won't link the issue unless somebody thinks it's worthwhile, they're not directly related but it does seem to be a common issue.

TechnoTim2010’s picture

It is an issue, not sure how common though, it is the first time I have encountered it in five years, but we have a suite of 93 sites, with issue raised on 2 of them but not ironically the sites using Workbench Moderation.

As they increase usage and complexity of use, I expect this to be raised as more and more of an issue. Currently the solution is to adopt an image naming format to differentiate older images from current ones.

Regards

Tim

alexpott’s picture

So we need more tests to ensure the hooks are being fired in the order we expect. Looking at the previous comments the things that needs addressing is #79.2 and #79.4.

alexpott’s picture

Status: Needs work » Needs review
FileSize
2.3 KB
17.62 KB

Addressing #79.2

Status: Needs review » Needs work

The last submitted patch, 88: 1239558-88.patch, failed testing.

alexpott’s picture

Here's some additional tests around hook firing during entity revision delete. Also fixed the DbLogTest. That one is super interesting. Our hook_*_load hooks take lists of entities keyed by entity ID - not so good for multi-loading revisions. We should probably change that is D9. Or introduce *_load_revision() and deprecate *_load() in D8.

catch’s picture

#1730874: Add support for loading multiple revisions at once is still open, although that won't help with the hook signature.

claudiu.cristea’s picture

Status: Needs review » Needs work
claudiu.cristea’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 93: 1239558-93.patch, failed testing.

Berdir’s picture

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
    @@ -312,6 +312,53 @@ protected function doPostSave(EntityInterface $entity, $update) {
    +  protected function loadMultipleRevisions(array $revision_ids) {
    

    loadMultipleRevision() kind of sounds better to me, but not sure. or loadRevisionMultiple() ? the issue below used loadRevisions()...

    the problem with adding this as a protected method is that adding it as a public method later on in #1730874: Add support for loading multiple revisions at once isn't possible anymore as that would be an API change for subclasses.

  2. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -438,17 +438,71 @@ protected function getFromStorage(array $ids = NULL) {
    +      foreach ($revisions as &$revision) {
    

    why are we doing this by reference? the only reason for that would be able to replace the object, that doesn't happen?

  3. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -438,17 +438,71 @@ protected function getFromStorage(array $ids = NULL) {
    +   * @param array|null $revision_ids
    +   *   If not empty, return entities that match these IDs. Return all entities
    +   *   when NULL.
    ...
    +  protected function doLoadMutlipleRevisions(array $revision_ids) {
    

    I think null can only be paassed in if you default to NULL when you have a type hint. definitely for objects, haven't tested array.

    possibly just copied code, I don't think loading all revisions of all entities is something we need to support?

  4. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -644,18 +707,20 @@ protected function buildPropertyQuery(QueryInterface $entity_query, array $value
    +   * @param $revision_ids
    +   *   Revision IDs to load.
    

    missing type (yes, the exising params too)

  5. +++ b/sites/default/default.settings.php
    @@ -144,6 +144,11 @@
    + *
    + * Per-table prefixes are deprecated as of Drupal 8.2, and will be removed in
    + * Drupal 9.0. After that, only a single prefix for all tables will be
    + * supported.
    

    unrelated?

alexpott’s picture

Status: Needs work » Needs review
FileSize
3.57 KB
22.74 KB

Thanks for the review @Berdir

  1. Multiple and then a single thing doesn't make much sense. I've made the method public and @internal. We can remove the @internal in #1730874: Add support for loading multiple revisions at once
  2. No good reason - fixed
  3. I agree - loading all revisions of all entities seems excessive. However that makes the API a bit weird considering how loadMultiple() works. I've removed for now since adding this back in in #1730874: Add support for loading multiple revisions at once would be simple.
  4. Fixed
  5. Think that is only part of @claudiu.cristea's patch in #93 and not in his interdiff... and therefore not on my branch.

This will still because I'm not sure that delete translations fires the entity revision delete hooks :(

Status: Needs review » Needs work

The last submitted patch, 97: 1239558-97.patch, failed testing.

alexpott’s picture

Status: Needs work » Needs review
FileSize
9.2 KB
31.26 KB

Here's an improvement of the test coverage that is failing because #93. Once revisions are enabled on a content type removing a translation will not cause any deletes or changes in file usage because removing a translation will only remove the translation from the revision that is being saved. Therefore the file is still being used by the old revision. I've discussed this with @hchonov on IRC and he agreed that calling removeTranslation on an entity should not result in an entity revision delete firing.

So the assumption that the just changing the flag on the content type made in #93 is what we need to do is incorrect. The patch attached adds test cases where revisions are enabled and where they are not.

hchonov’s picture

I've done a bit of digging and come across ContentEntityStorageBase::invokeFieldMethod, which when called for the "postSave" method will invoke the FieldItemList::delete method on each removed translation, which is a bit confusing as the doc of the method FieldItemList::delete says :

"This method is called during the process of deleting an entity, just before values are deleted from storage."

So currently deleting an entity translation still results in calling the field delete method on each removed translation, which is independent of if we are saving the default or a non-default revision with a removed translation.

P.S. This means that FileFieldItemList::delete will be called and should decrement the usage count of the referenced file.

hchonov’s picture

Here is the path how it comes to invoke the delete method on each field of a removed translation:

1. EntityStorageBase::save ->
2. ContentEntityStorageBase::doPostSave ->
3. EntityStorageBase::doPostSave ->
4. ContentEntityStorageBase::invokeHook(update/insert) ->
5. ContentEntityStorageBase::invokeFieldPostSave ->
6. ContentEntityStorageBase::invokeFieldMethod('postSave')

and the last method has the following in its body :


    // We need to call the delete method for field items of removed
    // translations.
    if ($method == 'postSave' && !empty($entity->original)) {
      $original_langcodes = array_keys($entity->original->getTranslationLanguages());
      foreach (array_diff($original_langcodes, $langcodes) as $removed_langcode) {
        $translation = $entity->original->getTranslation($removed_langcode);
        $fields = $translation->getTranslatableFields();
        foreach ($fields as $name => $items) {
          $items->delete();
        }
      }
    }
alexpott’s picture

Okay after discussion translation deletion a bit more and creating #2815779: Deleting a translation leaves behind orphaned revisions. I've added further testing here to show what's happens with revisions, translations and file counting.

timmillwood’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs tests

I've not followed the issue too closely, but the patch in #102 looks pretty good and has nice test coverage. So going to RTBC to see if we can get things moving again.

claudiu.cristea’s picture

I'm not sure that we can fix #2788571: Cleanup file_usage table after #1239558 afterwards. I think we should fix it here.

timmillwood’s picture

Status: Reviewed & tested by the community » Needs work

ok, back to needs work then to add a cleanup for the file_usage table.

anairamzap’s picture

Assigned: Unassigned » anairamzap

Hi all,
I have tested this and:

  • I can reproduce the problem following the instructions
  • Patch in #102 applied correctly and fixed the problem with revisions and orphan files
  • I would like to help on the file_usage table cleanup (detailed in: #2788571). I'm working on it right now and I have some doubts I hope someone can answer:
  1. I think we can cleanup the table creating a function that returns a query (draft query pasted below). If this is correct, where should we place this function? In file/src/FileUsage/ files?
  2. If previous point is not correct, maybe you can suggest another approach so I can investigate?
$query = db_delete('file_usage', 'fu'); 
$query->leftJoin('node', 'n', 'n.nid=fu.id'); 
$query->isNull('n.nid');
$query->execute();



Will assign the issue to myself so everyone is aware I'm working on this. I hope that's ok.

Thanks,

m

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

kattekrab’s picture

@anairamzap - you've had this assigned to yourself for 2 months. Any progress? Or should we un-assign it at this point?

anairamzap’s picture

Assigned: anairamzap » Unassigned

Hi @kattekrab,
I was waiting for an answer, but given that no movement has been done in this issue I have unassigned it.

Cheers,
m

chipway’s picture

Issue tags: +DevDaysSeville

Working on this at DrupalDevDays Seville.

chipway’s picture

First, I had to re-roll the patch.
I had an issue when creating the interdiff (because of lines number been different in 2 places).
Here is the re-rolled patch.
Then I will work on adding the file_usage table cleanup.

paranojik’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 111: 1239558-111.patch, failed testing.

chipway’s picture

Status: Needs work » Needs review
FileSize
33.49 KB

Same patch as 111 + fix of failed test.

chipway’s picture

Status: Needs review » Needs work

So patch 1239558-114.patch passes tests.
We still have to work on adding the file_usage table cleanup.
I hadn't time in Seville and hope to do this soon.

dagmar’s picture

Status: Needs work » Needs review

In #2753971: ContentEntityStorageBase::deleteRevision() function does not remove node_field_revision entries we have a similar situation as was ok to create a follow up to clean the tables later. I think we can do the same here.

Manuel Garcia’s picture

Patch looks good, just a small nitpick:

+++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
@@ -1132,21 +1203,24 @@ protected function loadFromDedicatedTables(array &$values, $load_from_revision)
+       // if (!isset($values[$row->entity_id][$field_name][$langcode])) {
+       //LC   $values[$row->entity_id][$field_name][$langcode] = [];

We should remove these lines if they're not necessary.

vijaycs85’s picture

Issue summary: View changes

Overall, looks good. +1 to RTBC.

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
    @@ -335,6 +335,57 @@ protected function doPostSave(EntityInterface $entity, $update) {
    +   * @internal
    

    nice.

  2. +++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
    @@ -644,18 +704,20 @@ protected function buildPropertyQuery(QueryInterface $entity_query, array $value
    +  protected function buildQuery($ids, $revision_id = FALSE, $revision_ids = []) {
    

    Minor: Ideally default of $revision_id should be NULL instead of FALSE as it holds integer value. but not changed by this issue. so we can ignore it.

vijaycs85’s picture

matthieuscarset’s picture

I've follow the steps described in the issue summary.
Without the patch, files are not deleted.
After applying the patch, files are correctly deleted physically and marked as to be deleted (status=0 in database).

+1 for RTBC

timmillwood’s picture

Status: Needs review » Reviewed & tested by the community
tstoeckler’s picture

Since this affects the storage I queued a bunch of tests on non-primary environments.

Manuel Garcia’s picture

@tstoeckler++

catch’s picture

Status: Reviewed & tested by the community » Needs work

https://www.drupal.org/pift-ci-job/696689 looks like a real postgres failure.

plach’s picture

Issue tags: +PostgreSQL
Wim Leers’s picture

Title: Deleting a node with revisions does not release file usage » Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger
Issue summary: View changes
Issue tags: -D8Media +Media Initiative, +Entity Field API
Related issues: +#391330: File Field for Core, +#2743345: Revisions should be individually deleted when deleting entities, +#2892377: Document relationship of entities, entity revisions and entity translations, +#89181: Use queue API for node and comment, user, node multiple deletes

I went through the entire issue.

The following discussions/comments were especially important:

  1. @aaron's steps to reproduce in #4 from August 2011 (!!! — this is almost 6 years ago!) are still perfectly reproducible!
  2. @chaby's initial solution in #10 plus caveat in #11 about his solution not working when there are many revisions.
  3. @joshua.albert asking in #17 why simply removing all file usages for a certain node is not sufficient — which is … a very good point obviously. The funny thing is that we appear to have a hook implementation for that in file.module: file_file_predelete(). But its implementation is a @todo. Turns out this TODO was added in #391330: File Field for Core, almost 8 years ago. @catch approved it there, assuming there'd be a follow-up. That follow-up never happened.
  4. From #23–#31, rebuilding all the file usage data from scratch is being considered.
  5. @David_Rothstein explained the problem very clearly in #32: revisions don't get formally deleted when an entity is deleted, by which he means revisions are not each individually explicitly deleted.
  6. @catch opened an issue in #34-#37 to formally delete each revision: #2743345: Revisions should be individually deleted when deleting entities
  7. @mtodor implemented a patch in #43 that uses the approach that @jsohua.albert proposed in #17. He had to expand the API of \Drupal\file\FileUsage\FileUsageInterface to make this possible.
  8. @amateescu explains in #53 that the approach taken in #43 (and hence proposed in #17) cannot work, because translations are not taken into account. (This reminds me of #2892377: Document relationship of entities, entity revisions and entity translations.) He proposes an alternative approach, which works for translations, but fails in a different edge case.
  9. After discussing it outside this issue, @catch and @amateescu decided in #58 that this most be postponed on deleting individual revisions: #2743345: Revisions should be individually deleted when deleting entities, which is blocked on #89181: Use queue API for node and comment, user, node multiple deletes.
  10. In #61 through #66, @alexpott unpostponed this issue, and provided yet another approach: making ContentEntityStorageBase::delete() iterate over all revisions, timeout problems be damned, because many sites won't run in timeout issues, only sites with many revisions for an individual entity. @catch reviewed and +1'd.
  11. In #67–#99, this is refined further.
  12. @hchonov dug into the code and explains in #100 and #101 that deleting a translation calls FieldItemList::delete(), even though that is meant to only be called when deleting the entity (i.e. deleting all translations, not just a single translation).
  13. From #103 onwards, we're on the verge of RTBC'ing, but we want the update path to fix the file_usage table first (in #104 + #105), and then shifted back to "nah let's do it in a follow-up" (in #119 + #121).

Why is it suddenly okay to do this in a follow-up, and why wasn't it before?


Updated issue summary, updated issue tags, updated issue title, updated related issues.

Wim Leers’s picture

I also stepped through the code with a debugger. Based on the >100 comments above, many of which show very smart people attempting solutions but then running into dead ends … that shows that the semantics & architecture of deleting entities vs deleting translations vs deleting revisions is still a weak area of the Entity/Field API.

When deleting an entity (with no translations)
$entity->delete() calls SqlContentEntityStorage::delete() which calls EntityStorageBase::delete() which calls ContentEntityStorageBase::doDelete() which calls ContentEntityStorageBase::invokeFieldMethod(), which iterates over all translations and invokes the delete() method on all FieldItemList objects. After ContentEntityStorageBase::doDelete() called invokeFieldMethod(), it also calls doDeleteFieldItems(), which is what e.g. deletes all revisions for this entity.
In case of deletion, FieldItemList::delete() should iterate over all revisions, (so it's not ContentEntityStorageBase::invokeFieldMethod() that should iterate over all revisions), because the docs of \Drupal\Core\Field\FieldItemListInterface::delete() say:
  /**
   * Defines custom delete behavior for field values.
   *
   * This method is called during the process of deleting an entity, just before
   * values are deleted from storage.
   */
  public function delete();

That's only removing the usage of the default revision, not of all files across all revisions. It's not reasonable to expect some hook to be invoked for every revision, because there might be hundreds, and most things don't need to do per-revision clean-up. As the documentation indicates, this is the method that is called when deleting an entity. Therefore if this field type does revision-specific stuff, it also is itself responsible for doing revision-specific clean-up.

When deleting a non-default entity translation
(Because deleting the default entity translation is equivalent with deleting the entire entity — the UI makes this clear fortunately.)
@amateescu pointed out in #53 that this is wrong because it handles translations incorrectly. But as @hchonov pointed out in #100/#101: it's wrong that translation deletion calls FieldItemList::delete(), because it's designed to be used for "entire entity deletion, which also means all translations".
I stepped through this too. ContentEntityDeleteForm() calls $entity->removeTranslation($langcode) and then calls $entity->save(). Via the call chain shown in #101, this eventually results in FieldItemList::delete() in being called. Which is wrong per the \Drupal\Core\Field\FieldItemListInterface::delete() docs.

Therefore I think that the current patch is wrong.

I think we need to fix two bugs:

  1. Deleting an entity with a file/image field does not release all usages of those files by the entity that is being deleted.
  2. Entity translation deletion is implemented incorrectly.

The current patch only fixes the first bug, and does it in a way that makes it harder to fix the second bug.

Multiple solutions can be thought of for both of those:

    1. Fix FileFieldItemList::delete() by making it iterate over all revisions, and deleting those individual usages.
    2. Fix FileFieldItemList::delete() by deleting all usages of this file by this entity in one go (implemented by @mtodor in #43, based on #17).
    3. Equivalent with #2, but implemented more generically: remove FileFieldItemList::delete()'s logic to deal with file usage altogether, and instead move that into hook_entity_delete, which means it will also work for e.g. editor.module
      file usage and #2826127: Usage of Files Referenced in Entity Reference Fields Not Tracked. Note that this was always the plan in #391330: File Field for Core, 8 years ago, it just never happened…
    1. Fix \Drupal\Core\Entity\ContentEntityStorageBase::invokeFieldMethod()'s special casing: stop calling delete() on all fields when removing a translation. We have \Drupal\Core\Field\FieldItemListInterface::deleteRevision() and we have hook_entity_translation_delete(). It seems we're missing \Drupal\Core\Field\FieldItemListInterface::deleteTranslation()?

    Related: #2787187: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity fixed the case of "deleting a translation causes all file usage to be fixed", but this too was due to translation deletion calling FieldItemList::delete(), so that attempted to change the logic in FieldItemList::delete() to only delete usage in case the default translation is being deleted, but … that too is wrong! When file A is used in translation X and file B is used in translation Y, then deleting the non-default translation means that file B's usage will never reach zero. Reopened that issue because of that: #2787187-47: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity .


All in all, I think it's important that Entity/Field API subsystem maintainers make a decision here that considers all ramifications. Either we go with a different patch than the current one that fixes both bugs, or we go with the current patch, but then we must fix the documentation of \Drupal\Core\Field\FieldItemListInterface::delete() and document how to detect whether a single translation is being deleted or whether the entire entity is being deleted.

plach’s picture

I did not dive as deep into this issue as @Wim Leers did (wim++), but off the top of my head I totally agree with everything said in #127.

Currently, my way forward would be: implement solutions #127.1.3 and #127.2. A ::deleteTranslation() method should also allow us to properly address the follow-up mentioned in #2787187-47: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity .

I'd like to explore this solution and provide a new patch if there's consensus this should be the way forward.

catch’s picture

Priority: Critical » Major
Issue tags: -Triaged D8 critical

Now that #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary is in, this no longer results in unexpected file deletion. #2821423: Dealing with unexpected file deletion due to incorrect file usage is open as a critical meta-issue to discuss the broader approach on this. So I’m going to downgrade this to major after discussion with @xjm and @cilefen, since we feel effort would be better spent on replacing the file_usage system at this point. We can still fix this bug of course as a major.

plach’s picture

Issue tags: -WI critical

Spoke with @catch and we agreed this no longer needs to be a MUST-HAVE for the Workflow Initiative, as lingering files are now accepted "by design".

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

suranga.gamage’s picture

Having looked/proposing a patch at https://www.drupal.org/node/2810355 (Before noticing this thread, is now linked there aswel).

"Deleting an entity with a file/image field does not release all usages of those files by the entity that is being deleted."
In an ideal conceptual code world I guess this could be by handling every one of revisions/translation instances separately however in reality this seems incredibly heavy handed (200 revisions x 4 languages would be incredibly painful at best).

The file usage however does aggregate to the "root" entity id so in theory it can be flushed out in a single go when deleting the main entity.

Assumptions in the "delete" callback.
There appear to be a few ways that the "delete" callback on the fieldList is called. But as far as I can see from the previous comments these don't always seem to match to the idea that it"s only when the entire node is deleted.
e.g:
- The default language of a node is changed (causing a "new" field instance to be added in the new language and the old one removed (invoking the delete callback) as seen in the related https://www.drupal.org/node/2787187
- Deleting a revision (basically a node, which assumes only this revision)
- Deleting a "current" revision (assuming all the connected revisions)
- Deleting a translation.
- Deleting the default translation (which assumes all the other translations)

Now in this sense I disagree with the assumption of the delete callback automatically originating from a "full entity delete". It might be the most common usecase but it's not an inherent given and trying to start guessing the case in this context feels odd. Conceptually speaking the we're deleting the "field instance" and we shouldn't be concerned about the larger context of why it's being deleted. The larger context (e.g the deleting of a node) should be concerned about calling the correct callbacks.

Roughly the current flow:
Something is changed / deleted --> Invokes the invokeFieldMethode [ which then tries to guess it's context from the passed entity ] --> edits the file usage.

But in my mind the flow should be
Something is changed / deleted --> This something decides the context (or asks some form of central point to do it) --> invokes the associated method --> edits the file usage.

So (as mentioned by Wim)
1. Fix \Drupal\Core\Entity\ContentEntityStorageBase::invokeFieldMethod()'s special casing: stop calling delete() on all fields when removing a translation. We have \Drupal\Core\Field\FieldItemListInterface::deleteRevision() and we have hook_entity_translation_delete(). It seems we're missing \Drupal\Core\Field\FieldItemListInterface::deleteTranslation()?
Seems the more logical case to implement without overhauling toomuch.

2. The other way would is to "expand" / solidify the internal checks in the invoker by using
$entity::TRANSLATION_REMOVED type checks more rigorously. But that would mean that detecting revisions is done the other way that detecting translations, even if the use case is pretty much the same. And the hook / invoke logic feels more general in how drupal does things.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

AaronBauman’s picture

bump

webchick’s picture

Hm. This might need to be re-escalated. The lingering files, while no longer resulting in data loss (which, hooray!), now results in non-compliance with GDPR or other laws that require the deletion of files.

Berdir’s picture

@webchick: We already have #2821423: Dealing with unexpected file deletion due to incorrect file usage as a meta issue for file usage problems and it's still critical. It's just that nobody cared about it anymore :) And I think there is also an issue about being able to delete manually, which I think would already mostly adress the legal concerns. We have generic entity actions now, so that shouldn't be too hard. If we can combine it with a view that shows files without usage and bulk-delete, then that should be pretty OK.

webchick’s picture

Right, I realize about the existence of #2821423: Dealing with unexpected file deletion due to incorrect file usage, but that feels a lot more like "in order to fix this bug, you must first create the universe" issue. :P Whereas this one seems, at least on the surface, to have more targeted fixing possibilities.

Berdir’s picture

Hm, adding a way to manually delete files is something we can do any time, that's fairly easy. There are no blockers for that.

This issue is open for 8 years now, there is a reason for that. And it's not just about fixing this single issue, we have to fix every file usage problem to enable *automatic* deletion again. There are issues that remove too much usage or don't add enough, and others, like this one don't *remove* enough usages, so an issue like this might be hiding other issues that we haven't even found yet.

Pancho’s picture

Issue tags: +GDPR

Re #136:

This might need to be re-escalated. The lingering files, while no longer resulting in data loss (which, hooray!), now results in non-compliance with GDPR or other laws that require the deletion of files.

Indeed.

And I think there is also an issue about being able to delete manually, which I think would already mostly adress the legal concerns.

I don’t think so. While being a stop-gap, the mere fact that it’s possible to delete lingering files manually does not qualify as data protection by design (Art. 25 GDPR) in regard to the right to erasure (Art. 17 GDPR), see also the recitals 39, 46, and 65.

Now we’re no magicians, so as Berdir said, have to fix every file usage bug erring on one side or the other, before we can reenable automatic clean-up. However, I suggest we don’t accept lingering files as “by design” and pick up the proposal laid out in #127 and 128 by Wim and plach.

dqd’s picture

Maybe related issue in contrib: #2910053: Used by another application

John Pitcairn’s picture

Version: 8.6.x-dev » 8.8.x-dev

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

ankitr’s picture

Hello everyone,

i am also facing this issue but i didn't get any solution. any solution for this problem?

Thanks

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

LpSolit’s picture

I think comment #28 should be fixed, and it shouldn't be that hard to do. In my specific case, I reached 100% of my quota on the web server I use because there are tons of orphaned PDF, Word, etc... files which have not been deleted from the server when the corresponding nodes have been deleted from Drupal. The file_usage table contains many entries which point to a non-existent node, and these entries should be easy to catch.

KlemenDEV’s picture

I agree this issue should have more attention, this can become a serious problem for users with smaller disk spaces.

pameeela credited maseyuk.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

michal.k’s picture

Hi,

Can anyone confirm that #119 patch is safe to use?

Kind regards,
Michal. K.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

abelass’s picture

Patch #119 with drupal 9.4.7 causes Fatal error: Cannot redeclare Drupal\Core\Entity\ContentEntityStorageBase::loadMultipleRevisions() in /var/www/html/web/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php on line 836

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

andrew.wang’s picture

Issue tags: +Bug Smash Initiative
larowlan’s picture

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

kevin.pfeifer’s picture

Patch #119 can't be applied to Drupal 9.5.10

steinmb’s picture

Status: Needs work » Postponed
bvoynick’s picture

Status: Postponed » Needs work

@steinmb This is one of the issues noted in the table in #2821423: Dealing with unexpected file deletion due to incorrect file usage. And I don't know of a reason work on this cannot proceed in parallel to the others there?

As the notes column there says regarding this issue: "No data loss, but irrepairable, therefore still critical."

4kant’s picture

file_usage does not seem to get touched when a filefield gets deleted.
So we can not even find those files anymore that had been referenced (= used) by the deleted field. At least not in bulk.
All files still keep value "1" in column "status" .
At least this value should have changed after deletion of the field.
Am I wrong?

catch’s picture

Status: Needs work » Postponed

This should really be postponed on #3364744: Add a reliable entity-usage system to core, the file_usage system is broken beyond repair, there is no way to rebuild file usage, so as soon as it's off by one or more, it's like that forever.

solideogloria’s picture

solideogloria’s picture

Status: Postponed » Active

I also disagree that this should be postponed. Leaving it as-is will continue to create incorrect file usage data any time someone deletes content, which will only make their problems worse. The solution I shared in the previous comment isn't that difficult or complicated.

Please review and/or try the code posted in the linked issue summary. Note that I didn't test it with media, so the code might need to be further generalized to handle that case.