Problem/Motivation
Images/files uploaded to a file/image fields on entities with multiple revisions are never deleted, even when the entity is deleted. Because the file usage counts for the revisions is never released.
Steps to reproduce:
- create an
article
node with an image A uploaded to itsimage
field - verify that the uploaded file A has its
status
column in thefile_managed
table set to1
,
and it has 1 usage in thefile_usage
table - edit the node, check the "Create new revision" checkbox, remove the original image A and upload a new image B, save
- verify that the uploaded file B has its
status
column in thefile_managed
table set to1
,
and it has 1 usage in thefile_usage
table. Verify that the rows for file A are unchanged in both tables. - delete the article
- verify that the uploaded file B has its
status
column in thefile_managed
table set to0
, and it has no row anymore in thefile_usage
table -
- Expected result: file A has its
status
column in thefile_managed
table set to0
, and it has no row anymore in thefile_usage
table - Actual result: file A has its
status
column in thefile_managed
table still set to1
, and it still has n1 usage in thefile_usage
table
- Expected result: file A has its
- run cron 6 hours later (this is the default value of the
temporary_maximum_age
setting insystem.file
) - verify that uploaded file B is now removed (because usage = 0)
-
- expected result: the uploaded file A is still present (because usage > 0)
- actual result: the uploaded file A is absent (because usage = 0)
Proposed resolution
TBD, see #126 for an summary of comments #1–#125.
Remaining tasks
TBD
User interface changes
None.
API changes
TBD
Data model changes
TBD
Comments
Comment #1
aaron CreditAttribution: aaron commentedI've tested, and this appears to be a core issue.
Comment #2
Dave ReidAre you able to delete the file using media/[file id]/delete?
Comment #3
aaron CreditAttribution: aaron commentedmost likely part of the file system, rather.
Comment #4
aaron CreditAttribution: aaron commentedsteps to reproduce w/o media module, from a new installation of drupal:
1) create an article post with an image file
2) edit the post, check to create a new revision, remove the original image and upload a new image, and submit
3) edit and delete the post
4) you'll see the new image removed from the files/field/image folder, yet the original remains
5) examining the file_managed table in mysql, you'll see an entry for the original.
Comment #5
aaron CreditAttribution: aaron commentedyou'll also see an entry in the file_usage table.
Comment #6
aaron CreditAttribution: aaron commentedthinking about it, it's most likely in the node system, to do with revisions not properly decrementing the count in file_usage when deleting.
Comment #7
aaron CreditAttribution: aaron commentedoddly, deleting a single revision does indeed properly release the file. it's only when you delete a node with revisions that this occurs.
Comment #8
pwolanin CreditAttribution: pwolanin commentedDoes this only happen with media module, or other files too?
Comment #9
idiotzoo CreditAttribution: idiotzoo commentedHappens with other files too. Behaviour is exactly the same with a CCK file field.
Comment #10
chaby CreditAttribution: chaby commentedHi,
Agree with #6 and i faced with this trouble too in 7.x (having orphans files with revision enable and a image field which heavy used file element and process).
I have understand that when a node is delete, it did't delete revisions associated by invoking appropriate hooks (node_revision_delete and field_attach_delete_revision) for example by using node_revision_delete().
As a result, hook_field_revision_delete aren't called and file module did'nt decrement file usage.
Indeed, you need to delete revision one by one and then delete node itself !
Or implements a hook node delete and do what the node module should do (?) : load revisions and for each of them, execute node_revision_delete() (or do it manually whatever).
E.g :
Here is a quick unstable 7.x patch for example.
But the real problem is : modules invoke in node delete maybe need to load revisions data about the current node beeing deleted (for e.g field revision value). But mostly do it already rather than using revision hook ?
By deleting revisions first, it will just breaks these invoked modules behaviors...
And i'm sure that i miss a lot of things...!
Comment #11
chaby CreditAttribution: chaby commentedforgot to say that in an extrem use case, if we have 10000 revisions to delete in one shot per node to delete in a huge transaction, sure some bad behavior will happen on the server! And this is maybe one reason why it haven't been done...
maybe using a cron to create/update a limited queue to delete orphans file (forced) is a better solution (from file_usage table, make a left join on node table and select nid where nid is null)...
Comment #12
Dave ReidHave a patch but it seems to need work and tests.
Comment #13
gilsbert CreditAttribution: gilsbert commentedHi.
We need this solution urgently for D7.
Is there a patch that we may test?
Thank you very much.
Regards,
Gilsberty
Comment #14
capellicIs there a module out there that will clean-up orphaned files? Or a recipe for using VBO to view a list of orphaned files with the checkboxes to delete them?
I did some research and found these resources -- thought I have not yet tried them:
https://www.drupal.org/project/auditfiles
https://www.drupal.org/node/733258#comment-5582764
Comment #15
joshua.albert CreditAttribution: joshua.albert commentedI submitted a patch for the same issue (https://www.drupal.org/node/2393641), but quickly marked it as a duplicate when I found this (the patch here makes more sense as it allows for node_revision_delete() hooks).
@capellic - I've used the following SQL to remove the orphans manually:
Comment #16
joshua.albert CreditAttribution: joshua.albert commentedComment #17
joshua.albert CreditAttribution: joshua.albert commentedIs there any reason why we couldn't also add this? After all, if the node is deleted, so should all revisions and all file use references to the file, correct?
I supposed using only the hooks allows for modules more control, but as I see it, if a node is deleted, there shouldn't be any references in the file_usage table which, of course, references the non-existence node id.
Comment #18
joshua.albert CreditAttribution: joshua.albert commentedAfter some testing with the core patch that invokes revision deletion, I've verified that, at least in my case, the file_usage table record is still not being released. The code above tests out to work correctly. It removes the records from the file_usage table and allows the file to be deleted. This includes deleting files when deleting nodes, due to the fact that node deletion invokes file deletion.
Comment #19
FreekyMage CreditAttribution: FreekyMage commentedJust confirming that #10 works (although line numbers are different now and it might not be the best solution as the author says) and is still needed in D7.38
Comment #20
catchBumping to major since this is a data-integrity issue.
Comment #21
alexpottDiscussed with @catch, @effulgentsia, @xjm, @amateescu, @plach, @berdir, @swentel, and @fago. We agreed this issue is major.
Comment #22
Wim LeersHoly cow, we've had this bug since 2011, and it also exists in Drupal 7?!? This is impressive.
Comment #23
xjmI also confirmed this issue is still a problem while working on #2725415: Text Editor module fails to track usage of images uploaded in text_with_summary fields, allows uploaded images to be deleted.
Also setting to active since the existing patch on the issue has no bearing on D8. For D7, we can create a separate followup issue postponed on this one, per the updated backport policy: #2462101: [policy, no patch] Update backport policy to account for 8.x minor versions and 7.x divergence.
Comment #24
tame4tex CreditAttribution: tame4tex commentedI would like to propose that this is the responsibility of the file module not the node module.
Comment #25
catchThis patch will fix the bug for later deletions, but the existing file usage records are going to remain corrupted - we don't have any way to rebuild file usage from scratch, the API doesn't allow for this.
So while the files themselves don't get lost in this case (in fact it makes them almost impossible to lose), we are losing data in file usage that can never be repaired. So I think this should be promoted to critical.
Comment #26
xjm#25 seems worth discussing to me. Promoting to critical for now for visibility since it is a data integrity problem, although it is kind of borderline since it's theoretically possible to manually check the uses, then confirm that the referenced thing does not exist. I'm leaving the major triage tag on purpose in case we re-demote it.
Edit: I should say that with revisions on by default in 8.2.x, this feels more critical to me there since it becomes a whole lot worse for new sites, but I might consider it major for 8.1.x.
Comment #27
xjm#2732393: Decide if and how to try to save incorrectly orphaned images uploaded in text_with_summary fields from #2725415 also has a need to rebuild usage data (but for the opposite reason), so adding that as a related issue here. It's possible that both of these issues could have data updates resolved by the same functionality.
Comment #28
xjmOTOH is this actually so hard to clean up? All you have to do is an inner join on the usage table and relevant entity tables. If no entities listed in the usage exist, then that's a list of files to mark for purge.
Comment #29
Wim Leers@catch @xjm See #2708411-29: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations for an example of how you can recalculate file usage data.
Comment #30
catchThat works for editor module, but afaik this affects every file_usage implementation that could possibly be applied to any revisionable entity - due to the way that revisions are deleted. So I don't think it's an equivalent case for which generic code can be written - we'd need to know what editor, file/image and any other modules are doing.
Comment #31
catchWell I guess we could write an update for each type we know about in core to clear those out, then a change record for contrib. So not impossible to do something.
Comment #32
David_Rothstein CreditAttribution: David_Rothstein as a volunteer commentedI marked #2393641: File usage records for files referenced in older node revisions are not removed on node deletion (the Drupal 7 issue) postponed on this one.
#10 and #11 look like an accurate description of the problem for both Drupal 7 and Drupal 8 (revisions don't get formally deleted when an entity is deleted) but I agree with some of the concerns mentioned there.
So another way to do it could just be to have File module, when it reacts to an entity being deleted, try to find all files referenced by all revisions of the entity (rather than just the current version of the entity) and try to delete (D7) or delete usage records for (D8) those.
Comment #33
jelo CreditAttribution: jelo commentedI added a number of related issues. Files in revisions have lots of issues and maybe this should be considered as a whole to see how this area can be improved. The related issues mention potential access/security issues as well.
Comment #34
catchDeleting individual revisions when we delete an entity is probably the most robust way to deal with this and avoid similar issues in other places. That really needs #89181: Use queue API for node and comment, user, node multiple deletes though since revisions are unbounded.
Comment #35
catchOpened an issue to discuss that.
Comment #36
jonathanshawLooks like you forgot to reference the issue @catch?
Comment #37
catchWhoops, was trying to add to related issues mostly likely. Here it is: #2743345: Revisions should be individually deleted when deleting entities
Comment #38
mtodor CreditAttribution: mtodor commentedI'm currently working on this issue. There is at least test - I'll attach it, before fix.
Comment #39
mtodor CreditAttribution: mtodor commentedHere is patch with test for issue.
Comment #40
xjmThanks @mtodor! Setting NR to trigger testbot and demonstrate the test.
Comment #43
mtodor CreditAttribution: mtodor commentedHere is patch with possible solution for problem. It's full patch with Test included.
To describe solution in short:
- new method is added to 'file.usage' service (deleteForEntity)
- that method is executed after all fields are deleted for entity, to cleanup file relations for that entity
In future some better generic solution should be introduced, for example: to chain deletion and propagate to entity revision, where revision would take care of proper deletion of it's fields and so on. In short to delegate responsibility to elements that should be aware of made action and handle it accordingly. I don't know is that part of #2743345: Revisions should be individually deleted when deleting entities.
Comment #44
mtodor CreditAttribution: mtodor commentedComment #46
mtodor CreditAttribution: mtodor commentedHere is new solution of problem. This option actually replaces code that is not doing proper deletion of entity related files.
In process of changing I have found other issue: when field is deleted for that entity type, file usage still stays unchanged for not active revision of entity. Test for that problem is added too.
Comment #48
xjm@alexpott, @catch, @effulgentsia, @Cottser, and I discussed this issue today and agreed to handle it as a critical issue. While it is not data loss per se, orphaned files are a data integrity problem and could also result in problems like unintended information disclosure (since site users have no reason to expect that files attached to something they deleted would still exist.) This is also another data integrity problem that becomes worse the longer it exists, so it does make sense to explore a hotfix as @alexpott and I discussed in person with @mtodor at the sprint.
Thanks all! Looks like the next step is to work more on @mtodor's patch from #46 and then review it.
Comment #49
Gábor HojtsyComment #50
Gábor HojtsyFix to use the right media tag. Sorry.
Comment #51
xjmThis is also critical for phase A of the workflow initiative.
Comment #53
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThe approach from the latest patches can not work because we have to take translations into account and they are not stored in the
file_usage
table.Instead, I managed to write a tricky aggregated entity query which gives us the file IDs used by all revisions of an entity, with optional filtering on a specific translation. This works fine for all uses cases but one: when we delete the field, the query can not find the renamed field tables anymore.
Not sure how to fix that yet but I'm posting what I have so far to get some feedback on the approach :)
Comment #55
jibranWe should have an empty $fids check else we'll load all the files.
Comment #56
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedNope, passing an empty array to
loadMultiple()
makes it return an empty array, so we're good.Comment #58
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedDiscussed this issue with @catch today and we concluded that we should postpone it on the only viable solution that we have so far: deleting individual revisions.
In order for that to happen, we need at least two things: use the Queue API for deletion (#89181: Use queue API for node and comment, user, node multiple deletes) and have some kind of archiving capabilities in order make any entity 'unpublished' until the deletion process is finished (#2725463: WI: Add StatusItem field type to store archived state).
Comment #59
xjmIf this is postponed on that, then those would need to be critical too I think.
Postponing this on those issues worries me though; the first has been stuck for years.
Comment #60
catchI don't think there's a viable solution that can be done without piggybacking off those issues though - anything we come up with will be pretty fragile, which is not what we want. amateescu spent two days trying to fix this without deleting individual revisions, spoke to me about it in irc, and we ended up with this plan.
This is almost a duplicate of #89181: Use queue API for node and comment, user, node multiple deletes really, and the specific bug here has been around since 2010/11 when file_usage was added in the first place.
From the issue summary there:
i.e. fixing that issue, then applying it to revisions, should mean this issue only needs test coverage.
I've bumped #89181: Use queue API for node and comment, user, node multiple deletes to critical for now.
Comment #61
alexpottDiscussed with @catch, @xjm, @effulgentsia, @Cottser and @webchick. Whilst #89181: Use queue API for node and comment, user, node multiple deletes is a good solution to this problem maybe not doing revision deletes before an entity delete causes problems for contrib that are worse than the potential timeout issues if we do this.
Re-opening the issue to explore doing entity revision deletes to fix it.
Comment #62
alexpottSomething like this.
Comment #63
alexpottWe could improve the query to load revision IDs if we could exclude default revision ID if the entity is the default revision. However we can;t do that because #2766135: EntityQuery with condition on the revision field leads to wrong results
Comment #64
catchHad to double check, but this is called by SqlContentEntityStorage which wraps its parent::delete() call in a transaction, so individually deleting the revisions still happens within a single transaction.
s/delete/default/
But if we have the entity, don't we also have the default revision ID available, so could do an if check instead?
Otherwise this looks great. It will fall over when entities have hundreds or thousands of revisions (and I've seen one of those recently, albeit via a bug), but that's an issue we can deal with separately - at least in that case with luck the transaction gets rolled back, which is better than just bits of an entity getting deleted silently.
Comment #65
alexpottRe #64.1 well we can't be sure that the entities passed are the default revision - so how would we know every entities default revision. Ideally what we could do is exclude the revision IDs for entities that are default revisions that are passed in. However we can't do that because of #2766135: EntityQuery with condition on the revision field leads to wrong results. And even if we could we'd still need the exception handling.
The other option is to load all the revisions here and check but that is exactly what happens in \Drupal\Core\Entity\ContentEntityStorageBase::deleteRevision so that seems meh.
One thing to explore might be a loadMultipleRevision.
Comment #66
catchGood point.
Comment #67
claudiu.cristeaIsn't this faster?
EDIT: I wonder how the 'OR' operator works. Probably is not documented.
Comment #68
catchIN is a tidier query whether it's faster or not.
Comment #69
dawehnerOne thing I am wondering: Could we add an update path, which cleans up the
file_usage
table? That sounds though like a major follow up or so, to be honest.Comment #70
claudiu.cristeaCheck if there are entities earlier. In this way we avoid the call to parent::delete() that is checking again the same thing.
@dawehner, added #2788571: Cleanup file_usage table after #1239558.
Comment #71
alexpottWell we shouldn't be loading revisions one by one - not doing that is the biggest performance optimisation we can be making here.
Comment #72
alexpottI'm also pretty sure that we should have some more explicit testing of this.
Comment #74
alexpottPretty sure we just exposed a nasty bug in \Drupal\Core\Entity\Sql\SqlContentEntityStorage::loadFromDedicatedTables()
Comment #75
alexpottHmmm \Drupal\Core\Entity\Sql\SqlContentEntityStorage::loadFromSharedTables is going to have the same problem but it looks like we don't have a test to catch that. And now were doing something different from loadRevision... mapFromStorageRecords is just meant to work on records keyed by entity ID and not revision ID... ho hum.
Comment #76
alexpottBut that is fixable :)
Comment #78
claudiu.cristeaSome cleanup in tests.
@alexpott, what exactly do you think it need some additional testing?
Comment #79
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedWhere does this inherit the doc from? And.. why not make it public?
\Drupal\Core\Entity\ContentEntityStorageBase::cleanIds()
relies on the entity 'id' key but we're acting on revision IDs.This feels a bit icky :) Since it is a protected method, why don't we change the existing $revision_id param to also accept an array? Or only an array?
Could this change have any performance impact on the query?
Comment #80
alexpottRe #79
1. Because making it public can happen in another issue that uses it where appropriate. Maybe we should mark it internal till that followup issue. It inherits from the abstract base class
2. Yerp - needs fixing
3. Because BC but it is all protected
4. If it does our indexes are totally out of whack... I'll run an explain.
Comment #81
claudiu.cristeaA new one which seems related #2787187: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity .
Comment #82
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedNeeds work for more tests, alexpott will need to describe which ones.
Comment #83
catchJust a note #1239558: Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger went in yesterday with a @todo for this issue. If revisions are enabled in the test for removing a translation, this issue prevents the file usage from being removed so the tests fail. We should be able to enable revisions in that test as part of fixing this bug (although it would be good to keep a non-revision variant of the test so both cases are covered).
Comment #85
phillamb168 CreditAttribution: phillamb168 commentedJust a note to anyone who happens to come across this issue while trying to figure out why deleting node revisions doesn't also delete the files associated with those revisions, if you're using workbench moderation you'll need this patch plus the one found here: https://www.drupal.org/node/2353491
I won't link the issue unless somebody thinks it's worthwhile, they're not directly related but it does seem to be a common issue.
Comment #86
TechnoTim2010 CreditAttribution: TechnoTim2010 as a volunteer commentedIt is an issue, not sure how common though, it is the first time I have encountered it in five years, but we have a suite of 93 sites, with issue raised on 2 of them but not ironically the sites using Workbench Moderation.
As they increase usage and complexity of use, I expect this to be raised as more and more of an issue. Currently the solution is to adopt an image naming format to differentiate older images from current ones.
Regards
Tim
Comment #87
alexpottSo we need more tests to ensure the hooks are being fired in the order we expect. Looking at the previous comments the things that needs addressing is #79.2 and #79.4.
Comment #88
alexpottAddressing #79.2
Comment #90
alexpottHere's some additional tests around hook firing during entity revision delete. Also fixed the DbLogTest. That one is super interesting. Our hook_*_load hooks take lists of entities keyed by entity ID - not so good for multi-loading revisions. We should probably change that is D9. Or introduce *_load_revision() and deprecate *_load() in D8.
Comment #91
catch#1730874: Add support for loading multiple revisions at once is still open, although that won't help with the hook signature.
Comment #92
claudiu.cristea#83 is not addressed. Note that the link from #83 is wrong, should be #2787187: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity .
Comment #93
claudiu.cristeaComment #94
claudiu.cristeaComment #96
BerdirloadMultipleRevision() kind of sounds better to me, but not sure. or loadRevisionMultiple() ? the issue below used loadRevisions()...
the problem with adding this as a protected method is that adding it as a public method later on in #1730874: Add support for loading multiple revisions at once isn't possible anymore as that would be an API change for subclasses.
why are we doing this by reference? the only reason for that would be able to replace the object, that doesn't happen?
I think null can only be paassed in if you default to NULL when you have a type hint. definitely for objects, haven't tested array.
possibly just copied code, I don't think loading all revisions of all entities is something we need to support?
missing type (yes, the exising params too)
unrelated?
Comment #97
alexpottThanks for the review @Berdir
@internal
. We can remove the@internal
in #1730874: Add support for loading multiple revisions at onceThis will still because I'm not sure that delete translations fires the entity revision delete hooks :(
Comment #99
alexpottHere's an improvement of the test coverage that is failing because #93. Once revisions are enabled on a content type removing a translation will not cause any deletes or changes in file usage because removing a translation will only remove the translation from the revision that is being saved. Therefore the file is still being used by the old revision. I've discussed this with @hchonov on IRC and he agreed that calling removeTranslation on an entity should not result in an entity revision delete firing.
So the assumption that the just changing the flag on the content type made in #93 is what we need to do is incorrect. The patch attached adds test cases where revisions are enabled and where they are not.
Comment #100
hchonovI've done a bit of digging and come across ContentEntityStorageBase::invokeFieldMethod, which when called for the "postSave" method will invoke the FieldItemList::delete method on each removed translation, which is a bit confusing as the doc of the method FieldItemList::delete says :
"This method is called during the process of deleting an entity, just before values are deleted from storage."
So currently deleting an entity translation still results in calling the field delete method on each removed translation, which is independent of if we are saving the default or a non-default revision with a removed translation.
P.S. This means that FileFieldItemList::delete will be called and should decrement the usage count of the referenced file.
Comment #101
hchonovHere is the path how it comes to invoke the delete method on each field of a removed translation:
1. EntityStorageBase::save ->
2. ContentEntityStorageBase::doPostSave ->
3. EntityStorageBase::doPostSave ->
4. ContentEntityStorageBase::invokeHook(update/insert) ->
5. ContentEntityStorageBase::invokeFieldPostSave ->
6. ContentEntityStorageBase::invokeFieldMethod('postSave')
and the last method has the following in its body :
Comment #102
alexpottOkay after discussion translation deletion a bit more and creating #2815779: Deleting a translation leaves behind orphaned revisions. I've added further testing here to show what's happens with revisions, translations and file counting.
Comment #103
timmillwoodI've not followed the issue too closely, but the patch in #102 looks pretty good and has nice test coverage. So going to RTBC to see if we can get things moving again.
Comment #104
claudiu.cristeaI'm not sure that we can fix #2788571: Cleanup file_usage table after #1239558 afterwards. I think we should fix it here.
Comment #105
timmillwoodok, back to needs work then to add a cleanup for the file_usage table.
Comment #106
anairamzapHi all,
I have tested this and:
Will assign the issue to myself so everyone is aware I'm working on this. I hope that's ok.
Thanks,
m
Comment #108
kattekrab CreditAttribution: kattekrab as a volunteer and at Catalyst IT commented@anairamzap - you've had this assigned to yourself for 2 months. Any progress? Or should we un-assign it at this point?
Comment #109
anairamzapHi @kattekrab,
I was waiting for an answer, but given that no movement has been done in this issue I have unassigned it.
Cheers,
m
Comment #110
chipway CreditAttribution: chipway at Chipway commentedWorking on this at DrupalDevDays Seville.
Comment #111
chipway CreditAttribution: chipway at Chipway commentedFirst, I had to re-roll the patch.
I had an issue when creating the interdiff (because of lines number been different in 2 places).
Here is the re-rolled patch.
Then I will work on adding the file_usage table cleanup.
Comment #112
paranojik CreditAttribution: paranojik commentedComment #114
chipway CreditAttribution: chipway at Chipway commentedSame patch as 111 + fix of failed test.
Comment #115
chipway CreditAttribution: chipway at Chipway commentedSo patch 1239558-114.patch passes tests.
We still have to work on adding the file_usage table cleanup.
I hadn't time in Seville and hope to do this soon.
Comment #116
dagmarIn #2753971: ContentEntityStorageBase::deleteRevision() function does not remove node_field_revision entries we have a similar situation as was ok to create a follow up to clean the tables later. I think we can do the same here.
Comment #117
Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer and at Appnovation commentedPatch looks good, just a small nitpick:
We should remove these lines if they're not necessary.
Comment #118
vijaycs85Overall, looks good. +1 to RTBC.
nice.
Minor: Ideally default of $revision_id should be NULL instead of FALSE as it holds integer value. but not changed by this issue. so we can ignore it.
Comment #119
vijaycs851. #116: Created follow up to clean up file_usage table - #2885434: Deleting a node with file usage doesn't delete file
2. #117: Removed.
Do we still need to work on this issue in 8.3.x?
Comment #120
matthieuscarset CreditAttribution: matthieuscarset as a volunteer and at Appnovation for Appnovation commentedI've follow the steps described in the issue summary.
Without the patch, files are not deleted.
After applying the patch, files are correctly deleted physically and marked as to be deleted (status=0 in database).
+1 for RTBC
Comment #121
timmillwoodComment #122
tstoecklerSince this affects the storage I queued a bunch of tests on non-primary environments.
Comment #123
Manuel Garcia CreditAttribution: Manuel Garcia as a volunteer and at Appnovation commented@tstoeckler++
Comment #124
catchhttps://www.drupal.org/pift-ci-job/696689 looks like a real postgres failure.
Comment #125
plachComment #126
Wim LeersI went through the entire issue.
The following discussions/comments were especially important:
file.module
:file_file_predelete()
. But its implementation is a@todo
. Turns out this TODO was added in #391330: File Field for Core, almost 8 years ago. @catch approved it there, assuming there'd be a follow-up. That follow-up never happened.\Drupal\file\FileUsage\FileUsageInterface
to make this possible.ContentEntityStorageBase::delete()
iterate over all revisions, timeout problems be damned, because many sites won't run in timeout issues, only sites with many revisions for an individual entity. @catch reviewed and +1'd.FieldItemList::delete()
, even though that is meant to only be called when deleting the entity (i.e. deleting all translations, not just a single translation).file_usage
table first (in #104 + #105), and then shifted back to "nah let's do it in a follow-up" (in #119 + #121).Why is it suddenly okay to do this in a follow-up, and why wasn't it before?
Updated issue summary, updated issue tags, updated issue title, updated related issues.
Comment #127
Wim LeersI also stepped through the code with a debugger. Based on the >100 comments above, many of which show very smart people attempting solutions but then running into dead ends … that shows that the semantics & architecture of deleting entities vs deleting translations vs deleting revisions is still a weak area of the Entity/Field API.
$entity->delete()
callsSqlContentEntityStorage::delete()
which callsEntityStorageBase::delete()
which callsContentEntityStorageBase::doDelete()
which callsContentEntityStorageBase::invokeFieldMethod()
, which iterates over all translations and invokes thedelete()
method on allFieldItemList
objects. AfterContentEntityStorageBase::doDelete()
calledinvokeFieldMethod()
, it also callsdoDeleteFieldItems()
, which is what e.g. deletes all revisions for this entity.FieldItemList::delete()
should iterate over all revisions, (so it's notContentEntityStorageBase::invokeFieldMethod()
that should iterate over all revisions), because the docs of\Drupal\Core\Field\FieldItemListInterface::delete()
say:That's only removing the usage of the default revision, not of all files across all revisions. It's not reasonable to expect some hook to be invoked for every revision, because there might be hundreds, and most things don't need to do per-revision clean-up. As the documentation indicates, this is the method that is called when deleting an entity. Therefore if this field type does revision-specific stuff, it also is itself responsible for doing revision-specific clean-up.
FieldItemList::delete()
, because it's designed to be used for "entire entity deletion, which also means all translations".ContentEntityDeleteForm()
calls$entity->removeTranslation($langcode)
and then calls$entity->save()
. Via the call chain shown in #101, this eventually results inFieldItemList::delete()
in being called. Which is wrong per the\Drupal\Core\Field\FieldItemListInterface::delete()
docs.Therefore I think that the current patch is wrong.
I think we need to fix two bugs:
The current patch only fixes the first bug, and does it in a way that makes it harder to fix the second bug.
Multiple solutions can be thought of for both of those:
FileFieldItemList::delete()
by making it iterate over all revisions, and deleting those individual usages.FileFieldItemList::delete()
by deleting all usages of this file by this entity in one go (implemented by @mtodor in #43, based on #17).FileFieldItemList::delete()
's logic to deal with file usage altogether, and instead move that intohook_entity_delete
, which means it will also work for e.g.editor.module
file usage and #2826127: Usage of Files Referenced in Entity Reference Fields Not Tracked. Note that this was always the plan in #391330: File Field for Core, 8 years ago, it just never happened…
\Drupal\Core\Entity\ContentEntityStorageBase::invokeFieldMethod()
's special casing: stop callingdelete()
on all fields when removing a translation. We have\Drupal\Core\Field\FieldItemListInterface::deleteRevision()
and we havehook_entity_translation_delete()
. It seems we're missing\Drupal\Core\Field\FieldItemListInterface::deleteTranslation()
?Related: #2787187: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity fixed the case of "deleting a translation causes all file usage to be fixed", but this too was due to translation deletion calling
FieldItemList::delete()
, so that attempted to change the logic inFieldItemList::delete()
to only delete usage in case the default translation is being deleted, but … that too is wrong! When file A is used in translation X and file B is used in translation Y, then deleting the non-default translation means that file B's usage will never reach zero. Reopened that issue because of that: #2787187-47: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity .All in all, I think it's important that Entity/Field API subsystem maintainers make a decision here that considers all ramifications. Either we go with a different patch than the current one that fixes both bugs, or we go with the current patch, but then we must fix the documentation of
\Drupal\Core\Field\FieldItemListInterface::delete()
and document how to detect whether a single translation is being deleted or whether the entire entity is being deleted.Comment #128
plachI did not dive as deep into this issue as @Wim Leers did (wim++), but off the top of my head I totally agree with everything said in #127.
Currently, my way forward would be: implement solutions #127.1.3 and #127.2. A
::deleteTranslation()
method should also allow us to properly address the follow-up mentioned in #2787187-47: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity .I'd like to explore this solution and provide a new patch if there's consensus this should be the way forward.
Comment #129
catchNow that #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary is in, this no longer results in unexpected file deletion. #2821423: Dealing with unexpected file deletion due to incorrect file usage is open as a critical meta-issue to discuss the broader approach on this. So I’m going to downgrade this to major after discussion with @xjm and @cilefen, since we feel effort would be better spent on replacing the file_usage system at this point. We can still fix this bug of course as a major.
Comment #130
plachSpoke with @catch and we agreed this no longer needs to be a MUST-HAVE for the Workflow Initiative, as lingering files are now accepted "by design".
Comment #132
suranga.gamage CreditAttribution: suranga.gamage as a volunteer commentedHaving looked/proposing a patch at https://www.drupal.org/node/2810355 (Before noticing this thread, is now linked there aswel).
"Deleting an entity with a file/image field does not release all usages of those files by the entity that is being deleted."
In an ideal conceptual code world I guess this could be by handling every one of revisions/translation instances separately however in reality this seems incredibly heavy handed (200 revisions x 4 languages would be incredibly painful at best).
The file usage however does aggregate to the "root" entity id so in theory it can be flushed out in a single go when deleting the main entity.
Assumptions in the "delete" callback.
There appear to be a few ways that the "delete" callback on the fieldList is called. But as far as I can see from the previous comments these don't always seem to match to the idea that it"s only when the entire node is deleted.
e.g:
- The default language of a node is changed (causing a "new" field instance to be added in the new language and the old one removed (invoking the delete callback) as seen in the related https://www.drupal.org/node/2787187
- Deleting a revision (basically a node, which assumes only this revision)
- Deleting a "current" revision (assuming all the connected revisions)
- Deleting a translation.
- Deleting the default translation (which assumes all the other translations)
Now in this sense I disagree with the assumption of the delete callback automatically originating from a "full entity delete". It might be the most common usecase but it's not an inherent given and trying to start guessing the case in this context feels odd. Conceptually speaking the we're deleting the "field instance" and we shouldn't be concerned about the larger context of why it's being deleted. The larger context (e.g the deleting of a node) should be concerned about calling the correct callbacks.
Roughly the current flow:
Something is changed / deleted --> Invokes the invokeFieldMethode [ which then tries to guess it's context from the passed entity ] --> edits the file usage.
But in my mind the flow should be
Something is changed / deleted --> This something decides the context (or asks some form of central point to do it) --> invokes the associated method --> edits the file usage.
So (as mentioned by Wim)
1. Fix \Drupal\Core\Entity\ContentEntityStorageBase::invokeFieldMethod()'s special casing: stop calling delete() on all fields when removing a translation. We have \Drupal\Core\Field\FieldItemListInterface::deleteRevision() and we have hook_entity_translation_delete(). It seems we're missing \Drupal\Core\Field\FieldItemListInterface::deleteTranslation()?
Seems the more logical case to implement without overhauling toomuch.
2. The other way would is to "expand" / solidify the internal checks in the invoker by using
$entity::TRANSLATION_REMOVED type checks more rigorously. But that would mean that detecting revisions is done the other way that detecting translations, even if the use case is pretty much the same. And the hook / invoke logic feels more general in how drupal does things.
Comment #135
AaronBaumanbump
Comment #136
webchickHm. This might need to be re-escalated. The lingering files, while no longer resulting in data loss (which, hooray!), now results in non-compliance with GDPR or other laws that require the deletion of files.
Comment #137
Berdir@webchick: We already have #2821423: Dealing with unexpected file deletion due to incorrect file usage as a meta issue for file usage problems and it's still critical. It's just that nobody cared about it anymore :) And I think there is also an issue about being able to delete manually, which I think would already mostly adress the legal concerns. We have generic entity actions now, so that shouldn't be too hard. If we can combine it with a view that shows files without usage and bulk-delete, then that should be pretty OK.
Comment #138
webchickRight, I realize about the existence of #2821423: Dealing with unexpected file deletion due to incorrect file usage, but that feels a lot more like "in order to fix this bug, you must first create the universe" issue. :P Whereas this one seems, at least on the surface, to have more targeted fixing possibilities.
Comment #139
BerdirHm, adding a way to manually delete files is something we can do any time, that's fairly easy. There are no blockers for that.
This issue is open for 8 years now, there is a reason for that. And it's not just about fixing this single issue, we have to fix every file usage problem to enable *automatic* deletion again. There are issues that remove too much usage or don't add enough, and others, like this one don't *remove* enough usages, so an issue like this might be hiding other issues that we haven't even found yet.
Comment #140
PanchoRe #136:
Indeed.
I don’t think so. While being a stop-gap, the mere fact that it’s possible to delete lingering files manually does not qualify as data protection by design (Art. 25 GDPR) in regard to the right to erasure (Art. 17 GDPR), see also the recitals 39, 46, and 65.
Now we’re no magicians, so as Berdir said, have to fix every file usage bug erring on one side or the other, before we can reenable automatic clean-up. However, I suggest we don’t accept lingering files as “by design” and pick up the proposal laid out in #127 and 128 by Wim and plach.
Comment #141
dqdMaybe related issue in contrib: #2910053: Used by another application
Comment #142
John Pitcairn CreditAttribution: John Pitcairn commentedComment #144
ankitr CreditAttribution: ankitr commentedHello everyone,
i am also facing this issue but i didn't get any solution. any solution for this problem?
Thanks
Comment #146
LpSolit CreditAttribution: LpSolit as a volunteer commentedI think comment #28 should be fixed, and it shouldn't be that hard to do. In my specific case, I reached 100% of my quota on the web server I use because there are tons of orphaned PDF, Word, etc... files which have not been deleted from the server when the corresponding nodes have been deleted from Drupal. The file_usage table contains many entries which point to a non-existent node, and these entries should be easy to catch.
Comment #147
KlemenDEV CreditAttribution: KlemenDEV as a volunteer and at Pylo commentedI agree this issue should have more attention, this can become a serious problem for users with smaller disk spaces.
Comment #149
pameeela CreditAttribution: pameeela commentedAdded credit for maseyuk from duplicate #3051952: Editor file usage count not decreasing correctly if someone ever saves a node without editing
Comment #152
michal.k CreditAttribution: michal.k commentedHi,
Can anyone confirm that #119 patch is safe to use?
Kind regards,
Michal. K.
Comment #155
abelass CreditAttribution: abelass commentedPatch #119 with drupal 9.4.7 causes
Fatal error: Cannot redeclare Drupal\Core\Entity\ContentEntityStorageBase::loadMultipleRevisions() in /var/www/html/web/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php on line 836
Comment #157
andrew.wang CreditAttribution: andrew.wang at Ontario Digital Service commentedComment #158
larowlanComment #160
kevin.pfeifer CreditAttribution: kevin.pfeifer commentedPatch #119 can't be applied to Drupal 9.5.10
Comment #161
steinmb CreditAttribution: steinmb as a volunteer commentedI think we should postponed this on #2821423: Dealing with unexpected file deletion due to incorrect file usage
Comment #162
bvoynick@steinmb This is one of the issues noted in the table in #2821423: Dealing with unexpected file deletion due to incorrect file usage. And I don't know of a reason work on this cannot proceed in parallel to the others there?
As the notes column there says regarding this issue: "No data loss, but irrepairable, therefore still critical."
Comment #163
4kant CreditAttribution: 4kant commentedfile_usage does not seem to get touched when a filefield gets deleted.
So we can not even find those files anymore that had been referenced (= used) by the deleted field. At least not in bulk.
All files still keep value "1" in column "status" .
At least this value should have changed after deletion of the field.
Am I wrong?
Comment #164
catchThis should really be postponed on #3364744: Add a reliable entity-usage system to core, the file_usage system is broken beyond repair, there is no way to rebuild file usage, so as soon as it's off by one or more, it's like that forever.
Comment #165
solideogloria CreditAttribution: solideogloria commentedI posted a potential solution/workaround in #3427640: File not marked temporary and usage not updated if only used in past revisions when node is deleted
Comment #166
solideogloria CreditAttribution: solideogloria commentedI also disagree that this should be postponed. Leaving it as-is will continue to create incorrect file usage data any time someone deletes content, which will only make their problems worse. The solution I shared in the previous comment isn't that difficult or complicated.
Please review and/or try the code posted in the linked issue summary. Note that I didn't test it with media, so the code might need to be further generalized to handle that case.