Problem/Motivation

Recently @catch, @webchick, @xjm, @cottser, @cilefen and I discussed #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary , #2810355: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted and #2708411: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations. We all agreed that the current file usage behaviour that is resulting in files being deleted when they are being used by a site is unacceptable and a critical issue.

This was also discussed at Drupal Dev Days Seville — see the meeting notes.

This issue is exist to propose a series of steps to mitigate and fix these related issues.

The current behaviour

As files are "used" in different situations we add to the counter in the file_usage table. As the usages are removed we subtract from the counter. Once the counter gets to 0 the file is made temporary in \Drupal\file\FileUsage\FileUsageBase::delete(). Then a cron process (file_cron()) comes along and deletes temporary files after system.file:temporary_maximum_age seconds.

There are several cases where that counter is not incremented or decremented correctly.

Regardless of all the sub-issues we have the known issue that this does not track usages added by content editors if they are able to add links to file but editing the source HTML. Also the system of maintaining a counter seems very fragile and evidence supports that :).

Which cases/issues are there?

These fall into clear buckets, along three axes:

  1. Field (image/file/entity reference) versus editor.module's editor_file_reference filter
  2. Entity revisions versus entity translations
  3. Data loss: yes or no
Short description Data loss? Field or filter? Revision or translation? Issue Notes
Deleting entity with revisions + image/file field no Field Revision #1239558: Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger No data loss, but irrepairable, therefore still critical.
Changing default translation + image/file field YES Field Translation #2810355: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted This was caused by the "fix" for another critical: #2787187: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity
The critical #2666700: User profile images unexpectedly deleted was merged into this — it has the same root cause, but different steps to reproduce.
Generic entity reference fields YES Field N/A #2826127: Usage of Files Referenced in Entity Reference Fields Not Tracked A missing feature. File usage support is only present on file and image fields, which subclass the generic entity reference field code.
Filter + translations YES Filter Translation #2708411: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations Was implemented using entity hooks before Translation API matured.
Filter + revisions + adding/removing images no Filter Revision #2892368: editor.module's hook_entity_update() does not count file usages correctly when adding/removing images without creating new revisions The only "normal" priority issue here, because it cannot lead to data loss, and can be repaired
The mitigation for all the rows with YES in the Data loss? column N/A #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary This is a mitigation for:
  • existing sites that desperately need a solution NOW
  • sites where it doesn't matter if files end up getting deleted or not (running out of disk space is only an issue on sites where there is a lot of editing that includes the need for replacing one file with another, without having an old revision still pointing to it)

The downside: it becomes difficult to delete "unused" files, which can be a security problem (e.g. accidentally uploaded PDF with sensitive data, now how to delete it?). That's less of a problem than files just disappearing right under you though. Worst case, the site administrator could be asked to remove the file from disk manually.

Proposed resolution

TBD

Remaining tasks

TBD

User interface changes

TBD

API changes

TBD

Data model changes

TBD

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alexpott created an issue. See original summary.

catch’s picture

catch’s picture

Issue tags: +Triaged D8 critical

This was opened as a result of a triage discussion, adding the tag to reflect that.

alexpott’s picture

Issue summary: View changes
Leksat’s picture

My personal thoughts on this.

1. #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary would be cool only for some projects. Disk space will run out pretty fast on a project having tens of editors, thousands of files, and frequent content updates.

2. > does not track usages added by content editors if they are able to add links to file but editting the source HTML
But we can add a support for this, right? The editor module already parses formatted text fields. It could search for img tags too.

3. > the system of maintaining a counter seems very fragile
We just need to fix few bugs and extend tests to make it strong again.

In general, is this issue mean that the file usage system can be removed completely? I'm asking because I don't see alternatives.
Also file usage is really helpful sometimes when checking issues reported by clients.

Leksat’s picture

Also, IMHO it would be nice if "Delete orphaned files after" would be "1 day" by default (it's "6 hours" currently) because most of hosting providers make backups on daily basis.

Berdir’s picture

1:
> on a project having tens of editors, thousands of files, and frequent content updates.

Frequent content updates that replace images? I don't think that is a very common use case, usually sites continuously create new content and don't delete/update existing one? At least not on a massive scale?

2:
Editor module parses image tages for a UUID, trying to do that for arbitrary image links is going to be a lot harder, imagine direct linkes to image styles, external images (both actually external images and those that point to your own domain), what about embedded videos, links to PDf's and so on.. that's *a lot* of markup/reference variants that we'd have to cover.

Also keep in mind that we are moving towards having media entity in core. That means each file is actually part of a media entity, which means the file itself is always "used". We could add a similar tracking for any entity usage or specifically for media entities but that currently doesn't exist yet in the existing module. So those media entities will be part of the re-usable media library, and just because you stop using one for a while, that doesn't mean you want to have it removed from the library. So even if we would add usage tracking, we wouldn't automatically delete I think.

One thing that would be possible is to let users view files/medias without usage and delete them explicitly if they are no longer needed. Of course we also need to make the usage tracking system as reliable as possible. And sites that do use this can then still ensure that they only use supported embedding mechanisms, while others can still do whatever they want with their content/markup.

catch’s picture

3. > the system of maintaining a counter seems very fragile
We just need to fix few bugs and extend tests to make it strong again.

This misses that as soon as anything goes wrong, we have no way currently to rebuild the counts.

Leksat’s picture

@catch,
> we have no way currently to rebuild the counts.
Can't understand this. In #2708411: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations we have a patch with an update hook that updates the file usage values to correct numbers.
Or do you mean something else?

catch’s picture

That fixes it for one case, it doesn't rebuild it for all cases, and there's no API for this it's a one-off update.

saurabh.tripathi.cs’s picture

Hi Everyone,

####Update after some debugging####

Drupal Version used: 8.2.1
Lingotek module used:
Machine name: lingotek
Version: 8.x-1.12


1) Uploaded one image in existing content in a node's slice (read more about slices here ) via WYSIWYG editor, saved the node and than checked below tables:
1.1) file usage
1.2) file managed
Result: The usage count in file_usage table is set as 2, but it should be 1, as i am sure that we are using that image only in one node object.

2) Created a new content, added a new slice
added a new image to it, and saved the node.
Result: file usage count shows 8 instead of 1.

3) Applied patch 2708411-44.patch provided here and ran hook update.
now on adding new image in existing or new content
Result: file usage entry is not created for image.

Drupal Version used: 8.2.4 and 8.2.1
Lingotek module not used:

1)The upload behavior works fine, on uploading 1 image, i entry created in file_managed and usage count shows 1 in file_usage table,
2) On uploading same image in same node, 2nd entry created in file managed, usage count shows 2 but an extra record is created in file usage of same file with usage count 1. (is that normal?)
explanation: If we are not removing previous image and uploading same image again, than drupal should consider it as a new image, and it does also by creating a new fid in file managed table, but,
the usage count for the removed file should not be affected, instead it gets incremented by 1.
A new record for the new image is created in file_usage table with count 1, however see point 3 below.

3) There is one bug: If we edit the content and change any field or not and click on save, the file usage count in file_usage table increases by one against the file which is uploaded in that node.
Also the second record which is of previous image (because we uploaded the same image again in the node and not removed previous one.) the usage count increments by one.


  • Summary: Usage count increments by one on editing and saving the content only. This is because a new revision is created every time we edit and save the node, how ever if we remove a file from node (via WYSIWIG editor) than the usage count is not decremented.
  • With lingotek module also enabled, the usage count jumps directly to 8 even if we upload a single image in a new node.

#####How we checked ########
Copy the uuid of image from node edit page and search it in file_managed table.
Copy the fid of record and search it in file_usage table, count field displays the no of times the file is being used. (in my case value comes as 2 for existing content and 8 for new content)
Copy the id field value: This is the id of entity object which is using the file. This object can be a node, slice etc.

######Questions######

1) Will running the db_update provided here fix this?
2) How does garbage collection of drupal 8 works?
2.1) When a file has usage count as 0, does that file gets deleted along with their entry in file managed and usage tables?
2.2) Which condition is checked first: file_managed has entry or file_usage count is 0 before deleting a file.
2.3) What triggers the increment of file usage count, what code/file?

####Work-Around??###
If we don't get a fix on this soon we are planning to set up a separate cron which will run a script checking all files which are not in file managed table and are in file system and remove them, also the one which have file usage count 0 will be removed.

There i can add extra condition to crosscheck if a file in file managed table and recount object using this file.

We will un-check the check-box of default garbage collection of drupal in that case, i am not sure if that workaround is a good idea.

Please suggest how to proceed,
Thanks.

hitfactory’s picture

This bug is causing a lot of frustration for content editors on one of our multilingual sites.

I think the fundamental problem is that core makes the assumption that files not associated with entities are no longer required and should be deleted.

In our case, only administrators and content editors can upload files to the server to begin with. And our client expects that when they upload a file to the website (either via a field or via CKEditor etc), that the file will remain on the server regardless of any changes to the host entity. This is because they often link to a file or image on the website from email campaigns.

Can someone explain what the security issue is in regards to allowing unused files to remain on the server?

Berdir’s picture

There are two different versions of unused files. Those that were never used in the first place, e.g. because they were uploaded in (possibly public) forms and then never submitted or replaced before they were ever marked as used. You don't want to keep those, which is why completely disabling deleting temporary files is IMHO a bad idea.

The other case of files that were once used and no longer are is a very different scenario. The only problem there is that your disk could eventually get full if you have a site that replaces a lot of files and no longer uses old ones. That's what #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary is about, it got stuck on the admin UI and texts there. I'd love some help there to move that forward.

And as also mentioned, media entity will completely change this as well, because a file in a media entity will always be used by itself, so media entities won't be deleted automatically, ever.

Leksat’s picture

Berdir’s picture

There is also #2708411: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations for embedded images and then there's the whole problem of manually added links (for example to uploaded PDF files) and HTML in content that we are not tracking at all :)

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

catch’s picture

So I still think we need to stop using file usage to programmatically delete files at all. We could potentially retain it (at least in 8.x) as a usage estimate on the file admin pages. Theoretical cases like sites that regularly change files or delete lots of content are nothing compared to the actual cases of files getting deleted on real sites.

Leksat’s picture

There is a blog post from mail.ru - one of the most popular email services in Russia.
They use a file usage system similar to Drupal's one, but with additional protection for dealing with unexpected file deletions.
https://www.smashingmagazine.com/2017/01/reducing-email-storage-mailru/#...

Yet, they never delete all file usages at once. Each usage added or removed individually.
(I'm talking about the ability to call \Drupal\file\FileUsage\FileUsageInterface::delete() with $count == 0 which deletes all file usages.)

TrevorBradley’s picture

I just hit this independently and posted about block inline images going missing over in #2863829: Block inline images considered "orphaned", deleted?.

Is there any way to see which files are flagged for deletion, or put the system into a debug mode that says "I would have deleted this file if file deletion were turned on for reason X"?

Wim Leers’s picture

Issue summary: View changes

This was also discussed at Drupal Dev Days Seville — see the meeting notes.

@xjm asked me to get all of the files criticals going again. So I did. I did deep dives on every issue. I posted summaries of all discussion on each of the issues, updated their title, issue summaries, and so on.

This enabled me to get the overview across this cluster of about half a dozen critical issues. Turns out the entire space of issues… falls into clear buckets, along three axes:

  1. Field (image/file/entity reference) versus editor.module's editor_file_reference filter
  2. Entity revisions versus entity translations
  3. Data loss: yes or no

From this, I distilled a table that hopefully makes it very easy to find your way through these issues:

Short description Data loss? Priority Field or filter? Revision or translation? Issue Notes
Deleting entity with revisions + image/file field no Critical Field Revision #1239558: Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger No data loss, but irrepairable, therefore still critical.
Changing default translation + image/file field YES Critical Field Translation #2810355: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted This was caused by the "fix" for another critical: #2787187: Data loss: Deleting a translation of an entity deletes all file_usage entries for files referenced by the entity
The critical #2666700: User profile images unexpectedly deleted was merged into this — it has the same root cause, but different steps to reproduce.
Generic entity reference fields YES Critical Field N/A #2826127: Usage of Files Referenced in Entity Reference Fields Not Tracked A missing feature. File usage support is only present on file and image fields, which subclass the generic entity reference field code.
Filter + translations YES Critical Filter Translation #2708411: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations Was implemented using entity hooks before Translation API matured.
Filter + revisions + adding/removing images no Normal Filter Revision #2892368: editor.module's hook_entity_update() does not count file usages correctly when adding/removing images without creating new revisions The only "normal" priority issue here, because it cannot lead to data loss, and can be repaired
The mitigation for all the rows with YES in the Data loss? column N/A #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary This is a mitigation for:
  • existing sites that desperately need a solution NOW
  • sites where it doesn't matter if files end up getting deleted or not (running out of disk space is only an issue on sites where there is a lot of editing that includes the need for replacing one file with another, without having an old revision still pointing to it)

The downside: it becomes difficult to delete "unused" files, which can be a security problem (e.g. accidentally uploaded PDF with sensitive data, now how to delete it?). That's less of a problem than files just disappearing right under you though. Worst case, the site administrator could be asked to remove the file from disk manually.

(This table is now also in the IS, but I'm also posting it in this comment so it can be easily compared later on, to track the evolution.)

Wim Leers’s picture

Regardless of all the sub-issues we have the known issue that this does not track usages added by content editors if they are able to add links to file but editing the source HTML.

We could easily make \Drupal\editor\Plugin\Filter\EditorFileReference reject any <img> tag that doesn't have the data-entity-type and data-entity-uuid attributes set, or if their values don't result in a successful entity_load(). That would prevent this. It'd also disallow external linking, but this could be a setting.

That fixes it for one case, it doesn't rebuild it for all cases, and there's no API for this it's a one-off update.

Can you give an example of which use case is absolutely impossible to recalculate?

Related: https://www.drupal.org/project/auditfiles might be helpful for some of the sites running into this problem space.

Wim Leers’s picture

Issue summary: View changes

I forgot that d.o's node preview was so terrible. Removed the Priority column so that more of the table is legible.

plach’s picture

Awesome work, Wim, that helps a lot! :)

Wim Leers’s picture

d.o's CSS still gets in the way, here's a screenshot with some HTML + CSS hackery to make the table more useful:

@plach: <3

Eli-T’s picture

@wim-leers the auditfiles module you linked to in #21 has no D8 version, at least findable from its d.o project page.

catch’s picture

Can you give an example of which use case is absolutely impossible to recalculate?

I mean it's impossible to recalculate the file_usage table as a whole, it relies on the owner of a specific implementation identifying and fixing a bug, then writing the code to do that in a one-off update. There's no way as a site-owner to run that code/debug it etc except possibly extracting it from an update (if that update even exists).

So I still feel like there are fundamental issues with the file_usage system that means we should focus efforts on a replacement rather than trying to fix it:

- the increment/decrement doesn't give you any idea where files are used, if you wanted to use that information to audit.
- any bug in file usage tracking can result in data-loss at any arbitrary later date (unless it's fixed and a correct upgrade path is written). Since the same file can be referenced in multiple different ways, a single module with a bad implementation can put any file on the site at risk.
- an upgrade path that attempts to rebuild file usage data will have to collect the data at the time it runs - that could could potentially be used by a different API that pulls the usages rather than using increment/decrement

That leads towards doing something like this:

1. Remove the removal of zero-usage files altogether
2. Deprecate the file_usage API
3. Allow manual deletion of files from the UI, show file_usage data as an interim indication.
4. Build a new API that allows you to obtain counts and individual usages of individual files on-demand (via references, text fields etc.) - obviously some of this information is more expensive to get than others. This could then be used by sites that want to audit for unused files.

So if we're able to rebuild the information for every instance of file usage, then we should be able to use that for an on-demand usage audit API that's more useful to people. If we're not, then we can't fix what we have anyway, and have to revert to something informational anyway.

ckaotik’s picture

4. Build a new API that allows you to obtain counts and individual usages of individual files on-demand (via references, text fields etc.) - obviously some of this information is more expensive to get than others. This could then be used by sites that want to audit for unused files.

We've recently built a (currently private) module for a client that collects information of entity references via various means (CKEditor entity_embed tokens, entity_reference fields, blocks in panels, ...). This might be of interest?
What we have noticed there though, is that it's really hard to get information for all kinds of references. For example, how to deal with custom forms, config entities etc. that don't use the Field API for storage? I think some kind of generic entity usage tracking API would be wonderful for this, so both core and contrib could integrate with it and we wouldn't need to keep different tracking solutions for different types/sources of references.

Wim Leers’s picture

I mean it's impossible to recalculate the file_usage table as a whole, it relies on the owner of a specific implementation identifying and fixing a bug, then writing the code to do that in a one-off update.

True. But that's achievable, at least for core. There aren't boatloads of things outside core dealing with files.

the increment/decrement doesn't give you any idea where files are used, if you wanted to use that information to audit.

Not true — /admin/content/files.

  1. Allow manual deletion of files from the UI, show file_usage data as an interim indication.

I'm curious to hear what Berdir thinks about this.

  1. Build a new API that allows you to obtain counts and individual usages of individual files on-demand (via references, text fields etc.) - obviously some of this information is more expensive to get than others. This could then be used by sites that want to audit for unused files.

I fear that on most sites this would be prohibitively expensive. Let alone on big sites. Let alone for files referenced via text fields.

catch’s picture

the increment/decrement doesn't give you any idea where files are used, if you wanted to use that information to audit.

Not true — /admin/content/files.

If I upload a file to a node, then I unpublish the node, how do I find out that the file is only referenced by one, unpublished, node at the moment?

I fear that on most sites this would be prohibitively expensive. Let alone on big sites. Let alone for files referenced via text fields.

Yes, so it might not be an option, but it'd still be better than deleting random files.

Also, we could look at 'files referenced by text fields' being paired to an administrative field on the entity, or tracking table (k/v or similar), so that it only has to be calculated on entity save.

dawehner’s picture

I would totally argue in favour of removing the file usage API, its complex and doesn't even the kind of things people do.
Not only that, but I also fear that the temporary feature itself is sort of broken:

Random example I debugged the other week. The client used the ckeditor file upload for pdfs, copied the link and then pasted it into a link field, which in their usecase seemed reasonable, as they also had external PDFs. Given that the file have never been tracked in the first place, as it was never part of a normal file upload field NOR the wysiwyg editor, still, the file was totally usable by the user, but then disappeared after a while. Maybe this is an edge case, but its something the client came up with for themselves.

jonathanshaw’s picture

Re #21/#25 The auditfiles module has no D8 version but file_checker does, and does some of the same things: it ensures that all file entities in Drupal have an actual file on server.

3. Allow manual deletion of files from the UI, show file_usage data as an interim indication.
4. Build a new API that allows you to obtain counts and individual usages of individual files on-demand (via references, text fields etc.) - obviously some of this information is more expensive to get than others. This could then be used by sites that want to audit for unused files.

how about this:
A: extend the current file usage API to tracks instances of actual use, at the time of use. If we record and delete usages individually, its less fragile than a simple integer increment/decrement. For that matter, the media library examples discussed here show there's a serious use case for a more generic entity use API - so that we can explore and consider deleting unused media.

B: create a second file usage API that interrogates all the places files (or entities?) are used and rebuilds the usages records.

C: allow users to run B from the file UI as an action for selected files.

D: run B automatically when a user wishes to delete a file from the UI. If a new previously unrecorded usage is discovered, warn the user on the deletion confirmation page. Have something similar for programmatic deletion of unused files (a "safeDelete" method that checks usage before deletion).

This way we can BOTH
1. Show people a view of unused files that has a greater chance of being right, and show it instantly without a hugely expensive calculation
2. Prevent people from inadvertently deleting files that have usage

Wim Leers’s picture

#2801777: Give users the option to prevent drupal from automatically marking unused files as temporary landed. Does that mean we downgrade all other "file usage" criticals listed in the IS from critical to major?


#29:

If I upload a file to a node, then I unpublish the node, how do I find out that the file is only referenced by one, unpublished, node at the moment?

Touché! Thanks for that insight, I hadn't realized that yet!
Well, actually … none of the file usage tracking is affected by whether a node is published or not? Published or not, file usage is tracked, and it shows up correctly at /admin/content/files.

#30: You can't upload PDFs in CKEditor unless with just core. You can if you install https://www.drupal.org/project/editor_file. It's then up to editor_entity_insert() & friends to detect the inserted file and mark it as permanent. If the file was uploaded, the URL copied, and then the HTML removed, then this hook will never be able to detect the inserted file and mark it as permanent. Your client was definitely using the system in a way it was not meant to be used. If they didn't remove the HTML, they must've run into one of the two "filter" rows in the IS.

catch’s picture

Well, actually … none of the file usage tracking is affected by whether a node is published or not? Published or not, file usage is tracked, and it shows up correctly at /admin/content/files.

This is exactly what I mean though.

The argument that when deleting a node, you expect the file to be deleted applies equally to unpublishing a node and expecting the file to be unpublished. In both cases, if the file stays on disk and accessible via the public file directory, it could be available when you don't expect it.

If we don't automatically delete files, but the file usage API is functioning perfectly otherwise, then you could look at a view of zero usage files, and delete the file manually when doing a file audit.

Where the file is attached to an unpublished node, it will show up with a usage of '1', but there's no way from the managed files page to know which node it's attached to and whether or not that node is published or not. We can change that situation though if we track usages one by one.

Wim Leers’s picture

I see, thanks for clarifying!

SKAUGHT’s picture

to add a thought (i can't find a more specific issue to address this toward)

assuming revision & workflow, referenced entities, and all the toys are counted correctly: perhaps at the point when a file is unused it is displayed under a new UI:
/admin/content/files/unused
to help, at least, visually filter this type of garbage collection.

as well toward other possible future forward tools (like a Trash mode) then files will still exist and can be reclaimed when un-trashed.

ckaotik’s picture

@skaught There is already a listing of all temporary files at /admin/content/files?status=0. Those are the files that will be deleted, if pruning is enabled.
The problem I see with reclaiming is that once the last file usage is removed, you cannot know where the file was referenced before. Keeping a "These are the files you recently orphaned" table would get huge very quickly. And even if you knew where the file was attached, the default upload widget does not allow to re-use an existing file.

For the suggestion of a way to delete unused files via UI, there is already a separate issue: #2648816: Uploaded files are impossible to replace.

Slightly off-topic: I don't know if this use case is already represented in #2821423-24: Dealing with unexpected file deletion due to incorrect file usage's table or if there is a separate issue already: When editing an existing entity with a file field and selecting "Remove", the file is immediately deleted from disk, even if the user later decides to abort the edit of the entity. I don't think this is related to file usage though, just wanted to mention it, as it seems like an unexpected file deletion, too.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Anonymous’s picture

Sorry, if this scenario has already been discussed (issues by unexpected file deletion a lot):

1. Install 8.5.x
2. Add to theme-settings.php additional field settings with image

function bartik_form_system_theme_settings_alter(&$form, &$form_state) {

  $form['super_image'] = array(
    '#type' => 'managed_file',
    '#title' => t('My super image.'),
    '#upload_location' => 'public://images/',
    '#default_value' => theme_get_setting('super_image'),
  );

}

3. upload image
4. check database tables:

  • file_usage -> none
  • file_managed -> status = 0

5. After 6 hours (by default), the file will be deleted.

All known to me сontrib themes use this way to storing files too. So the problem has quite a wide impact. If this has not been discussed yet, I can write tests for it. Also, as a solution, we can check the status of files inside calling theme_get_setting, and change it, if necessary.

Berdir’s picture

Whoever uploads a file *is* responsible for making a file permanent (and registering usages). This isn't a bug, this is how the API works. I'm sure it could be improved, e.g. a flag to autotomatically mark files as permanent but that would be a task/feature.

This is required to clean up files that are uploaded, e.g. on new nodes and then the node is never actually saved.

Anonymous’s picture

@Berdir, thank you for help with it! You right, I overlooked the code with set permanent and add to file.usage via hook_submit(). And yes, simplify work with api is task/feature. I will create an issue for this, if I do not find an existing one (maybe #618654: File and image fields are treated as temporary files and automatically deleted after six hours).

But I also noticed, that even if the user uses the Bartik, and changes the default logo to custom logo (via UI, not api), then it will be stored only 6 hours too. It looks questionable behavior. I think I already saw an issue like "Why drupal removes my custom logo," but I just can not find it :(

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

dsnopek’s picture

So I still feel like there are fundamental issues with the file_usage system that means we should focus efforts on a replacement rather than trying to fix it

There's references here and on other related issues to completely replacing the file_usage system. Is there an issue for this? I can't seem to find one linked on any of these issues or by searching. Thanks!

SKAUGHT’s picture

catch’s picture

@dsnopek #2835840: Track media usage and present it to the site builder (in the media library, media view, on media deletion confirmation, etc.) is probably the closest.

I've just marked #2244513: Move the unmanaged file APIs to the file_system service (file.inc) needs more info, since it's an OOP conversion rather than a replacement. edit: that was pointless because the issue isn't to do with file usage.

dsnopek’s picture

@catch Thanks!

mpp’s picture

Adding IMCE to the long list of projects that need to handle managed files.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

handkerchief’s picture

Any news on this?

Rewted’s picture

I've been wondering why deleted images were accumulating on the file system, only to discover this is the intended behavior in Drupal 8 now.

It appears all images have STATUS Permanent, but those USED IN 0 places are not removed. Is the solution to modify file.settings.yml, setting make_unused_managed_files_temporary to true?

Also, moving forward, how is the novice Drupal user expected to remove image from their site. This needs some TLC, no?

handkerchief’s picture

In my case:

Drupal 8.6.7

On /admin/content/files:

  • Column USED IN: 1 Places
  • Click on the link: This file is not currently used.

So the counter of used files is wrong in my case. Then the file status doesn't change, even tough I changed the make_unused_managed_files_temporary config to true.

cilefen’s picture

@Rewted We’ve published this in noticeable places such as https://www.drupal.org/project/drupal/releases/8.4.0-rc1 and this is an open, critical issue. Please help any way you can.

Rewted’s picture

@handkerchief I've experienced this too, clear your caches and it should report 0 places correctly.

@cilefen Yes, I am available for testing. Currently running Drupal 8.6.9, I've modified file.settings.yml, setting make_unused_managed_files_temporary to true, after 6 hours the image remains on the file system and has STATUS Permanent, USED IN 0 (no change). Expected result would be STATUS Temporary and then automatically deleted on cron run. Am I misunderstanding?

Berdir’s picture

> Am I misunderstanding?

Yes, a bit. The flag controls what what happens *when* a file reaches 0 usage, if enabled, it will switch the file to temporary and then it will get deleted. It will not affect any existing files that already are on 0 usages.

Rewted’s picture

@Berdir Okay, but prior to changing the flag to true, all existing images that were added with an Image Field on my Content Type were listed as: STATUS Permanent, USED IN 1 place. (No 0 places or Temporary images present)

I then tested by adding a new image, this results in: STATUS Permanent, USED IN 1 place. Minutes later I deleted this newly added image, this results in: STATUS Permanent, USED IN 0 places. Both of these add/delete actions occurred between cron runs.

Leaving the system overnight, 10+ hours later the image remains on the file system and has no change to STATUS Permanent, USED IN 0 (no change).

iancawthorne’s picture

@Rewted
It sounds from your test at #54 that make_unused_managed_files_temporary is not set to true.

If you've only set it in the yaml file and your site is running off the database, you would need to import that change for it to take effect.

Or maybe it is set to false in settings.php or setting.local.php, which would override it being set in the database.

If it's not in settings.php, try adding it:

$config['file.settings']['make_unused_managed_files_temporary'] = TRUE;

With make_unused_managed_files_temporary set to true, what should happen is:
1) 1 piece of content with a file associated and is the only occurence of it exists
2) Delete this 1 piece of content
3) Immediately on delete action, the file should be used on zero places and marked as temporary (there is no delay)
4) When cron next runs, assuming the temp max age has been hit and there aren't a backlog of other files to delete, the file in question should be deleted and no longer listed in file_managed table and file administration page.

You can override the time before a temporary file can be removed in settings.php too.
30 seconds for example to test with:

$config['system.file']['temporary_maximum_age'] = 30;

I can confirm this process does work when make_unused_managed_files_temporary is set to true.

Rewted’s picture

@iancawthorne Thank you for that explanation (#55). Makes sense now!

handkerchief’s picture

@Rewted The cache is not the problem at #50. But maybe this is a problem with the File Entity modules?
https://www.drupal.org/project/file_entity

The counter is always increased by one. So if the file is used in 1 place, the counter says 2 places. If the file is used in 0 place, the counter says 1 place.

iancawthorne’s picture

@handkerchief admin/content/files (and the counter column) is provided by core, not file_entity as far as I'm aware.

handkerchief’s picture

@iancawthorne yes i think that too. But maybe the module somehow interferes with this process. It's just a note value that I'm using this module in this project.

Deas.h’s picture

@iancawthorne - I changed it in file.settings.yml from false to true and added the line to settings.php. I am still not able to change my files from permanent to temporary.

But why must this be so complicated? I have no problem with NOT deleting files automatically. But why is there no checkbox next to each file and a "Delete" button? That would solve the problem I think...

iancawthorne’s picture

@Deas.h Updating the yml file wouldn't have any effect unless you then imported it into the database (drush cim or syncronize in the config admin). Updating settings.php with the line in #55 should do the trick on its own though.

Are you sure the file is used in zero places? Going from one to zero places used is the trigger to mark a file as temporary. The line at post #55 has to be in place at the point the trigger happens. If it is added retrospectively, it will only work on future files that go from one to zero used places.

I agree though, a delete option like you can other entities (like nodes) would be ideal for any files used in zero places.

Rewted’s picture

@iancawthorne As suggested in #60, Would it be a lot of work to implement a "delete button" (manual override) into the next point release as an interim solution while the DEVs work out the long term logistics?

queenielow’s picture

Hi @iancawthorne

I am following your steps in comment #55, it works for the existed media(file in use) being deleted and the file will be marked as Temporary.
What happens to those existed files that are already 0 files in use, they are not marked as Temporary when the cron is run?

Just like what has been suggested in #60, is there a chance you can create a manually delete button for those files?

iancawthorne’s picture

@queenielow That's correct. Files that are already zero use and not marked as temporary will be forever that way. This is where a manual delete option would be useful. For now in that scenario you can identify the fid in the file_managed table of the database and remove the row. Not ideal.

queenielow’s picture

Hi @iancawthorne,

I've tried that method(remove it from file_managed) but it doesn't remove the physical file tho'.
Is there something else needs happening?

Thanks,
Queenie

iancawthorne’s picture

@queenielow Sorry, my mistake, I'd assumed the file was already gone, and you just had the record in the database to remove.

If you want to remove the file too, instead of deleting the row in the database, you would change the status column in the file_managed table from 1 to 0. That changes it from permanent to temporary. Then run cron (after the 'temporary_maximum_age' has elapsed) and both the file and database record will be deleted.

Rewted’s picture

@iancawthorne Your suggestion in #55 worked perfectly. Cleaned-up all old images, though I have a few straggler (three) images marked as Status: Permanent, Used In: 0 places, is it safe to simply delete them from the file system, or are they referenced in the database too?

iancawthorne’s picture

@Rewted If they're listed as status permanent that means they're in the database in the "file_managed" table. Identify the fid and change the status from 1 to 0 in the database table, run cron and you should be all good = the file and db row should be removed.

astringer’s picture

I have a site in development for a client that may sometimes receive a DMCA takedown notice. If legitimate, the client would need to completely delete the file off of their server and out of the database.

So while we understand the current fix is temporary, we need to find a solution for removing files that is easy and do-able by an editor or sitebuilder.

Setting make_unused_managed_files_temporary config to true seems risky to me because of the potential data loss issues.

So I simply added the ability to delete files on /admin/content/files via Views Bulk Operations. Assuming editors are properly trained about when and how to use this feature -- DOES ANYONE SEE POTENTIAL PROBLEMS WITH THIS SOLUTION?

Thanks.

oriol_e9g’s picture

@astringer IMO this is a needed workarround at this moment, but needs to be done in a separate issue.

1. First add a new permission for force deleting files manually.
2. Add a new bulk operation to force deletion (file (if exists) and database (is file is deleted or not exists)).

But I'm not sure if this is core material or contrib.

waverate’s picture

I think the settings at comment #55 only apply to new files that are uploaded and will make the site work as you expect in the future. However I don’t believe it will change the status of the current unused files to temporary.

If you are certain that the file listing with 0 places is accurate, you could run the following SQL query to set them to to temporary:

UPDATE
  file_managed AS b1,
  (
    SELECT
      DISTINCT file_managed.fid AS file_managed_fid,
      file_managed.filename AS file_managed_filename,
      file_managed.status AS file_managed_status,
      SUM(file_usage_file_managed.count) AS file_usage_file_managed_count
    FROM
      file_managed
      LEFT JOIN file_usage file_usage_file_managed ON file_managed.fid = file_usage_file_managed.fid
    GROUP BY
      file_managed.fid
    HAVING
      (COUNT(file_usage_file_managed.count) = '0')
  ) AS b2
SET
  b1.status = 0
WHERE
  b1.fid = b2.file_managed_fid;

The will then be removed during subsequent cron.

You can see the query in action in SQL Fiddle: http://sqlfiddle.com/#!9/d78ba0/1/0

lukasss’s picture

Version: 8.6.x-dev » 8.8.x-dev
acbramley’s picture

Joachim Namyslo’s picture

Oh Boy.

Many many community members are evolved in this. But the bug and the corresponding ones moved from 8.2 to 8.7 Sorry for the question but may it be possible that I could delete files that duplicated and used nowhere on the site. Without changing the config of the file module and mark all these files as temporary in DB manually in drupal 8.8, again by using the UI

We will get improved media Management again. Wich is totaly fine for me. What we do not have is a simple method to delete database entries and files used nowhere in drupal anymore. A permanent temporary mechanism is a great tool if it works. But wouldn't it be easier to provide a tool to delete duplicated files manually within the files table or something like that

instead of announcing great media updates again and again and leave less advanced users totaly alone, when they like to clean up their servers disk space.

Don't take me wrong i am not bothering. I am just asking if there is any quick fix for this issue until it is fixed completely.

marcoscano’s picture

#75 you might be interested in #2936175: Add delete button to admin/content/file (and related)

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

jnicola’s picture

Here's code for an update hook implementing the SQL query above, with some modifications to make it work in Drupal:

/**
 * Move all managed unused files to temporary status.
 *
 * NOTE: Add your module name below, and change "N" to 8001 or above as needed.
 *
 * Implements hook_update_n().
 */
function YOUR_MODULE_NAME_update_N() {
  $query_force_temp_on_unused_managed_files = "
  UPDATE
    {file_managed} AS b1,
    (
      SELECT
        DISTINCT {file_managed}.fid AS file_managed_fid
      FROM
        {file_managed}
      LEFT JOIN
        {file_usage} file_usage_file_managed
        ON {file_managed}.fid = {file_usage_file_managed}.fid
      GROUP BY
        {file_managed}.fid
      HAVING
        (COUNT(file_usage_file_managed.count) = 0)
    ) AS b2
  SET
    b1.status = 0
  WHERE
    b1.fid = b2.file_managed_fid;";

  $database = \Drupal::database();
  $query = $database->query($query_force_temp_on_unused_managed_files);

  // Attempt to execute query, throw error if it failed.
  if ($query->execute()) {
    return t('Successfully flagged all file entities with no value in table file_usage for removal after 6 hours on cron run.');
  }
  else {
    throw new UpdateException('FUNCTION_NAME_HERE Query failed to execute.');
  }
}

joseph.olstad’s picture

I encountered unexpected file deletion due to incorrect file usage when relying on the file_usage data in a cleanup job.

There is a patch that has tests and is passing tests.
see #2810355-61: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted

Appreciate community and core maintainers assistance to get this patch to where it needs to be in order for it to be accepted.
Thanks,

jungle’s picture

FYI, Just proposed an approach #3123244: Instead of relying on file usage, tracking dependency to determine the deletion of files.

Maybe they are two different concerns, this one looks like going to stick with file usage for a while, 3123244 is trending to let file usage does its primary duty -- tracking file usages only, and let another approach in to deal with file deletion. So I filed a new issue.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Andrew Answer’s picture

Hello, I found another case for file_usage: when I upload custom logo to theme, it appears in files list, but it's usage is 0 and it marked as "temporary". Why?

Chris Matthews’s picture

The File Delete UI is an elegant contrib solution for this issue.

joseph.olstad’s picture

Thanks for the heads up @Chris Matthews!

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Paul Dudink’s picture

I have been looking at the core code that handles the file usage count and IMO it is prone to fail. It seems to rely on the file_usage.count field as a single source of truth, which means that if at any point a usage has not been counted, or a usage has been counted twice, from that moment the count will be incorrect forever.

Currently I am implementing the entity_usage contrib module in a website. In my opinion this module implements tracking in the correct way. It is also highly configurable and if - at any point in time - you believe that the counts are off you can let the module re-index.

In my opinion the core code should be ditched and this module should be implemented in the core instead.

(BTW I must admit that this issue was too long to read completely for me, but I couldn't find any references to the entity_usage module yet. So I believe this may be an important addition to consider)

**Edit:**
I am currently investigating whether it would work to:
1. Patch core to disable the core tracking functionality
2. Add a submodule containing an event-listener for entity_usage's "Events::USAGE_REGISTER" event. The event-listener will check for events triggered for "files" only and then update the file_usage table.

This investigation may take a while, because I think it should be tested on several websites and should be tested in a safe way, because it can lead to potential file loss (ofcourse).

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

xjm’s picture

wadley0’s picture

Additional problem case for not deleting files (similar to #69):

1 - Content editor adds "File A"
2 - Google indexes "File A"
3 - Content editor removes "File A", adds updated "File B"
4 - Public still finds outdated "File A" (via Google... it is still available on the server)
5 - Content editor has no understanding why the file they removed is still being found... nor should they be expected to go through a complex file management process to delete it, they already removed the file.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

maskedjellybean’s picture

I think @Paul Dudink had a good suggestion in #87 to look into the entity_usage module. If all this is caused by file_usage counts being wrong, shouldn't we look at how a contrib module is doing the same thing and see if it's better?

I'm also wondering has anyone has tried media_files_handler?

For those of us who need deleted files to actually be deleted, what is the best option? Just enable $config['file.settings']['make_unused_managed_files_temporary'] = TRUE; and cross our fingers?

Berdir’s picture

For those of us who need deleted files to actually be deleted, what is the best option? Just enable $config['file.settings']['make_unused_managed_files_temporary'] = TRUE; and cross our fingers?

Not cross your fingers, but actively verify that it works for your site and your use cases.

That this setting even exists is a stopgap workaround, to remove it again, we need to fix the bugs around file usage and we need people to test it and report issues they find, if any. FWIW, #2810355: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted is the only one that I'm aware of, we have that setting enabled on quite a lot of production sites and it's working well for us.

If you have a multilingual site, you definitely want to use that.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

hanoii’s picture

I started researching what's going on around this issue. I will likely use 'make_unused_managed_files_temporary' for now, but I was wondering if a module that (as opposed to most of the modules out there that tries to add some smarter capability) that adds a setting to each file field so that you can manually set which file file flags a file for deletion when the entity is removed.

catch’s picture