Problem/Motivation

Managed files entities are not deleted when flush external images, which creates 404s for the images files used elsewhere.

Because imagecache_external_flush_cache() uses file_unmanaged_delete_recursive() to delete files when flushing external images, it removes the files of managed file entities.

Steps To Reproduce

  • Configure Imagecache External to use Managed Files option at path /admin/config/media/imagecache_external
  • Use Imagecache External as normal to fetch and save external images locally
  • Go to path /admin/content/files to view managed files
  • Confirm external images are now managed files
  • Flush external images at path /admin/config/media/imagecache_external/flush
  • Review managed files at path /admin/content/files
  • Confirm external images are still listed as managed files
  • Try to view managed file, observe 404; file has been deleted by flush process

Proposed resolution

Add flushing strategy:

  • Only allow external image flushing for managed files with no usage. In this case, directories will not be deleted either
  • Always delete, with a big warning that broken images might be seen on the site.

User interface changes

Flush strategy option, visible when seletecting the 'Managed' strategy.

API changes

None.

Data model changes

None

CommentFileSizeAuthor
#9 2990703-9.patch16.68 KBswentel

Comments

jasonawant created an issue. See original summary.

jasonawant’s picture

This is also a problem for the 7.x version.

I'm not quite sure what the solution should be as the managed vs. nonmanaged option can be switched. Meaning, I could use the managed file option, which would create file entities; then switch to non-managed option and flush external images causing data loss.

I think the only way to safely flush managed files would be to track managed files created by Imagecache External via FileUsageBase. For example, when saving the file data with file_save_data() in imagecache_external_fetch(), add file usage of imagecache_external via FileUsageBase::add.

Then, when flushing external images, we would need to use a batch process to try to remove usage via FileUsageBase::delete, and let Drupal remove the file if no longer used.

Because a user can switch between managed and nonmanaged file options, this process would need to be run every time flushing external caches regardless of the current set option.

Because of the work involved, we may want to consider a quicker solution to prevent data loss: either removing this feature or adding warnings to the flush external images page.

BarisW’s picture

Hi Jason,

you are completely right, thanks for creating such a good issue report and add the steps to reproduce. Very helpful.

Would it be an idea to add 'managed' or 'unmanaged' in the path when storing the image? So instead of saving images in externals/filename.jpg it would be externals/managed/filename.jpg.

In that case, we can check for the path when deleting images and only delete unmanaged files.

jasonawant’s picture

Hi,

Regarding,

Would it be an idea to add 'managed' or 'unmanaged' in the path when storing the image? So instead of saving images in externals/filename.jpg it would be externals/managed/filename.jpg.

In that case, we can check for the path when deleting images and only delete unmanaged files.

Is this a solution to manage when a user switched from managed to non-managed option? That may work. I think we still want to avoid orphaned image files in managed directory. We would still want to be able to safely flush/delete both managed and non-managed files.

I was proposing to inform Drupal that imagecache_external is using the managed file by using FileUsageBase::add to add file usage of imagecache_external. Then, when flushing external images, we would need to use a batch process to try to remove usage via FileUsageBase::delete, and let Drupal remove the file if no longer used. We would let Drupal delete un-used files as it normally does with managed files that no longer have a usage.

Because a user can switch from managed to non-managed and vice versa, flushing external images would need to try to flush both managed and non-managed every time.

BarisW’s picture

So, added usage info makes total sense for managed files. But how can we delete the unmanaged files without accidentally deleting the managed files?

jasonawant’s picture

Not sure, I'd have to review the code to know the current logic. Why does this module provide both options, managed and non-managed?

If both types are treated differently, the option to switch types becomes the challenge. For example, when switching form non-managed to managed, the non-managed don't become managed do they?

BarisW’s picture

There are valid use cases for both managed and unmanaged. On one site, I have thousands and thousands of external images - performance wise I use the unmanaged filesystem. On another site, external images are using the managed file system so editors can re-use the images on several places.

So, I think we can fix this by:

  1. Adding the filesystem as path so that we can check the uri to determine if we're dealing with managed or unmanaged files
  2. Implement FileUsageBase to track and delete managed files

Agreed?

swentel’s picture

Version: 8.x-1.x-dev » 3.0.x-dev

Moving to 3.0.x branch.

I agree this is a critical bug when using managed files.

Few things that come to mind:

  • prevent switching to unmanged when there are managed files
  • prevent changing the externals directory (related to unmanaged as well) when there are (un)managed files
  • allow flushing, but only if the managed files has no usages (normally core will tell us if the file is used somewhere else)
  • redownload the file when called - seems less hard then maintaining usage ourselves, but I could be totally wrong of course. Also, since the 'call' is not going through the api of the module, we'd have to hook in somewhere else, that might trigger overhead again of course, but it's worth checking out I think
  • add tests - I've added a dedicated issue already for that for the existing ones, so we can add new ones for this and other issues.
swentel’s picture

Status: Active » Needs review
StatusFileSize
new16.68 KB

Ok, here's a patch that introduces a flushing strategy when 'managed' storage is used:

  • Only delete if file has no usage
  • Always delete

The default is the first option, which means that files will be deleted from the server and db when the file has no usages.

The second option always deletes everything, including the directories (which does not happy for the first strategy) in which case broken images may occur on the site, but the user knows that may happen (there's a warning in the description of the option).

These options are also hidden in case the unmanaged strategy is used.

Even though it comes with tests, this patch can use some serious reviews and double testing from people :)

swentel’s picture

Issue summary: View changes
swentel’s picture

Issue summary: View changes
swentel’s picture

Priority: Critical » Major

Moving to major as most people use the unmanaged strategy.