Problem/Motivation
Managed files entities are not deleted when flush external images, which creates 404s for the images files used elsewhere.
Because imagecache_external_flush_cache() uses file_unmanaged_delete_recursive() to delete files when flushing external images, it removes the files of managed file entities.
Steps To Reproduce
- Configure Imagecache External to use Managed Files option at path /admin/config/media/imagecache_external
- Use Imagecache External as normal to fetch and save external images locally
- Go to path /admin/content/files to view managed files
- Confirm external images are now managed files
- Flush external images at path /admin/config/media/imagecache_external/flush
- Review managed files at path /admin/content/files
- Confirm external images are still listed as managed files
- Try to view managed file, observe 404; file has been deleted by flush process
Proposed resolution
Add flushing strategy:
- Only allow external image flushing for managed files with no usage. In this case, directories will not be deleted either
- Always delete, with a big warning that broken images might be seen on the site.
User interface changes
Flush strategy option, visible when seletecting the 'Managed' strategy.
API changes
None.
Data model changes
None
| Comment | File | Size | Author |
|---|---|---|---|
| #9 | 2990703-9.patch | 16.68 KB | swentel |
Comments
Comment #2
jasonawantThis is also a problem for the 7.x version.
I'm not quite sure what the solution should be as the managed vs. nonmanaged option can be switched. Meaning, I could use the managed file option, which would create file entities; then switch to non-managed option and flush external images causing data loss.
I think the only way to safely flush managed files would be to track managed files created by Imagecache External via FileUsageBase. For example, when saving the file data with file_save_data() in imagecache_external_fetch(), add file usage of imagecache_external via FileUsageBase::add.
Then, when flushing external images, we would need to use a batch process to try to remove usage via FileUsageBase::delete, and let Drupal remove the file if no longer used.
Because a user can switch between managed and nonmanaged file options, this process would need to be run every time flushing external caches regardless of the current set option.
Because of the work involved, we may want to consider a quicker solution to prevent data loss: either removing this feature or adding warnings to the flush external images page.
Comment #3
BarisW commentedHi Jason,
you are completely right, thanks for creating such a good issue report and add the steps to reproduce. Very helpful.
Would it be an idea to add 'managed' or 'unmanaged' in the path when storing the image? So instead of saving images in externals/filename.jpg it would be externals/managed/filename.jpg.
In that case, we can check for the path when deleting images and only delete unmanaged files.
Comment #4
jasonawantHi,
Regarding,
Is this a solution to manage when a user switched from managed to non-managed option? That may work. I think we still want to avoid orphaned image files in managed directory. We would still want to be able to safely flush/delete both managed and non-managed files.
I was proposing to inform Drupal that imagecache_external is using the managed file by using FileUsageBase::add to add file usage of imagecache_external. Then, when flushing external images, we would need to use a batch process to try to remove usage via FileUsageBase::delete, and let Drupal remove the file if no longer used. We would let Drupal delete un-used files as it normally does with managed files that no longer have a usage.
Because a user can switch from managed to non-managed and vice versa, flushing external images would need to try to flush both managed and non-managed every time.
Comment #5
BarisW commentedSo, added usage info makes total sense for managed files. But how can we delete the unmanaged files without accidentally deleting the managed files?
Comment #6
jasonawantNot sure, I'd have to review the code to know the current logic. Why does this module provide both options, managed and non-managed?
If both types are treated differently, the option to switch types becomes the challenge. For example, when switching form non-managed to managed, the non-managed don't become managed do they?
Comment #7
BarisW commentedThere are valid use cases for both managed and unmanaged. On one site, I have thousands and thousands of external images - performance wise I use the unmanaged filesystem. On another site, external images are using the managed file system so editors can re-use the images on several places.
So, I think we can fix this by:
Agreed?
Comment #8
swentel commentedMoving to 3.0.x branch.
I agree this is a critical bug when using managed files.
Few things that come to mind:
Comment #9
swentel commentedOk, here's a patch that introduces a flushing strategy when 'managed' storage is used:
The default is the first option, which means that files will be deleted from the server and db when the file has no usages.
The second option always deletes everything, including the directories (which does not happy for the first strategy) in which case broken images may occur on the site, but the user knows that may happen (there's a warning in the description of the option).
These options are also hidden in case the unmanaged strategy is used.
Even though it comes with tests, this patch can use some serious reviews and double testing from people :)
Comment #10
swentel commentedComment #11
swentel commentedComment #12
swentel commentedMoving to major as most people use the unmanaged strategy.