Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
HttpFetcher stores downloaded feeds in temporary://. file_cron() deletes old files from temporary:// but only managed files (with status != permanent). The Feeds files will stay forever unless they are deleted externally (e.g. server reboot?)
Proposed resolution
Delete feeds files during a cleanup phase in cron?
Remaining tasks
User interface changes
API changes
Data model changes
Comment | File | Size | Author |
---|---|---|---|
#15 | 2778373-15.patch | 6.69 KB | Qusai Taha |
#13 | 2778373-12.patch | 6.72 KB | epapaniko |
#11 | 2778373-11.patch | 6.81 KB | Odai Atieh |
Issue fork feeds-2778373
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
BerdirNon-permament files and (unmanaged) files in the temporary:// stream wrapper have little to nothing in common. Unlike the first, you can't really purge unmanaged files later on, as you don't know their file names, they are randomly generated.
Instead, the module should attempt to automatically delete them when the import finished. The easiest method seems to be to implement a destruct method that would delete the file. Possibly combined with a flag that can be set in the constructor to make it clear that is indeed a temporary file. Not sure if that's always the case.
I also think this is a task or maybe even a bug. We had hundreds of those files in our temp folder.
Comment #3
bkosborneWonder if this is still relevant after #2912130: Missing temporary files in load balanced environments gets resolved. We won't use temp file storage anymore after that, instead will use public/private file system. Though I imagine these files still need to be cleaned up after that. I think this should still be blocked on that issue.
Comment #4
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedIt depends on how exactly #2912130: Missing temporary files in load balanced environments is resolved, but if it is resolved using the proposed resolution there, then this issue will still be relevant after that. When an import is abruptly aborted for example, then files would remain on the public/private file system.
Comment #5
scottop CreditAttribution: scottop as a volunteer commentedSo is this still not fixed? I am having this problem as well, using version: '8.x-3.0-alpha9'. My tmp directory is full of feeds files. But they are all called feeds*, so could you just delete those files when you finish?
Comment #6
Bohus UlrychHi all,
today I realized that my /tmp folder is as well full of feeds_hhtp_fetcher* files. On the small VM (with small disk space) it could become problem very quickly.
Would it be possible to implement such auto-clean up? I believe this is not big deal implement this.
This ticket was reported in Aug 2016, related in #2912130 in Sept 2017 - both still not solved. But to have this one done would be big help at least for the 9 users following this issue (and probably some others too).
Workarounds like files deletion using e.g. crontab works, but ...
Thanks
Comment #7
MegaChriz CreditAttribution: MegaChriz at WebCoo commented@Bohus Ulrych
Yes, this is still planned to be fixed. Since Feeds is mostly volunteer work I can only fix a few issues a month. Last year the focus had been on multilingual support and on D9 support. Now the focus is on fixing the beta blockers, from which there is only one left: #2938505: Add UI for editing and removing custom (CSV) sources (though it may be broken down into a few smaller issues).
This issue is meant to get addressed in the next series: the "Stable release blockers".
Comment #10
cosolom CreditAttribution: cosolom commentedI added code for move downloaded file as temporary to a files managed table. Now all downloded files must be purged after system.file.temporary_maximum_age (default 6h)
Comment #11
Odai AtiehThe saved URI was wrong, so I fixed the file URI.
Comment #13
epapaniko CreditAttribution: epapaniko at EWORX S.A. commentedModified patch based on #10 and #11 that takes into account that the temp directory might be relative to the Drupal root.
Comment #14
cosolom CreditAttribution: cosolom commentedComment #15
Qusai Taha CreditAttribution: Qusai Taha at Vardot commentedRe-roll the patch to be compatible with feeds version 8.x-3.0-alpha 11
Comment #17
bkosborneThe proposed patch creates a proper File entity for the file (making it a managed file) and marks it temporary. If a site is configured to auto delete temporary files, it will delete them after some defined period of time via cron. This is probably not the best solution. What guarantee do we have that Feeds is still not using the file? Isn't the purpose of the file so that Feeds can reference it during long running imports spread across multiple cron runs?
Comment #18
darkodev CreditAttribution: darkodev commentedI agree with @bkosborne that making these managed tmp files risks clobbering files that are not fully processed by Feeds queue.
As @berdir said (6 years ago),
"Instead, the module should attempt to automatically delete them when the import finished. The easiest method seems to be to implement a destruct method that would delete the file. Possibly combined with a flag that can be set in the constructor to make it clear that is indeed a temporary file. Not sure if that's always the case."
Comment #19
MegaChriz CreditAttribution: MegaChriz at WebCoo commentedI've implemented a possible solution for this as part of an other issue in #2912130: Missing temporary files in load balanced environments. In there a
cleanUp()
method is added to the fetcher result which is called after finishing an import. It will clean up all files in "private://feeds/in_progress/[Feed ID]" as well when unlocking a feed.Comment #20
MegaChriz CreditAttribution: MegaChriz at WebCoo commentedThis should be fixed as part of #2912130: Missing temporary files in load balanced environments.