Problem/Motivation

HttpFetcher stores downloaded feeds in temporary://. file_cron() deletes old files from temporary:// but only managed files (with status != permanent). The Feeds files will stay forever unless they are deleted externally (e.g. server reboot?)

Proposed resolution

Delete feeds files during a cleanup phase in cron?

Remaining tasks

User interface changes

API changes

Data model changes

Issue fork feeds-2778373

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Arla created an issue. See original summary.

Berdir’s picture

Category: Feature request » Task

Non-permament files and (unmanaged) files in the temporary:// stream wrapper have little to nothing in common. Unlike the first, you can't really purge unmanaged files later on, as you don't know their file names, they are randomly generated.

Instead, the module should attempt to automatically delete them when the import finished. The easiest method seems to be to implement a destruct method that would delete the file. Possibly combined with a flag that can be set in the constructor to make it clear that is indeed a temporary file. Not sure if that's always the case.

I also think this is a task or maybe even a bug. We had hundreds of those files in our temp folder.

bkosborne’s picture

Wonder if this is still relevant after #2912130: Missing temporary files in load balanced environments gets resolved. We won't use temp file storage anymore after that, instead will use public/private file system. Though I imagine these files still need to be cleaned up after that. I think this should still be blocked on that issue.

MegaChriz’s picture

It depends on how exactly #2912130: Missing temporary files in load balanced environments is resolved, but if it is resolved using the proposed resolution there, then this issue will still be relevant after that. When an import is abruptly aborted for example, then files would remain on the public/private file system.

scottop’s picture

So is this still not fixed? I am having this problem as well, using version: '8.x-3.0-alpha9'. My tmp directory is full of feeds files. But they are all called feeds*, so could you just delete those files when you finish?

Bohus Ulrych’s picture

Hi all,
today I realized that my /tmp folder is as well full of feeds_hhtp_fetcher* files. On the small VM (with small disk space) it could become problem very quickly.
Would it be possible to implement such auto-clean up? I believe this is not big deal implement this.
This ticket was reported in Aug 2016, related in #2912130 in Sept 2017 - both still not solved. But to have this one done would be big help at least for the 9 users following this issue (and probably some others too).
Workarounds like files deletion using e.g. crontab works, but ...
Thanks

MegaChriz’s picture

@Bohus Ulrych
Yes, this is still planned to be fixed. Since Feeds is mostly volunteer work I can only fix a few issues a month. Last year the focus had been on multilingual support and on D9 support. Now the focus is on fixing the beta blockers, from which there is only one left: #2938505: Add UI for editing and removing custom (CSV) sources (though it may be broken down into a few smaller issues).

This issue is meant to get addressed in the next series: the "Stable release blockers".

cosolom made their first commit to this issue’s fork.

cosolom’s picture

Status: Active » Needs review

I added code for move downloaded file as temporary to a files managed table. Now all downloded files must be purged after system.file.temporary_maximum_age (default 6h)

Odai Atieh’s picture

The saved URI was wrong, so I fixed the file URI.

Status: Needs review » Needs work

The last submitted patch, 11: 2778373-11.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

epapaniko’s picture

Modified patch based on #10 and #11 that takes into account that the temp directory might be relative to the Drupal root.

cosolom’s picture

Status: Needs work » Needs review
Qusai Taha’s picture

Re-roll the patch to be compatible with feeds version 8.x-3.0-alpha 11

Status: Needs review » Needs work

The last submitted patch, 15: 2778373-15.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

bkosborne’s picture

The proposed patch creates a proper File entity for the file (making it a managed file) and marks it temporary. If a site is configured to auto delete temporary files, it will delete them after some defined period of time via cron. This is probably not the best solution. What guarantee do we have that Feeds is still not using the file? Isn't the purpose of the file so that Feeds can reference it during long running imports spread across multiple cron runs?

darkodev’s picture

I agree with @bkosborne that making these managed tmp files risks clobbering files that are not fully processed by Feeds queue.

As @berdir said (6 years ago),

"Instead, the module should attempt to automatically delete them when the import finished. The easiest method seems to be to implement a destruct method that would delete the file. Possibly combined with a flag that can be set in the constructor to make it clear that is indeed a temporary file. Not sure if that's always the case."

MegaChriz’s picture

I've implemented a possible solution for this as part of an other issue in #2912130: Missing temporary files in load balanced environments. In there a cleanUp() method is added to the fetcher result which is called after finishing an import. It will clean up all files in "private://feeds/in_progress/[Feed ID]" as well when unlocking a feed.

MegaChriz’s picture

Status: Needs work » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.