Missing temporary files in load balanced environments [#2912130]

Problem/Motivation

When an import needs multiple cron runs to complete and when using the HTTP Fetcher, Feeds uses the temporary directory to temporary store the file that is being imported.

On some servers, the default configured temporary directory is not static. It can change per request. This can for example be the case on Acquia and Pantheon. This is problematic because if the temporary directory is changed between two import cycles, Feeds looses track of the file being imported and therefore cannot finish the import.

Proposed resolution

Do not use the temporary directory to temporary store a file to import, but use private://feeds/in_progress instead. The D7 version of Feeds already does this.

The first step however is adding test coverage for importing data that is using multiple calls/requests. Right now, there is only test coverage for importing data in one go. There are already tests written for this case for the D7 version of Feeds. It probably is a good idea to port these over. These tests can be found in the class 'FeedsFileHTTPTestCase'. The following methods from that class would be the most interesting to port over:

::setUpMultipleCronRuns()
::getInProgressFile()
::testImportSourceWithMultipleCronRuns() - ported in #3044983: Add test coverage for importing using multiple cron runs
::testAbortImportWhenTemporaryFileIsDeleted()

Additionally, there seems to be no warnings logged when a file that is in progress being imported is suddenly missing.

Workaround

A workaround is to set a different path for the temporary directory, one that is static. In settings.php:
$config['system.file']['path']['temporary'] = '/your/temporary/directory';
Be sure to change '/your/temporary/directory' to the directory of your choice.

Remaining tasks

~~Write test coverage for importing data that needs multiple cron runs to complete.~~
Write test coverage for aborting an import because of a missing temporary file.
Log a warning when the file that is progress of being imported no longer exists.
Move the temporary file to private://feeds/in_progress.
Write test coverage for removing the temporary file after import (this task could also be handled in #2778373: Auto-purge old temporary feed downloads).
Add a setting for the in progress directory called "in_progress_dir". Make it part of a config object called "feeds.settings".

Original report by bburg

I'm running several sites using the Pantheon platform on an elite plan. These sites use feeds to import content from a central site. They also run in a load-balanced set up, so multiple instances of Drupal running on separate servers with a shared database.

The problem is that on cron, the Feeds module will download the feed source (an XML document from the central site), and store it in the temporary directory. It appears that the parser isn't triggered immediately, but instead added to the queue. The queue entry references the temporary file in the full path. On pantheon, the temporary directory may have an inconsistent path. e.g. /srv/bindings/580b80cb8cb344dfa6f2c1059d896835/tmp. The hashed number I believe is something called the "Pantheon Binding." Which I assume is a reference to the current container, but I'm not certain. So, on subsequent cron runs, the queue processor may end up trying to access a the temporary file that no longer exists as the Pantheon Binding has changed since the queue job was saved to the queue table. This results in an accumulation of queue items, that point to non-existent temporary files, that will never be removed, since the queue job doesn't remove them if the temporary file is missing. This leads to a lot of errors like this one, which also seem to cause performance issues during cron:

RuntimeException: File /srv/bindings/c1e9936fd5b347d7ae98b1226d7724f0/tmp/feeds_http_fetcherELdIb4 does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /srv/bindings/580b80cb8cb344dfa6f2c1059d896835/code/web/modules/contrib/feeds/src/Result/FetcherResult.php).

I'm using a snapshot of the dev branch from August 2016. So if this problem has already been addressed in a previous issue, my apologies. Upgrading to a newer snapshot seems to lead to several errors...

Pantheon support has suggested that we change the temporary files directory to point to a path on the shared, networked file system, under sites/default/files/tmp, which they have documented how to do under Temporary File Management with Multiple Application Containers: Using Valhalla, Pantheon's Networked Filesystem. I'll link their support team to this ticket as well so they can make any comments and clarify the issue. Our next scheduled deployment window isn't until early October, So I won't be able to test that suggestion in production until then.

Also, to address the stuck queue items, I think the only option is to delete all queue items whose name matches "feeds_feed_parse%". So adding an update hook with the following:

  db_delete('queue')
    ->condition('name', 'feeds_feed_parse%', 'like')
    ->execute();

(db_delete() is deprecated, but I don't know the new method yet). This might be something to address in this module. Also, for the issue itself. Might it be better to store the file reference in the queue item as a filestream wrapper instead? e.g. temporary://feeds_http_fetcherELdIb4.

I'm not sure if supporting such environments is really in the scope of this project (why I've filed this as a support request). I'm really filing this issue here in order to improve the search-ability of the problem in general in case other people are running into it.

Comment	File	Size	Author
#136	feeds_s3fs.patch	1.25 KB	rdivince
#136	8.x-3.x: PHP 7.4 & MySQL 5.7, D9.5.2 Patch Failed to Apply
#114	feeds--temporary-files-handling--2912130-114.patch	9.69 KB	someshver
#114	8.x-3.x: PHP 7.4 & MySQL 5.7, D9.3.17 Patch Failed to Apply
#110	feeds--temporary-files-handling--2912130-110.patch	8.37 KB	jmcerda
#110	8.x-3.x: PHP 7.4 & MySQL 5.7, D9.3.17 563 pass PHP 7.3 & PostgreSQL 10.12, D9.4.1 Patch Failed to Apply
#107	feeds--temporary-files-handling--2912130-107-D8.patch	8.37 KB	meladawy
#107	8.x-3.x: PHP 7.4 & MySQL 5.7, D9.5 Patch Failed to Apply PHP 8.1 & MySQL 5.7, D9.4 Patch Failed to Apply PHP 8.0 & MySQL 5.7, D9.4 Patch Failed to Apply PHP 7.4 & MySQL 5.7, D9.4 Patch Failed to Apply
#82	interdiff-2912130-81-82.txt	1.51 KB	MegaChriz
#82	feeds-in-progress-test-2912130-82.patch	4.31 KB	MegaChriz
#82	8.x-3.x: PHP 7.3 & MySQL 5.7, D8.9 491 pass PHP 7.2 & MySQL 5.5, D8.9 491 pass PHP 7.4 & MySQL 5.7, D9.2 491 pass
#81	interdiff-2912130-80-81.txt	1.18 KB	MegaChriz
#81	feeds-in-progress-test-2912130-81.patch	3.97 KB	MegaChriz
#81	8.x-3.x: PHP 7 & MySQL 5.5, D8.9 491 pass
#80	feeds-2912130-80-test-cases.patch	3.57 KB	mstrelan
#80	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 488 pass, 1 fail
#79	feeds-2912130-79-test-cases.patch	3.57 KB	mstrelan
#79	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 488 pass, 2 fail
#73	interdiff_62-73.txt	1021 bytes	mstrelan
#73	feeds--temporary-files-handling--2912130-73-D8.patch	8.45 KB	mstrelan
#73	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 437 pass PHP 8.0 & MySQL 5.7, D9.4 Patch Failed to Apply
#62	feeds--temporary-files-handling--2912130-62-D8.patch	8.42 KB	bburg
#62	8.x-3.x: PHP 7.3 & MySQL 5.7, D8.9 429 pass
#51	feeds--temporary-files-handling--2912130-51-D8.patch	8.44 KB	n4r3n
#51	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 370 pass
#49	feeds--temporary-files-handling--2912130-49-D8.patch	5.95 KB	n4r3n
#49	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 370 pass
#48	feeds--temporary-files-handling--2912130-48-D8.patch	4.4 KB	n4r3n
#48	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 365 pass, 2 fail
#39	feeds--temporary-files-handling--2912130-39-D8.patch	11.24 KB	merauluka
#39	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.7 320 pass, 7 fail PHP 7.1 & MySQL 5.5, D8.6 350 pass, 5 fail
#37	feeds--temporary-files-handling--2912130-37-D8.patch	11.22 KB	merauluka
#37	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.7 300 pass, 48 fail
#28	feeds-temporary-files-path-2912130-23-D8.patch	9.82 KB	Fant0m
#28	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 336 pass PHP 7.2 & MySQL 5.5, D8.7 361 pass
#25	feeds-temporary-files-path-2912130-22-D8.patch	9.83 KB	Fant0m
#25	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail
#23	feeds-temporary-files-path-2912130-21-D8.patch	9.42 KB	Fant0m
#23	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail
#22	feeds-temporary-files-path-2912130-20-D8.patch	6.48 KB	Fant0m
#22	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 Patch Failed to Apply
#21	feeds-temporary-files-path-2912130-19-D8.patch	10.07 KB	Fant0m
#21	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail
#20	feeds-temporary-files-path-2912130-19-D8.patch	10 KB	Fant0m
#20	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail
#19	feeds-temporary-files-path-2912130-18-D8.patch	9.21 KB	Fant0m
#19	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 Patch Failed to Apply
#18	feeds-temporary-files-path-2912130-17-D8.patch	8.16 KB	Fant0m
#18	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 331 pass, 2 fail
#17	feeds-temporary-files-path-2912130-D8.patch	7.01 KB	Fant0m
#17	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 297 pass, 36 fail
#16	feeds_temporary-files-path_2912130-15-D8.patch	7.46 KB	Fant0m
#16	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 Patch Failed to Apply PHP 5.6 & MySQL 5.5, D8.6 Patch Failed to Apply
#14	feeds_temporary-files-path_2912130-12-D8.patch	7.88 KB	Fant0m
#14	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.6 326 pass, 2 fail
#12	feeds_temporary-files-path_2912130-12-D8.patch	4.84 KB	artematem
#12	8.x-3.x: PHP 7 & MySQL 5.5, D8.6 327 pass, 2 fail
#11	feeds_temporary-files-path_2912130-11-D8.patch	4.83 KB	artematem
#11	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 327 pass, 2 fail PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail
#10	feeds_temporary-files-path_2912130-10-D8.patch	4.82 KB	artematem
#10	8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 297 pass, 30 fail PHP 7.2 & MySQL 5.5, D8.6 297 pass, 30 fail
#9	feeds_temporary-files-path_2912130-9-D8.patch	4.35 KB	artematem
#9	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 297 pass, 30 fail
#7	2912130.patch	4.14 KB	Fant0m
#7	8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail

Issue fork feeds-2912130

Show commands

Start within a Git clone of the project using the version control instructions.

Add & fetch this issue fork’s repository

Or, if you do not have SSH keys set up on git.drupalcode.org:

Add & fetch this issue fork’s repository

2912130-in-progress-directory changes, plain diff MR !91
Check out this branch
2912130-missing-temporary-files compare
Check out this branch

About issue forks

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

27 September 2017 at 14:31

bburg created an issue. See original summary.

Comment #2

bburg

he/him

Washington D.C.

CreditAttribution: bburg at Forum One commented 27 September 2017 at 14:35

Issue summary:

View changes

Comment #3

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 27 September 2017 at 14:53

@bburg
This has not yet been addressed. In the latest dev temporary files are still saved to temporary://. I think this is wrong and it would be better if Feeds would save it in a 'in progress' directory. In the D7 version of Feeds a similar issue exists: in that version, Feeds saves temporary files in 'public://feeds'. This file is then removed after six hours, regardless of if the import has been completed or not.

In the D7 version I have been experimenting with saving a temporary file in 'private://feeds/in_progress' instead. See #1029102: Importing Large CSV document (downloaded and processed in the background). The file gets removed after import has been completed. Maybe it is possible to port that idea to D8?

A similar issue for D8 Feeds (maybe even the same problem): #2778373: Auto-purge old temporary feed downloads

Comment #4

bburg

he/him

Washington D.C.

CreditAttribution: bburg at Forum One commented 6 December 2017 at 18:04

Category:

Support request

» Task

I've been thinking more about this issue lately. We deployed in October the update to the temporary files directory location. We are seeing some other issues alleviated, but we are still seeing the Feeds errors. I'm not sure saving to temporary:// is an issue per se, it seems that even with the new location, these are still added as queue items, that reference the absolute path of the temporary file, which can vary between application containers as the path contains the container id.

I suspect that the only reason this is working now in the setup is that it occasionally runs the fetch, and the queue processing in the same container. But purging the broken queue items would likely be helpful.

If changing the directory from temporary:// is something you were already considering, I'll change the category of this issue to be a task. I'd offer to assist, but I'm still pretty unfamiliar with this code and D8 module architecture.

Comment #5

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 6 December 2017 at 18:37

Yes, moving from temporary:// to private://feeds/in_progress is what I'm considering, with the reason being that files may not exist long enough in temporary:// to complete an import. On my local machine for example, temporary:// is set to '/tmp', which is cleared after every machine restart.

For this issue to move forward, we will likely need automated tests. Feeds only has unit tests right now, but I think this issue could use a functional test (or maybe a kernel test - which I'm not familiar with yet).

If you like to help on this issue or help with the development in general, maybe you want to help with writing a base class for tests? See #2917925: Port remaining methods from test base class in D7. That's probably easier to do and helps to build foundation for this issue. Or other things mentioned in #2917928: Plan for 8.x-3.0-alpha1 release? Anyway, it's unlikely I will put time in this issue before 8.x-3.0-alpha1 is done.

Comment #6

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 2 November 2018 at 13:32

Issue summary:

View changes

Closed #3010908: Articles are not being imported to production Acquia site, possibly because /tmp folder is not on the same container between requests as a duplicate. Some relevant info from there:

From @alberto56:

I think this may be due to the fact that on certain servers, Acquia among them, the production server is in fact a cluster of containers which share /var/www/html/sites/default/files, but do not share /tmp. A quick look at your code suggests you are using the /tmp folder in a batch process between calls, to store the raw feed.

The problem might be that in one call, we are storing the raw feed in the /tmp folder of one container in the cluster, and the second call, which might or might not hit the same cluster, the /tmp folder might no longer contain that raw data.

An issue which might be related on the webform module was fixed in #2980276: Webform assumes the /tmp directory is always the same, but if there are multiple servers, each may have its own /tmp directory

From @MegaChriz:

My idea for fixing this is by not using the temporary directory for temporary storing the file to import, but use private://feeds/in_progress instead.

The first step however is adding test coverage for importing data that is using multiple calls/requests. Right now, there is only test coverage for importing data in one go. There are already tests written for this case for the D7 version of Feeds. It probably is a good idea to port these over. These tests can be found in the class 'FeedsFileHTTPTestCase'. The following methods from that class would be the most interesting to port over:

::setUpMultipleCronRuns()

::getInProgressFile()

::testImportSourceWithMultipleCronRuns()

::testAbortImportWhenTemporaryFileIsDeleted()

A workaround for your issue would be to set a different path for the temporary directory as also suggested in #2980276-18: Webform assumes the /tmp directory is always the same, but if there are multiple servers, each may have its own /tmp directory:
$config['system.file']['path']['temporary'] = '/your/temporary/directory';

The watchdog has no entries. Is there a debug mode? How would you suggest that I try to figure out what is wrong?

Unfortunately, there is no debug mode in Feeds D8. PHP errors during an import would be logged, but I think there are no log entries for when a file to import has gone missing. It would be good idea to find out if we can make sure that a warning is logged in this case.

Comment #7

Fant0m CreditAttribution: Fant0m at FFW commented 4 January 2019 at 10:16

File	Size
2912130.patch	4.14 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail

Add fast patch to have possibility using public directory for temp feeds files. In admin config path: '/admin/config/feeds_settings' add Settings form where you can check "Use public directory for temporary feed files."

Comment #8

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 4 January 2019 at 10:32

@Fant0m
Thanks for giving this a go!

+++ b/src/Feeds/Fetcher/HttpFetcher.php
@@ -81,7 +81,15 @@ class HttpFetcher extends PluginBase implements ClearableInterface, FetcherInter
+      // Remove all files by mask before imported files as not needed any more.
+      array_map('unlink', glob($this->fileSystem->realpath($destination) . "/feeds_http_fetcher*"));

Note that this may disrupt other unfinished imports, since you are deleting all files in the feeds_http_fetcher directory here.

Using a checkbox for the "feeds_temporary_dir" is fine as a temporary solution, but ideally it would be textfield so people can choose the private directory as well.

Comment #9

artematem CreditAttribution: artematem at FFW commented 8 January 2019 at 14:17

File	Size
feeds_temporary-files-path_2912130-9-D8.patch	4.35 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 297 pass, 30 fail

OK, as a first version of patch. Added default feeds.system.yml file to see if it help to pass the tests.

Comment #10

artematem CreditAttribution: artematem at FFW commented 8 January 2019 at 14:36

File	Size
feeds_temporary-files-path_2912130-10-D8.patch	4.82 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 297 pass, 30 fail PHP 7.2 & MySQL 5.5, D8.6 297 pass, 30 fail

1 file was hidden/shown/deleted

File	Size
feeds_temporary-files-path_2912130-9-D8.patch	4.35 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 297 pass, 30 fail

Add schema for new settings.

Comment #11

artematem CreditAttribution: artematem at FFW commented 8 January 2019 at 15:04

File	Size
feeds_temporary-files-path_2912130-11-D8.patch	4.83 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 327 pass, 2 fail PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail

1 file was hidden/shown/deleted

File	Size
feeds_temporary-files-path_2912130-10-D8.patch	4.82 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 297 pass, 30 fail PHP 7.2 & MySQL 5.5, D8.6 297 pass, 30 fail

Fix default value type.

Comment #12

artematem CreditAttribution: artematem at FFW commented 8 January 2019 at 16:10

File	Size
feeds_temporary-files-path_2912130-12-D8.patch	4.84 KB
8.x-3.x: PHP 7 & MySQL 5.5, D8.6 327 pass, 2 fail

1 file was hidden/shown/deleted

File	Size
feeds_temporary-files-path_2912130-11-D8.patch	4.83 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 327 pass, 2 fail PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail

Save config value in boolean type.

Comment #13

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 9 January 2019 at 11:08

+++ b/src/Feeds/Fetcher/HttpFetcher.php
@@ -81,7 +81,15 @@ class HttpFetcher extends PluginBase implements ClearableInterface, FetcherInter
+    if (\Drupal::config('feeds.settings')->get('feeds_temporary_dir')) {

The reason for the test failure is that the feeds.settings config object is not injected.

Comment #14

Fant0m CreditAttribution: Fant0m at FFW commented 22 February 2019 at 15:03

Status:

Active

» Needs review

File	Size
feeds_temporary-files-path_2912130-12-D8.patch	7.88 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.6 326 pass, 2 fail

2 files were hidden/shown/deleted

File	Size
2912130.patch	4.14 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail PHP 7.2 & MySQL 5.5, D8.6 327 pass, 2 fail
feeds_temporary-files-path_2912130-12-D8.patch	4.84 KB
8.x-3.x: PHP 7 & MySQL 5.5, D8.6 327 pass, 2 fail

Comment #15

22 February 2019 at 15:10

Status:

Needs review

» Needs work

The last submitted patch, 14: feeds_temporary-files-path_2912130-12-D8.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #16

Fant0m CreditAttribution: Fant0m at FFW commented 22 February 2019 at 15:26

File	Size
feeds_temporary-files-path_2912130-15-D8.patch	7.46 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 Patch Failed to Apply PHP 5.6 & MySQL 5.5, D8.6 Patch Failed to Apply

1 file was hidden/shown/deleted

File	Size
feeds_temporary-files-path_2912130-12-D8.patch	7.88 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.6 326 pass, 2 fail

Comment #17

Fant0m CreditAttribution: Fant0m at FFW commented 22 February 2019 at 16:07

File	Size
feeds-temporary-files-path-2912130-D8.patch	7.01 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 297 pass, 36 fail

1 file was hidden/shown/deleted

File	Size
feeds_temporary-files-path_2912130-15-D8.patch	7.46 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 Patch Failed to Apply PHP 5.6 & MySQL 5.5, D8.6 Patch Failed to Apply

Comment #18

Fant0m CreditAttribution: Fant0m at FFW commented 22 February 2019 at 16:26

File	Size
feeds-temporary-files-path-2912130-17-D8.patch	8.16 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 331 pass, 2 fail

1 file was hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-D8.patch	7.01 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 297 pass, 36 fail

Comment #19

Fant0m CreditAttribution: Fant0m at FFW commented 22 February 2019 at 16:50

File	Size
feeds-temporary-files-path-2912130-18-D8.patch	9.21 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 Patch Failed to Apply

1 file was hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-17-D8.patch	8.16 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 331 pass, 2 fail

Comment #20

Fant0m CreditAttribution: Fant0m at FFW commented 22 February 2019 at 17:06

File	Size
feeds-temporary-files-path-2912130-19-D8.patch	10 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail

1 file was hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-18-D8.patch	9.21 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.6 Patch Failed to Apply

Comment #21

Fant0m CreditAttribution: Fant0m at FFW commented 22 February 2019 at 17:22

File	Size
feeds-temporary-files-path-2912130-19-D8.patch	10.07 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail

1 file was hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-19-D8.patch	10 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail

Comment #22

Fant0m CreditAttribution: Fant0m at FFW commented 25 February 2019 at 07:49

File	Size
feeds-temporary-files-path-2912130-20-D8.patch	6.48 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 Patch Failed to Apply

1 file was hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-19-D8.patch	10.07 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail

Comment #23

Fant0m CreditAttribution: Fant0m at FFW commented 25 February 2019 at 08:33

File	Size
feeds-temporary-files-path-2912130-21-D8.patch	9.42 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail

1 file was hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-20-D8.patch	6.48 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 Patch Failed to Apply

Comment #24

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 25 February 2019 at 08:48

@Fant0m
Thanks for working on this issue! It would be great if you could document what you are working on in each patch and what your motivations are for some of the changes. That helps with reviewing the patches.

Comment #25

Fant0m CreditAttribution: Fant0m at FFW commented 25 February 2019 at 09:48

File	Size
feeds-temporary-files-path-2912130-22-D8.patch	9.83 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail

1 file was hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-21-D8.patch	9.42 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail

Trying to fix Unit test.

Comment #26

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 25 February 2019 at 10:58

+++ b/src/Feeds/Fetcher/HttpFetcher.php
@@ -69,19 +79,42 @@ class HttpFetcher extends PluginBase implements ClearableInterface, FetcherInter
+  /**
+   * {@inheritdoc}
+   */
+  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
+    return new static(
+      $configuration,
+      $plugin_id,
+      $plugin_definition,
+      $container->get('config.factory')
+    );
+  }

Feeds currently uses annotation to specify dependent services. So adding (or overriding) a create() method is not going to work here.

We could open a new issue for Feeds to use ContainerFactoryPluginInterface instead of defining dependent services in the annotation.

Comment #27

Fant0m CreditAttribution: Fant0m at FFW commented 25 February 2019 at 11:24

Thank you @MegaChriz.

Comment #28

Fant0m CreditAttribution: Fant0m at FFW commented 25 February 2019 at 11:39

File	Size
feeds-temporary-files-path-2912130-23-D8.patch	9.82 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 336 pass PHP 7.2 & MySQL 5.5, D8.7 361 pass

1 file was hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-22-D8.patch	9.83 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 331 pass, 2 fail

Add to HttpFetcherTest like emulation of container with 'config.factory'.

Comment #29

Fant0m CreditAttribution: Fant0m at FFW commented 25 February 2019 at 11:45

Status:

Needs work

» Needs review

Comment #30

maxdev CreditAttribution: maxdev commented 26 March 2019 at 13:49

Latest patch doesn't work. If you go to Feeds page it throws fatal error.

ArgumentCountError: Too few arguments to function Drupal\feeds\Feeds\Fetcher\HttpFetcher::__construct(), 6 passed and exactly 7 expected in Drupal\feeds\Feeds\Fetcher\HttpFetcher->__construct() (line 85 of modules/contrib/feeds/src/Feeds/Fetcher/HttpFetcher.php).

Comment #31

maxdev CreditAttribution: maxdev commented 26 March 2019 at 13:50

Status:

Needs review

» Needs work

Comment #32

artematem CreditAttribution: artematem at FFW commented 28 March 2019 at 07:12

@maxdev,
Did you try with https://simplytest.me/ ?
I run with Feeds 8.x-3.0-alpha4 + patch #28 and didn't see error.

Comment #33

bburg

he/him

Washington D.C.

CreditAttribution: bburg at Forum One commented 1 April 2019 at 18:45

Just jumping in to make a side comment. Since I changed the temporary directory location to address the issue we are now experiencing an issue where we have accumulated a large number of temporary feeds files that aren't being cleaned out regularly (like 200K+). While the files are very small and not filling up too much space, the quantity of them is causing performance issues when we run backups on Pantheon, taking over 2 hours on some of the worst affected sites.

Comment #34

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 1 April 2019 at 19:05

Issue summary:

View changes

I've opened #3044983: Add test coverage for importing using multiple cron runs for porting ::testImportSourceWithMultipleCronRuns(). A port of this test is also needed for #2978490: Optimize Feeds queue.

Comment #35

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 1 April 2019 at 19:15

@Fant0m

+++ b/src/Feeds/Fetcher/HttpFetcher.php
@@ -69,19 +79,42 @@ class HttpFetcher extends PluginBase implements ClearableInterface, FetcherInter
+  public function __construct(array $configuration, $plugin_id, array $plugin_definition, ClientInterface $client, CacheBackendInterface $cache, FileSystemInterface $file_system, ConfigFactoryInterface $config_factory) {
...
+  /**
+   * {@inheritdoc}
+   */
+  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
+    return new static(
+      $configuration,
+      $plugin_id,
+      $plugin_definition,
+      $container->get('config.factory')
+    );
+  }
+

The number of parameters passed to new static in ::create() does not equal the number of parameters in the constructor.

So this function may be removed, but even better would be to start to change Feeds so that it can use the standard instead of that it is defining parameters in the annotation.

This is likely also needed for #3001376: entity.query deprecated warnings and as such to make Feeds compatible with Drupal 9 in the future.

So I fear we need to take a step back here and fix other things properly first.

Comment #36

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 3 April 2019 at 18:46

Issue summary:

View changes

#3044983: Add test coverage for importing using multiple cron runs is committed!

Comment #37

merauluka CreditAttribution: merauluka at Mediacurrent commented 9 May 2019 at 18:21

File	Size
feeds--temporary-files-handling--2912130-37-D8.patch	11.22 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.7 300 pass, 48 fail

We have been seeing this issue on Acquia now that we've switched over to using the production environment. It's created serious issues for us with feeds constantly jamming up the queue system since they cannot find their temporary files.

I have reworked the patch from #28 and allowed the user to select either the public or private file system for storage. I'm sure there's probably a more comprehensive approach to allowing those stream wrappers, but I wanted to get support for the core options first.

Because my version completely moves away from the use of temporary files, I have updated the logic for file name generation as well. I thought it would be helpful to identify the feeds using the bundle so, at a glance, they can be associated with their appropriate feed. I also added on a new method to the FetcherResult the allows for the removal of a file once it's been processed, and a call to remove that file once the feed has finished processing.

Comment #38

oknate

Greater New York City Area

CreditAttribution: oknate as a volunteer and commented 9 May 2019 at 19:28

The schema for feeds.settings needs to be updated for the changes in #37:

diff --git a/config/install/feeds.settings.yml b/config/install/feeds.settings.yml
new file mode 100644
index 00000000..7fa05cbe
--- /dev/null
+++ b/config/install/feeds.settings.yml
@@ -0,0 +1 @@
+feeds_storage_method: public
diff --git a/config/schema/feeds.schema.yml b/config/schema/feeds.schema.yml
index 4639abdf..65a6374b 100644
--- a/config/schema/feeds.schema.yml
+++ b/config/schema/feeds.schema.yml
@@ -163,3 +163,11 @@ feeds.processor.entity:
 
 feeds.processor.entity:*:
   type: feeds.processor.entity
+
+feeds.settings:
+  type: config_object
+  label: 'Feeds settings'
+  mapping:
+    feeds_temporary_dir:
+      type: boolean
+      label: 'Use public directory for temporary feed files.'

Comment #39

merauluka CreditAttribution: merauluka at Mediacurrent commented 9 May 2019 at 19:42

File	Size
feeds--temporary-files-handling--2912130-39-D8.patch	11.24 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.7 320 pass, 7 fail PHP 7.1 & MySQL 5.5, D8.6 350 pass, 5 fail

1 file was hidden/shown/deleted

File	Size
feeds--temporary-files-handling--2912130-37-D8.patch	11.22 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.7 300 pass, 48 fail

Updated my patch with schema changes. This is still going to break tests. But it will break them differently now?

Comment #40

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 9 May 2019 at 21:04

@merauluka
Thanks for working on this! This is definitively an issue I like to see fixed at some point.

I did made a rough start on this a few weeks ago: I started with copying code from the D7 version. So far I did not came to anything even somewhat workable. The issue I ran into was that tempnam() seems to ignore subdirectories. I wanted the generated file to go to private://feeds/in_progress, but the file kept going to just private://.

So I think that we need to replace tempnam() with something that supports subdirectories.

Comment #41

merauluka CreditAttribution: merauluka at Mediacurrent commented 14 May 2019 at 19:40

@MegaChriz Yeah. When I reviewed the code, it's actually a part of the tempnam process to strip out any directories and just take the file name, append a random string, and save it in temporary storage.

What I did was update the first patch to use either public or private storage (user's choice, default to public), use the same feed file name for the beginning, but I inserted the machine name of the feed type into the name also, then append the random string. This allows a user to see where the stored files came from.

I also had the logic create the files under a subdirectory that is controlled by the module. This was mostly for convenience since allowing the user to create the directory leads to the need for additional discussion/logic around what we would do with files that were in the old directory.

But it's a good start. And I can report that with my patch in place, the issues we were seeing in our load balanced environment, with temporary files being unavailable, are no longer present now that the files are in the private/public directories instead.

Comment #42

merauluka CreditAttribution: merauluka at Mediacurrent commented 2 August 2019 at 23:11

As a follow up comment, this change recently saved a client when some of their feed content was erroring due to changes made by a feed provider. We were able to address the changes and the cached copies of the feeds in the private directory in Drupal allowed us to retrieved almost 2 months worth of missing data.

If the files were still stored in the temporary directory, they would more than likely have been removed before we had a resolution to the issue in place.

Comment #43

bburg

he/him

Washington D.C.

CreditAttribution: bburg at Forum One commented 9 August 2019 at 11:50

Patch #3 works for me, but one observation. Without a default value getting set in an existing installation. The directory just defaulted to one with just a colon ":" as the name directly in the Drupal root.

Comment #44

jorgediazhav CreditAttribution: jorgediazhav as a volunteer commented 26 August 2019 at 23:03

@bburg, you mean it only works when you do not set a value? By just applying the patch and clearing the cache?

Comment #45

merauluka CreditAttribution: merauluka at Mediacurrent commented 9 September 2019 at 22:15

1 file was hidden/shown/deleted

File	Size
feeds--temporary-files-handling--2912130-39-D8.patch	11.24 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.7 320 pass, 7 fail PHP 7.1 & MySQL 5.5, D8.6 350 pass, 5 fail

i have run into an issue with my patch. Before my patch, the feeds module relied on the fact that the system periodically empties the temporary folder to handle left over files that were created during an HTTP Fetch operation.

I added the option to move these files to public/private storage instead and added a clean up action to remove the file once the queued articles have been processed.

There are two issues:

If there is an error, the files are still created and never removed. #3080098: Delete Empty Files Created If Empty/Error Exception is Returned
If the user uses the UI instead of drush/cron to import feeds, the file is never deleted because the UI uses batch. Because these batches can be interrupted at any time, even if a mechanism was in place to remove the file once the import was complete, the file has the possibility of never being cleared.

I am withdrawing my recommendation to use my patches until a unified file clean up option is available.

Comment #46

merauluka CreditAttribution: merauluka at Mediacurrent commented 9 September 2019 at 22:17

Comment #47

bburg

he/him

Washington D.C.

CreditAttribution: bburg at Forum One commented 11 September 2019 at 20:01

I noticed that too. The default /tmp directory is usually cleared about by the OS, so using a custom one will require a separate process.

We implemented a cron hook to kick off a process to periodically clear out the files:

function my_module_cron() {
  // Function to confirm the presence of the temporary directory, and if not, create it, which may not exist in newly instantiated dev environments.
  my_module_verify_temp_directory();

  $tmp = \Drupal::config('system.file')->get('path.temporary');
  
  // Call function to clear out temporary files.
  my_module_tmp_file_delete(500, $tmp);
  my_module_tmp_file_delete(500, $tmp . '/feeds-ingestion');
}

/**
 * Delete temporary files that are older than a week
 */
function my_module_tmp_file_delete($limit = 500, $path = false) {

  if (!$path) {
    return false;
  }
  if(!is_dir($path)) {
    return false;
  }

  $file = opendir($path);
  if($file === false) {
    return false;
  }

  $fileList = array();
  $count = 0;

  while($entry = readdir($file) && $count < $limit)
  {
    $entry = readdir($file);

    // Avoid adding directories to be deleted.
    if(!is_dir($entry)) {
      $fileList[] = $path . '/' .$entry;
      $count++;
    }
  }

  closedir($file);
  $now = time();
  array_multisort(array_map( 'filemtime', $fileList ), SORT_NUMERIC, SORT_ASC, $fileList);
  $count = 0;
  $deleted_files = [];

  foreach($fileList as &$file)
  {
    if ($now - filemtime($file) >= 60 * 60 * 24 * 7)
    {
      if (file_exists($file))
      {
        if(!is_dir($file))
        {
          $deleted_files[] = $file;
          if ( unlink($file)) {
            $count++;
          }
        }
      }
    }
  }

  \Drupal::logger('my_module_cron')->notice("Deleted $count files." . '<ul><li>' . implode('</li><li>', $deleted_files) . '</li></ul>');
}

Comment #48

n4r3n

he/him

Goa

CreditAttribution: n4r3n commented 13 January 2020 at 12:11

File	Size
feeds--temporary-files-handling--2912130-48-D8.patch	4.4 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 365 pass, 2 fail

This patch worked for me.

Comment #49

n4r3n

he/him

Goa

CreditAttribution: n4r3n commented 13 January 2020 at 12:45

File	Size
feeds--temporary-files-handling--2912130-49-D8.patch	5.95 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 370 pass

Updating HttpFetcherTest case.

Comment #50

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 13 January 2020 at 13:24

+++ b/src/Feeds/Fetcher/HttpFetcher.php
@@ -96,7 +109,15 @@ class HttpFetcher extends PluginBase implements ClearableInterface, FetcherInter
+      $destination = 'public://';

Note that by using the public:// destination, you are potentially opening a security hole. Files stored in this directory could in theory be read by anonymous users. They must guess the filename, however.

Comment #51

n4r3n

he/him

Goa

CreditAttribution: n4r3n commented 15 January 2020 at 11:48

File	Size
feeds--temporary-files-handling--2912130-51-D8.patch	8.44 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 370 pass

This patch worked for me

Comment #52

n4r3n

he/him

Goa

CreditAttribution: n4r3n commented 15 January 2020 at 11:50

1 file was hidden/shown/deleted

File	Size
feeds--temporary-files-handling--2912130-48-D8.patch	4.4 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 365 pass, 2 fail

Comment #53

Chris Burge CreditAttribution: Chris Burge at University of Nebraska commented 15 January 2020 at 15:43

Is this issue relevant anymore? Beginning in D8.8, Drupal assumes the path to the temp directory is '/tmp' unless you set $settings['file_temp_path'] in settings.php. The "workaround" described in the issue summary is required configuration for the application to function in a multi-web server environment.

Comment #54

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 15 January 2020 at 18:20

@Chris Burge
I think it is. In theory, files in /tmp could be cleaned up before the Feeds import is complete. Feeds divides an import task into multiple chunks. For large imports (hundreds of items), this can take multiple cron runs to complete. If the webserver cleans up temporary files in between, then the import cannot finish. The more items to import and/or the less times cron runs, the higher the chance that this can happen.

This issue is more or less fixed in the D7 version. I would like to see a port of that version along with tests, rather than what the current patch does. The D7 version makes it configurable to where the files to import are stored and it defaults to private://feeds/in_progress (or public://feeds/in_progress when the private file system is not configured). The current patch forces these files to be in the public file system and I think that’s a potential security risk, because files could come from a protected source. Ok, the generated file names could be hard to guess but if the data to import is sensitive, you’d better be safe than sorry and ensure that these files are on the private filesystem.

Comment #55

mikalaki CreditAttribution: mikalaki commented 3 March 2020 at 11:41

I can assure that problem continues exist on Drupal 8.8, currently working on a site with Drupal 8.8.1 version on it and I am facing the issue.

Comment #56

j9 CreditAttribution: j9 commented 29 March 2020 at 01:03

Hi and thanks!

I am also getting errors upon attempt to import feeds.

RuntimeException: File <em class="placeholder">/tmp/feeds_http_fetchertuzw4F</em> does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /home1/netup/public_html/cryptos.cool/modules/feeds/src/Result/FetcherResult.php).

I tried the patch, I tried setting manually a temp directory in settings.php, but haven't gotten through yet.

When I run the import manually from the /admin/content/feed page, I get the feed items imported!

But cron does not seem to import on its own.

In the Recent log messages, it looks like it is because it is being run as an Anon user, instead of manually running as an Admin.

Any thought on what I might try next?

Drupal core 8.8.4
Feeds 8.x-3.0-alpha6

On a shared Hostgator server. Feeds was working just great, but somehow stopped.

Much appreciated and thank you again!

Comment #57

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 9 April 2020 at 14:00

Issue tags:

+D8 stable release blocker

Marking as a stable release blocker.

Comment #58

jjwfcd CreditAttribution: jjwfcd commented 14 April 2020 at 16:20

thanks for you guys effort and wish to have a release soon.

Comment #59

bkosborne

English

New Jersey, USA

CreditAttribution: bkosborne at Princeton University commented 7 May 2020 at 18:54

+1 to getting this resolved. Agreed the current patch is not the way to go, since it puts the feeds files in public storage. @MegaChriz, so it seems like we just need to update this patch to use private file system if available or public otherwise, just like D7?

Comment #60

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 7 May 2020 at 18:59

@bkosborne
Yes, I propose to add temporary files to private://feeds/in_progress if the private file system is available, otherwise it would be public://feeds/in_progress. Do you want to help with resolving this issue? If yes, let me know if you understand the proposed resolution + the remaining tasks from the issue summary.

Comment #61

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 16 June 2020 at 20:08

Just a note that #51 applied cleanly to 3.0-alpha7 but no longer applies to 3.0-alpha8, so in addition to the fixes requested in #60, this needs a reroll.

Comment #62

bburg

he/him

Washington D.C.

CreditAttribution: bburg at Forum One commented 16 June 2020 at 21:15

File	Size
feeds--temporary-files-handling--2912130-62-D8.patch	8.42 KB
8.x-3.x: PHP 7.3 & MySQL 5.7, D8.9 429 pass

Conflict is in tests/src/Unit/Feeds/Fetcher/HttpFetcherTest.php with the addition of two"use" classes:

@@@ -17,7 -17,7 +18,11 @@@ use GuzzleHttp\Handler\MockHandler
  use GuzzleHttp\Psr7\Request;
  use GuzzleHttp\Psr7\Response;
  use Prophecy\Argument;
++<<<<<<< HEAD
 +use RuntimeException;
++=======
+ use Symfony\Component\DependencyInjection\ContainerBuilder;
++>>>>>>>

Fixed to allow both.

Comment #63

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 7 July 2020 at 23:53

We had upgraded to alpha8 and applied #62 but we're still seeing the error on Pantheon.

File /srv/bindings/<uniqueid>/tmp/feeds_http_fetcherz6rDEi does not exist.

We're going to dive in here and see what is going on.

Comment #64

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 8 July 2020 at 00:54

Facepalm.

The reason the patch wasn’t working was due to a configuration error on my part. I didn't realize the patch exposed a new config option that has to be checked/enabled for things to work at URL /admin/config/feeds_settings

Checkbox:
[ ] Use public directory for temporary feed files.
Site public file system path settings.

Sadly, checking the option doesn’t solve the problem. The messages on screen after a manual feed import are now slightly different:

Created 100 [Nodetype] items.
File /srv/bindings/<uniqueid>/files/feeds_http_fetcheroJyPaV does not exist.

There should have been 325 items created total. Note that the Path is now pointing to the files folder on Pantheon which is supposedly synced across containers. Something is making the file go away after the first batch size of 100 items is imported.

I don’t know if this is a Feeds problem yet or a Pantheon issue.

Comment #65

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 8 July 2020 at 00:55

My facepalm moment above clearly demonstrates that this issue needs work: A) it should be using the private files folder not the public folder, and B) no one should have to manually specify this option.

Comment #66

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 8 July 2020 at 05:27

@jwilson3
Yes, I would like to see a similar solution as in the D7 version. The D7 version saves the file to import in private://feeds/in_progress (or public://feeds/in_progress if the private filesystem is not configured). See also the issue summary.

Would you like to work on this? It is on my list to pick up eventually, but it could still take a while before I would get to it myself. There are enough things on the roadmap: #1960800: [meta] Feeds 8.x roadmap. If you do want to pick it up, I hope I can assist you as much as possible. :)

Comment #67

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 8 July 2020 at 09:44

@MegaChriz, thanks for your vote of support. I've asked my colleague @marthinal to have a look at making the recommended changes to the patch.

I think I also hit upon a related issue in #64, whereby if the line_limit is too low to process all items in a feed, the file gets deleted prematurely, but i think that would have to be a follow-up ticket. Maybe there already is one, I haven't looked. After i bumped line limit to 1000, AND exported the config yml change for the public folder, I was able to use the manual "import" button to get all of the items ingested.

Comment #68

MegaChriz CreditAttribution: MegaChriz at WebCoo commented 8 July 2020 at 10:39

@jwilson3
Thanks for your support! :)

Line Limit setting

The line limit setting dictates how many lines of a file should be parsed at a time. Feeds goes through a few import stages and these stages get looped when they are not completed yet:

When the import starts, Feeds creates a single "fetch" task.
Fetch stage: the fetcher gets called a single time and returns a fetcher result.
Parse stage: the fetcher result is passed to the parser and the parser gets called a single time. The parser returns a parser result.
Process stage: for each item in the parser result a "process" task is created.
When all process tasks are completed, Feeds asks the parser if there's more to parse. If so, the import process continues with step 3.
If there's nothing more to parse, Feeds asks the fetcher if there's more to fetch. If so, the import process continues with step 2.
If there's nothing more to fetch, the clean stage starts.
Clean stage: during this stage, items that are no longer available in the source can get unpublished/deleted (depends 'Previously imported items' setting).
If the Clean stage is done or there's nothing to clean, the import finishes.

So as you see, the line_limit setting shouldn't affect how many items get imported during the whole import process. It shouldn't matter if not everything is parsed on the first call. Feeds gets back to the parser after all items from the first parser result are processed.

The reason you won't get everything imported is probably that the fetcher result - which lives in a temporary file when using the HTTP fetcher - is no longer available at the time Feeds wants to get back to the parser for the second iteration.

Comment #69

bburg

he/him

Washington D.C.

CreditAttribution: bburg at Forum One commented 27 July 2020 at 16:39

For the Pantheon users, Pantheon is in the process of moving to their "compute optimized environment" Which moves away from the container ids in paths. While the issue still exists, I think the bulk of the users here were probably experiencing this on Pantheon.

Comment #70

dpagini CreditAttribution: dpagini commented 2 October 2020 at 17:15

Just ran into the issue in #3163897 on our team at Acquia, so I found my way over here. I'm a little hesitant to pull in the patches here with all the discussion about a different approach... so the latest patch seems to just give a setting to either a) use the temp directory (existing functionality), or b) change to use the public directory? And from what I'm reading, if I use the public directory for my files, I won't run into the problem outlined in #3163897, but then my files are available to the public?

So just to try and summarize what I'm reading, it sounds like this PR needs 3 things still...
1. Test coverage for running imports.
2. Use the private directory instead of public, if it exists, and if not, fall back to public.
3. A way to clean up the public/private files after the feeds are processed...?

Does that sound right?

Quick review of latest patch...

```
+++ b/config/install/feeds.settings.yml
@@ -0,0 +1 @@
+feeds_temporary_dir: false
```
I think I saw this in some comments already, but I think we need an update hook to set this for existing sites.

+++ b/src/Feeds/Fetcher/HttpFetcher.php
@@ -96,7 +109,15 @@ class HttpFetcher extends PluginBase implements ClearableInterface, FetcherInter
+    if ($this->configFactory->get('feeds.settings')->get('feeds_temporary_dir')) {

Am I missing something? If the name of the settings is feeds_temporary_directory, would this need to be FALSE to use the public filesystem?

Comment #71

MegaChriz CreditAttribution: MegaChriz at WebCoo commented 2 October 2020 at 17:53

@dpagini
Yes, that sounds about right. I tried to summarize what needs to be done in the section “Remaining tasks” in the issue summary.

When fetching a file from a HTTP source it needs to be locally stored somewhere, because the file can be too large to process in one go. If you use cron to run imports, then it can happen that multiple cron runs are needed to complete the import. This is because Feeds is given a limited time per cron run to do its job and that time is about one minute. So the file to import must be kept over multiple cron runs.

Right now, the file is stored in a temporary directory indeed and that can be problematic if that directory is getting cleaned between two cron runs and the import is not done yet. So my proposal is to store the downloaded file in the private directory instead (with a fallback to the public directory if there is no private directory).

When the import is done, the downloaded file is no longer useful, so it should be cleaned up after import. Otherwise, the private directory would grow with files that have been imported over time. Files in the temporary directory are not cleaned up by Feeds right now, hence the issue #2778373: Auto-purge old temporary feed downloads.

Tests are needed to overcome exceptions during this process. For example: when a file is manually removed from the private directory, Feeds shouldn’t try to continue the import over and over again. So for that situation we need test coverage.

The approach I proposed is already implemented in the D7 version of Feeds, so some code from the D7 version can perhaps be reused. Or at least the D7 implementation can be used as inspiration here.

Lastly, this issue is on my radar to be picked up soon. The plan of the Feeds team is to first resolve all beta blockers and perhaps also the beta targets. My hope is to release the beta before the end of this year. Some of the beta targeted issues need review/testing, so help on them is welcome. :) See the roadmap: https://www.drupal.org/project/feeds/issues/1960800#feeds-beta-blockers

Comment #72

dpagini CreditAttribution: dpagini commented 9 October 2020 at 20:52

Thanks @MegaChriz - I'm happy to help a little bit as allowed by my company. =)
I pulled in the last patch here and started using it. I think it will work for me for the time being. I'm pulling in a public feed, so there's no concern storing it on the public directory, and I'm only using 1 feed, so the below concern is not a problem for me.
Also, just worth mentioning that the setting as defined in this patch, as mentioned above, is confusing to me b/c I have to set "temp = true" in order to use the public directory.

+++ b/src/Feeds/Fetcher/HttpFetcher.php
@@ -96,7 +109,15 @@ class HttpFetcher extends PluginBase implements ClearableInterface, FetcherInter
+      array_map('unlink', glob($this->fileSystem->realpath($destination) . "/feeds_http_fetcher*"));

Just a question here... if there is more than one feed in a site, is it possible that this line could delete a temp file from another feed?

Comment #73

mstrelan CreditAttribution: mstrelan commented 11 November 2020 at 04:46

File	Size
feeds--temporary-files-handling--2912130-73-D8.patch	8.45 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 437 pass PHP 8.0 & MySQL 5.7, D9.4 Patch Failed to Apply
interdiff_62-73.txt	1021 bytes

Just a question here... if there is more than one feed in a site, is it possible that this line could delete a temp file from another feed?

Yes. Attached patch address this.

Comment #74

jerrac CreditAttribution: jerrac commented 10 December 2020 at 17:46

So, I've been watching this issue for a while, ever since I submitted https://www.drupal.org/project/feeds/issues/3163897.

Is there a reason the patch doesn't just add a "Use private" checkbox the same way it adds the "Use public" checkbox?

Comment #75

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 23 December 2020 at 16:43

Is there a reason the patch doesn't just add a "Use private" checkbox the same way it adds the "Use public" checkbox?

Checkboxes wouldn't make sense with two options, since they're mutually exclusive. It could be made into a single field with two radio options, default to "Use private" but let the user switch to "Use public". This would clarify what is really going on here.

Comment #76

Chris Burge CreditAttribution: Chris Burge at University of Nebraska commented 23 December 2020 at 17:01

Why not make the setting a string and accept a stream-wrapper path? (e.g. 'private://feeds-tmp')? That would provide a greater degree of flexibility and wouldn't encourage people to use public:// like it's temporary://. One of the requirements for the Drupal temp directory path is that it cannot be accessible over the web: file_directory_temp() is deprecated and moved to the FileSystem service. This is a security-based requirement.

Comment #77

MegaChriz CreditAttribution: MegaChriz at WebCoo commented 23 December 2020 at 21:47

Issue summary:

View changes

@Chris Burge

Why not make the setting a string and accept a stream-wrapper path? (e.g. 'private://feeds-tmp')?

Good suggestion. The D7 version of Feeds already does this. In the proposed resolution in the issue summary I proposed to have it default to 'private://feeds/in_progress' (or 'public://feeds/in_progress' if no private filesystem is available). In the D7 version this setting is called "feeds_in_progress_dir". I would call it "in_progress_dir" here and make it part of a config object called "feeds.settings".

Comment #78

mstrelan CreditAttribution: mstrelan commented 19 February 2021 at 02:27

Just looking at the remaining tasks, it seems that "Log a warning when the file that is progress of being imported no longer exists." could be expanded to also remove the queue entry. Currently every cron run just updates the "expire" timestamp.

Comment #79

mstrelan CreditAttribution: mstrelan commented 19 February 2021 at 04:46

File	Size
feeds-2912130-79-test-cases.patch	3.57 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 488 pass, 2 fail

Attached patch adds the missing test cases as mentioned in the proposed resolution and completes the task "Write test coverage for aborting an import because of a missing temporary file."

This patch does not attempt to change the existing functionality and therefore I believe it makes sense to commit it as-is before continuing with the remaining tasks.

Comment #80

mstrelan CreditAttribution: mstrelan commented 19 February 2021 at 05:54

File	Size
feeds-2912130-80-test-cases.patch	3.57 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 488 pass, 1 fail

Not sure why that worked locally. This patch tries to delete the file by it's URI instead of the object.

Comment #81

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 20 February 2021 at 12:34

File	Size
feeds-in-progress-test-2912130-81.patch	3.97 KB
8.x-3.x: PHP 7 & MySQL 5.5, D8.9 491 pass
interdiff-2912130-80-81.txt	1.18 KB

2 files were hidden/shown/deleted

File	Size
feeds-2912130-79-test-cases.patch	3.57 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 488 pass, 2 fail
feeds-2912130-80-test-cases.patch	3.57 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 488 pass, 1 fail

@mstrelan
Thanks for working on the tests! And good idea to commit that bit first when that passes.

The first thing to change in the parser configuration, is to limit the CSV parser how many items to parse per batch. By default the line limit is 100, which means the items from the source file are already parsed on the first cron if it only contains 9 items.

However, even after I set the line limit, the test still fails.

I added the following debug code to testAbortImportWhenTemporaryFileIsDeleted():

    // Remove file.
    $file = $this->getInProgressFile();
    print_r($file);
    \Drupal::service('file_system')->unlink($file->uri);
    $files = scandir('temporary://');
    print_r($files);
    print "\n";
    print time();
    sleep(2);

And the following and the end of Drupal\feeds\Feeds\Parser\CsvParser::parse():

print '<pre>';
$files = scandir('temporary://');
print_r($files);
print "\n";
print "\n";
print_r($fetcher_result->getFilePath());
print "\n";
print_r($result);
print "\n";
print time();

With a call to time() I could verify that one more parser task ran after deleting the file. But strangely enough scanning the directory in CronTest.php results into:

Array
(
    [0] => .
    [1] => ..
    [2] => .htaccess
)

But the same code scanning the directory in CsvParser.php (happening later), results into:

Array
(
    [0] => .
    [1] => ..
    [2] => .htaccess
    [3] => feeds_http_fetcherxfDF7P
)

Any ideas?

Comment #82

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 20 February 2021 at 12:59

File	Size
feeds-in-progress-test-2912130-82.patch	4.31 KB
8.x-3.x: PHP 7.3 & MySQL 5.7, D8.9 491 pass PHP 7.2 & MySQL 5.5, D8.9 491 pass PHP 7.4 & MySQL 5.7, D9.2 491 pass
interdiff-2912130-81-82.txt	1.51 KB

2 files were hidden/shown/deleted

File	Size
feeds-in-progress-test-2912130-81.patch	3.97 KB
8.x-3.x: PHP 7 & MySQL 5.5, D8.9 491 pass
interdiff-2912130-80-81.txt	1.18 KB

Coding standard fixes and additional code comments.

Apparently, the test does pass on PHP 7 :o. Let's see if that happens on other PHP versions too. I tested locally on PHP 7.3 and there it failed.

Comment #83

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 20 February 2021 at 13:39

Okay, tests are passing on drupal.org. I have no idea then why they are failing locally.

@mstrelan
If you think this test is good too, I'll commit it.

Comment #84

mstrelan CreditAttribution: mstrelan commented 21 February 2021 at 22:51

The first thing to change in the parser configuration, is to limit the CSV parser how many items to parse per batch. By default the line limit is 100, which means the items from the source file are already parsed on the first cron if it only contains 9 items.

Ah, that explains it, I ran out of time to debug it that far so thanks for explaining it.

Okay, tests are passing on drupal.org. I have no idea then why they are failing locally.

Your patch in #82 passes for me in my local PHP 7.4 env. Not sure what happened on your env and not overly familiar with the handling of temporary:// in tests. What is responsible for deleting the downloaded file, maybe there was a temp file hanging around from an earlier run? I think getInProgressFile() is a bit flaky because we just assume that a file matching the pattern is the right file rather than checking what's saved in the cron queue.

Comment #85

mventure2021 CreditAttribution: mventure2021 commented 10 March 2021 at 20:16

Can we get some reasonable solution committed to the dev-x 8.x please?

Comment #86

mstrelan CreditAttribution: mstrelan commented 10 March 2021 at 21:00

@mventure2021 set your feeds to not import automatically and configure a cron job to run drush feeds:import <ID> periodically.

Comment #87

smustgrave CreditAttribution: smustgrave at Mobomo commented 12 May 2021 at 00:43

The patch in #82 has anyone used that on their production yet? Is it safe?

Comment #88

achap

🇦🇺

CreditAttribution: achap commented 14 May 2021 at 08:29

This bug caught us out just before we went live :P Changing the temporary directory caused more issues for us then it solved, so we ended up doing as @mstrelan suggested and running from drush directly while turning the other stuff off. I guess it is not subject to the same memory limits so will always complete? Either way, it worked for us. Thanks!

Comment #89

smustgrave CreditAttribution: smustgrave at Mobomo commented 17 May 2021 at 16:58

What's the current status of this by chance? Do we know when to expect a reasonable solution like #85 mentioned?

Comment #90

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 18 May 2021 at 16:28

Our workaround approach is similar to what @mstrelan and @achap have said above, but without Drush.

We've installed Ultimate Cron contrib module, then use Uptime Robot to ping site hosted on Pantheon (which doesnt provide anyway to setup cron jobs through their admin panel) on a more frequent basis than pantheon's deafult of once per hour. Ultimate Cron suggests running cron every minute but we do ever 5 and seem to now avoid the issues.

Ultimate Cron isolates each module's hook_cron implementation into a separate thread or request, and ensures that there is no overlap when running different jobs in one thread which lead to timeouts and/or the need to hit the batch limit. Caveat emptor, YMMV. I'm not 100% certain this is a sure-fire solution for everyone, but it works for our needs for the time being. Be sure to do your own testing with real feeds data sets in real server scenarios.

Comment #91

loopy1492 CreditAttribution: loopy1492 commented 30 June 2021 at 19:33

I pegged our site at 3x-dev and applied the #82 patch and drush cron still comes up with:

[notice] Starting execution of update_cron(), execution of system_cron() took 14.27ms.
 [notice] Starting execution of webform_cron(), execution of update_cron() took 772.51ms.
 [notice] Starting execution of webform_scheduled_email_cron(), execution of webform_cron() took 2.03ms.
 [notice] Cron task executed. (scheduled = 0; rescheduled = 0; unscheduled = 0; already scheduled = 0; ignored = 0; sent = 0; not sent = 0; skipped = 0)
 [notice] Execution of webform_scheduled_email_cron() took 1.59ms.
 [notice] Cron run completed.
 [error]  RuntimeException: File <em class="placeholder">/mnt/tmp/OUR-ACQUIA-SITE-NAME/feeds_http_fetcher06g66O</em> does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /mnt/www/html/OUR-SITE-NAME/docroot/modules/contrib/feeds/src/Result/FetcherResult.php).

Comment #92

loopy1492 CreditAttribution: loopy1492 commented 6 July 2021 at 19:55

We have sites on Acquia Cloud and this problem has cropped up after upgrading drupal core. We've tried setting the Acquia environmental variables in settings.php as indicated in their documentation.

if (isset($_ENV['AH_SITE_ENVIRONMENT']) && isset($_ENV['AH_SITE_GROUP'])) {
  $settings['file_temp_path'] = '/mnt/tmp/' . $_ENV['AH_SITE_GROUP'] . '.' . $_ENV['AH_SITE_ENVIRONMENT'];
}

This does not seem to have worked. We also tried changing the temp files folder permissions to 770 and that also didn't work. We also tried installing Ultimate Cron and that hasn't worked either.

Last update of the day.

It seems that running the import **manually** from the feed page DOES create the file in the temp directory, but produces the ssl error:

The feed from https://SITE_URL.COM/apps/cfa/public/api.cfc?method=getTrackingForSite&roundKeyList=all seems to be broken because of error "cURL error 60: SSL certificate problem: unable to get local issuer certificate (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)".

And running **cron** produces the does not exist error because, apparently, cron has no idea that it's not running form the Stage environment.

[error] RuntimeException: File /mnt/tmp/SITE/feeds_http_fetcher06g66O does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /var/www/docroot/modules/contrib/feeds/src/Result/FetcherResult.php).

We have a ticket with Acquia now and if they can give us a resolution, I'll be sure to post it here.

Comment #93

vinmassaro CreditAttribution: vinmassaro commented 22 September 2021 at 18:15

Anyone have luck with the workarounds getting this to work on Pantheon?

I've got $settings["file_temp_path"] = sys_get_temp_dir(); set and $config['system.file']['path']['temporary'] = 'sites/default/files/tmp'; with the tmp directory created. When running cron, it always fails:

RuntimeException: File <em                                           [error]
class="placeholder">/srv/bindings/1e1d9be9ee8c41ab8e72730e7fc637f8/files/feeds_http_fetcherIpFByl</em>
does not exist. in Drupal\feeds\Result\FetcherResult->checkFile()
(line 53 of
/code/modules/contrib/feeds/src/Result/FetcherResult.php).

Comment #94

bburg

he/him

Washington D.C.

CreditAttribution: bburg at Forum One commented 30 September 2021 at 15:48

@vinmassaro

I was under the impression that Pantheon had moved all sites to their "Compute Optimized Environments," and the paths with their server bindings were no longer necessary

https://pantheon.io/docs/platform-considerations#compute-optimized-envir...

Also, I think that part of the problem in this issue is that the queue item is saved with the full path to the temporary directory, which includes the non-existent bind id. I think if you can remove those queue items (either by direct SQL or with a module like https://www.drupal.org/project/queue_ui -- I am only guessing you can delete queue items with this), and attempt to re-import once you have new settings in place and finish your feed import process.

Comment #95

Oligerd CreditAttribution: Oligerd at Drupal Ukraine Community commented 6 October 2021 at 15:47

@vinmassaro
I had the same issue on the Drupal 8 site on the Pantheon.

if (isset($_ENV['PANTHEON_ENVIRONMENT'])) {

  $settings["file_temp_path"]= 'sites/default/files/tmp';

}

fixed this issue. No more errors after cron running.

Comment #96

BeauTownsend CreditAttribution: BeauTownsend commented 6 October 2021 at 15:51

#73 worked for me on a Jenkins/GCP/k8s setup.

For anyone deploying this patch, know that, upon login, I experienced a long load time and, when I executed `drush ws --tail`, I discovered that my site was rolling through what appears to be all of the busted queues and "expiring" each missing temp file. In my case, there were thousands of these, so the browser just sat and spun for a good while before finally logging me in.

Comment #97

vinmassaro CreditAttribution: vinmassaro commented 6 October 2021 at 16:40

@Oligerd: for your code in #95, are you using it with any of the patches in this thread?

Comment #98

Oligerd CreditAttribution: Oligerd at Drupal Ukraine Community commented 6 October 2021 at 21:51

@vinmassaro
I am not using any patch. I found that you need to delete the feed from content (here /admin/content/feed), without deleting imported items and then create the same feed. In this case, the old path will be deleted. It needs to create manually this tmp directory inside the files directory. I am using FileZilla to create tmp on every Pantheon environment.

Comment #99

maskedjellybean

Portland, OR

CreditAttribution: maskedjellybean commented 6 December 2021 at 20:42

I think the suggestion to move tmp directory into sites/default/files as a work around for Pantheon is incorrect. The field description for Temporary directory at /admin/config/media/file-system says "This directory should not be accessible over the web", which to me implies it should be outside of /web (or /docroot). I assume there are security reasons for this. It may not be a security problem for feeds, but feeds is not the only one using the temp directory.

That said, I really need a solution that works with Pantheon (but doesn't also open up potential security holes).

Comment #100

maskedjellybean

Portland, OR

CreditAttribution: maskedjellybean commented 6 December 2021 at 21:04

Ok! I realized that the patch in #73 is not supposed to work immediately after patching. It adds a config item that needs to be set. So after patching, go to /admin/config/feeds_settings and check "Use public directory for temporary feed files". I'll give this a try on Pantheon and report back.

Another thought based on the explanation of the cause of this problem: If this happens when an import takes more than one cron run to complete, perhaps within our feed type settings under Processor settings > Advanced settings, we should at least temporarily make sure that "Force update" is unchecked. I assume this would make the import more likely to take more than one cron to complete, but this is a complete guess.

Comment #101

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 6 December 2021 at 21:11

@maskedjellybean
"Force update" can make the import process longer yes, but even without that option enabled, an import with hundreds of items can easily take more than one cron run to complete. For every item in the feed, Feeds still has to check if there's an existing item for it and then if it also has changed. Unchecking "Force update" only potentially skips a lot of entity save calls.

In the issue summary I proposed to move the file to import to private://feeds/in_progress by default. Only use the public filesystem if there's no private filesystem configured.

Comment #102

maskedjellybean

Portland, OR

CreditAttribution: maskedjellybean commented 7 December 2021 at 00:57

@MegaChriz
Thank you for clarifying what force update does.

I'm going to use whatever patch works. I don't believe there is a patch to move the files to private, but there is one to move to public, so that is what I'll use unless I find it doesn't work. For what it's worth, I agree with you that private would be preferable to public, but I wouldn't feel comfortable writing that patch. Unless you're saying the private://feeds/in_progress functionality already exists in some version of feeds or in an existing patch?

Comment #103

andileco CreditAttribution: andileco at JSI Research & Training Institute, Inc. (JSI) commented 7 December 2021 at 17:09

Tried to post this yesterday, didn't realize my post didn't go through:

@maskedjellybean, just so you know, Pantheon has added "protected_web_paths" in pantheon.yml. I think you could do something like this:

protected_web_paths:
  - /sites/default/files/private/
  - /sites/default/files/config/
  - /sites/default/files/tmp/

Comment #104

maskedjellybean

Portland, OR

CreditAttribution: maskedjellybean commented 14 December 2021 at 20:18

Thanks @andelico, I'll have to look into that.

I've found that the patch in #73 is working well for me on Pantheon now that I realized I have to check the checkbox at /admin/config/feeds_settings. Thanks for the patch!

Comment #105

maskedjellybean

Portland, OR

CreditAttribution: maskedjellybean commented 3 February 2022 at 23:12

After inspecting the queue table it seems I may either still be having this issue with the patch applied, or I'm seeing queue items that have been stuck for months. I'll keep an eye on it.

I wanted to share some more info:

In the queue table I found items with a name like "feeds_feed_refresh:FEED_NAME". When I ran drush cron I saw a "RuntimeException: File /tmp/feeds_http_fetcher_1_kqRgck does not exist." error for each item. To me this implied these queue items were stuck, would never be processed, and I should delete them manually.

To delete items from the queue you can run:

drush queue:delete feeds_feed_refresh:FEED_NAME

To delete items from the queue on a Pantheon environment, you can run:

terminus remote:drush SITE-NAME.ENV queue:delete feeds_feed_refresh:FEED_NAME

Comment #106

tlo405 CreditAttribution: tlo405 commented 25 February 2022 at 17:11

@maskedjellybean I've been seeing the same issues that you are seeing (with the patch applied). If I remove those stuck items from the queue, then at the next cron run everything appears to work again.

If the queue is never cleared out, updates to the feed I am pulling in are never imported into Drupal. Cron runs every 10 minutes in my site, however we import the feed every 15 minutes. The problem is, each cron run Drupal tries to import those items from the queue (if any exist). It fails due to that "RuntimeException: File /tmp/feeds_http_fetcher_1_kqRgck does not exist." error.

When that exception is thrown, I notice $feed->finishImport(); gets called. That method ends up pushing the 'next import' into the future. Then 10 minutes later when cron runs again, the same thing happens. Feeds attempts to run those queued up items...it fails again..then pushes the 'next import' date into the future. So updates to the external feed we pull in never make it into Drupal.

So for instance, say it's 12 noon, cron is set to run at 12:10, but my 'next import' for my feed is set to run at 12:15. If there are no items in the queue, cron runs at 12:10, and the 'next import' stays at 12:15 (which is correct). Then when cron runs at 12:20, the feed runs correctly. The 'next import' gets set to 12:35 (15 minutes into the future) which is correct.

However, if there is an item in the queue, the cron run at 12:10 attempts to run the queued items and fails. Due to the code in $feed->finishImport();, the 'next import' gets pushed from 12:15 to 12:25. However, when cron runs again at 12:20, the queued up items fail once again, then the 'next import' gets pushed to 12:35...and so on. So the 'next import' never runs because it keeps getting pushed into the future every time...therefore Drupal never imports the new data from the feed.

So it seems that the patches here do not work. Also, it seems that the patches here now fail against the new beta1 release that came out yesterday.

Comment #107

meladawy CreditAttribution: meladawy as a volunteer commented 4 March 2022 at 01:09

File	Size
feeds--temporary-files-handling--2912130-107-D8.patch	8.37 KB
8.x-3.x: PHP 7.4 & MySQL 5.7, D9.5 Patch Failed to Apply PHP 8.1 & MySQL 5.7, D9.4 Patch Failed to Apply PHP 8.0 & MySQL 5.7, D9.4 Patch Failed to Apply PHP 7.4 & MySQL 5.7, D9.4 Patch Failed to Apply

This should be compatible with beta1 or latest dev updates

Comment #108

tsurvuli CreditAttribution: tsurvuli commented 26 May 2022 at 12:02

Thank you very much for this patch, it's been very helpful so far. However, I upgraded to core 9.3.14 this morning and now the patch will not apply. Any chance of an updated version?

Comment #109

xeiro CreditAttribution: xeiro as a volunteer commented 25 July 2022 at 05:14

Thank you for the patch @meladawy! I can confirm the patch was working with 9.4.1 however, has failed to apply on 9.4.3.

Comment #110

jmcerda CreditAttribution: jmcerda as a volunteer commented 26 July 2022 at 19:25

Status:

Needs work

» Needs review

File	Size
feeds--temporary-files-handling--2912130-110.patch	8.37 KB
8.x-3.x: PHP 7.4 & MySQL 5.7, D9.3.17 563 pass PHP 7.3 & PostgreSQL 10.12, D9.4.1 Patch Failed to Apply

3 files were hidden/shown/deleted

File	Size
feeds-in-progress-test-2912130-82.patch	4.31 KB
8.x-3.x: PHP 7.3 & MySQL 5.7, D8.9 491 pass PHP 7.2 & MySQL 5.5, D8.9 491 pass PHP 7.4 & MySQL 5.7, D9.2 491 pass
interdiff-2912130-81-82.txt	1.51 KB
feeds--temporary-files-handling--2912130-107-D8.patch	8.37 KB
8.x-3.x: PHP 7.4 & MySQL 5.7, D9.5 Patch Failed to Apply PHP 8.1 & MySQL 5.7, D9.4 Patch Failed to Apply PHP 8.0 & MySQL 5.7, D9.4 Patch Failed to Apply PHP 7.4 & MySQL 5.7, D9.4 Patch Failed to Apply

Re-rolled for latest: 3.x-dev.

Comment #111

xeiro CreditAttribution: xeiro as a volunteer commented 9 August 2022 at 04:36

Thanks for posting 2912130-82, @jmcerda. The following cron error returns on this version:
RuntimeException: File /tmp/feeds_http_fetcherrsGHI4 does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /code/web/modules/contrib/feeds/src/Result/FetcherResult.php).
- D 9.4.3 / PHP 8.0

Comment #112

Topplestack CreditAttribution: Topplestack commented 31 August 2022 at 20:01

#110 fails for me as well.

Comment #113

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 2 September 2022 at 06:50

Assigned:

Unassigned

» MegaChriz

I'm working on this issue. I've made a draft for this issue, but it is currently tangled with possible fixes for other, but related, issues. The most important of these other issues is #3193610: Items going missing, which only needs some more testing. The other issues I'm working on are #3132198: Clean queue tasks when unlocking a feed to prevent potential data loss and #2778373: Auto-purge old temporary feed downloads.

Comment #114

someshver CreditAttribution: someshver as a volunteer and commented 12 September 2022 at 12:42

File	Size
feeds--temporary-files-handling--2912130-114.patch	9.69 KB
8.x-3.x: PHP 7.4 & MySQL 5.7, D9.3.17 Patch Failed to Apply

Fixed the apply fail error. Feed Module Version 3.0.0.0-beta2

Comment #115

someshver CreditAttribution: someshver as a volunteer and commented 14 September 2022 at 12:03

Comment #116

29 September 2022 at 14:21

MegaChriz opened merge request !91

Comment #117

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 29 September 2022 at 14:32

I implemented a possible solution for this issue! The code in the branch 2912130-in-progress-directory does the following:

It puts HTTP downloaded files in a configurable directory that defaults to "private://feeds/in_progress/[Feed ID]" or "public://feeds/in_progress/[Feed ID]" in case no private file system is configured.
It adds a service called "feeds.file_system.in_progress" with which you can interact with the directory for Feeds temporary files.
The config key for the directory is feeds.settings:in_progress_dir. I have no UI added for configuring it, as I think there is usually no need to change the default. But those that want to change it can do it via the YAML file for now.
It includes a possible fix for #2778373: Auto-purge old temporary feed downloads as well, since that issue was so closely related I worked on that one at the same time. I could try to remove a fix for that issue from here upon request. The benefit of that would be that there's less code to review here. That could help getting this finalized quicker.

I haven't tested this into the "wild" yet. The only testing I did was with automated tests that I added.
And let's see if it breaks any other tests.

Comment #118

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 6 October 2022 at 18:59

8 files were hidden/shown/deleted

File	Size
feeds-temporary-files-path-2912130-23-D8.patch	9.82 KB
8.x-3.x: PHP 7.1 & MySQL 5.5, D8.6 336 pass PHP 7.2 & MySQL 5.5, D8.7 361 pass
feeds--temporary-files-handling--2912130-49-D8.patch	5.95 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 370 pass
feeds--temporary-files-handling--2912130-51-D8.patch	8.44 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.7 370 pass
feeds--temporary-files-handling--2912130-62-D8.patch	8.42 KB
8.x-3.x: PHP 7.3 & MySQL 5.7, D8.9 429 pass
feeds--temporary-files-handling--2912130-73-D8.patch	8.45 KB
8.x-3.x: PHP 7.2 & MySQL 5.5, D8.9 437 pass PHP 8.0 & MySQL 5.7, D9.4 Patch Failed to Apply
interdiff_62-73.txt	1021 bytes
feeds--temporary-files-handling--2912130-110.patch	8.37 KB
8.x-3.x: PHP 7.4 & MySQL 5.7, D9.3.17 563 pass PHP 7.3 & PostgreSQL 10.12, D9.4.1 Patch Failed to Apply
feeds--temporary-files-handling--2912130-114.patch	9.69 KB
8.x-3.x: PHP 7.4 & MySQL 5.7, D9.3.17 Patch Failed to Apply

Hiding all the patches. The one to test is the branch "2912130-in-progress-directory". Tests are failing, so there is still work to be done.

Comment #119

andileco CreditAttribution: andileco at JSI Research & Training Institute, Inc. (JSI) commented 7 October 2022 at 15:22

@MegaChriz I tested "2912130-in-progress-directory" and the feed run completed in one run (which I didn't expect, but there were no new items, so I guess that makes sense); however I saw this notice in the dblog: The file private://feeds/in_progress/11 was not deleted because it does not exist.

Comment #120

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 18 January 2023 at 13:35

I've fixed the issue reported in #119 by @andileco. The issue was that Feeds tried to clean up files for feed imports that do not use the HTTP Fetcher. This is now fixed by checking upon finishing (or unlocking) an import if there are files to clean up.

I also did some small refactoring in the Feeds Log module, LogFileManager now extends FeedsFileSystemBase. This is because there was a lot overlapping in behavior.

Comment #121

smustgrave CreditAttribution: smustgrave at Mobomo commented 18 January 2023 at 20:29

Status:

Needs review

» Needs work

Tested the MR on the environment we are seeing it on and still getting the error 100s of times on cron.

RuntimeException: File /tmp/feeds_http_fetchergiFocL does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /home/site/wwwroot/modules/contrib/feeds/src/Result/FetcherResult.php).

Comment #122

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 18 January 2023 at 21:12

@smustgrave
Okay, but does it happen with completely new imports too? Because that path is supposed to be the old location where things are saved. For new imports, the temporary file is expected to be stored in private://feeds/in_progress/[feed id].
But thanks for helping with testing!

Comment #123

smustgrave CreditAttribution: smustgrave at Mobomo commented 18 January 2023 at 21:25

Sorry don't fully follow. We have a number of feeds that we can't recreate as it will cause duplicate issues I believe.

Didn't test with a brand new import though. Think for that test I would have to create across environments and sync the databases I think for a valid test.

Comment #124

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 18 January 2023 at 22:56

@smustgrave
For clarity, with "new import" I don't mean a new feed. Just make sure that for a particular feed no existing import is still running. If you only have feeds that are in an unfinished import state, try to unlock these. Maybe also clear all existing feeds tasks from the queue table. It is possible that some old ones got stuck there. In the latest release of Feeds I found a solution for cleaning up stale queue tasks, but that only applies for newly created tasks - not tasks that were already there when using older Feeds releases.

Hm, maybe I should think about if there is a way to clean up stale Feeds queue tasks that had been created in the older format. But the challenge for that would be to detect if they are really stale. If cron doesn't run often on a site, they could potentially still belong to an active import.

Comment #125

smustgrave CreditAttribution: smustgrave at Mobomo commented 19 January 2023 at 03:05

May be overkill but maybe a UI to view the items in the queue, their created date. And let the user choose to delete. May be out of scope for this ticket. But does make it hard to test with old values in the queue.

Comment #126

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 19 January 2023 at 10:26

@smustgrave
For monitoring what's in the queue, there is already the module Queue UI. To find the original created date of a queue item is hard, because when an exception occurs during executing a queue item, the item is getting requeued (and then it gets a new creation date). At least, in the case of Feeds. Sometimes that is a good thing, for example if an external source is temporary unavailable you would want Feeds to retry. But for refining that process (for example implement a way that there is a maximum number of retries) is indeed something for an other issue. See for example #3277999: Add DelayedRequeueException feature to feeds queue - D9.1.

There is by the way an other issue closely related to this one and that is #3158678: FetcherResult::checkFile() throws \RuntimeException, which is never caught. That one could help you out of the loop, but it also stops the retry attempts that I talked about above.

Comment #127

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 24 January 2023 at 07:58

Status:	Needs work	» Needs review
Related issues:		+#3158678: FetcherResult::checkFile() throws \RuntimeException, which is never caught

Setting this back to "Needs review" because I think that the issue reported here needs to be fixed in an other issue, possibly #3158678: FetcherResult::checkFile() throws \RuntimeException, which is never caught.

Comment #128

smustgrave CreditAttribution: smustgrave at Mobomo commented 25 January 2023 at 21:11

So we are using redis for our queue so maybe the bad values are in there. Clearing the queue table doesn't do anything for us.

Comment #129

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 26 January 2023 at 16:48

@smustgrave
Hm, that's unfortunate. Since 8.x-3.0-beta3, Feeds automatically cleans up tasks for a specific feed from the queue table whenever that feed finishes an import or when the feed gets unlocked. But if that queue is external, Feeds cannot clean up tasks from that queue. The queue has no API method to clear a specific task from it. The queue API only provides a method to clear a whole queue. The Feeds queues are currently defined per feed type. Perhaps each feed should gets its own queue? That could potentially lead to a large amount of queues, not sure if that's a problem.

Comment #130

smustgrave CreditAttribution: smustgrave at Mobomo commented 26 January 2023 at 16:55

So have to dig more but if the feeds module is querying the queue table directly that seems wrong to me. If it's using the queue service then when a queue is cleared it will clear from what other source it is using.

Experiencing many times drupal default queue will fail and cause performance issues.

Comment #131

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 26 January 2023 at 19:22

@smustgrave
We should probably discuss the queue thing further in an other issue than this, but see the methods hasQueueTasks() and clearQueueTasks() in FeedImportHandler. hasQueueTasks() is used by FeedsLockBackend to determine if an import still appears to be active. clearQueueTasks() is called when finishing an import or when unlocking a feed. When unlocking a feed, it is important that any active imports for that feed get killed and that is done now by removing specific tasks from the queue table.

The code:

  /**
   * Checks if there are still tasks on the feeds queue.
   *
   * @param \Drupal\feeds\FeedInterface $feed
   *   The feed to look for in the queue.
   *
   * @return bool
   *   True if there are still tasks on the queue. False otherwise.
   */
  public function hasQueueTasks(FeedInterface $feed): bool {
    // First check if the queue table exists.
    if (!$this->database->schema()->tableExists('queue')) {
      // No queue table exists, so no tasks can exist on it.
      return FALSE;
    }

    $result = $this->database->select('queue')
      ->fields('queue', [])
      ->condition('data', 'a:3:{i:0;i:' . $feed->id() . ';%', 'LIKE')
      ->countQuery()
      ->execute()
      ->fetchField();

    return $result > 0;
  }

  /**
   * Removes all queue tasks for the given feed.
   *
   * @param \Drupal\feeds\FeedInterface $feed
   *   The feed for which to remove queue tasks.
   */
  public function clearQueueTasks(FeedInterface $feed): void {
    if (!$this->database->schema()->tableExists('queue')) {
      return;
    }
    $this->database->delete('queue')
      ->condition('data', 'a:3:{i:0;i:' . $feed->id() . ';%', 'LIKE')
      ->execute();
  }

I didn't see an other way to cleanup stale queue tasks. The only other option is to remove the whole queue, but that would kill imports of other feeds of the same feed type as well. Queues are currently defined per feed type, not per feed.

Comment #132

rdivince CreditAttribution: rdivince commented 8 February 2023 at 13:19

Tested the MR and also experienced the RuntimeException error with /tmp/feeds_http_fetcher not existing in our load-balanced environment.

Still testing but I wonder if it has something to do with our site also using a s3fs for a remote file system, which is quite useful for load-balanced environments. I noticed the tempnam method in the FeedsFileSystemBase class was using fileSystem->realpath to get the path. The Drupal API docs say regarding realpath:

"Resolves the absolute filepath of a local URI or filepath. The use of this method is discouraged, because it does not work for remote URIs. Except in rare cases, URIs should not be manually resolved. Only use this function if you know that the stream wrapper in the URI uses the local file system, and you need to pass an absolute path to a function that is incompatible with stream URIs.

https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21File%21Fi...

Comment #133

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 8 February 2023 at 14:25

@rdivince
Thanks helping with testing! I've no experience with s3fs. Can you check what happens with that filesystem if you try to write a file the way FeedsFileSystemBase::tempnam() does that? So test that separately, outside of the Feeds context?

also experienced the RuntimeException error with /tmp/feeds_http_fetcher not existing in our load-balanced environment

That's probably because there are still queue items left on the queue. So in order to check if the solution here works for future imports, try to empty these Feeds queues first. One way of doing that is by installing the Queue UI module. It can also be done with the drush command drush queue:delete, but then you need to know the exact queue name, which you can see with drush queue:list.

Comment #134

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 16 February 2023 at 19:24

@rdivince
I tried to find a way how I could test the patch with a S3 filesystem. It looks like I would need an account for Amazon’s Simple Storage Service (S3) for that. I tried to create an account there but I got stuck at the step asking for Credit Card details. In the Netherlands it is not self-evident to have that. So I’m tempted to commit this without the S3 check.

Comment #135

smustgrave CreditAttribution: smustgrave at Mobomo commented 16 February 2023 at 19:30

FYI after I cleared my redis queues and applied this patch across all our environments we are no longer seeing the problem.

We are using Azure with a share file mount. Our prod and stage are scale up to multiple instances as well.

Comment #136

rdivince CreditAttribution: rdivince commented 17 February 2023 at 20:50

File	Size
feeds_s3fs.patch	1.25 KB
8.x-3.x: PHP 7.4 & MySQL 5.7, D9.5.2 Patch Failed to Apply

@MegaChriz thanks for the info! I was able to clear the queue with the drush commands provided. But it did not fix the import errors. I tested file writing with tempnam outside of the feeds context but did not have any luck. But I did get the MR to work as followed:

In FeedsFileSystemBase::tempnam(), instead of writing a temporary file with tempnam, I wrote a file w/ \Drupal::service('file.repository')->writeData(). This method should support local and remote stream wrappers. Then I got additional error information that the getAsync call in HttpFetcher::get() appears to open a file with "w+" permissions by default which s3 (and perhaps other remote stream wrappers) do not seem to allow: https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s3-stream-wra...

You can upload larger files by streaming data using fopen() and a “w”, “x”, or “a” stream access mode. The Amazon S3 stream wrapper does not support simultaneous read and write streams (e.g. “r+”, “w+”, etc). This is because the HTTP protocol doesn’t allow simultaneous reading and writing.

First opening the stream through GuzzleHttp\Psr7\Utils::tryFopen() with "w" mode and passing that filestream as the sink to getAsync worked perfectly. Not sure if other use cases require "w+"

While these tweaks to the MR successfully resolved my errors with s3fs, I know mine is just one of many use cases and you'll know those better than me. So I've attached my tweaks as a patch and would like to hear your thoughts.

Comment #137

rdivince CreditAttribution: rdivince commented 17 February 2023 at 20:52

1 file was hidden/shown/deleted

File	Size
feeds_s3fs.patch	1.25 KB
8.x-3.x: PHP 7.4 & MySQL 5.7, D9.5.2 Patch Failed to Apply

Marking as hidden

Comment #138

pingwin4eg

Zaporizhia 🇺🇦

CreditAttribution: pingwin4eg commented 22 March 2023 at 14:42

Status:

Needs review

» Reviewed & tested by the community

The fix from the merge request works - the default private directory (private://feeds/in_progress/#) is used instead of the temporary one.

For those who still see the error with the tmp file, check also other fetcher classes. For example, in our client's project there are also the feeds_http_auth_fetcher module with it's own fetcher, and the custom one, and existing feed entities use all 3. So I had to change them too similar to the http fetcher.

Comment #139

30 March 2023 at 14:58

MegaChriz committed 4062e01b on 8.x-3.x

Issue #2912130 by Fant0m, MegaChriz, artematem, n4r3n, merauluka, bburg...

Comment #140

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 30 March 2023 at 14:59

Status:

Reviewed & tested by the community

» Fixed

@rdivince
I had been giving Feeds a low priority for the past month, so I guess that for your patch I'll need to dive deeper to find out how remote file systems work. So because of that I'm merging the changes of this issue now and move your patch to a follow-up issue. Perhaps we can find a way to add test coverage for remote stream wrappers too?

Comment #141

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 30 March 2023 at 15:22

@rdivince
I've opened #3351330: Add support for remote file systems for http downloaded data.

Comment #142

13 April 2023 at 15:24

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Comment #143

bkosborne

English

New Jersey, USA

CreditAttribution: bkosborne at Princeton University commented 9 May 2023 at 16:12

Glad to see this landed! thanks everyone who contributed.

@MegaChriz any idea when you might cut a new beta that includes this?

Comment #144

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 9 May 2023 at 16:24

@bkosborne
I hope within a month from now. There is a follow-up issue that I would like to include in the next release as well: #3158678: FetcherResult::checkFile() throws \RuntimeException, which is never caught. While the fix for this issue only aims to prevent imports ending abruptly for new imports, that other issue should help cleaning up existing imports that are (still) stuck in a loop as a result of the issue that was reported here.

Comment #145

C-Logemann

German

Frankfurt/M, Germany

CreditAttribution: C-Logemann at Nodegard commented 28 August 2023 at 19:35

I understand this new feature for fixing loadbalancing issues. But this new tmp files inside the private folder can also break some backup strategies. So where is the "configurable" part?

Put http downloaded files in a configurable directory and added service to interact with this directory - to fix issue with missing temporary files in load balanced environments.

Comment #146

jwilson3

English

Ecuador

CreditAttribution: jwilson3 at Bluespark commented 28 August 2023 at 21:15

Use the in_progress_dir config element in the Feed module's feeds.settings.yml config file.

The default location is now private://feeds/in_progress assuming you've configured a private files directory, otherwise, it is public://feeds/in_progress.

Review the commit from above for additional context:

https://git.drupalcode.org/project/feeds/-/commit/4062e01bb8471254d9daa5...

Comment #147

C-Logemann

German

Frankfurt/M, Germany

CreditAttribution: C-Logemann at Nodegard commented 29 August 2023 at 20:32

@jwilson3 Thanks for the description. So it's a good candidate to set in settings.php

In many situations I think configuration via settings.php is enough especially for very special functionality. And this feature is such a special thing to solve load balancing problems but can rise problems for non advanced admins. So maybe it would be fine to make make this feature and a setting for it more visible. But it's just an idea.

For our customers we will just exclude this folder via backup scripts. This we already do with other temporary folders like image styles rendered css etc.