Problem/Motivation
When an import needs multiple cron runs to complete and when using the HTTP Fetcher, Feeds uses the temporary directory to temporary store the file that is being imported.
On some servers, the default configured temporary directory is not static. It can change per request. This can for example be the case on Acquia and Pantheon. This is problematic because if the temporary directory is changed between two import cycles, Feeds looses track of the file being imported and therefore cannot finish the import.
Proposed resolution
Do not use the temporary directory to temporary store a file to import, but use private://feeds/in_progress
instead. The D7 version of Feeds already does this.
The first step however is adding test coverage for importing data that is using multiple calls/requests. Right now, there is only test coverage for importing data in one go. There are already tests written for this case for the D7 version of Feeds. It probably is a good idea to port these over. These tests can be found in the class 'FeedsFileHTTPTestCase'. The following methods from that class would be the most interesting to port over:
::setUpMultipleCronRuns()
::getInProgressFile()
::testImportSourceWithMultipleCronRuns()
- ported in #3044983: Add test coverage for importing using multiple cron runs::testAbortImportWhenTemporaryFileIsDeleted()
Additionally, there seems to be no warnings logged when a file that is in progress being imported is suddenly missing.
Workaround
A workaround is to set a different path for the temporary directory, one that is static. In settings.php:
$config['system.file']['path']['temporary'] = '/your/temporary/directory';
Be sure to change '/your/temporary/directory' to the directory of your choice.
Remaining tasks
Write test coverage for importing data that needs multiple cron runs to complete.- Write test coverage for aborting an import because of a missing temporary file.
- Log a warning when the file that is progress of being imported no longer exists.
- Move the temporary file to
private://feeds/in_progress
. - Write test coverage for removing the temporary file after import (this task could also be handled in #2778373: Auto-purge old temporary feed downloads).
- Add a setting for the in progress directory called "in_progress_dir". Make it part of a config object called "feeds.settings".
Original report by bburg
I'm running several sites using the Pantheon platform on an elite plan. These sites use feeds to import content from a central site. They also run in a load-balanced set up, so multiple instances of Drupal running on separate servers with a shared database.
The problem is that on cron, the Feeds module will download the feed source (an XML document from the central site), and store it in the temporary directory. It appears that the parser isn't triggered immediately, but instead added to the queue. The queue entry references the temporary file in the full path. On pantheon, the temporary directory may have an inconsistent path. e.g. /srv/bindings/580b80cb8cb344dfa6f2c1059d896835/tmp
. The hashed number I believe is something called the "Pantheon Binding." Which I assume is a reference to the current container, but I'm not certain. So, on subsequent cron runs, the queue processor may end up trying to access a the temporary file that no longer exists as the Pantheon Binding has changed since the queue job was saved to the queue table. This results in an accumulation of queue items, that point to non-existent temporary files, that will never be removed, since the queue job doesn't remove them if the temporary file is missing. This leads to a lot of errors like this one, which also seem to cause performance issues during cron:
RuntimeException: File /srv/bindings/c1e9936fd5b347d7ae98b1226d7724f0/tmp/feeds_http_fetcherELdIb4 does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /srv/bindings/580b80cb8cb344dfa6f2c1059d896835/code/web/modules/contrib/feeds/src/Result/FetcherResult.php).
I'm using a snapshot of the dev branch from August 2016. So if this problem has already been addressed in a previous issue, my apologies. Upgrading to a newer snapshot seems to lead to several errors...
Pantheon support has suggested that we change the temporary files directory to point to a path on the shared, networked file system, under sites/default/files/tmp, which they have documented how to do under Temporary File Management with Multiple Application Containers: Using Valhalla, Pantheon's Networked Filesystem. I'll link their support team to this ticket as well so they can make any comments and clarify the issue. Our next scheduled deployment window isn't until early October, So I won't be able to test that suggestion in production until then.
Also, to address the stuck queue items, I think the only option is to delete all queue items whose name matches "feeds_feed_parse%". So adding an update hook with the following:
db_delete('queue')
->condition('name', 'feeds_feed_parse%', 'like')
->execute();
(db_delete() is deprecated, but I don't know the new method yet). This might be something to address in this module. Also, for the issue itself. Might it be better to store the file reference in the queue item as a filestream wrapper instead? e.g. temporary://feeds_http_fetcherELdIb4
.
I'm not sure if supporting such environments is really in the scope of this project (why I've filed this as a support request). I'm really filing this issue here in order to improve the search-ability of the problem in general in case other people are running into it.
Comment | File | Size | Author |
---|
Issue fork feeds-2912130
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
bburgComment #3
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@bburg
This has not yet been addressed. In the latest dev temporary files are still saved to
temporary://
. I think this is wrong and it would be better if Feeds would save it in a 'in progress' directory. In the D7 version of Feeds a similar issue exists: in that version, Feeds saves temporary files in 'public://feeds'. This file is then removed after six hours, regardless of if the import has been completed or not.In the D7 version I have been experimenting with saving a temporary file in 'private://feeds/in_progress' instead. See #1029102: Importing Large CSV document (downloaded and processed in the background). The file gets removed after import has been completed. Maybe it is possible to port that idea to D8?
A similar issue for D8 Feeds (maybe even the same problem): #2778373: Auto-purge old temporary feed downloads
Comment #4
bburgI've been thinking more about this issue lately. We deployed in October the update to the temporary files directory location. We are seeing some other issues alleviated, but we are still seeing the Feeds errors. I'm not sure saving to temporary:// is an issue per se, it seems that even with the new location, these are still added as queue items, that reference the absolute path of the temporary file, which can vary between application containers as the path contains the container id.
I suspect that the only reason this is working now in the setup is that it occasionally runs the fetch, and the queue processing in the same container. But purging the broken queue items would likely be helpful.
If changing the directory from temporary:// is something you were already considering, I'll change the category of this issue to be a task. I'd offer to assist, but I'm still pretty unfamiliar with this code and D8 module architecture.
Comment #5
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedYes, moving from temporary:// to private://feeds/in_progress is what I'm considering, with the reason being that files may not exist long enough in temporary:// to complete an import. On my local machine for example, temporary:// is set to '/tmp', which is cleared after every machine restart.
For this issue to move forward, we will likely need automated tests. Feeds only has unit tests right now, but I think this issue could use a functional test (or maybe a kernel test - which I'm not familiar with yet).
If you like to help on this issue or help with the development in general, maybe you want to help with writing a base class for tests? See #2917925: Port remaining methods from test base class in D7. That's probably easier to do and helps to build foundation for this issue. Or other things mentioned in #2917928: Plan for 8.x-3.0-alpha1 release? Anyway, it's unlikely I will put time in this issue before 8.x-3.0-alpha1 is done.
Comment #6
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedClosed #3010908: Articles are not being imported to production Acquia site, possibly because /tmp folder is not on the same container between requests as a duplicate. Some relevant info from there:
From @alberto56:
From @MegaChriz:
Comment #7
Fant0m CreditAttribution: Fant0m at FFW commentedAdd fast patch to have possibility using public directory for temp feeds files. In admin config path: '/admin/config/feeds_settings' add Settings form where you can check "Use public directory for temporary feed files."
Comment #8
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@Fant0m
Thanks for giving this a go!
Note that this may disrupt other unfinished imports, since you are deleting all files in the feeds_http_fetcher directory here.
Comment #9
artematem CreditAttribution: artematem at FFW commentedOK, as a first version of patch. Added default feeds.system.yml file to see if it help to pass the tests.
Comment #10
artematem CreditAttribution: artematem at FFW commentedAdd schema for new settings.
Comment #11
artematem CreditAttribution: artematem at FFW commentedFix default value type.
Comment #12
artematem CreditAttribution: artematem at FFW commentedSave config value in boolean type.
Comment #13
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedThe reason for the test failure is that the feeds.settings config object is not injected.
Comment #14
Fant0m CreditAttribution: Fant0m at FFW commentedComment #16
Fant0m CreditAttribution: Fant0m at FFW commentedComment #17
Fant0m CreditAttribution: Fant0m at FFW commentedComment #18
Fant0m CreditAttribution: Fant0m at FFW commentedComment #19
Fant0m CreditAttribution: Fant0m at FFW commentedComment #20
Fant0m CreditAttribution: Fant0m at FFW commentedComment #21
Fant0m CreditAttribution: Fant0m at FFW commentedComment #22
Fant0m CreditAttribution: Fant0m at FFW commentedComment #23
Fant0m CreditAttribution: Fant0m at FFW commentedComment #24
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@Fant0m
Thanks for working on this issue! It would be great if you could document what you are working on in each patch and what your motivations are for some of the changes. That helps with reviewing the patches.
Comment #25
Fant0m CreditAttribution: Fant0m at FFW commentedTrying to fix Unit test.
Comment #26
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedFeeds currently uses annotation to specify dependent services. So adding (or overriding) a
create()
method is not going to work here.We could open a new issue for Feeds to use ContainerFactoryPluginInterface instead of defining dependent services in the annotation.
Comment #27
Fant0m CreditAttribution: Fant0m at FFW commentedThank you @MegaChriz.
Comment #28
Fant0m CreditAttribution: Fant0m at FFW commentedAdd to HttpFetcherTest like emulation of container with 'config.factory'.
Comment #29
Fant0m CreditAttribution: Fant0m at FFW commentedComment #30
maxdev CreditAttribution: maxdev commentedLatest patch doesn't work. If you go to Feeds page it throws fatal error.
ArgumentCountError: Too few arguments to function Drupal\feeds\Feeds\Fetcher\HttpFetcher::__construct(), 6 passed and exactly 7 expected in Drupal\feeds\Feeds\Fetcher\HttpFetcher->__construct() (line 85 of modules/contrib/feeds/src/Feeds/Fetcher/HttpFetcher.php).
Comment #31
maxdev CreditAttribution: maxdev commentedComment #32
artematem CreditAttribution: artematem at FFW commented@maxdev,
Did you try with https://simplytest.me/ ?
I run with Feeds 8.x-3.0-alpha4 + patch #28 and didn't see error.
Comment #33
bburgJust jumping in to make a side comment. Since I changed the temporary directory location to address the issue we are now experiencing an issue where we have accumulated a large number of temporary feeds files that aren't being cleaned out regularly (like 200K+). While the files are very small and not filling up too much space, the quantity of them is causing performance issues when we run backups on Pantheon, taking over 2 hours on some of the worst affected sites.
Comment #34
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedI've opened #3044983: Add test coverage for importing using multiple cron runs for porting
::testImportSourceWithMultipleCronRuns()
. A port of this test is also needed for #2978490: Optimize Feeds queue.Comment #35
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@Fant0m
The number of parameters passed to
new static
in::create()
does not equal the number of parameters in the constructor.So this function may be removed, but even better would be to start to change Feeds so that it can use the standard instead of that it is defining parameters in the annotation.
This is likely also needed for #3001376: entity.query deprecated warnings and as such to make Feeds compatible with Drupal 9 in the future.
So I fear we need to take a step back here and fix other things properly first.
Comment #36
MegaChriz CreditAttribution: MegaChriz as a volunteer commented#3044983: Add test coverage for importing using multiple cron runs is committed!
Comment #37
merauluka CreditAttribution: merauluka at Mediacurrent commentedWe have been seeing this issue on Acquia now that we've switched over to using the production environment. It's created serious issues for us with feeds constantly jamming up the queue system since they cannot find their temporary files.
I have reworked the patch from #28 and allowed the user to select either the public or private file system for storage. I'm sure there's probably a more comprehensive approach to allowing those stream wrappers, but I wanted to get support for the core options first.
Because my version completely moves away from the use of temporary files, I have updated the logic for file name generation as well. I thought it would be helpful to identify the feeds using the bundle so, at a glance, they can be associated with their appropriate feed. I also added on a new method to the FetcherResult the allows for the removal of a file once it's been processed, and a call to remove that file once the feed has finished processing.
Comment #38
oknateThe schema for feeds.settings needs to be updated for the changes in #37:
Comment #39
merauluka CreditAttribution: merauluka at Mediacurrent commentedUpdated my patch with schema changes. This is still going to break tests. But it will break them differently now?
Comment #40
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@merauluka
Thanks for working on this! This is definitively an issue I like to see fixed at some point.
I did made a rough start on this a few weeks ago: I started with copying code from the D7 version. So far I did not came to anything even somewhat workable. The issue I ran into was that
tempnam()
seems to ignore subdirectories. I wanted the generated file to go to private://feeds/in_progress, but the file kept going to just private://.So I think that we need to replace
tempnam()
with something that supports subdirectories.Comment #41
merauluka CreditAttribution: merauluka at Mediacurrent commented@MegaChriz Yeah. When I reviewed the code, it's actually a part of the tempnam process to strip out any directories and just take the file name, append a random string, and save it in temporary storage.
What I did was update the first patch to use either public or private storage (user's choice, default to public), use the same feed file name for the beginning, but I inserted the machine name of the feed type into the name also, then append the random string. This allows a user to see where the stored files came from.
I also had the logic create the files under a subdirectory that is controlled by the module. This was mostly for convenience since allowing the user to create the directory leads to the need for additional discussion/logic around what we would do with files that were in the old directory.
But it's a good start. And I can report that with my patch in place, the issues we were seeing in our load balanced environment, with temporary files being unavailable, are no longer present now that the files are in the private/public directories instead.
Comment #42
merauluka CreditAttribution: merauluka at Mediacurrent commentedAs a follow up comment, this change recently saved a client when some of their feed content was erroring due to changes made by a feed provider. We were able to address the changes and the cached copies of the feeds in the private directory in Drupal allowed us to retrieved almost 2 months worth of missing data.
If the files were still stored in the temporary directory, they would more than likely have been removed before we had a resolution to the issue in place.
Comment #43
bburgPatch #3 works for me, but one observation. Without a default value getting set in an existing installation. The directory just defaulted to one with just a colon ":" as the name directly in the Drupal root.
Comment #44
jorgediazhav CreditAttribution: jorgediazhav as a volunteer commented@bburg, you mean it only works when you do not set a value? By just applying the patch and clearing the cache?
Comment #45
merauluka CreditAttribution: merauluka at Mediacurrent commentedi have run into an issue with my patch. Before my patch, the feeds module relied on the fact that the system periodically empties the temporary folder to handle left over files that were created during an HTTP Fetch operation.
I added the option to move these files to public/private storage instead and added a clean up action to remove the file once the queued articles have been processed.
There are two issues:
I am withdrawing my recommendation to use my patches until a unified file clean up option is available.
Comment #46
merauluka CreditAttribution: merauluka at Mediacurrent commentedComment #47
bburgI noticed that too. The default /tmp directory is usually cleared about by the OS, so using a custom one will require a separate process.
We implemented a cron hook to kick off a process to periodically clear out the files:
Comment #48
n4r3nThis patch worked for me.
Comment #49
n4r3nUpdating HttpFetcherTest case.
Comment #50
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedNote that by using the
public://
destination, you are potentially opening a security hole. Files stored in this directory could in theory be read by anonymous users. They must guess the filename, however.Comment #51
n4r3nThis patch worked for me
Comment #52
n4r3nComment #53
Chris Burge CreditAttribution: Chris Burge at University of Nebraska commentedIs this issue relevant anymore? Beginning in D8.8, Drupal assumes the path to the temp directory is '/tmp' unless you set
$settings['file_temp_path']
in settings.php. The "workaround" described in the issue summary is required configuration for the application to function in a multi-web server environment.Comment #54
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@Chris Burge
I think it is. In theory, files in /tmp could be cleaned up before the Feeds import is complete. Feeds divides an import task into multiple chunks. For large imports (hundreds of items), this can take multiple cron runs to complete. If the webserver cleans up temporary files in between, then the import cannot finish. The more items to import and/or the less times cron runs, the higher the chance that this can happen.
This issue is more or less fixed in the D7 version. I would like to see a port of that version along with tests, rather than what the current patch does. The D7 version makes it configurable to where the files to import are stored and it defaults to
private://feeds/in_progress
(orpublic://feeds/in_progress
when the private file system is not configured). The current patch forces these files to be in the public file system and I think that’s a potential security risk, because files could come from a protected source. Ok, the generated file names could be hard to guess but if the data to import is sensitive, you’d better be safe than sorry and ensure that these files are on the private filesystem.Comment #55
mikalaki CreditAttribution: mikalaki commentedI can assure that problem continues exist on Drupal 8.8, currently working on a site with Drupal 8.8.1 version on it and I am facing the issue.
Comment #56
j9 CreditAttribution: j9 commentedHi and thanks!
I am also getting errors upon attempt to import feeds.
I tried the patch, I tried setting manually a temp directory in settings.php, but haven't gotten through yet.
When I run the import manually from the /admin/content/feed page, I get the feed items imported!
But cron does not seem to import on its own.
In the Recent log messages, it looks like it is because it is being run as an Anon user, instead of manually running as an Admin.
Any thought on what I might try next?
Drupal core 8.8.4
Feeds 8.x-3.0-alpha6
On a shared Hostgator server. Feeds was working just great, but somehow stopped.
Much appreciated and thank you again!
Comment #57
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedMarking as a stable release blocker.
Comment #58
jjwfcd CreditAttribution: jjwfcd commentedthanks for you guys effort and wish to have a release soon.
Comment #59
bkosborne+1 to getting this resolved. Agreed the current patch is not the way to go, since it puts the feeds files in public storage. @MegaChriz, so it seems like we just need to update this patch to use private file system if available or public otherwise, just like D7?
Comment #60
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@bkosborne
Yes, I propose to add temporary files to
private://feeds/in_progress
if the private file system is available, otherwise it would bepublic://feeds/in_progress
. Do you want to help with resolving this issue? If yes, let me know if you understand the proposed resolution + the remaining tasks from the issue summary.Comment #61
jwilson3Just a note that #51 applied cleanly to 3.0-alpha7 but no longer applies to 3.0-alpha8, so in addition to the fixes requested in #60, this needs a reroll.
Comment #62
bburgConflict is in tests/src/Unit/Feeds/Fetcher/HttpFetcherTest.php with the addition of two"use" classes:
Fixed to allow both.
Comment #63
jwilson3We had upgraded to alpha8 and applied #62 but we're still seeing the error on Pantheon.
We're going to dive in here and see what is going on.
Comment #64
jwilson3Facepalm.
The reason the patch wasn’t working was due to a configuration error on my part. I didn't realize the patch exposed a new config option that has to be checked/enabled for things to work at URL /admin/config/feeds_settings
Checkbox:
[ ] Use public directory for temporary feed files.
Site public file system path settings.
Sadly, checking the option doesn’t solve the problem. The messages on screen after a manual feed import are now slightly different:
There should have been 325 items created total. Note that the Path is now pointing to the files folder on Pantheon which is supposedly synced across containers. Something is making the file go away after the first batch size of 100 items is imported.
I don’t know if this is a Feeds problem yet or a Pantheon issue.
Comment #65
jwilson3My facepalm moment above clearly demonstrates that this issue needs work: A) it should be using the private files folder not the public folder, and B) no one should have to manually specify this option.
Comment #66
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@jwilson3
Yes, I would like to see a similar solution as in the D7 version. The D7 version saves the file to import in
private://feeds/in_progress
(orpublic://feeds/in_progress
if the private filesystem is not configured). See also the issue summary.Would you like to work on this? It is on my list to pick up eventually, but it could still take a while before I would get to it myself. There are enough things on the roadmap: #1960800: [meta] Feeds 8.x roadmap. If you do want to pick it up, I hope I can assist you as much as possible. :)
Comment #67
jwilson3@MegaChriz, thanks for your vote of support. I've asked my colleague @marthinal to have a look at making the recommended changes to the patch.
I think I also hit upon a related issue in #64, whereby if the line_limit is too low to process all items in a feed, the file gets deleted prematurely, but i think that would have to be a follow-up ticket. Maybe there already is one, I haven't looked. After i bumped line limit to 1000, AND exported the config yml change for the public folder, I was able to use the manual "import" button to get all of the items ingested.
Comment #68
MegaChriz CreditAttribution: MegaChriz at WebCoo commented@jwilson3
Thanks for your support! :)
Line Limit setting
The line limit setting dictates how many lines of a file should be parsed at a time. Feeds goes through a few import stages and these stages get looped when they are not completed yet:
So as you see, the line_limit setting shouldn't affect how many items get imported during the whole import process. It shouldn't matter if not everything is parsed on the first call. Feeds gets back to the parser after all items from the first parser result are processed.
The reason you won't get everything imported is probably that the fetcher result - which lives in a temporary file when using the HTTP fetcher - is no longer available at the time Feeds wants to get back to the parser for the second iteration.
Comment #69
bburgFor the Pantheon users, Pantheon is in the process of moving to their "compute optimized environment" Which moves away from the container ids in paths. While the issue still exists, I think the bulk of the users here were probably experiencing this on Pantheon.
Comment #70
dpagini CreditAttribution: dpagini commentedJust ran into the issue in #3163897 on our team at Acquia, so I found my way over here. I'm a little hesitant to pull in the patches here with all the discussion about a different approach... so the latest patch seems to just give a setting to either a) use the temp directory (existing functionality), or b) change to use the public directory? And from what I'm reading, if I use the public directory for my files, I won't run into the problem outlined in #3163897, but then my files are available to the public?
So just to try and summarize what I'm reading, it sounds like this PR needs 3 things still...
1. Test coverage for running imports.
2. Use the private directory instead of public, if it exists, and if not, fall back to public.
3. A way to clean up the public/private files after the feeds are processed...?
Does that sound right?
Quick review of latest patch...
I think I saw this in some comments already, but I think we need an update hook to set this for existing sites.
Am I missing something? If the name of the settings is feeds_temporary_directory, would this need to be FALSE to use the public filesystem?
Comment #71
MegaChriz CreditAttribution: MegaChriz at WebCoo commented@dpagini
Yes, that sounds about right. I tried to summarize what needs to be done in the section “Remaining tasks” in the issue summary.
When fetching a file from a HTTP source it needs to be locally stored somewhere, because the file can be too large to process in one go. If you use cron to run imports, then it can happen that multiple cron runs are needed to complete the import. This is because Feeds is given a limited time per cron run to do its job and that time is about one minute. So the file to import must be kept over multiple cron runs.
Right now, the file is stored in a temporary directory indeed and that can be problematic if that directory is getting cleaned between two cron runs and the import is not done yet. So my proposal is to store the downloaded file in the private directory instead (with a fallback to the public directory if there is no private directory).
When the import is done, the downloaded file is no longer useful, so it should be cleaned up after import. Otherwise, the private directory would grow with files that have been imported over time. Files in the temporary directory are not cleaned up by Feeds right now, hence the issue #2778373: Auto-purge old temporary feed downloads.
Tests are needed to overcome exceptions during this process. For example: when a file is manually removed from the private directory, Feeds shouldn’t try to continue the import over and over again. So for that situation we need test coverage.
The approach I proposed is already implemented in the D7 version of Feeds, so some code from the D7 version can perhaps be reused. Or at least the D7 implementation can be used as inspiration here.
Lastly, this issue is on my radar to be picked up soon. The plan of the Feeds team is to first resolve all beta blockers and perhaps also the beta targets. My hope is to release the beta before the end of this year. Some of the beta targeted issues need review/testing, so help on them is welcome. :) See the roadmap: https://www.drupal.org/project/feeds/issues/1960800#feeds-beta-blockers
Comment #72
dpagini CreditAttribution: dpagini commentedThanks @MegaChriz - I'm happy to help a little bit as allowed by my company. =)
I pulled in the last patch here and started using it. I think it will work for me for the time being. I'm pulling in a public feed, so there's no concern storing it on the public directory, and I'm only using 1 feed, so the below concern is not a problem for me.
Also, just worth mentioning that the setting as defined in this patch, as mentioned above, is confusing to me b/c I have to set "temp = true" in order to use the public directory.
Just a question here... if there is more than one feed in a site, is it possible that this line could delete a temp file from another feed?
Comment #73
mstrelan CreditAttribution: mstrelan commentedYes. Attached patch address this.
Comment #74
jerrac CreditAttribution: jerrac commentedSo, I've been watching this issue for a while, ever since I submitted https://www.drupal.org/project/feeds/issues/3163897.
Is there a reason the patch doesn't just add a "Use private" checkbox the same way it adds the "Use public" checkbox?
Comment #75
jwilson3Checkboxes wouldn't make sense with two options, since they're mutually exclusive. It could be made into a single field with two radio options, default to "Use private" but let the user switch to "Use public". This would clarify what is really going on here.
Comment #76
Chris Burge CreditAttribution: Chris Burge at University of Nebraska commentedWhy not make the setting a string and accept a stream-wrapper path? (e.g. 'private://feeds-tmp')? That would provide a greater degree of flexibility and wouldn't encourage people to use
public://
like it'stemporary://
. One of the requirements for the Drupal temp directory path is that it cannot be accessible over the web: file_directory_temp() is deprecated and moved to the FileSystem service. This is a security-based requirement.Comment #77
MegaChriz CreditAttribution: MegaChriz at WebCoo commented@Chris Burge
Good suggestion. The D7 version of Feeds already does this. In the proposed resolution in the issue summary I proposed to have it default to 'private://feeds/in_progress' (or 'public://feeds/in_progress' if no private filesystem is available). In the D7 version this setting is called "feeds_in_progress_dir". I would call it "in_progress_dir" here and make it part of a config object called "feeds.settings".
Comment #78
mstrelan CreditAttribution: mstrelan commentedJust looking at the remaining tasks, it seems that "Log a warning when the file that is progress of being imported no longer exists." could be expanded to also remove the queue entry. Currently every cron run just updates the "expire" timestamp.
Comment #79
mstrelan CreditAttribution: mstrelan commentedAttached patch adds the missing test cases as mentioned in the proposed resolution and completes the task "Write test coverage for aborting an import because of a missing temporary file."
This patch does not attempt to change the existing functionality and therefore I believe it makes sense to commit it as-is before continuing with the remaining tasks.
Comment #80
mstrelan CreditAttribution: mstrelan commentedNot sure why that worked locally. This patch tries to delete the file by it's URI instead of the object.
Comment #81
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@mstrelan
Thanks for working on the tests! And good idea to commit that bit first when that passes.
The first thing to change in the parser configuration, is to limit the CSV parser how many items to parse per batch. By default the line limit is
100
, which means the items from the source file are already parsed on the first cron if it only contains 9 items.However, even after I set the line limit, the test still fails.
testAbortImportWhenTemporaryFileIsDeleted()
:Drupal\feeds\Feeds\Parser\CsvParser::parse():
With a call to
time()
I could verify that one more parser task ran after deleting the file. But strangely enough scanning the directory in CronTest.php results into:But the same code scanning the directory in CsvParser.php (happening later), results into:
Any ideas?
Comment #82
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedCoding standard fixes and additional code comments.
Apparently, the test does pass on PHP 7 :o. Let's see if that happens on other PHP versions too. I tested locally on PHP 7.3 and there it failed.
Comment #83
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedOkay, tests are passing on drupal.org. I have no idea then why they are failing locally.
@mstrelan
If you think this test is good too, I'll commit it.
Comment #84
mstrelan CreditAttribution: mstrelan commentedAh, that explains it, I ran out of time to debug it that far so thanks for explaining it.
Your patch in #82 passes for me in my local PHP 7.4 env. Not sure what happened on your env and not overly familiar with the handling of temporary:// in tests. What is responsible for deleting the downloaded file, maybe there was a temp file hanging around from an earlier run? I think
getInProgressFile()
is a bit flaky because we just assume that a file matching the pattern is the right file rather than checking what's saved in the cron queue.Comment #85
mventure2021 CreditAttribution: mventure2021 commentedCan we get some reasonable solution committed to the dev-x 8.x please?
Comment #86
mstrelan CreditAttribution: mstrelan commented@mventure2021 set your feeds to not import automatically and configure a cron job to run
drush feeds:import <ID>
periodically.Comment #87
smustgrave CreditAttribution: smustgrave at Mobomo commentedThe patch in #82 has anyone used that on their production yet? Is it safe?
Comment #88
achapThis bug caught us out just before we went live :P Changing the temporary directory caused more issues for us then it solved, so we ended up doing as @mstrelan suggested and running from drush directly while turning the other stuff off. I guess it is not subject to the same memory limits so will always complete? Either way, it worked for us. Thanks!
Comment #89
smustgrave CreditAttribution: smustgrave at Mobomo commentedWhat's the current status of this by chance? Do we know when to expect a reasonable solution like #85 mentioned?
Comment #90
jwilson3Our workaround approach is similar to what @mstrelan and @achap have said above, but without Drush.
We've installed Ultimate Cron contrib module, then use Uptime Robot to ping site hosted on Pantheon (which doesnt provide anyway to setup cron jobs through their admin panel) on a more frequent basis than pantheon's deafult of once per hour. Ultimate Cron suggests running cron every minute but we do ever 5 and seem to now avoid the issues.
Ultimate Cron isolates each module's hook_cron implementation into a separate thread or request, and ensures that there is no overlap when running different jobs in one thread which lead to timeouts and/or the need to hit the batch limit. Caveat emptor, YMMV. I'm not 100% certain this is a sure-fire solution for everyone, but it works for our needs for the time being. Be sure to do your own testing with real feeds data sets in real server scenarios.
Comment #91
loopy1492 CreditAttribution: loopy1492 commentedI pegged our site at 3x-dev and applied the #82 patch and drush cron still comes up with:
Comment #92
loopy1492 CreditAttribution: loopy1492 commentedWe have sites on Acquia Cloud and this problem has cropped up after upgrading drupal core. We've tried setting the Acquia environmental variables in settings.php as indicated in their documentation.
This does not seem to have worked. We also tried changing the temp files folder permissions to 770 and that also didn't work. We also tried installing Ultimate Cron and that hasn't worked either.
Last update of the day.
It seems that running the import **manually** from the feed page DOES create the file in the temp directory, but produces the ssl error:
The feed from <em class="placeholder">https://SITE_URL.COM/apps/cfa/public/api.cfc?method=getTrackingForSite&roundKeyList=all</em> seems to be broken because of error "<em class="placeholder">cURL error 60: SSL certificate problem: unable to get local issuer certificate (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)</em>".
And running **cron** produces the does not exist error because, apparently, cron has no idea that it's not running form the Stage environment.
[error] RuntimeException: File <em class="placeholder">/mnt/tmp/SITE/feeds_http_fetcher06g66O</em> does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /var/www/docroot/modules/contrib/feeds/src/Result/FetcherResult.php).
We have a ticket with Acquia now and if they can give us a resolution, I'll be sure to post it here.
Comment #93
vinmassaro CreditAttribution: vinmassaro commentedAnyone have luck with the workarounds getting this to work on Pantheon?
I've got
$settings["file_temp_path"] = sys_get_temp_dir();
set and$config['system.file']['path']['temporary'] = 'sites/default/files/tmp';
with the tmp directory created. When running cron, it always fails:Comment #94
bburg@vinmassaro
I was under the impression that Pantheon had moved all sites to their "Compute Optimized Environments," and the paths with their server bindings were no longer necessary
https://pantheon.io/docs/platform-considerations#compute-optimized-envir...
Also, I think that part of the problem in this issue is that the queue item is saved with the full path to the temporary directory, which includes the non-existent bind id. I think if you can remove those queue items (either by direct SQL or with a module like https://www.drupal.org/project/queue_ui -- I am only guessing you can delete queue items with this), and attempt to re-import once you have new settings in place and finish your feed import process.
Comment #95
Oligerd CreditAttribution: Oligerd at Drupal Ukraine Community commented@vinmassaro
I had the same issue on the Drupal 8 site on the Pantheon.
fixed this issue. No more errors after cron running.
Comment #96
BeauTownsend CreditAttribution: BeauTownsend commented#73 worked for me on a Jenkins/GCP/k8s setup.
For anyone deploying this patch, know that, upon login, I experienced a long load time and, when I executed `drush ws --tail`, I discovered that my site was rolling through what appears to be all of the busted queues and "expiring" each missing temp file. In my case, there were thousands of these, so the browser just sat and spun for a good while before finally logging me in.
Comment #97
vinmassaro CreditAttribution: vinmassaro commented@Oligerd: for your code in #95, are you using it with any of the patches in this thread?
Comment #98
Oligerd CreditAttribution: Oligerd at Drupal Ukraine Community commented@vinmassaro
I am not using any patch. I found that you need to delete the feed from content (here /admin/content/feed), without deleting imported items and then create the same feed. In this case, the old path will be deleted. It needs to create manually this tmp directory inside the files directory. I am using FileZilla to create tmp on every Pantheon environment.
Comment #99
maskedjellybeanI think the suggestion to move tmp directory into sites/default/files as a work around for Pantheon is incorrect. The field description for Temporary directory at /admin/config/media/file-system says "This directory should not be accessible over the web", which to me implies it should be outside of /web (or /docroot). I assume there are security reasons for this. It may not be a security problem for feeds, but feeds is not the only one using the temp directory.
That said, I really need a solution that works with Pantheon (but doesn't also open up potential security holes).
Comment #100
maskedjellybeanOk! I realized that the patch in #73 is not supposed to work immediately after patching. It adds a config item that needs to be set. So after patching, go to /admin/config/feeds_settings and check "Use public directory for temporary feed files". I'll give this a try on Pantheon and report back.
Another thought based on the explanation of the cause of this problem: If this happens when an import takes more than one cron run to complete, perhaps within our feed type settings under Processor settings > Advanced settings, we should at least temporarily make sure that "Force update" is unchecked. I assume this would make the import more likely to take more than one cron to complete, but this is a complete guess.
Comment #101
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@maskedjellybean
"Force update" can make the import process longer yes, but even without that option enabled, an import with hundreds of items can easily take more than one cron run to complete. For every item in the feed, Feeds still has to check if there's an existing item for it and then if it also has changed. Unchecking "Force update" only potentially skips a lot of entity save calls.
In the issue summary I proposed to move the file to import to
private://feeds/in_progress
by default. Only use the public filesystem if there's no private filesystem configured.Comment #102
maskedjellybean@MegaChriz
Thank you for clarifying what force update does.
I'm going to use whatever patch works. I don't believe there is a patch to move the files to private, but there is one to move to public, so that is what I'll use unless I find it doesn't work. For what it's worth, I agree with you that private would be preferable to public, but I wouldn't feel comfortable writing that patch. Unless you're saying the
private://feeds/in_progress
functionality already exists in some version of feeds or in an existing patch?Comment #103
andileco CreditAttribution: andileco at JSI Research & Training Institute, Inc. (JSI) commentedTried to post this yesterday, didn't realize my post didn't go through:
@maskedjellybean, just so you know, Pantheon has added "protected_web_paths" in pantheon.yml. I think you could do something like this:
Comment #104
maskedjellybeanThanks @andelico, I'll have to look into that.
I've found that the patch in #73 is working well for me on Pantheon now that I realized I have to check the checkbox at /admin/config/feeds_settings. Thanks for the patch!
Comment #105
maskedjellybeanAfter inspecting the queue table it seems I may either still be having this issue with the patch applied, or I'm seeing queue items that have been stuck for months. I'll keep an eye on it.
I wanted to share some more info:
In the queue table I found items with a name like "feeds_feed_refresh:FEED_NAME". When I ran
drush cron
I saw a "RuntimeException: File /tmp/feeds_http_fetcher_1_kqRgck does not exist." error for each item. To me this implied these queue items were stuck, would never be processed, and I should delete them manually.To delete items from the queue you can run:
drush queue:delete feeds_feed_refresh:FEED_NAME
To delete items from the queue on a Pantheon environment, you can run:
terminus remote:drush SITE-NAME.ENV queue:delete feeds_feed_refresh:FEED_NAME
Comment #106
tlo405 CreditAttribution: tlo405 commented@maskedjellybean I've been seeing the same issues that you are seeing (with the patch applied). If I remove those stuck items from the queue, then at the next cron run everything appears to work again.
If the queue is never cleared out, updates to the feed I am pulling in are never imported into Drupal. Cron runs every 10 minutes in my site, however we import the feed every 15 minutes. The problem is, each cron run Drupal tries to import those items from the queue (if any exist). It fails due to that "RuntimeException: File /tmp/feeds_http_fetcher_1_kqRgck does not exist." error.
When that exception is thrown, I notice
$feed->finishImport();
gets called. That method ends up pushing the 'next import' into the future. Then 10 minutes later when cron runs again, the same thing happens. Feeds attempts to run those queued up items...it fails again..then pushes the 'next import' date into the future. So updates to the external feed we pull in never make it into Drupal.So for instance, say it's 12 noon, cron is set to run at 12:10, but my 'next import' for my feed is set to run at 12:15. If there are no items in the queue, cron runs at 12:10, and the 'next import' stays at 12:15 (which is correct). Then when cron runs at 12:20, the feed runs correctly. The 'next import' gets set to 12:35 (15 minutes into the future) which is correct.
However, if there is an item in the queue, the cron run at 12:10 attempts to run the queued items and fails. Due to the code in
$feed->finishImport();
, the 'next import' gets pushed from 12:15 to 12:25. However, when cron runs again at 12:20, the queued up items fail once again, then the 'next import' gets pushed to 12:35...and so on. So the 'next import' never runs because it keeps getting pushed into the future every time...therefore Drupal never imports the new data from the feed.So it seems that the patches here do not work. Also, it seems that the patches here now fail against the new beta1 release that came out yesterday.
Comment #107
meladawy CreditAttribution: meladawy as a volunteer commentedThis should be compatible with beta1 or latest dev updates
Comment #108
tsurvuli CreditAttribution: tsurvuli commentedThank you very much for this patch, it's been very helpful so far. However, I upgraded to core 9.3.14 this morning and now the patch will not apply. Any chance of an updated version?
Comment #109
xeiro CreditAttribution: xeiro as a volunteer commentedThank you for the patch @meladawy! I can confirm the patch was working with 9.4.1 however, has failed to apply on 9.4.3.
Comment #110
jmcerda CreditAttribution: jmcerda as a volunteer commentedRe-rolled for latest: 3.x-dev.
Comment #111
xeiro CreditAttribution: xeiro as a volunteer commentedThanks for posting 2912130-82, @jmcerda. The following cron error returns on this version:
RuntimeException: File <em class="placeholder">/tmp/feeds_http_fetcherrsGHI4</em> does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /code/web/modules/contrib/feeds/src/Result/FetcherResult.php).
- D 9.4.3 / PHP 8.0
Comment #112
Topplestack CreditAttribution: Topplestack commented#110 fails for me as well.
Comment #113
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedI'm working on this issue. I've made a draft for this issue, but it is currently tangled with possible fixes for other, but related, issues. The most important of these other issues is #3193610: Items going missing, which only needs some more testing. The other issues I'm working on are #3132198: Clean queue tasks when unlocking a feed to prevent potential data loss and #2778373: Auto-purge old temporary feed downloads.
Comment #114
someshver CreditAttribution: someshver as a volunteer and commentedFixed the apply fail error. Feed Module Version 3.0.0.0-beta2
Comment #115
someshver CreditAttribution: someshver as a volunteer and commentedComment #117
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedI implemented a possible solution for this issue! The code in the branch 2912130-in-progress-directory does the following:
I haven't tested this into the "wild" yet. The only testing I did was with automated tests that I added.
And let's see if it breaks any other tests.
Comment #118
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedHiding all the patches. The one to test is the branch "2912130-in-progress-directory". Tests are failing, so there is still work to be done.
Comment #119
andileco CreditAttribution: andileco at JSI Research & Training Institute, Inc. (JSI) commented@MegaChriz I tested "2912130-in-progress-directory" and the feed run completed in one run (which I didn't expect, but there were no new items, so I guess that makes sense); however I saw this notice in the dblog:
The file private://feeds/in_progress/11 was not deleted because it does not exist.
Comment #120
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedI've fixed the issue reported in #119 by @andileco. The issue was that Feeds tried to clean up files for feed imports that do not use the HTTP Fetcher. This is now fixed by checking upon finishing (or unlocking) an import if there are files to clean up.
I also did some small refactoring in the Feeds Log module, LogFileManager now extends FeedsFileSystemBase. This is because there was a lot overlapping in behavior.
Comment #121
smustgrave CreditAttribution: smustgrave at Mobomo commentedTested the MR on the environment we are seeing it on and still getting the error 100s of times on cron.
RuntimeException: File /tmp/feeds_http_fetchergiFocL does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 53 of /home/site/wwwroot/modules/contrib/feeds/src/Result/FetcherResult.php).
Comment #122
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@smustgrave
Okay, but does it happen with completely new imports too? Because that path is supposed to be the old location where things are saved. For new imports, the temporary file is expected to be stored in private://feeds/in_progress/[feed id].
But thanks for helping with testing!
Comment #123
smustgrave CreditAttribution: smustgrave at Mobomo commentedSorry don't fully follow. We have a number of feeds that we can't recreate as it will cause duplicate issues I believe.
Didn't test with a brand new import though. Think for that test I would have to create across environments and sync the databases I think for a valid test.
Comment #124
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@smustgrave
For clarity, with "new import" I don't mean a new feed. Just make sure that for a particular feed no existing import is still running. If you only have feeds that are in an unfinished import state, try to unlock these. Maybe also clear all existing feeds tasks from the queue table. It is possible that some old ones got stuck there. In the latest release of Feeds I found a solution for cleaning up stale queue tasks, but that only applies for newly created tasks - not tasks that were already there when using older Feeds releases.
Hm, maybe I should think about if there is a way to clean up stale Feeds queue tasks that had been created in the older format. But the challenge for that would be to detect if they are really stale. If cron doesn't run often on a site, they could potentially still belong to an active import.
Comment #125
smustgrave CreditAttribution: smustgrave at Mobomo commentedMay be overkill but maybe a UI to view the items in the queue, their created date. And let the user choose to delete. May be out of scope for this ticket. But does make it hard to test with old values in the queue.
Comment #126
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@smustgrave
For monitoring what's in the queue, there is already the module Queue UI. To find the original created date of a queue item is hard, because when an exception occurs during executing a queue item, the item is getting requeued (and then it gets a new creation date). At least, in the case of Feeds. Sometimes that is a good thing, for example if an external source is temporary unavailable you would want Feeds to retry. But for refining that process (for example implement a way that there is a maximum number of retries) is indeed something for an other issue. See for example #3277999: Add DelayedRequeueException feature to feeds queue - D9.1.
There is by the way an other issue closely related to this one and that is #3158678: FetcherResult::checkFile() throws \RuntimeException, which is never caught. That one could help you out of the loop, but it also stops the retry attempts that I talked about above.
Comment #127
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedSetting this back to "Needs review" because I think that the issue reported here needs to be fixed in an other issue, possibly #3158678: FetcherResult::checkFile() throws \RuntimeException, which is never caught.
Comment #128
smustgrave CreditAttribution: smustgrave at Mobomo commentedSo we are using redis for our queue so maybe the bad values are in there. Clearing the queue table doesn't do anything for us.
Comment #129
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@smustgrave
Hm, that's unfortunate. Since 8.x-3.0-beta3, Feeds automatically cleans up tasks for a specific feed from the queue table whenever that feed finishes an import or when the feed gets unlocked. But if that queue is external, Feeds cannot clean up tasks from that queue. The queue has no API method to clear a specific task from it. The queue API only provides a method to clear a whole queue. The Feeds queues are currently defined per feed type. Perhaps each feed should gets its own queue? That could potentially lead to a large amount of queues, not sure if that's a problem.
Comment #130
smustgrave CreditAttribution: smustgrave at Mobomo commentedSo have to dig more but if the feeds module is querying the queue table directly that seems wrong to me. If it's using the queue service then when a queue is cleared it will clear from what other source it is using.
Experiencing many times drupal default queue will fail and cause performance issues.
Comment #131
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@smustgrave
We should probably discuss the queue thing further in an other issue than this, but see the methods
hasQueueTasks()
andclearQueueTasks()
in FeedImportHandler.hasQueueTasks()
is used by FeedsLockBackend to determine if an import still appears to be active.clearQueueTasks()
is called when finishing an import or when unlocking a feed. When unlocking a feed, it is important that any active imports for that feed get killed and that is done now by removing specific tasks from the queue table.The code:
I didn't see an other way to cleanup stale queue tasks. The only other option is to remove the whole queue, but that would kill imports of other feeds of the same feed type as well. Queues are currently defined per feed type, not per feed.
Comment #132
rdivince CreditAttribution: rdivince commentedTested the MR and also experienced the RuntimeException error with /tmp/feeds_http_fetcher not existing in our load-balanced environment.
Still testing but I wonder if it has something to do with our site also using a s3fs for a remote file system, which is quite useful for load-balanced environments. I noticed the tempnam method in the FeedsFileSystemBase class was using fileSystem->realpath to get the path. The Drupal API docs say regarding realpath:
https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21File%21Fi...
Comment #133
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@rdivince
Thanks helping with testing! I've no experience with s3fs. Can you check what happens with that filesystem if you try to write a file the way
FeedsFileSystemBase::tempnam()
does that? So test that separately, outside of the Feeds context?That's probably because there are still queue items left on the queue. So in order to check if the solution here works for future imports, try to empty these Feeds queues first. One way of doing that is by installing the Queue UI module. It can also be done with the drush command
drush queue:delete
, but then you need to know the exact queue name, which you can see withdrush queue:list
.Comment #134
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@rdivince
I tried to find a way how I could test the patch with a S3 filesystem. It looks like I would need an account for Amazon’s Simple Storage Service (S3) for that. I tried to create an account there but I got stuck at the step asking for Credit Card details. In the Netherlands it is not self-evident to have that. So I’m tempted to commit this without the S3 check.
Comment #135
smustgrave CreditAttribution: smustgrave at Mobomo commentedFYI after I cleared my redis queues and applied this patch across all our environments we are no longer seeing the problem.
We are using Azure with a share file mount. Our prod and stage are scale up to multiple instances as well.
Comment #136
rdivince CreditAttribution: rdivince commented@MegaChriz thanks for the info! I was able to clear the queue with the drush commands provided. But it did not fix the import errors. I tested file writing with tempnam outside of the feeds context but did not have any luck. But I did get the MR to work as followed:
In
FeedsFileSystemBase::tempnam()
, instead of writing a temporary file with tempnam, I wrote a file w/\Drupal::service('file.repository')->writeData()
. This method should support local and remote stream wrappers. Then I got additional error information that the getAsync call inHttpFetcher::get()
appears to open a file with "w+" permissions by default which s3 (and perhaps other remote stream wrappers) do not seem to allow: https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s3-stream-wra...First opening the stream through
GuzzleHttp\Psr7\Utils::tryFopen()
with "w" mode and passing that filestream as the sink to getAsync worked perfectly. Not sure if other use cases require "w+"While these tweaks to the MR successfully resolved my errors with s3fs, I know mine is just one of many use cases and you'll know those better than me. So I've attached my tweaks as a patch and would like to hear your thoughts.
Comment #137
rdivince CreditAttribution: rdivince commentedMarking as hidden
Comment #138
pingwin4egThe fix from the merge request works - the default private directory (private://feeds/in_progress/#) is used instead of the temporary one.
For those who still see the error with the tmp file, check also other fetcher classes. For example, in our client's project there are also the feeds_http_auth_fetcher module with it's own fetcher, and the custom one, and existing feed entities use all 3. So I had to change them too similar to the http fetcher.
Comment #140
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@rdivince
I had been giving Feeds a low priority for the past month, so I guess that for your patch I'll need to dive deeper to find out how remote file systems work. So because of that I'm merging the changes of this issue now and move your patch to a follow-up issue. Perhaps we can find a way to add test coverage for remote stream wrappers too?
Comment #141
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@rdivince
I've opened #3351330: Add support for remote file systems for http downloaded data.
Comment #143
bkosborneGlad to see this landed! thanks everyone who contributed.
@MegaChriz any idea when you might cut a new beta that includes this?
Comment #144
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@bkosborne
I hope within a month from now. There is a follow-up issue that I would like to include in the next release as well: #3158678: FetcherResult::checkFile() throws \RuntimeException, which is never caught. While the fix for this issue only aims to prevent imports ending abruptly for new imports, that other issue should help cleaning up existing imports that are (still) stuck in a loop as a result of the issue that was reported here.
Comment #145
C-LogemannI understand this new feature for fixing loadbalancing issues. But this new tmp files inside the private folder can also break some backup strategies. So where is the "configurable" part?
Comment #146
jwilson3Use the
in_progress_dir
config element in the Feed module's feeds.settings.yml config file.The default location is now
private://feeds/in_progress
assuming you've configured a private files directory, otherwise, it ispublic://feeds/in_progress
.Review the commit from above for additional context:
https://git.drupalcode.org/project/feeds/-/commit/4062e01bb8471254d9daa5...
Comment #147
C-Logemann@jwilson3 Thanks for the description. So it's a good candidate to set in settings.php
In many situations I think configuration via settings.php is enough especially for very special functionality. And this feature is such a special thing to solve load balancing problems but can rise problems for non advanced admins. So maybe it would be fine to make make this feature and a setting for it more visible. But it's just an idea.
For our customers we will just exclude this folder via backup scripts. This we already do with other temporary folders like image styles rendered css etc.