Problem/Motivation

Found out that when Feed Import is executed either manually imported or queued thru cron / drush command, it always create a file in local folder that remains forever unless deleted manually.

This Patch is to delete the file created if error exception is returned or an empty response body is returned.

Steps to reproduce

Proposed resolution

Delete the temporary file when fetching fails or when the fetched document is empty. Combine work with what's done in #3372368: in_progress filesystem grows indefinitely.

Remaining tasks

  • Add test coverage for:
    • When a feed is not modified.
      I know that there are existing tests that work with the 304 response status. \Drupal\feeds_test_files\Controller\CsvController::nodes() optionally returns that status. But perhaps this could be covered in a simpler way as well, with a Kernel test.
    • When the fetched document is empty.

User interface changes

API changes

Data model changes

Issue fork feeds-3080098

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

dharnell.a007 created an issue. See original summary.

megachriz’s picture

Status: Active » Needs work
Issue tags: -empty file, -Error Exception +Needs tests

Thanks for the patch!

Do you also want to write a kernel or functional test for this case?

jcnventura’s picture

This seems to be a better version of the patch in #3157339: HTTP Fetcher does not remove temp files if it throws an exception. Maybe mark that other issue as a duplicate?

jcnventura’s picture

hugronaphor’s picture

StatusFileSize
new1.68 KB

Re-roll for alpha10

jcnventura’s picture

Status: Needs work » Needs review
StatusFileSize
new1.28 KB
new2.46 KB

Seems that the reason for the failed tests is that unlink is not expected by the file system mock.

jcnventura’s picture

Tests passed. The problem I see now, @MegaChriz is that the exception should not be thrown, and instead of an empty result should be returned. The rest of feeds should then handle that empty result gracefully. Should a new EmptyResult class be added and returned? Or instead return NULL?

heikkiy’s picture

We tested this patch and it seems to fix the error mentioned in #3390530: Error: Call to a member function unlink() on null in Drupal\feeds\Result\HttpFetcherResult->cleanUp(). One thing that this seems to cause is that empty temporary files are never removed and this might fill up the temporary files folder

We have setup a custom cronjob to delete temporary files created by Feeds after 3 days but I think this should be also handled in the module that old temporary files are periodically removed?

megachriz’s picture

@HeikkiY

I think this should be also handled in the module that old temporary files are periodically removed?

The issue is that based on the age of a file, Feeds cannot determine if a file can be removed or not. A file could still be used in an active import. If for example cron runs only once a day and there is an import going on that takes more than 10 cron runs to complete, files that are older than a week could still be in use.

heikkiy’s picture

@megachriz

I can understand the problem. One thing comes to my mind is that you can create different feed types in admin/structure/feeds where you can for example set how often the feed type is imported. Maybe there could be a setting also which would define how long temporary files are kept and based on that clean out old temporary files? Of course that is most likely outside of this issues scope so it would need a new ticket for that.

heikkiy’s picture

I reviewed our logs with the current patch from this issue and it seems like it didn't solve my problem in #3390530: Error: Call to a member function unlink() on null in Drupal\feeds\Result\HttpFetcherResult->cleanUp(). Even with this patch I am getting the same error about unlinking NULL values:

Error: Call to a member function unlink() on null in Drupal\feeds\Result\HttpFetcherResult->cleanUp() (line 60 of /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/web/modules/contrib/feeds/src/Result/HttpFetcherResult.php)

#0 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/web/modules/contrib/feeds/src/FeedsExecutable.php(335): Drupal\feeds\Result\HttpFetcherResult->cleanUp()
#1 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/web/modules/contrib/feeds/src/FeedsExecutable.php(119): Drupal\feeds\FeedsExecutable->finish()
#2 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/web/modules/contrib/feeds/src/Plugin/QueueWorker/FeedRefresh.php(42): Drupal\feeds\FeedsExecutable->processItem()
#3 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/web/core/lib/Drupal/Core/Cron.php(268): Drupal\feeds\Plugin\QueueWorker\FeedRefresh->processItem()
#4 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/web/core/lib/Drupal/Core/Cron.php(233): Drupal\Core\Cron->processQueue()
#5 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/web/modules/contrib/ultimate_cron/src/UltimateCron.php(69): Drupal\Core\Cron->processQueues()
#6 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/web/modules/contrib/ultimate_cron/src/ProxyClass/UltimateCron.php(70): Drupal\ultimate_cron\UltimateCron->run()
#7 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/drush/drush/src/Drupal/Commands/core/DrupalCommands.php(69): Drupal\ultimate_cron\ProxyClass\UltimateCron->run()
#8 [internal function]: Drush\Drupal\Commands\core\DrupalCommands->cron()
#9 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/consolidation/annotated-command/src/CommandProcessor.php(276): call_user_func_array()
#10 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/consolidation/annotated-command/src/CommandProcessor.php(212): Consolidation\AnnotatedCommand\CommandProcessor->runCommandCallback()
#11 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/consolidation/annotated-command/src/CommandProcessor.php(176): Consolidation\AnnotatedCommand\CommandProcessor->validateRunAndAlter()
#12 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/consolidation/annotated-command/src/AnnotatedCommand.php(391): Consolidation\AnnotatedCommand\CommandProcessor->process()
#13 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/symfony/console/Command/Command.php(326): Consolidation\AnnotatedCommand\AnnotatedCommand->execute()
#14 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/symfony/console/Application.php(1081): Symfony\Component\Console\Command\Command->run()
#15 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/symfony/console/Application.php(320): Symfony\Component\Console\Application->doRunCommand()
#16 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/symfony/console/Application.php(174): Symfony\Component\Console\Application->doRun()
#17 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/drush/drush/src/Runtime/Runtime.php(124): Symfony\Component\Console\Application->run()
#18 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/drush/drush/src/Runtime/Runtime.php(51): Drush\Runtime\Runtime->doRun()
#19 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/drush/drush/drush.php(79): Drush\Runtime\Runtime->run()
#20 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/drush/drush/drush(4): require('/var/www/html/b...')
#21 /var/www/html/builds/2023-11-13.18-24-53.commit--5061a65db/vendor/bin/drush(117): include('/var/www/html/b...')
#22 {main}

This seems to be the only PHP related error currently in the environment.

megachriz’s picture

I see that this issue overlaps with #3372368: in_progress filesystem grows indefinitely. For the other one there is a fix that specifically handles 404 situations. The patch in #6 shows that it would be good to have automated tests for other cases as well:

  • When a feed is not modified.
    I know that there are existing tests that work with the 304 response status. \Drupal\feeds_test_files\Controller\CsvController::nodes() optionally returns that status. But perhaps this could be covered in a simpler way as well, with a Kernel test.
  • When the fetched document is empty.

megachriz’s picture

Issue summary: View changes

I've added test coverage for this issue. Before also adding the fix, I think it would be good if #3372368: in_progress filesystem grows indefinitely gets in first. Please review that one, if you have time. :)

  • MegaChriz committed c45fd241 on 8.x-3.x
    Issue #3080098 by MegaChriz, jcnventura, dharnell.a007, hugronaphor:...
megachriz’s picture

Status: Needs review » Fixed
Issue tags: -Needs tests

All tests are passing! Merged the code!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

jcnventura’s picture

mitechworld’s picture