When I import large CSV document, i get following errors. Anybody know Why? This also happens when cron run. Is there any fix out there?

* Warning: fopen(public://feeds/FeedsHTTPFetcherResult1295145004_1) [function.fopen]: failed to open stream: "DrupalPublicStreamWrapper::stream_open" call failed in FeedsFetcherResult->sanitizeFile() (line 88 of /home2/santhos1/public_html/deals/sites/all/modules/feeds/plugins/FeedsFetcher.inc).
* Warning: fgets(): supplied argument is not a valid stream resource in FeedsFetcherResult->sanitizeFile() (line 89 of /home2/santhos1/public_html/deals/sites/all/modules/feeds/plugins/FeedsFetcher.inc).
* Warning: fclose(): supplied argument is not a valid stream resource in FeedsFetcherResult->sanitizeFile() (line 90 of /home2/santhos1/public_html/deals/sites/all/modules/feeds/plugins/FeedsFetcher.inc).
* Warning: fopen(public://feeds/FeedsHTTPFetcherResult1295145004_1) [function.fopen]: failed to open stream: "DrupalPublicStreamWrapper::stream_open" call failed in ParserCSVIterator->__construct() (line 20 of /home2/santhos1/public_html/deals/sites/all/modules/feeds/libraries/ParserCSV.inc).
* Recoverable fatal error: Argument 2 passed to FeedsProcessor::process() must be an instance of FeedsParserResult, null given, called in /home2/santhos1/public_html/deals/sites/all/modules/feeds/includes/FeedsSource.inc on line 350 and defined in FeedsProcessor->process() (line 102 of /home2/santhos1/public_html/deals/sites/all/modules/feeds/plugins/FeedsProcessor.inc).

CommentFileSizeAuthor
#97 interdiff-1029102-96-97.txt486 bytesgreenSkin
#97 feeds-remove-permanent-file-after-import-1029102-97.patch13.69 KBgreenSkin
#96 interdiff-1029102-94-96.txt1.5 KBMegaChriz
#96 feeds-remove-permanent-file-after-import-1029102-96.patch13.65 KBMegaChriz
#94 interdiff-1029102-92-94.txt598 bytesMegaChriz
#94 feeds-remove-permanent-file-after-import-1029102-94.patch13.63 KBMegaChriz
#92 interdiff-1029102-90-92.txt8.05 KBMegaChriz
#92 feeds-remove-permanent-file-after-import-1029102-92.patch13.65 KBMegaChriz
#90 interdiff-1029102-88-90.txt2.94 KBMegaChriz
#90 feeds-remove-permanent-file-after-import-1029102-90.patch7.92 KBMegaChriz
#88 feeds-remove-permanent-file-after-import-1029102-88.patch6.55 KBMegaChriz
#86 feeds-remove-permanent-file-after-import-1029102-86.patch9.99 KBMegaChriz
#84 feeds-remove_permanent_file_after_import-1029102-84.patch4.19 KBMegaChriz
#73 feeds-remove_permanent_file_after_import-73.patch3.86 KBstefan.r
#68 feeds-remove_permanent_file_after_import-67.patch3.9 KBstefan.r
#68 feeds-remove_permanent_file_after_import-66.patch3.86 KBstefan.r
#65 feeds-remove_permanent_file_after_import-63.patch3.84 KBstefan.r
#63 feeds-remove_permanent_file_after_import-63.patch3.84 KBstefan.r
#46 feeds-remove_permanent_file_after_import-36.patch1.83 KBstefan.r
#29 feeds-redownload_remote_file-1801680.patch1009 bytesMTecknology
#25 feeds-redownload_remote_file-1801680.patch1009 bytespavel.karoukin
#4 blob.patch871 bytesdema502
Screenshot.png103.18 KBsanthoshabraham
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hwasem’s picture

I know it has been a while, but I'm curious to know if you ever figured this problem out. I'm getting similar errors on my Drupal 7 feeds import. I'm only importing around 100 items, but still getting the errors. I'm using Feeds 7.x-2.0-alpha4. It only happens during Cron for me, as well.

SeriousMatters’s picture

Version: 7.x-2.0-alpha3 » 7.x-2.0-alpha4

I am encountering the same error when running cron to import large csv file. using 7.x-2.0-alpha4

* Warning: fopen(public://feeds/FeedsHTTPFetcherResult1322621526_0) [function.fopen]: failed to open stream: "DrupalPublicStreamWrapper::stream_open" call failed in FeedsFetcherResult->sanitizeFile() (line 87 of /public_html/sites/all/modules/feeds/plugins/FeedsFetcher.inc).

* and more errors as a consequence. same as OP above

/sites/default/files/feeds directory is empty at the time of this error. I think this is the trigger of failed to open stream

On /import/my_importer, the status is stuck at "importing - 57% complete", which indicates cron is trying to continue an import started before.

if I go into database feeds_source table to change both status and fetcher_result fields to 'b:0;' (boolean -> 0), I can restart the import from 0%. And /sites/default/files/feeds now contains a fetcher result file.

So the problem is when cron try to continue a previous feeds import using the saved /sites/default/files/feeds/FeedsHTTPFetcherResult1322621526_0 , the file is deleted for some reason.

I will try to increase the frequency of cron and/or decrease the frequency of import to see if it helps.

marcus_w’s picture

It's been almost a year now, still no fix? Anyone has the slightest idea what might cause this? I've got the same issue.
My CSV has about 1500 lines...

Thanks,

Mark

dema502’s picture

FileSize
871 bytes

This patch helped me to solve similar problem with Postgresql

emackn’s picture

Status: Active » Postponed (maintainer needs more info)

have you tried the dev version?

mrfelton’s picture

I had a batch running, and my browser crashed. After that, the feed importer was stuck at 75% and the button to import was greyed out making it impossible for me to stop or restart the import. After running the following I was able to run the import again.

update feeds_source set state = 'b:0;';
update feeds_source set fetcher_result='b:0;';
strr’s picture

CSV Import of a 10K file is stopping at about 9-12% - tried more than 5 times.
I also tried with the latest dev version. Is there any fix for?

This is some of my error:

Warning: fopen(public://feeds/FeedsHTTPFetcherResult1337690085) [function.fopen]: failed to open stream: "DrupalPublicStreamWrapper::stream_open" call failed in FeedsFetcherResult->sanitizeFile() (Zeile 87 von /is/htdocs/wp1021/www/website/modules/feeds/plugins/FeedsFetcher.inc).

Warning: fgets(): supplied argument is not a valid stream resource in FeedsFetcherResult->sanitizeFile() (Zeile 88 von /is/htdocs/wp1021/www/website/modules/feeds/plugins/FeedsFetcher.inc).

Warning: fclose(): supplied argument is not a valid stream resource in FeedsFetcherResult->sanitizeFile() (Zeile 89 von /is/htdocs/wp1021/www/website/modules/feeds/plugins/FeedsFetcher.inc).

Warning: fopen(public://feeds/FeedsHTTPFetcherResult1337690085) [function.fopen]: failed to open stream: "DrupalPublicStreamWrapper::stream_open" call failed in ParserCSVIterator->__construct() (Zeile 19 von /is/htdocs/wp1021/www/website/modules/feeds/libraries/ParserCSV.inc).

Recoverable fatal error: Argument 2 passed to FeedsProcessor::process() must be an instance of FeedsParserResult, null given, called in /is/htdocs/wp1021/www/website/modules/feeds/includes/FeedsSource.inc on line 356 and defined in FeedsProcessor->process() (Zeile 101 von /is/htdocs/wp1021/www/website/modules/feeds/plugins/FeedsProcessor.inc).

cimo75’s picture

very same error as #7

strr’s picture

please, can someone helb to get started for a fix ...

Rob_Feature’s picture

Version: 7.x-2.0-alpha4 » 7.x-2.x-dev
Priority: Normal » Major
Status: Postponed (maintainer needs more info) » Active

Still seeing this in the latest dev as of May 29, 2012 release. Unfortunately I don't know where to start on this...marking as active and major since it seems to kill feed importing.

that0n3guy’s picture

I'm seeing this issue for the first time today as well... I just updated to dev yesterday... so maybe its a bug in dev.

tomcatuk’s picture

Same problem here. I'm looking at http://drupal.org/node/1454666, but presumably the problem is with a record in the feed. Is there a way to track down which record the feed failed on?

that0n3guy’s picture

@tomcatuk, Its not as issue w/ a stuck feed. My feed isn't locked, its just stuck, so that patch doesn't help me (I tested it).

that0n3guy’s picture

Using latest dev, I get this error, which is only slightly diffferent than the one above:

Warning: fopen(public://feeds/FeedsHTTPFetcherResult1339699961) [function.fopen]: failed to open stream: "DrupalPublicStreamWrapper::stream_open" call failed in FeedsFetcherResult->sanitizeFile() (line 87 of D:\zendserver\Apache2\htdocs\drupal7\sites\all\modules\feeds\plugins\FeedsFetcher.inc).
Warning: fgets() expects parameter 1 to be resource, boolean given in FeedsFetcherResult->sanitizeFile() (line 88 of D:\zendserver\Apache2\htdocs\drupal7\sites\all\modules\feeds\plugins\FeedsFetcher.inc).
Warning: fclose() expects parameter 1 to be resource, boolean given in FeedsFetcherResult->sanitizeFile() (line 89 of D:\zendserver\Apache2\htdocs\drupal7\sites\all\modules\feeds\plugins\FeedsFetcher.inc).
Warning: fopen(public://feeds/FeedsHTTPFetcherResult1339699961) [function.fopen]: failed to open stream: "DrupalPublicStreamWrapper::stream_open" call failed in ParserCSVIterator->__construct() (line 19 of D:\zendserver\Apache2\htdocs\drupal7\sites\all\modules\feeds\libraries\ParserCSV.inc).
Recoverable fatal error: Argument 2 passed to feeds_tamper_feeds_after_parse() must be an instance of FeedsParserResult, null given in feeds_tamper_feeds_after_parse() (line 19 of D:\zendserver\Apache2\htdocs\drupal7\sites\all\modules\feeds_tamper\feeds_tamper.module)
that0n3guy’s picture

I got mine running again.

I took my csv file that feeds was trying to import and copied it to public://feeds/ (in my case: sites/default/files/feeds). Then I renamed it from "jobinfo.csv" to "FeedsHTTPFetcherResult1339699961". Now my cron works. I chose the name "FeedsHTTPFetcherResult1339699961" because in the error its looking for the file with that name.

This is kind of what SeriousMatters was saying in #2 I think.

So the real issue isn't really this error (all though it should have a better notification than this and reset it self), but the question "where has the file gone".

gordns’s picture

I had this error and it seems to have been a permissions issue on the "files/feeds" folder (wrong owner). It was fixed by deleting the folder and then let the server recreate it on the next manual import.

tomcatuk’s picture

@that0n3guy you're a gem. Anyone else reading this, that0n3guy is on the money - I followed his suggestion in #15 and can now run cron again. Do think this is something that requires a more permanent fix though.

ErnestoJaboneta’s picture

#16 seems to have been my problem as well. The owner was right but I changed the permissions anyway since I couldn't delete the directory. I changed the permissions to 775 and the message seems to have gone away. Not sure what security issues this causes though...

Nevermind......

resullivan’s picture

I ran into this due to the fact I ran out of requests to the database. My server limits it to 75,000 per user per some amount of time. Right now mine is stuck at 69% with the button grayed out.

ErnestoJaboneta’s picture

If you have a guid set up correctly, you can always start the import over and it will skip over the existing imported items. Though you'd have to go into the database table "feeds_source" and delete the row for the feed. This will get rid of the grayed out button and the source of the feed.

spuky’s picture

#15 nails the problem and I guess it is happening only if your import takes longer than 6 hours...

I discovered that the import file has a status of 0 in the file_managed table which means it gets deleted after 6 hours... so if your import is not done but the time of the file is over drupal core just kills it.

the reason is: http://api.drupal.org/api/drupal/modules%21system%21system.module/consta...

a constant in Drupal that defines how long temp files should live...

I changed that value in modules/system/system.module on line 11

will report back tomorrow, if it helped with my large import but an final fix should be that feeds is updating the timestamp of the temp file as long as it's importing form it...

node9’s picture

+1

I have a CSV feed that jams up at 3% each time cron fires off.

Edit: I kept running cron and noticed that, each time, it stops at 6%, then 10%, 13%, and so on. I expected the whole feed to run, but it runs in small increments. In my case, feeds is not jammed. I just didn't realize cron will only run part of the feed each time.

pavel.karoukin’s picture

I had similar problem. File is too large and set to import over nights only and during 3 days. So file is automatically removed due being set as Temporary File by Feeds module. Simple patch solved this problem for me - http://drupal.org/node/1801680#comment-6551192

jos_s’s picture

Thanks, that helped me a lot. I had the same problems, but was able to recover this way.

pavel.karoukin’s picture

Moving from duplicate issue to in here:

I've found easy solution - re-download remote file if local file was removed. Patch attached. This potentially might cause problems if feed file will change between downloads. Not sure so. Discussion is welcome.

twistor’s picture

Tagging this.

This is a tricky issue. We could have Feeds manage the files itself. The simple way would be to delete the file on completion, but we cannot guarantee that a feed will finish without failing, and files could potentially add up. See #955236: Handle fatal errors for entity create with better error handler function.

We also could name the files in a more predictable manner, like importer_source_timestamp, and delete old files that match a pattern when starting a new import.

ari-meetai’s picture

Version: 7.x-2.x-dev » 7.x-2.0-alpha6

I can see this a mix of issues, where the main one seems to be better tittled after this one.

I've created a new issue, seemingly more related to #6 here, as my case is with a very big CSV file already on the server.

MTecknology’s picture

Version: 7.x-2.0-alpha6 » 7.x-2.x-dev

This is still an issue. This seems to happen when:
1) Large file gets downloaded and processed in the background
2) Incremental cron jobs import pieces of the file (not all at once)
3) The next document import happens before enough cron jobs ran to import the document

It seems that this is directly related to the "Process in background" option. "Periodic import" with "Process in background" on large documents results in issues.

The web ui option to fix this seems to be to make sure that the "Periodic import" is set high enough that enough cron executions will happen to make sure the document gets fully imported before the next run. Alternatively, you could just uncheck the process in background option.

I opted to just disable the process in background option, set periodic import to 6hr, and run cron every 15min.

The fix is likely to first add a warning about background processing, followed by:
On cron, check if periodic import is running, if it is properly close that execution, throw an error/warning that the partial was truncated, start a new import.
OR: On cron, check if periodic import is running, if it is then process the whole rest of the file, throw a warning that the partial didn't finish before next import and continue as usual with the new import.
OR: On cron check if periodic import is running, if it is then process the next segment as it would without the new import, throw an error/warning that the new import hasn't started yet because the old is still running.

Three options that would resolved this bug... I don't know which would be best.

MTecknology’s picture

This patch from a duplicate bug report seems to do the trick.

Edit: I just noticed that someone linked to this same patch earlier. Sorry about that. It does work, though. :)

rudiedirkx’s picture

Status: Active » Needs review
rudiedirkx’s picture

Status: Needs review » Reviewed & tested by the community

Confirmed with import of 51 items (2 batches) and manual cron runs (and manual file deletion). Perfect. I've also added a watchdog inside the !file_exists(), but that was purely debugging and is absolutely not necessary in the patch.

twistor’s picture

Status: Reviewed & tested by the community » Needs work

That patch doesn't quite fix the issue. It will just cause the file to be re-downloaded. This could lead to potentially missing data, if there was data in the previous file that doesn't exist in the current file, or if new data was added, the pointer that the CSV parser uses can be off.

That issue needs work as well, since the import should abort/restart if the file goes missing.

Ganginator’s picture

I'm also experiencing this issue on a 537KB .csv, halts at 6% max.
I even have it set to not update current nodes, which I assume means it should skip over the ones it has already processed.

henkit’s picture

FYI Patch #29 also fixed the problem i had like the one below on running cron:

Recoverable fatal error: Argument 2 passed to feeds_tamper_feeds_after_parse() must be an instance of FeedsParserResult, null given in feeds_tamper_feeds_after_parse() (regel 19 van ******/public_html/sites/all/modules/feeds_tamper/feeds_tamper.module).

Thanx!

checker’s picture

I think twistor is right. Re-download is not the right way. You should only use this patch (#29) if you import from a static file.

rudiedirkx’s picture

Why isn't the fix to make it a permanant file (easy) and remove it after import (easy?).

I don't use Feeds a lot, so I probably won't be making a patch.

mhenning’s picture

Two ways I found to get around this issue:

1) Increase the number of items that are processed each run by editing the variable feeds_process_limit. This is set at 50 by default. I was unable to find any place in the admin user interface to set it, so I upped it to 300 in code:

variable_set('feeds_process_limit', 300);

This impacts all feeds and I don't see an easy way to apply to one particular feed.

2) Update the time that temporary files stick around. I was able to change this to 24 hrs (up from the default of 6 hrs) without editing core files by adding this to sites/default/settings.php:

define('DRUPAL_MAXIMUM_TEMP_FILE_AGE', 86400);

rudiedirkx’s picture

@mhenning You should turn notices on. Defining a constant twice results in notices, if you have good error reporting. Probably not the best idea. The variable is good, but it's not a solution for BIIIG files.

mgifford’s picture

Status: Needs work » Needs review
rudiedirkx’s picture

@mgifford What about #32?

mgifford’s picture

Sorry, missed that.

crutch’s picture

#37 - 2) - Drupal 7

When adding to settings.php, it says it's declared twice with that statement. Would this be the correct way?

$conf ['DRUPAL_MAXIMUM_TEMP_FILE_AGE'] = '86400';

rudiedirkx’s picture

@crutch No. It's a constant, not a variable. It's not a good method.

I'll try to make #36 tomorrow.

rudiedirkx’s picture

Issue summary: View changes
Status: Needs review » Needs work
sinn’s picture

any progress?

stefan.r’s picture

Rough implementation of #36

stefan.r’s picture

Status: Needs work » Needs review
rudiedirkx’s picture

@stefan.r How did you test that? Set DRUPAL_MAXIMUM_TEMP_FILE_AGE to something low and run cron manually?

stefan.r’s picture

@rudiedirkx no I have not tested this using cron yet, just on a single manual import. DRUPAL_MAXIMUM_TEMP_FILE_AGE should not affect this patch since it's a permanent file now.

rudiedirkx’s picture

"should" is the reason we test =) You have to recreate the exact context where this was a problem and then try the fix. I'll try that later this week.

AlfTheCat’s picture

Tried this on the lates dev but the patch from #46 fails

patching file FeedsSource.inc
Hunk #1 succeeded at 368 (offset -37 lines).
patching file FeedsFetcher.inc
Hunk #1 FAILED at 11.
Hunk #2 FAILED at 50.
Hunk #3 FAILED at 62.
3 out of 3 hunks FAILED -- saving rejects to file FeedsFetcher.inc.rej
stefan.r’s picture

@AlfTheCat: this patch is to 7.x-2.x-dev, you can get it at http://ftp.drupal.org/files/projects/feeds-7.x-2.x-dev.tar.gz

I tested and it still applies fine...

MTeck’s picture

I updated to the dev version and updated the database since there's a schema change. Then I triggered an update (foreground) from the web ui. It worked fine. Then I tried to trigger cron using drush and I saw that the import never finished. It would reach the end, the continue updating from the beggining. I saw it run through five times with drush cron and then used ^C to kill it. Trying to remove the patch or downgrade didn't help.

The patch in #46 seems solid and I imagine it should work perfect. This is probably a new/separate bug.

robertwb’s picture

I applied the patch in #46 successfully. I will note a couple of things that I think may be relevant:

  1. Before applying the patch, or at least before testing it, "Unlock" any stuck feeds from the import screen
  2. After applying the patch, feeds can still become stuck, but they seem to resume the next time cron runs and complete successfully
stefan.r’s picture

To those of you that applied the patch, did it solve your issue or did it merely "apply successfully"? :)

AlfTheCat’s picture

On my end the patch still fails.
I'm using the latest dev but I also tried to manually download the mentioned tarball (which is still the same version as I downloaded via Drush) but the patch keeps failing

Hunk #1 FAILED at 405.
1 out of 1 hunk FAILED -- saving rejects to file b/includes/FeedsSource.inc.rej
patching file b/plugins/FeedsFetcher.inc
Hunk #1 FAILED at 11.
Hunk #2 FAILED at 50.
Hunk #3 FAILED at 62.
3 out of 3 hunks FAILED -- saving rejects to file b/plugins/FeedsFetcher.inc.rej

This is in the module's .info file:

version = "7.x-2.0-alpha8+33-dev"
core = "7.x"
project = "feeds"
datestamp = "1392084807"
stefan.r’s picture

@AlfTheCat try the following...

$  git clone --branch 7.x-2.x http://git.drupal.org/project/feeds.git
$ cd feeds
$ wget https://drupal.org/files/issues/feeds-remove_permanent_file_after_import-36_0.patch
$ patch -p1 < feeds-remove_permanent_file_after_import-36_0.patch 
patching file includes/FeedsSource.inc
patching file plugins/FeedsFetcher.inc

stefan.r’s picture

@rudiedirkx or anyone else, as per #50 has anyone managed to try see whether this fixed their issue?

rudiedirkx’s picture

Status: Needs review » Needs work

I tried it and found files aren't deleted. It seems, if you 'cancel' the batch, the file is orphaned. It also didn't delete the file when I cronned 10 times to make it finish. It finished the file and moved on to the next (I set up EVERY cron run to import), but didn't delete the previous.

Maybe it needs a cron to delete orphaned files? Or maybe I'm doing it wrong.

rudiedirkx’s picture

Unlocking the import also orphans the file (bad) and downloads a new one next cron run (good, in my test case).

stefan.r’s picture

stefan.r’s picture

Status: Needs work » Needs review
stefan.r’s picture

Status: Needs review » Needs work

The last submitted patch, 65: feeds-remove_permanent_file_after_import-63.patch, failed testing.

The last submitted patch, 63: feeds-remove_permanent_file_after_import-63.patch, failed testing.

stefan.r’s picture

stefan.r’s picture

stefan.r’s picture

Status: Needs work » Needs review

@rudiedirx: I am trying to reproduce the "files not getting deleted" issue but I can't. For me the file always deletes if it finishes importing, whether batched, in background or in multiple iterations.

Do you see any "Imported in n s" messages in the feeds log to confirm the feed has finished importing? And this is using the CSV importer? Are there any error messages in your error log?

Patch feeds-remove_permanent_file_after_import-67.patch should delete after unlock as well now and contains a rough implementation of the orphaned file deletion on cron (file needs to be older than a minute though).

Untested yet but hoping it will help the issue move forward. If this is the right approach and it fixes people's problems I can write automated tests.

Status: Needs review » Needs work

The last submitted patch, 68: feeds-remove_permanent_file_after_import-67.patch, failed testing.

rudiedirkx’s picture

The one where it finishes succesfully and didn't delete the file was a fluke, probably.

But there's still the possibility that the batch doesn't finish (not relevant in cron) and the import is interrupted and the file is left there, but there's no way to finish the import. (Or will cron take over then..?)

I have a local testcase now, so I'll try out more tomorrow.

stefan.r’s picture

New patch attached, let's see if we can get this to turn green :)

@rudiedirkx if the file is just left there, cron should take care of it. During every cron load it will load the temporary files associated to all current Feed sources and put them in a list. Any files still in the filesystem under public/feeds/Feeds*Fetcher*that are not on this list will be deleted.

stefan.r’s picture

Status: Needs work » Needs review
ressa’s picture

I am using Elysian cron, and couldn't get the import started again after getting "Recoverable fatal error: Argument 2 passed to feeds_tamper_feeds_after_parse() must be an instance of FeedsParserResult" in my logs.

There were a few stuck jobs in the queue, so after unchecking "Process in background" for my feed, I deleted the queues, ran cron a few times, and the feed import seems to be working again now.

alexander.sibert’s picture

I have an important issue and maybe this patch can solve it. I try to import the GeoNames database as nodes.

http://download.geonames.org/export/dump/allCountries.zip

I use Feeds Tamper and Feeds and if import simple only the Geoname ID as Unique ID, Title and use Keyword Filter exactly to match the GeoName ID 2838632 it didn't import the line. The file size is extracted 1.2 GB large.

I parse the CSV directly from private Drupal Directory. If i import without Feeds Tamper (i mean all lines) the import works but it imports not all.

I applied now the last tested Patch #73, enabled Background process via Cron and now it imports to 100 % and start automatically again?

Khalor’s picture

The patch applies cleanly, but it'd hard to tell what effect it's having.

twistor’s picture

This looks promising.

How does $valid_fids[] = $source_object->fetcher_result->fid; work if fid is protected?

What if we didn't use Drupal's file handling at all?

jomarocas’s picture

my issue duplicate nodes when ran the cron process, this worked for me, disable background process and Periodic import
#28, thanks @MTecknology

gaele’s picture

Title: Importing Large CSV document » Importing Large CSV document (downloaded and processed in the background)
MegaChriz’s picture

I hope that this will make it into the Feeds 7.x-2.0-beta4 release.

demonde’s picture

I confirm #73 fixing this error on a site with multiple complex csv imports with 25 fields and up to 1000 entries.

sphism’s picture

#73 doesn't apply cleanly for me:

patch -p1 < feeds-remove_permanent_file_after_import-73.patch
patching file feeds.module
Hunk #1 succeeded at 61 (offset 1 line).
patching file feeds.pages.inc
Hunk #1 succeeded at 257 (offset -1 lines).
patching file includes/FeedsSource.inc
Hunk #1 FAILED at 405.
1 out of 1 hunk FAILED -- saving rejects to file includes/FeedsSource.inc.rej
patching file plugins/FeedsFetcher.inc

and the FeedsSource.inc.rej is:

***************
*** 405,410 ****
        $this->imported = time();
        $this->log('import', 'Imported in !s s', array('!s' => $this->imported - $this->state[FEEDS_START]), WATCHDOG_INFO);
        module_invoke_all('feeds_after_import', $this);
        unset($this->fetcher_result, $this->state);
      }
      $this->save();
--- 405,411 ----
        $this->imported = time();
        $this->log('import', 'Imported in !s s', array('!s' => $this->imported - $this->state[FEEDS_START]), WATCHDOG_INFO);
        module_invoke_all('feeds_after_import', $this);
+       $this->fetcher_result->deleteFile();
        unset($this->fetcher_result, $this->state);
      }
      $this->save();

Ah ha... there's some new code in there about Rules

if ($result == FEEDS_BATCH_COMPLETE || isset($e)) {
      $this->imported = time();
      $this->log('import', 'Imported in @s seconds.', array('@s' => $this->imported - $this->state[FEEDS_START]), WATCHDOG_INFO);
      $this->importer->fetcher->afterImport($this);
      module_invoke_all('feeds_after_import', $this);
      if (module_exists('rules')) {
        rules_invoke_event('feeds_after_import', $this);
      }
      unset($this->fetcher_result, $this->state);
    }
    $this->save();
MegaChriz’s picture

I'm a bit worried about the following lines in the patch:

+  // Get current temporary feed file File IDs.
+  $result = db_select('file_managed')
+    ->fields('file_managed', array('fid'))
+    ->condition('uri', db_like('public://feeds/Feeds') . '%' . db_like('Fetcher') . '%', 'LIKE')
+    // We don't want to accidently delete a recently created temporary file,
+    // hence limit to files older than 1 minute.
+    ->condition('timestamp', REQUEST_TIME - 60, '<')
+    ->execute()
+    ->fetchAll();

Well, first on to the tests. The tests in the attached patch cover the following cases:

  • The temporary file should not be removed when the import has not been completed yet, even when six hours have passed.
  • The temporary file should be removed after the import is completed.
  • The temporary file should be removed when the import gets aborted (for example: when unlocking the source).

Since the patch is tests only, it should fail.

Status: Needs review » Needs work

The last submitted patch, 84: feeds-remove_permanent_file_after_import-1029102-84.patch, failed testing.

MegaChriz’s picture

Here is a first step at fixing this issue. The approach is similar as the patch in #73, though excluding the part that cleans up orphaned files from the feeds 'in progress' directory. I'm still thinking about that part. Having a 1 minute gap for files to consider them orphaned sounds too short to me. I'm considering solutions where a FeedsSource object is passed to the fetcher result, so that the file can be associated with the Feeds source in question. Problem is that this isn't reliable: due to keeping backwards compatibility we cannot enforce that a fetcher result object knows to which FeedsSource it belongs.

I expect some failure or else I still would need to add a test for cleaning up orphaned files from the feeds 'in progress' directory.

Status: Needs review » Needs work

The last submitted patch, 86: feeds-remove-permanent-file-after-import-1029102-86.patch, failed testing.

MegaChriz’s picture

Status: Needs review » Needs work

The last submitted patch, 88: feeds-remove-permanent-file-after-import-1029102-88.patch, failed testing.

MegaChriz’s picture

  • Fixed a mistake in FeedsSource::clearStates().
  • FeedsFileHTTPTestCase::setUpMultipleCronRuns() to use process in background instead of periodic import. This is now more reliable to use since with the changes from #2630694: Run background jobs directly in queue. initiating a background import no longer imports the first chunk.
  • Added a test for unlocking a feed. When a feed gets unlocked, the temporary file should be removed as well.

Probably still a few test failures.

Status: Needs review » Needs work

The last submitted patch, 90: feeds-remove-permanent-file-after-import-1029102-90.patch, failed testing.

MegaChriz’s picture

The test failure in FeedsFileHTTPTestCase::testTemporaryFileCleanUpAfterImport() is fixed by clearing the PHP file exists cache by calling clearstatcache().

To clean up orphaned files, I've done the following:

  • A column called 'fid' is added to the 'feeds_source' table. In this column the ID of the temporary file will be stored.
  • To the FeedsFetcherResult class a method called setContext() is added. This is used by the FeedsSource class to tell FeedsFetcherResult which Feeds source is using the fetcher result.
  • Upon saving the raw data (which happens when serializing the raw data) in FeedsFetcherResult::saveRawToFile() file usage is added in case it is known to which importer and feed node the fetcher result belongs. If no context is known, the file is saved as a temporary file just as before. This make sense when using the Feeds Import Preview module, for example.
  • When saving a feeds source into the database, the file ID that is stored on the fetcher result (if there is one) is saved in the 'feeds_source' table.
  • Upon cron run, files owned by Feeds that are older than 6 hours are checked if they are still used by a feeds source. The check is based on the 'fid' column in the 'feeds_source' table. From any of these files that are no longer referenced in the 'feeds_source' table, Feeds file usage is deleted. After that, the files are attempted to be deleted.

Status: Needs review » Needs work

The last submitted patch, 92: feeds-remove-permanent-file-after-import-1029102-92.patch, failed testing.

MegaChriz’s picture

Oops. The constant 'FILE_STATUS_TEMPORARY' does not exist in D7.

Status: Needs review » Needs work

The last submitted patch, 94: feeds-remove-permanent-file-after-import-1029102-94.patch, failed testing.

MegaChriz’s picture

Mistake in FeedsSource::save(). When saving the 'fid' value, the check should be if $this->fetcher_result is an instance of FeedsFetcherResult, not FeedsFetcher.

Also spread the query in feeds_cron() across multiple lines.

greenSkin’s picture

I was getting "SQLSTATE[22004]: Null value not allowed: 1138 Invalid use of NULL value" when running the 7214 update (adding the "fid" field to the "feeds_source" table). Here's an updated patch to #96 that adds 0 as the field's default value.

MegaChriz’s picture

@greenSkin
Thanks for testing! I'm busy with testing/reviewing the patch myself as well and noticed this one too. The 'fid' database column should actually have 'not null' set to 'FALSE' as in FeedsSource::save(), a NULL value can be set on this column. I often get confused about that 'not null' setting on a database column.