I am experiencing problems importing through cron. It seems that import never unlocks and never finishes, if parser (or fetcher it seems) uses progress.
I have attached a patch with a proposed solution.
When debugging this problem, i also noticed a problem if the fetcher uses progress.
If the fetcher did not complete the first time round, a new queue-item for feeds_feed_import
will be created, but when that is executed, the feed is already locked. This in turn results in the import never finishing, and the feed never unlocking.
Code in question (after above patch):
src/Plugin/QueueWorker/FeedProcess.php L48 (after patch)
if ($feed->progressFetching() !== StateInterface::BATCH_COMPLETE) {
$this->queueFactory->get('feeds_feed_import:' . $feed->bundle())->createItem($feed);
}
Which returns in the following code running:
src/Plugin/QueueWorker/FeedRefresh.php L30
try {
$feed->lock();
}
catch (LockException $e) {
return;
}
$feed->clearStates();
Comment | File | Size | Author |
---|---|---|---|
#10 | feeds-2803795-10.patch | 3.51 KB | Uhkis |
| |||
#8 | feeds-2803795-8.patch | 3.23 KB | mikran |
| |||
#7 | feeds-2803795-7.patch | 3.3 KB | twistor |
| |||
#5 | feeds-2803795-5.patch | 3.15 KB | twistor |
| |||
#3 | cron_import_does_not-2803795-3.patch | 934 bytes | mian3010 |
Comments
Comment #2
mian3010 CreditAttribution: mian3010 commentedAttached proposed solution
Comment #3
mian3010 CreditAttribution: mian3010 commentedAnd another one that assumes that patch from https://www.drupal.org/node/2781279 is applied first.
Comment #4
mian3010 CreditAttribution: mian3010 commentedComment #5
twistor CreditAttribution: twistor as a volunteer commentedI understand the issue, but I don't understand how this fixes it. We absolutely should finish parsing before starting a new fetch.
It seems, what we need is a new parameter to pass.
Comment #7
twistor CreditAttribution: twistor as a volunteer commentedoops
Comment #8
mikran CreditAttribution: mikran at Mediamaisteri Oy commentedPatch didn't apply anymore. Here's a reroll.
Comment #9
mikran CreditAttribution: mikran at Mediamaisteri Oy commentedThis did not work. Here's my test setup. Fresh install, latest feeds + this patch.
I created feed type using directory fetcher (as thats only one that uses batch) and then I tried to import nodes from sitemap xml.
I created a directory with 2 sitemap files to get the batch going on. The batch starts with cron as it should but further cron or queue-run commands now process the first file over and over. I set my importer to create new nodes and I get the node from one of the files each time I run the queue and second file never gets processed. And the task never finishes.
Comment #10
Uhkis CreditAttribution: Uhkis at Mediamaisteri Oy commentedAttached patch fixes buggy behaviour described by @mikran. Manually tested with a custom fetcher, worked just like importing by batch from the UI.
Comment #11
Uhkis CreditAttribution: Uhkis at Mediamaisteri Oy commentedComment #14
Erik FrèrejeanI've applied #10 and tested with both a custom fetcher and processor (both batched), and at first glance this patch seems to resolve this issue.
Comment #15
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@Erik Frèrejean
I have also been testing #10. Did you notice too that only a limited number of items get imported per cron run? Say you have 1000 items to import and you have set the limit to 100. Does it take you at least 10 cron runs too to import them all?
Comment #16
Erik Frèrejean@MegaChriz, I've just checked the cron logs and indeed I'm seeing that as well. Although quite a bit more extreme, I get 1 item parsed per cron run.
I'm also seeing something else (I'm not sure whether this happens due to #10 but this doesn't happen when importing from the GUI), it seems that it calls the processor three times. I've added a log in the
process
callback of my processor to check the contents of the item after all processing logic has ran. The logs contain (for every parsed item):The values obvious differ but the structure is the same. First it dumps out the value of
$item;
three times where the only difference is the value of"documentFile"
which contains the UUID of a file generated by the parser (the file thus gets overridden two times). After that it will dump the first two UUID because the file target couldn't find the created file with that UUID, which makes sense because only the last one ended up in the database.[EDIT]
Looking at things a bit further it seems that I've found the root of the multiple runs, there are additional queues created for the same process. If I go back into the logs it starts with a single dump as expected, after a certain time (unfortunately I do not have timestamps), a second dump gets added and some time later a third one arises. This gets confirmed in the database by the fact that the queue table now contains three records.
All records are exactly the same size and they contain the same state data.
Comment #17
Erik FrèrejeanComment #18
Erik FrèrejeanComment #19
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@Erik Frèrejean
Okay, then I know that I'm probably on the right track: I'm working on a patch to merge the queues 'feeds_feed_import', 'feeds_feed_parse' and 'feeds_feed_process' into one. I have working version of that locally, just need to throw the experimental stuff out of it (which I needed for other features). More details about that later (including why I want to merge these queues), hopefully this Thursday or earlier.
Comment #20
Erik FrèrejeanThat sounds great, if you want me to test something just let me know. I've got decent sized data set ~35000 records to handle.
Comment #22
MegaChriz CreditAttribution: MegaChriz as a volunteer commented#10 seems to work okay for now, so I committed that one. I'll add a followup for more improvements.
Comment #23
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedAdded a follow-up: #2869350: Provide one queue for all import related tasks