I am experiencing problems importing through cron. It seems that import never unlocks and never finishes, if parser (or fetcher it seems) uses progress.

I have attached a patch with a proposed solution.

When debugging this problem, i also noticed a problem if the fetcher uses progress.
If the fetcher did not complete the first time round, a new queue-item for feeds_feed_import will be created, but when that is executed, the feed is already locked. This in turn results in the import never finishing, and the feed never unlocking.

Code in question (after above patch):
src/Plugin/QueueWorker/FeedProcess.php L48 (after patch)

    if ($feed->progressFetching() !== StateInterface::BATCH_COMPLETE) {
      $this->queueFactory->get('feeds_feed_import:' . $feed->bundle())->createItem($feed);
    }

Which returns in the following code running:
src/Plugin/QueueWorker/FeedRefresh.php L30

    try {
      $feed->lock();
    }
    catch (LockException $e) {
      return;
    }

    $feed->clearStates();

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mian3010 created an issue. See original summary.

mian3010’s picture

Attached proposed solution

mian3010’s picture

And another one that assumes that patch from https://www.drupal.org/node/2781279 is applied first.

mian3010’s picture

twistor’s picture

Status: Active » Needs review
FileSize
3.15 KB

I understand the issue, but I don't understand how this fixes it. We absolutely should finish parsing before starting a new fetch.

It seems, what we need is a new parameter to pass.

Status: Needs review » Needs work

The last submitted patch, 5: feeds-2803795-5.patch, failed testing.

twistor’s picture

Status: Needs work » Needs review
FileSize
3.3 KB

oops

mikran’s picture

Patch didn't apply anymore. Here's a reroll.

mikran’s picture

Status: Needs review » Needs work

This did not work. Here's my test setup. Fresh install, latest feeds + this patch.

I created feed type using directory fetcher (as thats only one that uses batch) and then I tried to import nodes from sitemap xml.

I created a directory with 2 sitemap files to get the batch going on. The batch starts with cron as it should but further cron or queue-run commands now process the first file over and over. I set my importer to create new nodes and I get the node from one of the files each time I run the queue and second file never gets processed. And the task never finishes.

Uhkis’s picture

Attached patch fixes buggy behaviour described by @mikran. Manually tested with a custom fetcher, worked just like importing by batch from the UI.

Uhkis’s picture

Status: Needs work » Needs review

The last submitted patch, 2: cron_import_does_not-2803795-2.patch, failed testing.

The last submitted patch, 3: cron_import_does_not-2803795-3.patch, failed testing.

Erik Frèrejean’s picture

I've applied #10 and tested with both a custom fetcher and processor (both batched), and at first glance this patch seems to resolve this issue.

MegaChriz’s picture

@Erik Frèrejean
I have also been testing #10. Did you notice too that only a limited number of items get imported per cron run? Say you have 1000 items to import and you have set the limit to 100. Does it take you at least 10 cron runs too to import them all?

Erik Frèrejean’s picture

@MegaChriz, I've just checked the cron logs and indeed I'm seeing that as well. Although quite a bit more extreme, I get 1 item parsed per cron run.

I'm also seeing something else (I'm not sure whether this happens due to #10 but this doesn't happen when importing from the GUI), it seems that it calls the processor three times. I've added a log in the process callback of my processor to check the contents of the item after all processing logic has ran. The logs contain (for every parsed item):

object(Drupal\feeds\Result\ParserResult)#8518 (2) {
  ["flags":"SplDoublyLinkedList":private]=>
  int(0)
  ["dllist":"SplDoublyLinkedList":private]=>
  array(1) {
    [0]=>
    object(Drupal\rechtspraak_nl_base\Feeds\Item\RechtspraakNLItem)#8512 (8) {
      ["ecli":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
      ["ecliPermaLink":protected]=>
      string(67) "http://deeplink.rechtspraak.nl/uitspraak?id=ECLI:NL:RBMNE:2017:1635"
      ["documentAuthor":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["documentFile":protected]=>
      string(36) "01565d8f-770d-4c2c-b854-c3b9e09dffd5"
      ["modified":protected]=>
      NULL
      ["instantie":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["issued":protected]=>
      int(1491208091)
      ["documentTitle":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
    }
  }
}
object(Drupal\feeds\Result\ParserResult)#3825 (2) {
  ["flags":"SplDoublyLinkedList":private]=>
  int(0)
  ["dllist":"SplDoublyLinkedList":private]=>
  array(1) {
    [0]=>
    object(Drupal\rechtspraak_nl_base\Feeds\Item\RechtspraakNLItem)#3828 (8) {
      ["ecli":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
      ["ecliPermaLink":protected]=>
      string(67) "http://deeplink.rechtspraak.nl/uitspraak?id=ECLI:NL:RBMNE:2017:1635"
      ["documentAuthor":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["documentFile":protected]=>
      string(36) "488cc671-b62f-49e4-b477-0e5c2c6b4745"
      ["modified":protected]=>
      NULL
      ["instantie":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["issued":protected]=>
      int(1491208098)
      ["documentTitle":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
    }
  }
}
object(Drupal\feeds\Result\ParserResult)#4247 (2) {
  ["flags":"SplDoublyLinkedList":private]=>
  int(0)
  ["dllist":"SplDoublyLinkedList":private]=>
  array(1) {
    [0]=>
    object(Drupal\rechtspraak_nl_base\Feeds\Item\RechtspraakNLItem)#4243 (8) {
      ["ecli":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
      ["ecliPermaLink":protected]=>
      string(67) "http://deeplink.rechtspraak.nl/uitspraak?id=ECLI:NL:RBMNE:2017:1635"
      ["documentAuthor":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["documentFile":protected]=>
      string(36) "e81d830f-fd38-47df-9908-5cbdbd345100"
      ["modified":protected]=>
      NULL
      ["instantie":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["issued":protected]=>
      int(1491208099)
      ["documentTitle":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
    }
  }
}
 [notice] Cron run completed.
 [notice] Cron run successful.
 [error]  The file, 01565d8f-770d-4c2c-b854-c3b9e09dffd5, failed to save because the extension, 1565d8f-770d-4c2c-b854-c3b9e09dffd5, is invalid. 
 [error]  The file, 488cc671-b62f-49e4-b477-0e5c2c6b4745, failed to save because the extension, 88cc671-b62f-49e4-b477-0e5c2c6b4745, is invalid. 

The values obvious differ but the structure is the same. First it dumps out the value of $item; three times where the only difference is the value of "documentFile" which contains the UUID of a file generated by the parser (the file thus gets overridden two times). After that it will dump the first two UUID because the file target couldn't find the created file with that UUID, which makes sense because only the last one ended up in the database.

[EDIT]
Looking at things a bit further it seems that I've found the root of the multiple runs, there are additional queues created for the same process. If I go back into the logs it starts with a single dump as expected, after a certain time (unfortunately I do not have timestamps), a second dump gets added and some time later a third one arises. This gets confirmed in the database by the fact that the queue table now contains three records.

All records are exactly the same size and they contain the same state data.

Erik Frèrejean’s picture

Issue summary: View changes
FileSize
33.83 KB
Erik Frèrejean’s picture

MegaChriz’s picture

Assigned: Unassigned » MegaChriz

@Erik Frèrejean
Okay, then I know that I'm probably on the right track: I'm working on a patch to merge the queues 'feeds_feed_import', 'feeds_feed_parse' and 'feeds_feed_process' into one. I have working version of that locally, just need to throw the experimental stuff out of it (which I needed for other features). More details about that later (including why I want to merge these queues), hopefully this Thursday or earlier.

Erik Frèrejean’s picture

That sounds great, if you want me to test something just let me know. I've got decent sized data set ~35000 records to handle.

  • MegaChriz committed 2e4908d on 8.x-3.x authored by Uhkis
    Issue #2803795 by mian3010, twistor, mikran, Uhkis: Fixed periodic...
MegaChriz’s picture

Status: Needs review » Fixed

#10 seems to work okay for now, so I committed that one. I'll add a followup for more improvements.

MegaChriz’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.