Cron import does not work with parser that uses progress [#2803795]

I am experiencing problems importing through cron. It seems that import never unlocks and never finishes, if parser (or fetcher it seems) uses progress.

I have attached a patch with a proposed solution.

When debugging this problem, i also noticed a problem if the fetcher uses progress.
If the fetcher did not complete the first time round, a new queue-item for feeds_feed_import will be created, but when that is executed, the feed is already locked. This in turn results in the import never finishing, and the feed never unlocking.

Code in question (after above patch):
src/Plugin/QueueWorker/FeedProcess.php L48 (after patch)

    if ($feed->progressFetching() !== StateInterface::BATCH_COMPLETE) {
      $this->queueFactory->get('feeds_feed_import:' . $feed->bundle())->createItem($feed);
    }

Which returns in the following code running:
src/Plugin/QueueWorker/FeedRefresh.php L30

    try {
      $feed->lock();
    }
    catch (LockException $e) {
      return;
    }

    $feed->clearStates();

Comment	File	Size	Author
#17	Select__migratieweb_queue_-_Adminer.png	33.83 KB	Erik Frèrejean
#10	feeds-2803795-10.patch	3.51 KB	Uhkis
#10	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 158 pass
#8	feeds-2803795-8.patch	3.23 KB	mikran
#8	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 158 pass
#7	feeds-2803795-7.patch	3.3 KB	twistor
#7	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 155 pass
#5	feeds-2803795-5.patch	3.15 KB	twistor
#5	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 CI error
#3	cron_import_does_not-2803795-3.patch	934 bytes	mian3010
#3	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 155 pass, 1 fail
#2	cron_import_does_not-2803795-2.patch	932 bytes	mian3010
#2	8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 158 pass, 1 fail

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

21 September 2016 at 11:27

mian3010 created an issue. See original summary.

Comment #2

mian3010 CreditAttribution: mian3010 commented 21 September 2016 at 11:30

File	Size
cron_import_does_not-2803795-2.patch	932 bytes
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 158 pass, 1 fail

Attached proposed solution

Comment #3

mian3010 CreditAttribution: mian3010 commented 21 September 2016 at 11:37

File	Size
cron_import_does_not-2803795-3.patch	934 bytes
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 155 pass, 1 fail

And another one that assumes that patch from https://www.drupal.org/node/2781279 is applied first.

Comment #4

mian3010 CreditAttribution: mian3010 commented 21 September 2016 at 11:38

Comment #5

twistor CreditAttribution: twistor as a volunteer commented 4 October 2016 at 22:47

Status:

Active

» Needs review

File	Size
feeds-2803795-5.patch	3.15 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 CI error

I understand the issue, but I don't understand how this fixes it. We absolutely should finish parsing before starting a new fetch.

It seems, what we need is a new parameter to pass.

Comment #6

4 October 2016 at 22:49

Status:

Needs review

» Needs work

The last submitted patch, 5: feeds-2803795-5.patch, failed testing.

Comment #7

twistor CreditAttribution: twistor as a volunteer commented 4 October 2016 at 22:53

Status:

Needs work

» Needs review

File	Size
feeds-2803795-7.patch	3.3 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 155 pass

oops

Comment #8

mikran CreditAttribution: mikran at Mediamaisteri Oy commented 11 October 2016 at 07:48

File	Size
feeds-2803795-8.patch	3.23 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 158 pass

Patch didn't apply anymore. Here's a reroll.

Comment #9

mikran CreditAttribution: mikran at Mediamaisteri Oy commented 18 October 2016 at 06:41

Status:

Needs review

» Needs work

This did not work. Here's my test setup. Fresh install, latest feeds + this patch.

I created feed type using directory fetcher (as thats only one that uses batch) and then I tried to import nodes from sitemap xml.

I created a directory with 2 sitemap files to get the batch going on. The batch starts with cron as it should but further cron or queue-run commands now process the first file over and over. I set my importer to create new nodes and I get the node from one of the files each time I run the queue and second file never gets processed. And the task never finishes.

Comment #10

Uhkis CreditAttribution: Uhkis at Mediamaisteri Oy commented 13 January 2017 at 09:43

File	Size
feeds-2803795-10.patch	3.51 KB
8.x-3.x: PHP 5.5 & MySQL 5.5, D8.3 158 pass

Attached patch fixes buggy behaviour described by @mikran. Manually tested with a custom fetcher, worked just like importing by batch from the UI.

Comment #11

Uhkis CreditAttribution: Uhkis at Mediamaisteri Oy commented 13 January 2017 at 09:44

Status:

Needs work

» Needs review

Comment #12

13 January 2017 at 09:48

The last submitted patch, 2: cron_import_does_not-2803795-2.patch, failed testing.

Comment #13

13 January 2017 at 09:48

The last submitted patch, 3: cron_import_does_not-2803795-3.patch, failed testing.

Comment #14

Erik Frèrejean

Dutch

🏴‍☠️

CreditAttribution: Erik Frèrejean at KennisMeester commented 11 April 2017 at 06:15

I've applied #10 and tested with both a custom fetcher and processor (both batched), and at first glance this patch seems to resolve this issue.

Comment #15

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 11 April 2017 at 09:25

@Erik Frèrejean
I have also been testing #10. Did you notice too that only a limited number of items get imported per cron run? Say you have 1000 items to import and you have set the limit to 100. Does it take you at least 10 cron runs too to import them all?

Comment #16

Erik Frèrejean

Dutch

🏴‍☠️

CreditAttribution: Erik Frèrejean at KennisMeester commented 11 April 2017 at 10:39

@MegaChriz, I've just checked the cron logs and indeed I'm seeing that as well. Although quite a bit more extreme, I get 1 item parsed per cron run.

I'm also seeing something else (I'm not sure whether this happens due to #10 but this doesn't happen when importing from the GUI), it seems that it calls the processor three times. I've added a log in the process callback of my processor to check the contents of the item after all processing logic has ran. The logs contain (for every parsed item):

object(Drupal\feeds\Result\ParserResult)#8518 (2) {
  ["flags":"SplDoublyLinkedList":private]=>
  int(0)
  ["dllist":"SplDoublyLinkedList":private]=>
  array(1) {
    [0]=>
    object(Drupal\rechtspraak_nl_base\Feeds\Item\RechtspraakNLItem)#8512 (8) {
      ["ecli":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
      ["ecliPermaLink":protected]=>
      string(67) "http://deeplink.rechtspraak.nl/uitspraak?id=ECLI:NL:RBMNE:2017:1635"
      ["documentAuthor":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["documentFile":protected]=>
      string(36) "01565d8f-770d-4c2c-b854-c3b9e09dffd5"
      ["modified":protected]=>
      NULL
      ["instantie":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["issued":protected]=>
      int(1491208091)
      ["documentTitle":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
    }
  }
}
object(Drupal\feeds\Result\ParserResult)#3825 (2) {
  ["flags":"SplDoublyLinkedList":private]=>
  int(0)
  ["dllist":"SplDoublyLinkedList":private]=>
  array(1) {
    [0]=>
    object(Drupal\rechtspraak_nl_base\Feeds\Item\RechtspraakNLItem)#3828 (8) {
      ["ecli":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
      ["ecliPermaLink":protected]=>
      string(67) "http://deeplink.rechtspraak.nl/uitspraak?id=ECLI:NL:RBMNE:2017:1635"
      ["documentAuthor":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["documentFile":protected]=>
      string(36) "488cc671-b62f-49e4-b477-0e5c2c6b4745"
      ["modified":protected]=>
      NULL
      ["instantie":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["issued":protected]=>
      int(1491208098)
      ["documentTitle":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
    }
  }
}
object(Drupal\feeds\Result\ParserResult)#4247 (2) {
  ["flags":"SplDoublyLinkedList":private]=>
  int(0)
  ["dllist":"SplDoublyLinkedList":private]=>
  array(1) {
    [0]=>
    object(Drupal\rechtspraak_nl_base\Feeds\Item\RechtspraakNLItem)#4243 (8) {
      ["ecli":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
      ["ecliPermaLink":protected]=>
      string(67) "http://deeplink.rechtspraak.nl/uitspraak?id=ECLI:NL:RBMNE:2017:1635"
      ["documentAuthor":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["documentFile":protected]=>
      string(36) "e81d830f-fd38-47df-9908-5cbdbd345100"
      ["modified":protected]=>
      NULL
      ["instantie":protected]=>
      string(26) "Rechtbank Midden-Nederland"
      ["issued":protected]=>
      int(1491208099)
      ["documentTitle":protected]=>
      string(23) "ECLI:NL:RBMNE:2017:1635"
    }
  }
}
 [notice] Cron run completed.
 [notice] Cron run successful.
 [error]  The file, 01565d8f-770d-4c2c-b854-c3b9e09dffd5, failed to save because the extension, 1565d8f-770d-4c2c-b854-c3b9e09dffd5, is invalid. 
 [error]  The file, 488cc671-b62f-49e4-b477-0e5c2c6b4745, failed to save because the extension, 88cc671-b62f-49e4-b477-0e5c2c6b4745, is invalid.

The values obvious differ but the structure is the same. First it dumps out the value of $item; three times where the only difference is the value of "documentFile" which contains the UUID of a file generated by the parser (the file thus gets overridden two times). After that it will dump the first two UUID because the file target couldn't find the created file with that UUID, which makes sense because only the last one ended up in the database.

[EDIT]
Looking at things a bit further it seems that I've found the root of the multiple runs, there are additional queues created for the same process. If I go back into the logs it starts with a single dump as expected, after a certain time (unfortunately I do not have timestamps), a second dump gets added and some time later a third one arises. This gets confirmed in the database by the fact that the queue table now contains three records.

All records are exactly the same size and they contain the same state data.

Comment #17

Erik Frèrejean

Dutch

🏴‍☠️

CreditAttribution: Erik Frèrejean at KennisMeester commented 11 April 2017 at 10:38

Issue summary:

View changes

File	Size
Select__migratieweb_queue_-_Adminer.png	33.83 KB

Comment #18

Erik Frèrejean

Dutch

🏴‍☠️

CreditAttribution: Erik Frèrejean at KennisMeester commented 11 April 2017 at 10:39

1 file was hidden/shown/deleted

File	Size
Select__migratieweb_queue_-_Adminer.png	33.83 KB

Comment #19

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 11 April 2017 at 10:53

Assigned:

Unassigned

» MegaChriz

@Erik Frèrejean
Okay, then I know that I'm probably on the right track: I'm working on a patch to merge the queues 'feeds_feed_import', 'feeds_feed_parse' and 'feeds_feed_process' into one. I have working version of that locally, just need to throw the experimental stuff out of it (which I needed for other features). More details about that later (including why I want to merge these queues), hopefully this Thursday or earlier.

Comment #20

Erik Frèrejean

Dutch

🏴‍☠️

CreditAttribution: Erik Frèrejean at KennisMeester commented 11 April 2017 at 11:58

That sounds great, if you want me to test something just let me know. I've got decent sized data set ~35000 records to handle.

Comment #21

13 April 2017 at 10:30

MegaChriz committed 2e4908d on 8.x-3.x authored by Uhkis

Issue #2803795 by mian3010, twistor, mikran, Uhkis: Fixed periodic...

Comment #22

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 13 April 2017 at 10:30

Status:

Needs review

» Fixed

#10 seems to work okay for now, so I committed that one. I'll add a followup for more improvements.

Comment #23

MegaChriz CreditAttribution: MegaChriz as a volunteer commented 13 April 2017 at 10:39

Added a follow-up: #2869350: Provide one queue for all import related tasks

Comment #24

27 April 2017 at 10:39

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Cron import does not work with parser that uses progress

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Related issues

Thank you to these Drupal contributors

News items

Our community

Documentation

Drupal code base

Governance of community