When doing large imports, the progress bar bounces back and forth many times before reaching 100%. I guess this is to do with the batch size. Is it possible to make the progress bar more representative of the actual progress of the import?

Many thanks,

Comments

qdoscc created an issue.

megachriz’s picture

We talked about this in Slack, here is an export of our conversation there.

From Drupal Slack #support channel, 2019-12-03

David.qdoscc Good morning. Has anyone else noticed when doing a large import using Feeds that the progress bar goes backwards and forwards multiple times? I have seen this behaviour on several D8 sites. Any ideas how to make it progress smoothly from 0 to 100%?
megachriz @David.qdoscc I've noticed it, the reason is that Feeds is processing multiple separate tasks and each task gets its own progress bar. At the start of an import, the number of tasks is unknown.
David.qdoscc I figured it was something like that. So no quick fix at the moment?
megachriz Workflow of Feeds:Fetch data.Parse 50 first items of data.Process 50 items.Parse next 50 items of data.Process 50 items.When there's nothing more to parse, Feeds checks if there's more to fetch. If so, above steps are repeated. If not, import finishes.
megachriz @David.qdoscc I'm not sure yet how to make this better.
David.qdoscc My users will be importing a quarterly list of 11,000+ items - some are new, some are updates to existing nodes. At the moment you get the bar going back and forth for several minutes which gives the impression of it being stuck in a loop
David.qdoscc Was it different in D7?
megachriz The alternative might be to parse everything first, but then you might get a lot of data to keep in memory. In this case, I expect PHP would complain sooner that memory is exhausted.
David.qdoscc hmm. can we do a count of rows in the CSV without fully parsing it?
megachriz For CSV, we could do a count, but Feeds supports more data types than CSV.
megachriz I think in D7 an import was defined as one task, while in D8 it are multiple.
David.qdoscc ok good to know there's a possibility - most of my use cases are still CSV. Can we override the behaviour in a custom module or would we need to make a patch for feeds itself?
megachriz I think overriding it would be a good idea to do first - to look for ways how it could be done.I've been working on refactoring the import task behavior, to streamline the different ways an import can run which are:In the UI (batch API)On cron runsAll at oncePushed import (Pubsubhubbub)[#2811429]Since I plan to commit this in a few weeks (if no issues arise), it would be a good idea to base your overrides on this patch. I also haven't tested yet how easy overridable it is, so that would be good to know for me as well before committing it. (edited)
megachriz
  /**
   * Report progress as float between 0 and 1. 1 = FEEDS_BATCH_COMPLETE.
   */
  public function progressImporting() {
    $fetcher = $this->state(FEEDS_FETCH);
    $parser = $this->state(FEEDS_PARSE);
    if ($fetcher->progress == FEEDS_BATCH_COMPLETE && $parser->progress == FEEDS_BATCH_COMPLETE) {
      return FEEDS_BATCH_COMPLETE;
    }
    // Fetching envelops parsing.
    // @todo: this assumes all fetchers neatly use total. May not be the case.
    $fetcher_fraction = $fetcher->total ? 1.0 / $fetcher->total : 1.0;
    $parser_progress = $parser->progress * $fetcher_fraction;
    $result = $fetcher->progress - $fetcher_fraction + $parser_progress;
    if ($result == FEEDS_BATCH_COMPLETE) {
      return 0.99;
    }
    return $result;
  }

This is what Feeds in D7 does do get only a single progress bar. As the @todo states, this may not give adequate results in some situations.

bkosborne’s picture

I noticed this too. Part of the problem is feeds is not updating the message when it starts processing a new set. If it just updated the message to indicate the processing status, people would not be as confused.

seanr’s picture

It still does this - it is disconcerting as it appears as though it's not working correctly. When we know the file imported is a CSV, we should get the total ros from that so the progress bar is accurate. I'd love to know how long I really have before this thing is finally done (isn't that the whole point of a progress bar?). 😉

megachriz’s picture

Closed #3356144: Import Progress Bar - for All items as a duplicate.

In there, I commented:

The reason is that each stage of the import happens in its own batch. So you get a serie of batches.

This is the import workflow, each step has its own batch:

  1. Fetch data
  2. Parse data
  3. Process a number of items
  4. If there's more to parse, repeat step 2 and 3.
  5. If there's more to fetch, repeat step 1 till 4.

An other issue is that Feeds cannot predict accurately how many items will be processed in the end. But I think this is mostly an issue when multiple fetches happen. Because in a second fetched file there could be 10 items, but there could also be 2000 items. In Feeds itself, only the directory fetcher does multiple fetches.

I would like to see this issue fixed, but because I have enough other Feeds tasks to do that I consider to be a higher priority I think I won't pick this one up myself in the nearby future.

adstokoe’s picture

Any updates on this issue? I've setup a custom fetcher to run on a paged API. The batch process does complete, however, the bouncing bar makes the process appear broken in the UI.

megachriz’s picture

@adstokoe
It's bad UX, but it doesn't break functionality. And I think it's a hard one to fix. But because it doesn't break functionality, this issue is quite low on my priority list. My priority right now is first review and commit things that others have worked on recently. Then finish #3372368: in_progress filesystem grows indefinitely and create a new release and then get Feeds and related modules Drupal 11 compatible. And after that hopefully I'm able to resolve the remaining stable release blockers to finally create an official stable release. And after all that, then there might be room for me to address this issue. Though there also enough other issues that I think would have higher priority than this one. These are listed on #1960800: [meta] Feeds 8.x roadmap.

So in other words, I don't see this issue getting addressed in the nearby future and probably not in the next two years. At least not by me.

bburg’s picture

I came across this issue while looking into a separate problem I'm having with the Feeds batch api process, and just had a couple of thoughts. The batch API provides the ability to send a message to the user. That message right now is pretty generic, could we just use that to inform the user of what steps are being done? Looking at FeedsExecutable::processItem(), I don't see message used, but I do see 6 steps:

  1. Begin
  2. Fetch
  3. Parse
  4. Process
  5. Clean
  6. Finish

So just set a description of that as the message. e.g. "Preparing to start import", "Fetching data", "Parsing data", "Reticulating splines" etc.

If we did want an accurate count of the full process, that in itself can be it's own step, to just get the count, and perhaps that can be conditionally enabled via a config on the Feed type as well?