Recently, we observed that our workflow (a simplified version as attached) gets stuck some times.

After some digging, here's what I found in maestro_engine_version1.class.php:

1. Most of the time, our workflow would spawn child(ren) process(es) with new process ID(s).

2. Some pending tasks would change process IDs in the middle of the execution invoked from cleanQueue().

3. After such a execution that changes process ID for a queue record, when nextStep() is called, the old process ID is used to look up the next task (approx. line # 424 in the class file). Such old process ID would also be used when inserting a new task in the queue (approx. line # 473).

$query = db_select('maestro_queue', 'a');

if($this->_lastTestStatus == MaestroTaskStatusCodes::STATUS_IF_CONDITION_FALSE) {
      $query->addField('b','template_data_to_false','taskid');
}
else {
      $query->addField('b','template_data_to','taskid');
}

$query->fields('c',array('task_class_name','is_interactive','show_in_detail','reminder_interval'));

$query->join('maestro_template_data_next_step', 'b', 'a.template_data_id = b.template_data_from');

if($this->_lastTestStatus == MaestroTaskStatusCodes::STATUS_IF_CONDITION_FALSE) {
      $query->join('maestro_template_data', 'c', 'c.id = b.template_data_to_false');
}
else {
      $query->join('maestro_template_data', 'c', 'c.id = b.template_data_to');
} 

// Old process ID is used here even if the process ID is actually just changed during execution.
// And it would fail, leaving the current task without next task, therefore stuck.

$query->condition('a.process_id',$this->_processId,'='); 

$query->condition('a.id',$this->_queueId,'=');

$nextTaskResult = $query->execute();

$nextTaskRows = $query->countQuery()->execute()->fetchField();

It looks like a bug to me. But I'm not familiar with Maestro enough to determine whether it actually is or not.

Please take a look.

Thanks!.

Harry

CommentFileSizeAuthor
simplified-flow.png228.03 KBhguo
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hguo’s picture

Issue summary: View changes
hguo’s picture

Version: 7.x-1.x-dev » 7.x-1.5
Issue summary: View changes
_randy’s picture

Ahh.. this could be because for all of the items where a loop-back happens, you need to ensure that "regenerate all in-production tasks" is also checked off.

Usually what happens is when a process loops back over itself like what yours does, especially when there's parallel branches, the engine needs to know that when do do a loop back to recreate the tasks that are waiting to be completed. AND tasks especially are an "interesting" use case where the engine should re-create not only the AND, but also the completed feeder tasks to the AND.

Anyhow, try the regeneration flags on those tasks with the red dot on them.

hguo’s picture

_randy,

I appreciate the quick response on this.

One of our former team members actually brought up the point that the "Regenerate This Task" and/or "Regenerate All In-production Tasks" may have something to do with workflow getting stuck. I noticed that you only mentioned "Regenerate All In-production Tasks". I don't know if it's relevant, but it feels like what's happening with us may have something to do with having both checked in some tasks of our workflow.

From the code reading, in nextStep() function, when a task has "Regenerate This Task" checked, it will create a new process. And when a new process is created, that's where "Regenerate All In-production Tasks" kicks in. So it could happen like this (basically elaborating what I described in the first post):

  • Let's say we have a task "A-1" followed by task "A-2" in one branch. "A-2" is with "Regenerate This Task" and "Regenerate All In-production Tasks" checked (Our workflow actually has that)
  • When task "A-1" is executed, there is usually one task executed in the other branch at the same time, let's say a task "B-1", followed by a "B-2", that should be the next step of "B-1", and go in the next execution cycle
  • The execution cycle starts with the old process ID, let's say 100
  • Let's say task "A-1" is executed first, then nextStep() will be called, where "A-2" would be found.
  • Since "A-2" has "Regenerate This Task" checked, it will trigger a call to newProcess().
  • In newProcess(), it will be discovered that the next step "A-2" also has "Regenerate All In-production Tasks". This will trigger the part where all remaining (live) tasks are updated with a new process ID, including "B-1". Let's say "B-1" now has a new process ID 101.
  • Now that "A-1" is done, it's time to execute task "B-1" in the same cycle. First it gets executed. Then the next step for "B-1" is being looked up.
  • When the lookup is performed, since it's still in the same execution cycle, the _processId property of the engine object is still 100. However, "B-1" now has a new process ID 101.
  • In the query posted earlier, both queue ID and process ID are used together to do the lookup. Such lookup with the process ID 100 in the condition will yield no result. Therefore, there would be no next step found for "B-1". (Had the process ID not been changed, "B-2" would have been found as the next step of "B-1")
  • The result would be: "B-1" did get executed, but "B-2" was never found as the next step. And this branch would not proceed ever.

I hope I've described it clearly. It reflects my level of understanding of Maestro by now. But again, I must haven't spent enough time on Maestro. I could be wrong. And I'll continue to do more code reading, and hopefully understand it more.

Thanks again for your help!

Harry

_randy’s picture

Thanks for the update Harry. I'm not saying it isn't a bug, but I would have thought we would have seen this before with all of the Maestro processes out there, some being far more complex than what you've shown in your image snapshot.

In essence, you'd need both checkboxes checked -- can't do one without the other anyway for regeneration.

That being said, knowing that I don't have your custom functions and all, but could you export your workflow and email that to me via the Drupal user contact mechanism when you click on my username?