Node revision status can easily become out of sync with moderation history status. [#2314907]

We've been getting reports of content not becoming published correctly from our users.
Node's would appear to workbench moderation as published, but they would appear to core as un-published (a pink background on node pages in many themes)

I found a hand-full of problem nodes by running the following query:
"all nodes where workbench current status = 1 but the node_revision status is 0 for that vid"

SELECT * FROM workbench_moderation_node_history 
  INNER JOIN node ON node.vid = workbench_moderation_node_history.vid 
  INNER JOIN node_revision ON node_revision.vid = workbench_moderation_node_history.vid 
  WHERE
  workbench_moderation_node_history.published = 1 AND 
  workbench_moderation_node_history.current = 1 AND 
  node_revision.status = 0;

Nodes can get in to this state because (in our case) whenever a node is published, it gets indexed by apache solr - and whilst this happens successfully 99% of the time, sometimes it would bomb out and cause a timeout, leaving the node in this messed up state where node_revsion.status = 0 for all revisions, while workbench_moderation_node_history at the current node would have a status of 1.

This can happen because of how workbench moderation performs it's final node publish node_save().
It happens in workbench_moderation_store (during request shutdown), and if this is never called, then the node_revision table has had it's status set to 0 for all revisions of the node in workbench_moderation_moderate. This is done with a db_update query:

// If this revision is to be published, the new moderation record should be
  // the only one flagged 'published' in both
  // {workbench_moderation_node_history} AND {node_revision}
  if ($new_revision->published) {
    $query = db_update('workbench_moderation_node_history')
      ->condition('nid', $node->nid)
      ->fields(array('published' => 0))
      ->execute();
    $query = db_update('node_revision')
      ->condition('nid', $node->nid)
      ->fields(array('status' => 0))
      ->execute();
  }

We'll fix the problem with our apache solr instance - and make sure it requests time out before the php process can timeout, but there are any number of other ways this could happen (This problem is also touched on in #1966630 - where a watchog call was causing the request to end before the shutdown phase).

Does anybody have any ideas about how to make this more robust against this type of failure?

We're currently dealing with the issue on a case-by-case basis and would rather not implement some kind of automatic bandage solution unless we really have no other options.

Comments

Comment #1

thtas CreditAttribution: thtas commented 4 August 2014 at 07:23

What if during workbench_moderation_moderate we save the node id to an array and store the array using variable_set, then during shutdown we unset that node id from the array.

Then, during node_load if the loading node id exists in the previous array then we know we have a problem - and we can run some code to sync things back up or to at least display a warning or watchdog log or something.

Comment #2

shadysamir CreditAttribution: shadysamir commented 2 December 2014 at 09:41

I am getting similar behavior in two websites. The editor needs to go to content admin page and publish nodes already published by wb moderation using core publish action. Changing status back to needs to review then forth to published doesn't make a difference. Only core publish action helps, which requires node admin access.

Comment #3

rv0 CreditAttribution: rv0 commented 16 January 2015 at 10:38

I can confirm this issue, it has been an issue for as long as I remember
it is related to #1436260: Saving nodes outside Workbench Moderation leads to incorrect state transitions (e.g., "needs review" appearing as published) I think.

Comment #4

neuquen CreditAttribution: neuquen commented 11 February 2016 at 19:43

This issue just recently happened for us during a long migration of about 53,000 pieces of content. There were around 4,000 pieces of content which had a status of 1, but the revision status was still set to 0.

Did any of you figure out a fix for the issue or is it still unresolved?

FYI - I am using version 7.x-1.4 of Workbench Moderation.