Problem/Motivation

When a node is deleted, or when a workflow_field is removed from an entity type, or when a workflow_node is removed from an entity type, the corresponding records in table 'workflow_node_history' are note deleted.

This leads to the following symptons:

OP: Nodes are deleted, new nodes are created whilst Drupal starts reusing node ids. This is generally not a problem, in this case some new nodes with workflow have a rich history after creation. In SQL Admin we see that history is here, looks like nothing removed.

#28: In D8, the workflow may not be disabled, since there is content in the tables that are defined by the module. (This is fixed as per #31)

Proposed resolution

Implement 2 hooks deleting records from tables workflow_node_history and workflow_node_schedule.
- hook_entity_delete when deleting a node;
- hook_field_delete or hook_field_storage_delete when removing a field;
(D8 has other hooks, see the Change notice)

Remaining tasks

- implement the hooks in D8.
- Implement the hooks in D7, but not activate them, to avoid unwanted behaviour after updates. But the code can be switchable/activated via a custom hook.

Original report by TheLion

A month ago I made content clean up on my demo site.
I deleted near all nodes, so I have 3 nodes total and 1 node with workflow.

Some time passed and I mentioned that Drupal start reusing node ids, generally not a problem, but after that I found that some new nodes that has workflow have a rich history after creation.
I opened sql admin and found that history is here, looks like nothing removed.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

TheLion’s picture

FileSize
26.22 KB
johnv’s picture

It seems this is a feature:

/**
 * Implements hook_node_delete().
 */
function workflow_node_delete($node) {
  $node->workflow_stamp = REQUEST_TIME;
  // Delete the association of node to state.
  workflow_delete_workflow_node_by_nid($node->nid);
  if (!empty($node->workflow)) {
    $data = array( ...,  'comment' => t('Node deleted') );
    workflow_insert_workflow_node_history($data);
  }
  // Delete any scheduled transitions for this node.
  WorkflowScheduledTransition::deleteById('node', $node->nid);
}
johnv’s picture

But your screenshot does not mention the 'Node deleted' state... Do you somehow avoid using hook_node_delete?

TheLion’s picture

Thank you for providing correct code.
As I see 1.2 was released just before typo in line "if (!empty($node->workflow)) {" was fixed.

This is why I have no "Node deleted" messages.

However, problem persist, because most of my nodes has workflow.
So restarting of nid numbering after deletions brings wrong entries to node history.

johnv’s picture

Then I guess, the correct way is to set this issue to 'fixed', and upgrade to the 2.0 version.

You might also add some custom code to your set-up code sto empty the table.

TheLion’s picture

I think current status is "work as designed" not "fixed"

Can anybody describe why we need to keep history for deleted nodes?

johnv’s picture

Status: Active » Closed (works as designed)

I'm not sure why it exists. should have a reason.
In the repository, I only see this commit, which copies the code from D6 to D7.

johnv’s picture

Title: Workflow Node History remains unchaged when removing node » Workflow Node History remains unchanged after deleting node
Bastlynn’s picture

The reason is because workflow node has historically been used to track node states and accountability in business environments where accountability for changes over time is *absolutely* required. Think banking and/or particularly strict retention policies for legal reasons.

Given that - if core reuses nid's it may be worthwhile to tweak the deletion code to change the associated nid in such a way that old history doesn't become associated with the new node.

johnv’s picture

Status: Closed (works as designed) » Active

Bastlynn, thanks for your answer.
Perhaps we can circumvent this by deleting 'workflow_node_history' when $node->is_new == TRUE?

TheLion, how do you get core to restart numbering? I'd like to think this a testing artefact.
I'll add bastlynn's remarks to the code comment.

Bastlynn’s picture

That might work or maybe adding a new field to mark the values as archived. If I recall one of the reasons given for this behavior was to assure auditors there was no possible way for administrative staff to muck with things. I've not seen the reusing behavior at all myself - reusing nid's would imply that Drupal reset the auto-increment value in MySQL...

Based on docs here: http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
And schema here: https://api.drupal.org/api/drupal/modules!node!node.install/function/nod...

As nid is the primary key for {node} and a single field key, the docs would indicate that the primary key isn't reused. Maybe OP's table is InnoDB? MyISAM doesn't reset.... InnoDB will recalculate after a DB restart.

Bastlynn’s picture

Well *that's* interesting. My vanilla local D7 install has {node} as an InnoDB table. I'm gonna dive into core and see if they've implemented the recommended workaround to keep values from resetting. ( http://stackoverflow.com/questions/10184494/how-to-not-recycle-auto-incr... )

Bastlynn’s picture

And looks like... not? We might have a core bug here if this isn't expected behavior.

Found it:
https://api.drupal.org/api/drupal/includes!database!mysql!database.inc/f...
and
https://api.drupal.org/api/drupal/includes!database!mysql!database.inc/f...

Ok, I'm not sure how the OP got into this situation unless the Sequences table was also borked.

Bastlynn’s picture

Solution for the OP, based on this comment:

"$existing_id: After a database import, it might be that the sequences table is behind, so by passing in the maximum existing id, it can be assured that we never issue the same id."

Set your sequence table value in the DB above where your history values are to get Drupal to skip ahead.

Ah HA! Found it. Ok - the sequences table is *just* used for User IDs, batch IDs and action IDs. Node IDs are left hanging in the wind. So yes... I guess this IS expected behavior under InnoDB and DB restarted circumstances?

johnv’s picture

Version: 7.x-1.2 » 7.x-2.x-dev
Component: Workflow Node API » Code

Thanks Bastlynn, for your work.

In 7.x-2.x, I added your comments in this commit.
It also adds the logging in {workflow_node_history} when an Entity with a Workflow Field is deleted.

I guess OP's problem is not solved, but at least clarified. Some custom setup-code may help him.

Leaving this open for now, since you demonstrated it may happen in production environments, too.

johnv’s picture

Category: Bug report » Feature request

For statistics, let's make this a feature request.

Renee S’s picture

Oh my goodness, we've run into this. It's terrifying. We're on 2.5...

Renee S’s picture

(So... theoretically, do we get back to a good state? ;)

GuyPaddock’s picture

Aside from truncating the node table, how are people reusing NIDs? In the last 10 years I've worked with Drupal, I've never seen reuse of NIDs out in the wild unless people intentionally modify the auto-increment in the DB (which is not a good idea, IMO).

And, yes, the sequences table in 6.x used to be used to track all numbering but in 7.x they rely on the DB engine to do it. AFAIK even InnoDB only does it when explicitly asked to do so.

Renee S’s picture

Agreed, I wasn't even sure how to figure out why it was happening because I assumed it wasn't possible. I'll take a look at the DB setup, but we're on Pantheon so it certainly shouldn't be exotic.

Renee S’s picture

Ok, so:

node: nid AUTO_INCREMENT
node_revision: vid AUTO_INCREMENT

Which is how it should be...

Renee S’s picture

GuyPaddock, actually, realized that Bastlynn answers the question -- InnoDB restarts auto-increment numbering at the last one if it's restarted, so if we delete a bunch of nodes and then InnoDB restarts, it will be back at the previous number.

Renee S’s picture

GuyPaddock’s picture

TIL that this is a thing that InnoDB does.

Well, technically... it re-uses the ID only if the highest ID is deleted, per docs.

Renee S’s picture

And it looks like the sid-checking-logic fell out of the 2.5 refactor, also, in the constructor in /includes/Entity/WorkflowTransition.php where $sid=0 after entity delete...

NancyDru’s picture

@Bastlynn: If auditing is an issue, then deleting nodes should not be allowed, and, therefore, this issue is of minor import.

I'm personally of the opinion that deleting a node should also delete the history - actually all references to the node anywhere.

johnv’s picture

Version: 7.x-2.x-dev » 8.x-1.x-dev

Let's do this in D8. I agree with NancyDru in #26.

johnv’s picture

Title: Workflow Node History remains unchanged after deleting node » Workflow Node History remains unchanged after deleting node or field
Issue summary: View changes

  • johnv committed 9009de0 on 8.x-1.x
    Issue #2165349: delete Workflow Node History after deleting node
    
johnv’s picture

Adjacent D8-commit deletes workflow tables when deleting an entity.

  • johnv committed 087cb6d on 8.x-1.x
    Issue #2165349: delete Workflow Node History after deleting field
    
johnv’s picture

Version: 8.x-1.x-dev » 8.x-1.0-beta1

This is now fixed in D8. There are some issues with multiple workflows. See the follow-up D8-issue #2612968: [D8-port task] Errors when adding/deleting nodes with multiple workflows.

johnv’s picture

Version: 8.x-1.0-beta1 » 7.x-2.x-dev
Status: Active » Patch (to be ported)

#31 can be backported to D7, but only if it includes a setting to toggle on/off this new behaviour.

johnv’s picture

ibakayoko’s picture

The node is not deleted in workflow_node and workflow_node_history table.

ibakayoko’s picture

one attempt to resolve the issue. Patch attached.

johnv’s picture

Status: Patch (to be ported) » Needs review
johnv’s picture

Title: Workflow Node History remains unchanged after deleting node or field » Workflow Node History records not deleted after deleting node or field
Category: Feature request » Bug report
johnv’s picture

Issue summary: View changes