Problem/Motivation
When a node is deleted, or when a workflow_field is removed from an entity type, or when a workflow_node is removed from an entity type, the corresponding records in table 'workflow_node_history' are note deleted.
This leads to the following symptons:
OP: Nodes are deleted, new nodes are created whilst Drupal starts reusing node ids. This is generally not a problem, in this case some new nodes with workflow have a rich history after creation. In SQL Admin we see that history is here, looks like nothing removed.
#28: In D8, the workflow may not be disabled, since there is content in the tables that are defined by the module. (This is fixed as per #31)
Proposed resolution
Implement 2 hooks deleting records from tables workflow_node_history and workflow_node_schedule.
- hook_entity_delete when deleting a node;
- hook_field_delete or hook_field_storage_delete when removing a field;
(D8 has other hooks, see the Change notice)
Remaining tasks
- implement the hooks in D8.
- Implement the hooks in D7, but not activate them, to avoid unwanted behaviour after updates. But the code can be switchable/activated via a custom hook.
Original report by TheLion
A month ago I made content clean up on my demo site.
I deleted near all nodes, so I have 3 nodes total and 1 node with workflow.
Some time passed and I mentioned that Drupal start reusing node ids, generally not a problem, but after that I found that some new nodes that has workflow have a rich history after creation.
I opened sql admin and found that history is here, looks like nothing removed.
Comments
Comment #1
TheLion CreditAttribution: TheLion commentedComment #2
johnvIt seems this is a feature:
Comment #3
johnvBut your screenshot does not mention the 'Node deleted' state... Do you somehow avoid using hook_node_delete?
Comment #4
TheLion CreditAttribution: TheLion commentedThank you for providing correct code.
As I see 1.2 was released just before typo in line "if (!empty($node->workflow)) {" was fixed.
This is why I have no "Node deleted" messages.
However, problem persist, because most of my nodes has workflow.
So restarting of nid numbering after deletions brings wrong entries to node history.
Comment #5
johnvThen I guess, the correct way is to set this issue to 'fixed', and upgrade to the 2.0 version.
You might also add some custom code to your set-up code sto empty the table.
Comment #6
TheLion CreditAttribution: TheLion commentedI think current status is "work as designed" not "fixed"
Can anybody describe why we need to keep history for deleted nodes?
Comment #7
johnvI'm not sure why it exists. should have a reason.
In the repository, I only see this commit, which copies the code from D6 to D7.
Comment #8
johnvComment #9
Bastlynn CreditAttribution: Bastlynn commentedThe reason is because workflow node has historically been used to track node states and accountability in business environments where accountability for changes over time is *absolutely* required. Think banking and/or particularly strict retention policies for legal reasons.
Given that - if core reuses nid's it may be worthwhile to tweak the deletion code to change the associated nid in such a way that old history doesn't become associated with the new node.
Comment #10
johnvBastlynn, thanks for your answer.
Perhaps we can circumvent this by deleting 'workflow_node_history' when
$node->is_new == TRUE
?TheLion, how do you get core to restart numbering? I'd like to think this a testing artefact.
I'll add bastlynn's remarks to the code comment.
Comment #11
Bastlynn CreditAttribution: Bastlynn commentedThat might work or maybe adding a new field to mark the values as archived. If I recall one of the reasons given for this behavior was to assure auditors there was no possible way for administrative staff to muck with things. I've not seen the reusing behavior at all myself - reusing nid's would imply that Drupal reset the auto-increment value in MySQL...
Based on docs here: http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html
And schema here: https://api.drupal.org/api/drupal/modules!node!node.install/function/nod...
As nid is the primary key for {node} and a single field key, the docs would indicate that the primary key isn't reused. Maybe OP's table is InnoDB? MyISAM doesn't reset.... InnoDB will recalculate after a DB restart.
Comment #12
Bastlynn CreditAttribution: Bastlynn commentedWell *that's* interesting. My vanilla local D7 install has {node} as an InnoDB table. I'm gonna dive into core and see if they've implemented the recommended workaround to keep values from resetting. ( http://stackoverflow.com/questions/10184494/how-to-not-recycle-auto-incr... )
Comment #13
Bastlynn CreditAttribution: Bastlynn commentedAnd looks like... not? We might have a core bug here if this isn't expected behavior.Found it:
https://api.drupal.org/api/drupal/includes!database!mysql!database.inc/f...
and
https://api.drupal.org/api/drupal/includes!database!mysql!database.inc/f...
Ok, I'm not sure how the OP got into this situation unless the Sequences table was also borked.
Comment #14
Bastlynn CreditAttribution: Bastlynn commentedSolution for the OP, based on this comment:"$existing_id: After a database import, it might be that the sequences table is behind, so by passing in the maximum existing id, it can be assured that we never issue the same id."Set your sequence table value in the DB above where your history values are to get Drupal to skip ahead.Ah HA! Found it. Ok - the sequences table is *just* used for User IDs, batch IDs and action IDs. Node IDs are left hanging in the wind. So yes... I guess this IS expected behavior under InnoDB and DB restarted circumstances?
Comment #15
johnvThanks Bastlynn, for your work.
In 7.x-2.x, I added your comments in this commit.
It also adds the logging in {workflow_node_history} when an Entity with a Workflow Field is deleted.
I guess OP's problem is not solved, but at least clarified. Some custom setup-code may help him.
Leaving this open for now, since you demonstrated it may happen in production environments, too.
Comment #16
johnvFor statistics, let's make this a feature request.
Comment #17
Renee S CreditAttribution: Renee S as a volunteer commentedOh my goodness, we've run into this. It's terrifying. We're on 2.5...
Comment #18
Renee S CreditAttribution: Renee S as a volunteer commented(So... theoretically, do we get back to a good state? ;)
Comment #19
GuyPaddock CreditAttribution: GuyPaddock at RedBottle Design, LLC for House at Work commentedAside from truncating the node table, how are people reusing NIDs? In the last 10 years I've worked with Drupal, I've never seen reuse of NIDs out in the wild unless people intentionally modify the auto-increment in the DB (which is not a good idea, IMO).
And, yes, the sequences table in 6.x used to be used to track all numbering but in 7.x they rely on the DB engine to do it. AFAIK even InnoDB only does it when explicitly asked to do so.
Comment #20
Renee S CreditAttribution: Renee S as a volunteer commentedAgreed, I wasn't even sure how to figure out why it was happening because I assumed it wasn't possible. I'll take a look at the DB setup, but we're on Pantheon so it certainly shouldn't be exotic.
Comment #21
Renee S CreditAttribution: Renee S as a volunteer commentedOk, so:
node: nid AUTO_INCREMENT
node_revision: vid AUTO_INCREMENT
Which is how it should be...
Comment #22
Renee S CreditAttribution: Renee S as a volunteer commentedGuyPaddock, actually, realized that Bastlynn answers the question -- InnoDB restarts auto-increment numbering at the last one if it's restarted, so if we delete a bunch of nodes and then InnoDB restarts, it will be back at the previous number.
Comment #23
Renee S CreditAttribution: Renee S as a volunteer commentedhttp://ryanszrama.com/blog/05-12-2014/beware-innodbs-autoincrement-reset...
Comment #24
GuyPaddock CreditAttribution: GuyPaddock at RedBottle Design, LLC for House at Work commentedTIL that this is a thing that InnoDB does.
Well, technically... it re-uses the ID only if the highest ID is deleted, per docs.
Comment #25
Renee S CreditAttribution: Renee S as a volunteer commentedAnd it looks like the sid-checking-logic fell out of the 2.5 refactor, also, in the constructor in /includes/Entity/WorkflowTransition.php where $sid=0 after entity delete...
Comment #26
NancyDru@Bastlynn: If auditing is an issue, then deleting nodes should not be allowed, and, therefore, this issue is of minor import.
I'm personally of the opinion that deleting a node should also delete the history - actually all references to the node anywhere.
Comment #27
johnvLet's do this in D8. I agree with NancyDru in #26.
Comment #28
johnvComment #30
johnvAdjacent D8-commit deletes workflow tables when deleting an entity.
Comment #32
johnvThis is now fixed in D8. There are some issues with multiple workflows. See the follow-up D8-issue #2612968: [D8-port task] Errors when adding/deleting nodes with multiple workflows.
Comment #33
johnv#31 can be backported to D7, but only if it includes a setting to toggle on/off this new behaviour.
Comment #34
johnv#2689465: node count is incorrect, when nodes are deleted, since History is not removed also mentions that the 'node count' is not decreased.
Comment #35
ibakayoko CreditAttribution: ibakayoko as a volunteer commentedThe node is not deleted in workflow_node and workflow_node_history table.
Comment #36
ibakayoko CreditAttribution: ibakayoko as a volunteer commentedone attempt to resolve the issue. Patch attached.
Comment #37
johnvComment #38
johnvComment #39
johnv