Problem/Motivation
Small bug in the core "node" module's node.module
Current code:
function _node_access_rebuild_batch_operation(&$context) {
....
while ($row = db_fetch_array($result)) {
$loaded_node = node_load($row['nid'], NULL, TRUE);
// To preserve database integrity, only aquire grants if the node
// loads successfully.
if (!empty($loaded_node)) {
node_access_acquire_grants($loaded_node);
}
$context['sandbox']['progress']++;
$context['sandbox']['current_node'] = $loaded_node->nid;
}
...
}
The last line in this while loop should read: $context['sandbox']['current_node'] = $row['nid'];
As if the loaded is empty, $loaded_node->nid will be empty too causing an infinite loop in the batch operation, as it takes an empty value for $context['sandbox']['current_node'] as meaning it has not yet started the rebuild and starts all over again.
Steps to reproduce
Not sure
Proposed resolution
Based on patch #42. Loop through the $nids vs loaded $nodes and check if $nid is in the $node array first.
Remaining tasks
Maybe test = @catch mentioned in #46 this may not be possible
User interface changes
NA
API changes
NA
Data model changes
NA
Release notes snippet
NA
| Comment | File | Size | Author |
|---|---|---|---|
| #42 | 315302-42.patch | 1.4 KB | mstrelan |
| #41 | 315302-41.patch | 864 bytes | mstrelan |
| #19 | node-315302-racefix-1.patch | 851 bytes | tacituseu |
| #17 | node-315302-racefix.patch | 736 bytes | tacituseu |
| #8 | nodebug.patch | 1.07 KB | PaulMagrath |
Comments
Comment #1
Anonymous (not verified) commentedWe can only review patch files. Since there is no file marking as active.
Comment #2
btopro commented+1 this fix, I've encountered the same error and it was resolved by changing that one line.
Comment #3
coltraneI believe this fixes a problem that almost has had me pulling my hair out. I've rebuilt permissions a hundred times before and never had it stall or consistently keep saying 'The content access permissions have not been properly rebuilt.' Disabling and uninstalling modules hasn't fixed it on the site suffering from this problem so I dove into the batch process. It's easy to see that if not all nodes are processed in _node_access_rebuild_batch_operation() than the batch process reports it as unfinished and node_access_needs_rebuild(FALSE) is never called. What's difficult for me to deduce at this moment is why the node_load() may not return a full node object. However, since it's just wrong to use $loaded_node->nid in the case it might not be an actual node object we really should be using the $row['nid'].
Comment #4
coltraneThis is likely a problem in HEAD as well http://api.drupal.org/api/function/_node_access_rebuild_batch_operation/7 because if $node is empty then $context['sandbox']['current_node'] won't record the processed node id.
Edit: Spoke with Dave Reid on irc about this and node_load_multiple() doesn't return any invalid nodes so this bug probably won't occur in 7.
Comment #5
yched commentedThe fix makes complete sense, nice catch. I guess such cases, where there is a record in the node table but node_load() fails, can happen with deleted node types.
D7 can be affected too, though : if *no* valid node is found within the nid range, then $context['sandbox']['current_node'] won't be updated, and infinite loop ensues.
So, patch in #3 is RTBC for D6, but this will need to be fixed in D7 first. Here's a patch, needs review.
Comment #7
martin_qThis bug has just started to affect my D6 site. I can't see why the patch shouldn't work, the #3 patch works on my D6 site, and I can't see any errors in the automated testing results for #5. Can someone who knows more than me (that doesn't narrow it down much) review this manually and see if it's RTBC?
Thanks.
Comment #8
PaulMagrath commentedPatch is failing because in Drupal 7 they have changed the for loop into a for each loop:
This should fix it:
I've attached a patch file for testing:
Comment #9
tobiasb#8 where is the different?
Comment #10
PaulMagrath commentedtobiasb:
The difference is that instead of getting the nid from the loaded node, you use the nid from the array. The nid from the array will always be valid whereas the nid from the loaded node will be undefined if the loading of the node fails for any reason.
Comment #11
Ainur commentedHad the same problem as PaulMagrath, changing $context['sandbox']['current_node'] = $loaded_node->nid to $context['sandbox']['current_node'] = $row['nid']; worked well for me.
Comment #13
moshe weitzman commentedCode looks good. More robust. Bot is happy.
Comment #14
webchickInteresting. So over at #471080: Trigger watchdog when node rebuild permissions fails, which is essentially the same fix, chx threatens to won't fix due to it only masking the issue.
The big question in my mind is whether or not this will still pick it up when a node does indeed fail to rebuild permissions, due to corrupt data or similar. If it does, then this is good to go.
Could someone test this to make sure (doing something like manually inserting an invalid uid into the node table's uid column will do it)? Even better, write automated tests for the rebuild permissions functionality to prove this, and/or point out where in core they already exist?
Comment #15
PaulMagrath commented#8: nodebug.patch queued for re-testing.
Comment #16
PaulMagrath commentedI can understand chx's point of view. Fact is though that entries in the database are going to get corrupt sometimes either from hard drive failure, db admin's fiddling with database table contents directly or from people misusing the APIs. Drupal shouldn't fail like this when that happens. This is a small change for a significant increase in the robustness of the code of the function.
To answer the big question in your mind: there is no logging when a node fails to rebuild currently and this doesn't add it. That is being discussed currently over at #471080 I believe. This issue is for the fix itself only.
I've tested this fix to make sure it works and it has been tested and found to work by others as well. I agree that automated tests would be better and if you have the time, it'd be great if you get around to writing some!
In the meantime, I'm going to move this issue back to "reviewed & tested by the community". That reflects its current status and maturity. The lack of automated tests such as you describe is not directly relevant IMHO and shouldn't block this issue being fixed in HEAD.
Comment #17
tacituseu commentedI think the problem lies in other part of this function, there's a race condition between:
$context['sandbox']['max'] = db_query('SELECT COUNT(DISTINCT nid) FROM {node}')->fetchField();and
$nids = db_query_range("SELECT nid FROM {node} WHERE nid > :nid ORDER BY nid ASC", 0, $limit, array(':nid' => $context['sandbox']['current_node']))->fetchCol();if at some point between context initialization and last iteration of second query node gets deleted or added, following code will result in an infinite loop:
Comment #19
tacituseu commentedTakes into consideration that DrupalDefaultEntityController removes (returned) and fixes potential progress bar overflow.
Comment #20
Encarte commentedsubscribing
Comment #21
dave reidRelated for Drupal 6 only (already fixed in D7+): #600836: Batch API never terminates if you set $context['finished'] > 1
Comment #22
mstrelan commentedTo me it doesn't make sense that we run
$context['sandbox']['progress']++;inside offoreach ($nodes as $nid => $node) { }, because it is never guaranteed thatcount($nodes)is equal tocount($nids);, yet the batch process is based on $nids, not on $nodes.In my case I'm working with a D5 database I've inherited and I'm upgrading it to D7 and adding domain access. There are about 70 nodes which don't have any revisions, don't ask me how this is possible, but rebuilding permissions stops at 99%.
Two possible solutions.
1. Change the foreach loop to go through the nids, not the nodes.
2. Replace
$context['sandbox']['progress']++;with$context['sandbox']['progress'] += count($nids);outside of the foreach loop.Comment #23
Anonymous (not verified) commentedThe patch will need to be 8.x based first.
Comment #24
xjm(Merging "node system" and "node.module" components for 8.x; disregard.)
Comment #25
damienmckennaClosed a duplicate: #2292505: _node_access_rebuild_batch_operation does not handle a corner case
Comment #26
damienmckennaPer tacituseu's comment #17, this issue occurs when the 'max' number of nodes at the start of the batch process falls out of sync with the per-iteration count from the second query. I've seen this happen on large sites that required the access rebuild on production – because the process took so long, other events were triggered that caused other nodes to be fixed separately, thus the running total never reached the 'max'.
Comment #27
David_Rothstein commented#2331113: Node access rebuilds are completely broken when being rebuilt through a batch process? is somewhat related (although not the same issue as this one).
Comment #36
pameeela commentedNot sure whether this is still valid, but the referenced code has changed so the IS needs an update.
Comment #37
damienmckennaComment #41
mstrelan commentedThis came up as the daily triage target for Bug Smash and Lendude pinged me since I encountered this 11 years ago ...
I've attached a patch against 10.1.x that implements my suggestion in #22. Haven't got around to writing a test for this though since it doesn't seem straight forward to replicate.
FWIW when this was originally written in 6.x the sandbox progress was incremented for each nid loaded from the db, whereas in 7.x this changed to be incremented for each node that was successfully loaded. The patch changes it back to nids.
6.x:
7.x:
Comment #42
mstrelan commentedFixed phpstan baseline (in before the CCF fixers arrive)
Comment #43
ameymudras commentedI did a quick test and was able to get this working with the patch.
I have a suggestion regarding the code, rather than checking the nid below
Why don't we use
Also we need supporting tests for this use case
Comment #44
mstrelan commented@ameymudras the main reason is that we're iterating through a list of $nids and confirming that a corresponding node exists in the $nodes array, so $node doesn't exist at this point. There are plenty of other ways to refactor this but I think this minimizes the changes as much as possible.
Comment #46
catchI think this is actually major. Not sure how we'd write tests for it since I think it requires a race condition or similar? Also phpstan is finding the existing bug in the logic.
#42 looks good so moving back to needs review.
Comment #47
smustgrave commentedWow 2008!
Updated the issue summary best I could on patch #42
Leaving the tests tag just in case but noted in the IS this may not be possible.
Applied patch #42 and ran rebuild from admin/reports/status/rebuild which completed without failure.
Change makes sense to me.
Comment #48
quietone commentedI'm triaging RTBC issues. I read the IS and skimmed comments. It is very helpful that the proposed resolution identifies the patch being referred to. Although, the sentence is rather cryptic and I didn't understand it until I read the patch. I didn't find any unanswered questions or other work to do.
Leaving at RTBC.
Comment #49
kim.pepperShould this be
isset()instead?Comment #50
mstrelan commented@kim.pepper probably, but it doesn't make a difference to this issue, as we will still increment the progress whether the node is empty or not.
Comment #52
catchGoing to go ahead and commit this given it's been more or less the same fix since 2009 and has been RTBC multiple times since then.
Once again I don't see how this is testable since it depends on a race condition being triggered during node access rebuild, however it's also a good example of phpstan finding a (very old) bug.
We could open a follow-up for the empty vs. isset change, also think it's probably time to look at a NodeAccessUpdater class similar to the config update helper to try to modernize some of this and make it a bit less tied to batch.