Updated: Comment #N
Problem/Motivation
The order in which data is saved, caches are cleared and hooks invoked is inconsistent and can result in bugs, e.g. calls in field update methods resulting in loading a partially saved entity because the configurable fields are saved after the cache clear.
Proposed resolution
Unify the call order to this:
1. Save/delete base stuff
2. save/delete configurable fields
3. reset caches
4. invoke entity method and hooks.
Remaining tasks
Check if this results in failing tests, review.
User interface changes
None.
API changes
Not really, other than the slightly different behavior of when caches are cleared.
Original report by @username
I was modifying a module which required me to modify a node and save it. However, the changes to that node kept reverting back to what it was before I did the save. Upon investigation I determined that it was the node_load cache and when I did a load just after the save with the flag to reset the cache, all worked as it should.
So first, is there a deliberate reason why node_save can save without keeping the node_load cache in sync?
Second, if it is not deliberate then perhaps a change to the node system along the lines of the following would be useful:
In node_load:
function node_load($param = array(), $revision = NULL, $reset = NULL) {
global $user;
static $nodes = array();
if ($reset) {
if (is_numeric($reset)) {
if (array_key_exists($reset, $nodes)) {
unset($nodes[$reset]);
return true;
} else {
return false;
}
} else {
$nodes = array();
}
}
In node_save (about 5 lines from the bottom):
// Call the node specific callback (if any):
if ($node->is_new) {
node_invoke($node, 'insert');
node_invoke_nodeapi($node, 'insert');
}
else {
node_invoke($node, 'update');
node_invoke_nodeapi($node, 'update');
node_load(null, null, $node->nid);
}
// Update the node access table for this node.
node_access_acquire_grants($node);
// Clear the cache so an anonymous poster can see the node being added or updated.
cache_clear_all();
}
Comment | File | Size | Author |
---|---|---|---|
#43 | d8_entity_order.patch | 5.17 KB | fago |
#43 | d8_entity_order.interdiff.txt | 2.95 KB | fago |
#35 | unify-cache-hooks-221081-35.patch | 2.22 KB | Berdir |
#28 | entity-reset-cache-ordering-221081-28.patch | 868 bytes | David_Rothstein |
#25 | node-save-cache-221081-20.patch | 3.35 KB | fago |
Comments
Comment #1
Reg CreditAttribution: Reg commentedThis is a little cleaner for the node_load part:
Comment #2
Reg CreditAttribution: Reg commentedI'm changing this to a bug as there is no way that I can see where what you save should be allowed to be out of sync with what you load at such a fundamental level.
Comment #3
moshe weitzman CreditAttribution: moshe weitzman commentedYes, it would be smart to invalidate this cache from within node_save()
Comment #4
moshe weitzman CreditAttribution: moshe weitzman commentedThis is fixed the patch proposed at http://drupal.org/node/111127#comment-768053. That patch is much bigger and intended for D7.
Comment #5
David_Rothstein CreditAttribution: David_Rothstein commentedHere is a patch for Drupal 6 that should fix the bug.
Note, I wrote this for Drupal 6 rather than Drupal 7 because:
(a) the bug appears to already be fixed in Drupal 7, although I think it may have been fixed inadvertently (see #424602: Unsaved node changes are reflected on subsequent calls to node_load()), and
(b) static caching in Drupal 7 is currently in a state of flux, so I wasn't even positive what the correct fix would be right now
Comment #6
schwern CreditAttribution: schwern commentedI just hit this bug in 6.x.
@#5 node_load() flushes the entire node cache. That seems a bit extreme. node_save() could instead insert the node it just saved into the node cache. That would require exposing the node_load() cache to node_save() as well. Sounds worth it.
Comment #7
schwern CreditAttribution: schwern commentedFWIW I didn't see a test for this in 7, so here's a test patch.
Comment #8
schwern CreditAttribution: schwern commentedDigging into this, there's no special logic in 7 to clear the cache on save. The difference between 6 and 7 is that in 7 node_load() returns the node from the cache and 6 stores and returns a COPY of the node from the cache. In 7 when you call node_load() and fiddle with the node you've already updated the cache.
There's two solutions for 6: stop copying the node before returning it from the cache; update the node cache in node_save(). The former has the advantage of compatibility.
In 7, the consequences of node_load() returning the real cached object means that this now becomes unstable:
What is $node->body? If the node cache happened to have been cleared between the two node_load() calls its going to be its "Something". If not, its "Something else". This exposes the existence of the cache to the user and allows action at a distance. ++ungood
The attached test class demonstrates the bug. It passes in 6 and fails in 7 (remove setUp to run it in 6).
Comment #9
David_Rothstein CreditAttribution: David_Rothstein commentedYup, exactly :) As I already noted above, please see #424602: Unsaved node changes are reflected on subsequent calls to node_load()
Comment #10
schwern CreditAttribution: schwern commentedHere's full patches with tests to solve this bug and #424602 in both 6.11 and 7.x. I don't think #424602 is really a duplicate of #154859 as nodes have their own in memory cache unrelated to the database caches.
I'm new to PHP and Drupal, so please excuse any non-idiomatic stylings.
I turned on all error_reporting() in the tests because it seems like a good idea to run the tests with all the warnings flipped on. It exposes some issues in simpletest and the node code. Feel free to turn them off if you don't want to deal with them right now.
Comment #11
yhager CreditAttribution: yhager commentedDue to the nature of patches for 6.x, I assume this won't go in as is. However, this is rather nasty bug to fall on... Maybe a clean workaround could be laid out for 6.x developers, something along the lines of:
if you node_save() something you might want to node_load() later, make sure you clean the nodes cache by calling node_load(array(), NULL, TRUE).
Subscribing.
Comment #12
hanoiiI just got to this issue and I think the scope of it is even wider, probably covered in other issues but worth mentioning, maybe the same happens on D7?
If you are in any node/xxx/yyy page (node/123/edit for instance), the menu system will call node_load(123) at the very beginning of the page request, and the node will eventually be statically cached in the node_load() function.
This means that after clicking SAVE in the node/123/edit page, the menu will first call node_load(123) function but the node has not been updated at all yet, so it's really caching the old node. So every node_load(123) function called along the update process will always get the old node.
Maybe the patches already solved this, but thought it worth mentioning.
Comment #13
David_Rothstein CreditAttribution: David_Rothstein commentedI think this issue is becoming too confused with #424602: Unsaved node changes are reflected on subsequent calls to node_load(). Please move any discussion related to that problem there (or really to the meta-issue which it was marked duplicate of). Probably I made a mistake by even referring to it in the first place -- although it (coincidentally) helps fix the bug here, it doesn't actually completely fix it, and plus, we shouldn't rely on it but rather should have an actual direct fix.
The attached patch for D7 combines the fix from #5 with a reworked version of the test from #7 (thanks @schwern, and don't be discouraged that your patch sat for so long! - back in April, many people weren't yet focused on the "bug-fixing" part of the D7 development cycle...) The test works with cloned nodes so that it will fail without the actual fix, regardless of what happens with #424602: Unsaved node changes are reflected on subsequent calls to node_load().
Regarding the comment above in #6, I think we actually do need to clear the entire node cache, rather than just the node being saved. Instead of explaining it here, I figured I'd just explain it in a code comment in the patch :)
Comment #14
moshe weitzman CreditAttribution: moshe weitzman commentedNot so sure about the clear all nodes upon saving one. Couldn't that be implemented by the text filter module or the node ref module in your examples? I don't think node.module itself needs to be so aggressive, since core does not support these things. My main concern is that we blow away the whole persistent entity cache - http://drupal.org/project/entitycache. Perhaps catch can chime in here.
Comment #15
catchThis should use entity_get_controller('node')->resetCache(); to clear the static.
entitycache.module doesn't clear the persistent cache when you do ->resetCache(); so it wouldn't break that particularly. We could add an optional $nid argument to resetCache() too I guess.
Comment #16
David_Rothstein CreditAttribution: David_Rothstein commentedOK, using the resetCache() function now instead.
Seems to me that's probably far enough to go for this patch, since it makes the behavior consistent with http://api.drupal.org/api/function/node_delete_multiple/7... Trying to deal with only clearing a single $nid or group of $nids really sounds more complicated, at least off the top of my head (can a text field easily figure out the cache clearing function for the type of object it is attached to and know how to call it?). Plus it seems like a lot of work for minimal gain, no? The code right above this clears the entire page cache, so clearing the static cache for the rest of the current request seems like nothing compared to that...
Comment #17
moshe weitzman CreditAttribution: moshe weitzman commentedMakes sense.
Could save a few cycles by having NodeSaveTestCase extend DrupalUnitTestCase instead of DrupalWebTestCase? Or will the node_save() fail in that state?
Comment #18
catchYeah this is fine, we'll just need to be very careful in entitycache to keep the persistent and static cache clearing separate, but that's already the case.
Cross posted with Moshe - UnitTestCase has no database available, at all, so anything which could possibly trigger a database hit is a no go.
Comment #19
webchickNo longer applies. :(
Comment #20
David_Rothstein CreditAttribution: David_Rothstein commentedRerolled.
The reroll was completely trivial, so should be RTBC again assuming the tests pass.
Comment #21
David_Rothstein CreditAttribution: David_Rothstein commentedThe tests passed :)
Comment #22
fagoThe node_load() cache being still there after doing a node_save() is currently the only possibility to retrieve the unchanged node after an update. To be able to get the unchanged node is really important when you want to react on changes, thus most rules rely on reacting on changes! This patch makes it impossible to react on this changes on API level!
Ideally the API would make the unchanged node object available in hook_node_presave and hook_node_update. Thus when we remove the only previously possibility to get the unchanged node, we really need to make it available properly instead.
Also FIELD_LANGUAGE_NONE is LANGUAGE_NONE now.
Comment #23
fagoWith the current behaviour of objects being shared #154859: Document that cached entity objects are not necessarily cloned any more we are already close to the desired behaviour no? If the changed was previously loaded with node_load() the node is already updated in the cache. However in some situations, the updated node is created fresh e.g. from form values or comes from somewhere else, e.g. the form cache - then the node_load() cache hasn't been updated yet. So what about just updating the cached node with the new one? That way we would save further lookups of a node we already had in memory.
If a node depends on another node it should really only save a pointer to it - thus the nid or vid. In that case everything is fine as any further request to get the the node will retrieve the updated node anyway. So fixing the cache of the changed node only should be fine.
Comment #24
fagoFor the issue of being able to react on changes I created #651240: Allow modules to react to changes to an entity
Comment #25
fagoI fixed the FIELD_LANGUAGE_NONE define and re-rolled the patch.
While I'm still not sure about the complete (static) cache clear I trust the opinion of performance specialists like catch. :)
Comment #26
fago#25: node-save-cache-221081-20.patch queued for re-testing.
Comment #28
David_Rothstein CreditAttribution: David_Rothstein commentedThis bug was mostly fixed as a result of #651240: Allow modules to react to changes to an entity, but I'm running into one scenario where it still occurs.
The issue is that if you call node_load() from within an implementation of (let's say) hook_node_update(), you get the old node rather than the new one that was just saved to the database. Someone else already pointed this out in the comments on api.drupal.org, and a realistic case where I just ran into it is #1541142: Immediately deploying an edited entity sends the old version of the entity rather than the new one.
The fix would be something like in the attached patch, which I'm sending to the testbot to see what happens. (However, it's not just nodes that this issue occurs for; other entities have the same problem.) Also, it appears that $entity->save() in Drupal 8 already does this correctly, so for Drupal 8 it may fix itself on its own once everything is converted to the new entity system.
I think this is a clear bug, but I'm not sure if it's backportable to Drupal 7 or not? It makes the hooks behave the way you'd expect them to, but I guess I can imagine it breaking some existing code...
Comment #29
David_Rothstein CreditAttribution: David_Rothstein commentedAnother issue still remaining is the inconsistency between save and delete.
In node_save(), we do this:
But in node_delete(), we do this:
(And a similar thing for other entities.)
It's not clear to me that this discrepancy exists for a reason.
(Not to mention that the code comment about clearing the page and block caches looks to be out of date, and if we don't clear the page cache when a node is deleted anymore I bet that's a bug too.)
Comment #30
Berdir#28: entity-reset-cache-ordering-221081-28.patch queued for re-testing.
Comment #32
Fabianx CreditAttribution: Fabianx commented#28: entity-reset-cache-ordering-221081-28.patch queued for re-testing.
Comment #34
Berdir#597236: Add entity caching to core shows another problem that we have right now, but the other way round.
We first clear the static cache after we clear the cache, so in case someone calls entity load in e.g. a field update method gets a partially saved entity because the new field values aren't there yet.
Comment #35
BerdirOk, what about this?
This unifies the calls to this order:
1. Save base stuff
2. save configurable fields
3. reset caches
4. invoke entity method and hooks.
This will also help with moving the hooks into the entity pre/post methods as the order won't change then.
Comment #36
amateescu CreditAttribution: amateescu commentedThat unified order looks fine.
Comment #37
fagoI agree the ordering makes sense - first do the storage, then reset caches and invoke methods/hooks.
Comment #38
catchNot sure about clearing caches before the insert etc. hooks fire - there's some modules that change data in those hooks (iirc ERS does). That's usually a hack to work around core entity storage limitations, some of which we've fixed so maybe it won't be necessary, but would be good to at least add a comment to those hooks to note they shouldn't change the stored data.
Comment #39
fagoWhat's ERS?
Totally, we should finally become clear on that. Please see #1729812: Separate storage operations from reactions
Comment #40
Xano35: unify-cache-hooks-221081-35.patch queued for re-testing.
Comment #41
catchERS is https://drupal.org/project/ers
Comment #42
BerdirNot sure what's the state here now, should we move cache clear further down or not? I'm fine either way, as long as we we don't keep it like it is now in save, which is broken, see #34.
Comment #43
fagoI think catch was requesting to improve comments:
So, let's do. Attached patch improved entity-crud hooks docs based upon the existing node-crud hook docs which slightly differ. The interdif
unifies things such that pre-operation hook us the "act on .." wording while the post-operation hook use the "respond to" wording
Given that changes we already be clear on entity-insert/update hooks are for reactions, i.e. it mostly solves #1729812: Separate storage operations from reactions already. Imo a logical consequence would be that we change the scope of the database transaction, but I'd suggest deferring this to #1729812: Separate storage operations from reactions.
Comment #44
BerdirLooks good to me!
Comment #45
webchickSounds like catch already was involved in this one...
Comment #46
catchYep extra comments are plenty. Committed/pushed to 8.x, thanks!
Comment #47
yched CreditAttribution: yched commentedThis means hook_entity_presave() is the only place where you can alter the entity values then, right ?
Can implementations of this hook tell between an insert and a delete at this point in the saving flow ?
Comment #48
catchShould have isNew() (or whatever the 8.x equivalent is now) available in presave().
Comment #49
yched CreditAttribution: yched commentedWell, currently the various StorageControllers only do ->enforceIsNew() *after* calling invokeHook('presave').
Comment #50
BerdirThat call is only a trick to make sure the entity is still seen as new while saving data. Above that, they use isNew() themself to figure out if this is an insert or update. So that's safe to use.
Comment #52
ndobromirov CreditAttribution: ndobromirov commentedThis is a bug reported as fixed in for 8. Should there be a back-port to 7?