To react on changes of a certain field of a node one needs to be able to compare the updated node with the original one. Thus for that we need the unchanged node!
Users of the rules module already heavily react on changes, as this is the common pattern there.

For d6 the node_load() cache being still there after doing a node_save() was the only possibility to retrieve the unchanged node after an update. However the current behaviour of #154859: Document that cached entity objects are not necessarily cloned any more or the dedicated fix of #221081: Entity cache out of sync and inconsistent when saving/deleting entities would break that. Well it wasn't that clean anyway.

Ideally the API would make the unchanged node object available in hook_node_presave and hook_node_update - this is really important for any module that want to react on a change. Thus I did a patch, that implements that. Note that this doesn't break any existing API as PHP is fine with passing more arguments to a function than it takes.

This problem applies to all *_save() functions in drupal, so it should be fixed for all of them. However I'd suggest to figure it out right for nodes first, then I'll roll a patch for the others.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

fago’s picture

Status: Needs review » Needs work

This patch won't work right in all cases as currently the object returned by node_load() would already contain any changes applied to that object instance. So it works right with our current node form workflow, but not when a module calls node_load() changes something and then calls node_save().

catch’s picture

Subcribing. A lot of code which currently updates regardless could check for changes first using this. hook_user_presave() currently adds the form values, but I'd much rather we had two objects.

fago’s picture

In the meantime I've implemented this for rules which keeps a static with clones of unchanged entities - this static is updated on load, insert and update. This works fine, except when entities are serialized over page loads but not loaded. E.g. when editing a comment, entity_load is not called, thus when the comment persisted through form storage is saved, there is no unchanged entity available on insert/update. On entitytype_presave, we can just call entity_load though.

Once we have hook_entity_presave we can fix that too, so it would work that way.

Generally I think this belongs into the controller + let the controller invoke hooks and just pass the unchanged entities too - of course chaning hooks is out of scope for d7. But I wonder whether it makes sense to bring this functionality into the controller still for d7, such that other modules beside Rules have an API for retrieving unchanged entities too?

Apart from that I think there is another problem when saving entities that have been serialized previously. This saved object is not the same as the one as retrieved from entity_load(), thus after saving entity_load() would be served from static cache but it won't return the recently saved object.

mitchell’s picture

Issue tags: +API addition

@fago: Do you intend to get this patch ready to be reviewed for D7 within a meaningful time? I'm guessing there will soon be a major thrust to clean up the D7 'needs work' issues (20 pages, whoah).

Adding 'api addition' tag.

fago’s picture

Title: allow modules to react on changes of a node » allow modules to react on changes to an entity
Component: node system » base system
Priority: Normal » Critical
Status: Needs work » Needs review
Issue tags: +API change
FileSize
6.84 KB

I just ran over this issue again.. :/ I think its rather problematic for d7 to be not able to reliable detect changes. Thus increasing priority to critical + rolling a patch.

Attached patch provides a new API function entity_load_unchanged(), which can be used to get the unchanged entity in presave hooks.
* entity_load_unchanged() just resets the cache of the entity in question and loads it again, thus bypassing the static cache.
* The patch comes with a small API change - it adds an optional $id parameter to the entity controllers resetCache() method. This shouldn't be problematic though, as most controllers in contrib inherit from the DrupalDefaultEntityController anyway, in which case they don't need to change anything. But still there is a change, thus assigning the API change tag.
* Patch comes with a test detecting changes on node_save() + fixes the existing node_save() test to actually test something.

Status: Needs review » Needs work

The last submitted patch, drupal_changes.patch, failed testing.

fago’s picture

Status: Needs work » Needs review
FileSize
5.67 KB

grml, I guess the Git patch fails to apply due to $Id$ in the removed simpletest module.
re-roll without that.

The module "modules/node/tests/node_presave_test.module" is not used any more + the code is in "modules/node/tests/node_test.module" now, so it should be removed - but not touching that for now.

webchick’s picture

Priority: Critical » Normal

Not critical. This functionality didn't exist in Drupals 1, 2, 3, 4, 5, 6 or 7, so it's not at all clear why we need to add it now.

Just store the original copy of the node as a node property or something?

fago’s picture

Priority: Normal » Critical

That's not true, till Drupal 6 a $node returned from node_load() did reflect the node currently stored in the DB - thus previously one could do a node_load() instead of the proposed entity_load_unchanged(). However this changed with #154859: Document that cached entity objects are not necessarily cloned any more for D7. Thus setting back to critical.

>Just store the original copy of the node as a node property or something?
Yep, we could just go and store a copy of each loaded entity somewhere, however I doubt this would be good for Drupal's memory usage.

gr33nman’s picture

http://drupal.org/files/issues/drupal_changes_0.patch

The above patch from reply #7 hangs and does not patch.

[removed by sun]

longwave’s picture

@gr33nman: patch -p0 drupal_changes_0.patch should be patch -p0 < drupal_changes_0.patch - the former hangs because it is waiting for you to provide patch data on standard input.

sun’s picture

I agree that this is a critical problem introduced for D7.

However, I'm worried that entity_load_uncached() doesn't track its own invocations/usage. Since Drupal is modular, there can be 1-N modules that need to retrieve the original resp. currently stored entity in an hook_entity_presave() in order to figure out what they need to do. In turn, entity_load_uncached() would be re-invoked N times and each time, the cache is reset and the entity is built from scratch.

2 possible solutions:

1) Add a static cache to entity_load_uncached() in order to track reloaded entities by $type and $id. If there's an entry for a $type+$id already, don't reset the cache.

2) Formalize usage of entity_load_uncached() a bit more and automatically invoke it in ENTITY_save() functions, before hook_presave() is invoked, so that presave hooks can access the currently stored entity in $entity->original or similar.

fago’s picture

Beside that, I realized there is another problem with solely relying on 'presave'. As said Drupal is modular, so other modules could issue changes during 'presave' too - what makes it impossible to have a reliable way to determine changes based upon it, as no module can really ensure it is the last module acting.

Thus, in order to ensure no changes are missed, the only reliable way is to go with the 'update' hook. That would be already enabled with option 2) is the original as available in $entity->original anyway.

Compared to 1) this has the downside of having the additional load() operation for each entity being saved. However, as soon as a module is looking for changes it doesn't make any difference + it's required to make it actually reliable. Still I think that's better than cloning each entity on each load(), thus wasting time and memory on *each* load.

BenK’s picture

Subscribing

chx’s picture

Let me get this straight, due to people not knowing PHP 5 object handling six years after its release we are adding weird workarounds in our code?

sun’s picture

@chx: Sounds off-topic. The problem is not PHP5 but our own static caching.

webchick’s picture

Right, #15 pretty much sums up my feelings on the matter.

IMO we need to fix this by either:
a) cloning in entity_load, which is what we did in D6 (I hear concerns about RAM usage; but I don't understand how it's different than D6).
b) documenting this as the caller's responsibility.

fago’s picture

ad a), I think we've commonly agreed to not do so in #154859: Document that cached entity objects are not necessarily cloned any more
b) doesn't help anyone facing *this* issue.

>Let me get this straight, due to people not knowing PHP 5 object handling six years after its release we are adding weird workarounds in our code?

Knowing PHP5 object handling doesn't help anyone when facing problems like this issue. PHP's object handling is just part of the problem making it difficult - together with Drupal's 'static cache' for entities. Thus this issue is not a language problem in general, it is a problem introduced by the way Drupal makes use of it.

Thus still I think option 2) is the best way to go, I'll roll a new patch asap. Probably tomorrow morning.

catch’s picture

"a) cloning in entity_load, which is what we did in D6 (I hear concerns about RAM usage; but I don't understand how it's different than D6)."
The difference compared to D6 is that comments, taxonomy terms, files and other things were never loaded entities in D6 at all, they were direct queries from the database, and in the case of those two without an API to act on. Since we now have a unified API for loading stuff like that, adding stuff like cloning to nodes affects those too.

sun’s picture

Title: allow modules to react on changes to an entity » Allow modules to react to changes to an entity
FileSize
11.06 KB

Implemented 2) as $entity->original.

Damien Tournoud’s picture

What's the point of loading the original entity unconditionally? If interested, a contribution module could use entity_load_unchanged() itself. The performance impact on that, especially for batch modifications, seems unnecessarily huge.

Status: Needs review » Needs work

The last submitted patch, drupal.entity-save-original.20.patch, failed testing.

james.wilson’s picture

Would adding in the entity_load_unchanged() function from #20 in addition to some documentation be enough to close this issue? (encapsulating the actual load from db within that function)

fago’s picture

Status: Needs work » Needs review
FileSize
14.92 KB

ad #20:
thanks, I took that as base for that patch. However, you changed the function signature of entity_load() and the controller load() method, which is at that point an unnecessarily API change + is incorrect, as FALSE is a valid parameter for entity_load() + no array. Thus I removed that part.

ad #21 & #23:

Beside that, I realized there is another problem with solely relying on 'presave'. As said Drupal is modular, so other modules could issue changes during 'presave' too - what makes it impossible to have a reliable way to determine changes based upon it, as no module can really ensure it is the last module acting.

Thus, in order to ensure no changes are missed, the only reliable way is to go with the 'update' hook.

However, enabling usage of the 'update' hook means one has to pre-load the unchanged entity before it is saved, i.e. on pre-save.

>The performance impact on that, especially for batch modifications, seems unnecessarily huge.
Agreed, that's problematic. Thus, we could go and leave it up the using module to pre-load the unchanged entity on 'presave', however, as soon as a single module is looking for changes it doesn't make any difference.

So what about the attached patch? For mass operations, it pre-sets $node->original in order to be efficient. I also improved the tests to also test determining changes on hook_node_update().

Status: Needs review » Needs work

The last submitted patch, drupal_changes.patch, failed testing.

fago’s picture

Status: Needs work » Needs review

ok, fixed that 2 fails.

Also node_save() doesn't yet clear the static cache upon node_save(). This is an issue in general (see explanation at #983090: Reset cache after save), however with that patch the static cache would even contain the original, unchanged node. Thus I fixed that for nodes and vocabularies, for the other entity types in core this was already correct.

Thus updated the patch to fix the tests + to fix clearing the static cache upon save.

fago’s picture

FileSize
16.15 KB
fago’s picture

FileSize
16.4 KB

added in a hunk testing the static cache being cleared upon save for nodes.

fago’s picture

All green, imho it's ready. Any comments?

sun’s picture

Status: Needs review » Reviewed & tested by the community

Yes, that makes perfectly sense. The code properly accounts for both situations, single entity updates and bulk entity updates, and uses the best (performance-wise) approach for the concrete situation at hand. Nicely done!

Anonymous’s picture

possibly this cuts against the "if you break it, you get to keep both pieces approach" we're taking to object handles, but - do we want entity_load_unchanged() to leave a handle in the static cache? it feels like we're changing the world just by trying to observe it, so maybe:

function entity_load_unchanged($entity_type, $id) {
  entity_get_controller($entity_type)->resetCache(array($id));
  $result = entity_get_controller($entity_type)->load(array($id));
  entity_get_controller($entity_type)->resetCache(array($id));
  return reset($result);
}
sun’s picture

@justinrandell: I don't really see the point of flushing the cache once more after the fact that we already flushed and rebuilt it for the sole purpose of ensuring that we at least flushed it once in order to retrieve the currently, actually stored entity, but maybe I'm missing your point.

Anonymous’s picture

@sun: my point may just be excessive hand-holding, but if you load an entity via _unchanged(), then do_some_stuff_that_calls_other_modules_code(), do_some_stuff_that_calls_other_modules_code() could change your 'unchanged' entity.

i'm ok if we're saying that people just have to not do that, but given this function is all about getting an entity that hasn't been changed, i wonder if we should make it so it *can't be changed* by code that runs elsewhere.

make sense?

sun’s picture

AFAIK, we can't technically enforce a "unchangeable" object, unless we'd wrap and *tach its properties onto a special class object as protected properties, and/or have the class use the magic __set() method to send any property changes to /dev/null

Anyway, I'd leave that edge-case for D8 or D9 to figure out - until then, it should be sufficient to clone $entity into $entity->original, resp., reload the stored entity into ->original upon saving, like the current patch does.

cosmicdreams’s picture

Woohoo! another critical down!

Anonymous’s picture

ok, guess i didn't make myself clear.

@ #34 - its not about making an object unchangeable, its about making it only changeable from within the scope of the code that calls entity_load_unchanged().

as the patch is now, its possible for the object returned by entity_load_unchanged() to be altered by code in a far away corner of drupal, because the entity's handle is left in the static cache, and will be returned by the next call to entity_load().

anyway, its definitely an edge case, and this is a critical bug, so i'll stop.

fago’s picture

ad #31:
Good catch.

>.., because the entity's handle is left in the static cache, and will be returned by the next call to entity_load()

Indeed, however this is only until the save operation finished as then the static cache is cleared - then you'll get the updated entity. That said, I see your point, however I do think its an edge case which can be only caused by a module doing an entity_load() on the entity itself in an entity's presave/update hook (which already has the entity object though) + apply updates to it. If you do that, I think its obvious that you mix up things then.
Thus, I think it's better to keep the static cache, so any further theoretical reads (during save) can benefit from it.

Related, multiple, simultaneous page requests doing saves are also problematic, however that should be fine for entity types making use of the dbtng transaction system. I guess all should, but not all do yet. But that's another issue.

fago’s picture

I try to give a summary for the patch in #28 for webchick/dries:

  • Because of #154859: Document that cached entity objects are not necessarily cloned any more reliable detecting changes to an entity in contrib isn't possible easily in contrib, nor efficient. The patch fixes that in core in an efficient way, such that any module may easily make use of $entity->original during updates in order to detect changes. Example use case: Detect when a node has been unpublished.
  • The patch pre-loads the original entity during pre-save, what is required to make change detection work during update hooks too. Changes can be only detected reliable on update, as no module can be sure it is the last one applying changes in 'presave'.
  • The patch keeps the optimized memory footprint for reads (no unnecessary entity clone for each load), while it allows for efficient bulk updates by allowing modules to pre-set $entity->original, thus saving the additional load. The patch implements that for the core-bulk-update-operations.
    Without that patch in core, one could solve this in contrib using the clone-on-load approach, thus doubling the memory usage of entities as soon as a single module tries to detect a change. Even worse, without anything in core, each module would have to clone the entity on its own. Cloning each entity in core though, would affect general memory usage - see #19.
  • The patch comes with tests, which detect changes on node presave and update + it fixes the related node_save() test case to actually test something. Related it also fixes node_save() to clear the static cache for the saved node - see #26.
  • API changes: The entity controllers resetCache() function signature has been modified to resetCache(array $ids = NULL) to enable resetting the cache for specific entities only. However usually modules inherit from DrupalDefaultEntityController anyway, thus the change would be only relevant to them if they override the method or implement the interface completely on their own.
  • Left for follow-up issues: a) ensure all entity types use transactions for their save() functions. b) remove "modules/node/tests/node_presave_test.module", see #7.
Dries’s picture

Status: Reviewed & tested by the community » Fixed

Looked at this patch several times, debating with myself whether this is actually critical, but eventually decided to commit this to CVS HEAD. Thanks.

yched’s picture

rfay’s picture

Based on #38's excellent issue summary, it looks to me like this API change doesn't have to be announced. Let me know if I'm wrong.

fago’s picture

Great to see this went in in time!

ad #41: While it's unlikely modules are affected, I'd still suggest to announce it E.g. the entity API is affected as it overrides resetCache(), but well I'm already aware of the change. ;)

Follow-ups:
#986018: Remove ununsed test files
#776222: Add transactions to more _save() functions

And another related issue:
#986024: Inconsistent static entity cache in hook_entity_[insert|update|delete]

Status: Fixed » Closed (fixed)
Issue tags: -API change, -API addition

Automatically closed -- issue fixed for 2 weeks with no activity.

jweowu’s picture

Issue summary: View changes

Would someone from this issue be able to comment on #1967266: Stop flushing entire entity load cache bins unnecessarily and in particular the issue of entity deletion functions continuing to flush their entire entity load cache bin, despite the ability (added here) for flushing only specific entity IDs.

See comment #6 in that issue for more details.

I note that in the patch committed here, there were whitespace (only) changes to taxonomy_vocabulary_delete() highlighting that in that function, an existing call to entity_get_controller('taxonomy_vocabulary')->resetCache() was not modified, whereas in taxonomy_vocabulary_save() there was a change from resetCache() to resetCache(array($vocabulary->vid))

That makes the difference seem purposeful, but I don't see why there's a difference between the two cases (and there are no comments to explain the reasoning).

edit: This question has come up before, but no answers were offered at that time.