Bad idea to try to override node_save?

By funkylaundry on 15 Nov 2015 at 14:33 UTC

I'm building some pretty complex logic that compute a number of calculated field across different content types.

I am currently using hook_node_insert, hook_node_update and hook_node_delete along with drupal_register_shutdown_function to postpone performing most actions until post save.

A user in here mentioned the danger of using node_save from within context described above and suggested using hook_node_presave, but unfortunately I can't see how I should get that to work in practice as I need to have the node saved to the database before I perform any calculations. Some of the calculations rely on fetching referenced notes and the calculation scheme is even structured so a "call to recalculate" propagates throughout all associated nodes.

The problem with this approach is that whenever node_save() is called to save the calculated fields to the node, it seems hook_node_update() is called again and I end up in an infinite loop. Somehow I had it working to a certain extend - probably because I was pretty strict about only calling node_save() when there where actually some changes to save - but investigating things a bit further I realize that even if I avoid the infinite loop, most of my code is run at least twice which is certainly not very good for performance.

So I got this idea that what if I could add my own (simple) functionality to node_save? Simply add an extra parameter to node save where you could tell that this should be a "safe" save and should not invoke any hooks. I'm still not sure exactly how to accomplish this but maybe I would just add a flag to the node and check for it within the relevant hooks. If you are calling node_save from within one of the aforementioned hooks you would pass in this parameter and the hooks would be bypassed.

I know you should not hack core, but is there a safe way to do this or should I look for a different path?

Comments

One should never modify

nevets commented 15 November 2015 at 15:17

One should never modify Drupal core, for one thing it makes it hard to update.

Because you are vague on the details of this code, it is hard to give specific advice on what you might do.

The way you describe this, I wonder if you should delay calculations till later using cron. So the code in hook_node_insert() and hook_node_update() would determine if the calculations need to be made. The module would implement a drupal queue and if a node needed to be processed it's need would be added to the queue. Then when the queue is processed the code would load the node, do the calculations and before saving the node it would do something like

$node->do_extended_calculations = TRUE;

You code in hook_node_insert() and hook_node_update() could then check for the existence of the flag and if present it could return immediately to avoid infinite loops.

While you could use the flag trick without using Drupal queues, since calculations could find other nodes to process, instead of doing so directly you could add them to the queue avoiding possibly long calculation times when processing a node.

I know why I should not hack

funkylaundry commented 16 November 2015 at 17:15

I know why I should not hack core - that's why I asked for a safe way, but I guess it would always at the very least involve duplicating core code into custom code and suddenly you will have to patch the custom code by hand every time there is an upgrade.

A cron job indeed sounds like the right solution. When I was drafting this feature I choose using hooks over a cron job, as I was imagining the cron job recalculating all nodes on every run. I figured it would create a lot of redundant processing and would create spikes in terms of server load, while doing it on the fly as the user creates content would distribute that load over time, only recalculating fields that where changed and fields that depend on those changed it generally would not be a very big task.

On the other hand, I realize that the current process is blocking user interaction every time he publishes content and given that Drupal is not terribly fast in the first place and this cannot really be cached, I realize that you might be right that this should be handled as a background task.

I suppose I could create a cron job which will - as you suggest - look for nodes that have pending calculations and then only recalculate the subset that has been changed. And if that is the case I could probably run it every 10 minutes or so (given that there are any changes) which would be absolutely acceptable in terms of the user experience.

I don't have much experience with cron jobs you, but I'll start doing some research. In the meantime I found a hack that enables me to get things working until I get cron set up:

Just before my main calculation function finishes I store the nid in an array which is referenced in a static variable so it stays there the next time the calculation function is called. In the beginning of the function I then implement a check that ensures that recalculation is not performed if the current node was already saved. I have still not tested it thoroughly yet, but it seems to work for now.

_

WorldFallz commented 20 November 2015 at 18:58

that's why I asked for a safe way

But that's just it-- there isn't a 'safe' way. Every time you hack core or contrib you create your own private fork with all the issues, overhead, headaches, and maintenance that entails.

And other than temporarily applying patches that haven't yet made their way into a release (which you can easily track with git and/or patch_status), there's really no reason to.

In over 8 years of working with drupal since drupal 5, I've not encountered a single scenario that required me to hack core/contrib. Whenever I encountered what initially seemed like such a situation it was either 1) a bug (which I usually participated in fixing) or 2) because I hadn't yet learned the proper way to do what I needed to do with the API.

In this particular case, I did something similar pretty much the way nevets describes-- using queues precisely to avoid forcing users to wait for long processing times.