In the past week, cron has stopped working on one of our sites. Although the max execution time is set at 240 seconds, it's still timing out.
I traced the issue to the _scheduler_unpublish function in scheduler.cron.inc.
This site is several years old, and has thousands of nodes that have been unpublished by this module over time. It appears that this function looks for all nodes with an unpublish date <= the current time, and doesn't filter out nodes that are already unpublished. The query itself isn't the issue, but all of the work done for each node in the query result takes a long time if the result contains thousands of nodes.
I'm curious if there is any reason to include already unpublished nodes in this functionality? I can't think of a reason why they would be included, but there are many use cases out there besides mine.
Comment | File | Size | Author |
---|
Comments
Comment #2
jonathan1055 CreditAttribution: jonathan1055 as a volunteer commentedHello Spry_Julia,
This is interesting. At first glance I cannot see why you should be having this problem, but clearly something is going on, and your detailed analysis is helpful.
No, there is no reason to include already unpublished nodes, and they should not be included. The query only selects rows which have an unpublish-on date which is > 0 but earlier than the current time:
When Scheduler unpublishes a node the unpublish_on value is removed (or should be) so this query is only meant to return just the nodes which need processing. So, some questions:
/admin/config/content/scheduler/list
how many nodes are shown? Do you just see the one which you think should be unpublished, or do you get lots of old ones aswell?scheduler
table. How many rows are there? It should only have one row for each node which is due to be published and/or unpublished. The row should be deleted when the node has been processedhook_scheduler_nid_list
orhook_scheduler_nid_list_alter
? These functions allow the node list to be manipulated_scheduler_publish()
or is it just a problem with_scheduler_unpublish()
?I'm sure we can help you out, as there must be a good reason why you are getting this.
Jonathan
Comment #3
JuliaKoelsch CreditAttribution: JuliaKoelsch at Spry Digital, LLC commentedHi Jonathan,
thanks so much for your detailed response!
It looks like my issue is that the unpublish_on value is not removed for some reason. So, every time cron run, it is re-processing thousands of already unpublished nodes. There are thousands of records in the scheduler table. And yes, the problem has been building over time. The site is several years old, and it looks like the scheduler table has all scheduled nodes since the site launched.
I'm looking at the _scheduler_unpublish function, but I can't identify what part of it removes the record from the scheduler table. If you could point me to the part of the code that does this, I can debug to see what's interfering with cleaning up the scheduler table.
Thanks again for your help,
Julia
Comment #4
jonathan1055 CreditAttribution: jonathan1055 as a volunteer commentedOK, yes we can work on this. I see why you had trouble identifying where the db rows are deleted, because it is not directly within
_scheduler_unpublish()
. In that function, do you see the bit:It is actions_do where the work is done, this will execute the 'node_save_action' action, which (for already existing nodes) calls all implementations of hook_node_update and it is therefore in scheduler_node_update where the updates are done, just the same as if the node was unpublished manually:
We have to cater for manual updates here, so we have the case that if either of the dates still remain in $node then a db_merge is done. If no dates remain then we call Scheduler's hook_node_delete, which deletes the db row. If you add debug to the top of
scheduler_node_update
and see what values it has for the two scheduler dates. The unpublish_on date should be empty at this stage, if all was working correctly, due to line 166 in scheduler.cron.inc$n->unpublish_on = NULL;
. Then the row would be deleted inscheduler_node_delete()
I'm interested to see what is going wrong with your site, and I'm sure we can solve it.
Jonathan
Comment #5
xpersonas CreditAttribution: xpersonas commentedI'm having the same issue. It's been happening for a long time. I'm just now addressing. I've debugged it a bit, and I can see that by the time we get to scheduler_node_update() function, the $node object being passed to it has the original unpublish date. Even though, in your code, you were trying to set it to null.
I believe this is because there are no Drupal hooks to act on a node AFTER it has been updated. So if we call hook_node_update, it will use the unchanged node values in its operation. Therefore, when scheduler_node_update() gets called, it's using the node values BEFORE it was altered.
That's if I'm understanding this right.
I made a patch. I moved the db_merge or db_delete actions on the scheduler table into the _scheduler_unpublish() function. I'm just trying this out. I'm not entirely sure if there are larger implications in doing this. So I would let the maintainer speak to that. At the very least, this hopefully helps the next person understand what's happening.
How I understand it...
In the code above, both $n and $context['node'] within the actions_do() have these values:
publish_on = "0"
unpublish_on = null
But when we get to scheduler_node_update($node), the $node class has these values:
publish_on = 0
unpublish_on = 1492354800 // the original, past, unpublished date. so it is not cleared out before saving to database
Comment #6
jonathan1055 CreditAttribution: jonathan1055 as a volunteer commentedHi xpersonas,
Thanks for your debug and useful information. The key point is that
$n->unpublish_on
and$context['node']->unpublish_on
do have the blanked value when processing control leaves_scheduler_unpublish()
but the value has been re-instated by the time we get toscheduler_node_update()
. This means that another 3rd-party module is incorrectly doing a node_load() or something like that, and modifying the$node
in theirhook_node_update()
which is being run before Scheduler'shook_node_update()
. This is incorrect - implementations of hook_node_update should not alter the $node parameter value.If you are interested to find a solution for this, you could make a dev replica of your site, with all your other modules enabled, then one by one disable them and find when scheduled unpublishing starts to work as intended. Then we can look at that module's code to see what they are doing wrong.
I am going to close this "works as designed" because the Scheduler codebase is working properly. But do not hesitate to re-open the issue if you have new information and want to progress it.
Jonathan