Applies to both Drupal 4.6.x & 4.7.0. While (inadvisedly) completing delivery 4,000+ emails via the simplenews.module and cron.php, we discovered that the apparent attempt in cron.php to prevent cron.php from running on top of itself while a previous run of cron.php is still running, does not work.

Since our (shared) hosting service has a once-per-day limit on (real Linux) cron runs, we ran cron.php manually by loading its URL into a browser. But since the host does not provide ssh (shell) services and cron.php provides no feedback itself and the Drupal admin logs provide none until a run of cron.php has finished, we tried stopping and reloading cron.php a few times manually.

Since the newsletter takes hours to send once the simplenews.module invokes cron.php to send the a newsletter to any subscribers it has not yet been sent to, our re-loading of cron.php restarted the whole sending process from the next subscriber address waiting as another server process thread while the old thread was still running causing overload on shared host, leading our hosting service to shut our site's account down until the problem was resolved and many, many of our subscribers receiving our (ridiculously large) newsletter many times each. Hence I've labelled this issue "critical".

While I am not 100% sure there might be some reason for cron.php to work this way, both its code and performance Drupal's admin logs suggest this is a bug.

In cron.php, before any modules' cron handlers are invoked, the global variable "cron_busy" is first checked to see "if the last cron run completed". If it is not set, it is give a default value of "false".

If cron_busy is true, cron.php sends this dubious message to the Drupal admin ("watchdog") logs: "Last cron run did not complete".

But then it goes out of the if control structure and always runs this code, whether an earlier cron.php run is still running or not(!?!):

// Iterate through the modules calling their cron handlers (if any):
// That is, it runs all modules' cron handlers!!! - ADM
module_invoke_all('cron');
// Clean up
variable_set('cron_busy', false);
// The implication of this is that if a second run of cron.php is shorter than an earlier one, 
// cron.php will then set cron_busy to "false" BEFORE THE EARLIER RUN COMPLETED 
// and then send this to the admin logs:
watchdog('cron', t('Cron run completed'));
// But an earlier cron.php run might still be running 
// and there is no indication anywhere of which cron run just completed.

The attached zip file includes a patch I believe fixes this apparent bug (simply by moving the above code into the else part of the if control structure at issue) as well as cron.test01.php and cron.test02.php which use this patch and can be used to test the control logic against Drupal's admin logs (and/or under Drupal 4.6.x, the current value of cron_busy using the variables.module), without actually running Drupal's cron handlers (and with timing feedback from cron*.php).

Note: this is the first patch I've created under Windows and I'm not sure if it was done right. It didn't seem to work with the GNUWin32 patch program using the Drupal handbooks instructions.

Recommended test method:
* While no crons are running, run cron.test01.php (which basically just runs "sleep(5);") and the logs should read "Cron run completed".
* Do the same first using cron.test02.php (which basically just runs "sleep(30);"), then run cron.test01.php in another browser window while the first cron is running. After both cron*.php files have finished the logs should have entries like this:

            cron 2006-06-27 15:04 Cron run completed adm details
warning cron 2006-06-27 15:04 Cron Running: Ignored attempted rerun. adm details

I believe this is a better (if not "correct") way for cron.php to work. The same test can be re-run changing the moving the above code from cron.php back outside the if control structure, as is the case for the current cron.php in both Drupal 4.6.x and 4.7.0, and both the effective results and admin log entries will be much less intuitively correct, if not simply wrong, since one cron.php run will run over top of an earlier one.

Note: This also affects poormanscron.module.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

beginner’s picture

Can you just post a patch?
http://drupal.org/patch

beginner’s picture

Version: 4.7.2 » x.y.z
ADMarshall’s picture

FileSize
1.19 KB

Ya, I RTFM'd already and have done diff-patch a gazillion times under Linux. But the GNUWin32 patch didn't seem to work the same way (under Windows).

The long explanation, convoluted explanation and test scripts were provided because I couldn't quite believe there could really be such a bug in cron.php and felt obliged to prove it -- and I didn't have time to make the explanation shorter and clearer.

I've extracted the patch out of the zip file for you, Beginner, and attached it to this "follow-up". Now it's your turn to RTFM... ;)

Btw, the patch is named "cron.php-v4.6.6_1.28.patch" because it was based on the v. 1.28 version of cron.php that came with Drupal 4.6.6. But it seems to apply to both the 4.7.0 and cvs versions as well.

beginner’s picture

Title: Cron Busy ReRuns: Patch for 4.6/4.7 to stop already running cron.php re-running » Cron Busy ReRuns: prevent already running cron.php from re-running
FileSize
1.05 KB

First of all, +1 on correcting this bug.
I noticed this problem the same way as you did: by sending our newsletter to 1000+ emails. Each time I would run cron.php again would mean that the people would receive the newsletter one more time, i.e. many people received multiple copies of the newsletter.

What's more, cron.php can be run by anybody. There is never a guaranty that it won't be run twice, even if you know what you are doing: search engines may have found a link to cron.php somewhere and try to index the page at the same time as your scheduled run in taking place.

Thanks for providing test files, in your zip.

The code needs some very minor adjustments, though:
1) you insert some new lines where you shouldn't.
2) comments must be indented, too. (// Clean up).
3) remove your comment block, with your name and email: there is no mention of the author of any patch anywhere in the code, and I guess this patch will not get committed with it.
4) improper use of capitalization in the watchdog message.

Now, an important question:
What if because of a severe failure (server crash?) the cron is no longer running, but the variable cron_busy is still set as TRUE?
Maybe we need to add a function in admin/settings to allow to reset the variable as FALSE in cases like.

I attach a new patch.

pwolanin’s picture

In search_cron(), a shutdown function is registered which does some cleanup if the function fails or times out. Can this be done in cron.php? Seems like this would be the approriate way to set the watchdog error "cron run did not complete" as well as resetting the 'cron_busy' variable?

beginner’s picture

Interesting idea.

http://php.net/register_shutdown_function
the comments mention differences between *nix and windows platforms.
I haven't had time to study the question in details, though.

drumm’s picture

Status: Needs review » Needs work

I'm concerned about cron runs which exceed the maximum execution time set by PHP.

I'm not sure, but registering a shutdown function still might not execute if the script exceeds any alotted time limit (especially since it can be an Apache-level time limit instead).

I think the best solution might be to add better locking to simplenews. (Or a queue API for Drupal so modules won't have to think about making a proper queue.)

Steven’s picture

I know from the search_cron() changes that register_shutdown_function() does allow you to get around PHP's time limit. Apache's limit, that I don't know.

ADMarshall’s picture

FileSize
705 bytes

First, apologies to beginner for my surly earlier reply. Once the firefighting was over, I realized I should have put up my patch and comments more concisely. Also, thanks for keeping an open mind and adding your patch and ideas, and for the coding tips, too.

I too haven't had time to look enough into the references, patch and ideas everyone's provided, but I also think the patch I provided is an insufficient, stop-gap measure:

  • I also wondered how this would affect a Windows platform and I have no idea about that, and
  • I didn't think of the need to reset cron_busy to false if the server crashed, but I'd definitely want that, too.
  • I too still think there should be some Drupal GUI admin interface allowing authorized users to manually start, monitor and stop cron.php and any process(es) it starts (PHP, Apache threads, BASH scripts or other, as drumm suggested), accounting for settings in all layers of the WAMP/LAMP platform, where feasible, without tech-challenged "end-users" having to manually load and/or reload cron.php, the Drupal admin/ logs page and PHP system()-call files into several browser windows, jumping between them to try catch cryptic PIDs, process names and, if they appear to be causing problems, kill them (under Linux, at least).

At least that's what we had to do. To try manage cron.php and the processes it runs, we had to use a two PHP BASH-system-call scripts; "ps.php" to monitor all processes run by our shared-host account's userid, which of course must be manually refreshed in a browser and "kill.php" which can kill any of those processes if it/they that appear to be causing problems, by copy-pasting a process' PID from ps.php's output, appending it to kill.php's URL as, eg, http://example.com/protected-scripts/kill.php?pid=9999. (Both these scripts are in the attached zip file, just in case it isn't immediately obvious how to make them, as it wasn't to me.)

But this is too much to ask most of my clients or their staff to do. Instead, what they really needed is some simple cron-admin GUI/Web interface they can use to start, stop and check or watch the cron and its child processes if potential cron-related problems are suspected or expected, or if they simply want to invoke all (or, better, selected) Drupal cron hooks manually.

Similar suggestions have been made in other Drupal issue or support posts, but I can't remember any leading to an solution usable by the "average Drupal end-user" among any of my clients at least.

pwolanin’s picture

While it would be lovely to be able to monitor/kill the processes started by cron, is there any cross-platform way to do so?

I can more readily imagine a new feature where cron is invoked from an admin page, with the option to only invoke the hook_cron implementations of a subset of the installed modules.

pwolanin’s picture

While it would be lovely to be able to monitor/kill the processes started by cron, is there any cross-platform way to do so?

I can more readily imagine a new feature where cron is invoked from an admin page, with the option to only invoke the hook_cron implementations of a subset of the installed modules.

ADMarshall’s picture

I can more readily imagine a new feature where cron is invoked from an admin page, with the option to only invoke the hook_cron implementations of a subset of the installed modules.

Ya, that sounds even better. And better still would be something like poormanscron where one could set different cron intervals for different hook_cron calls. For example those for 15 min for aggregator and simplenews and a few hours for search-index updating.

In case it might help towards related goals in this thread, or just help other admins on Linux platforms understand or manage poormanscron-related runs, here's a small proggie I was using for as much (to run in a separate browser window/tab):

<?php
 
include_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

// If not in 'safe mode', increase the maximum execution time:
if (!ini_get('safe_mode')) {
  set_time_limit(240);
}
if (variable_get('cron_busy', false)) {
  $cronbusy = variable_get('cron_busy', false);
  print "Cron is currently <strong>running</strong>!<br />";
  $thisrun = variable_get('poormanscron_lastrun', 0);
  print "This poormanscron-run was to have started at " . date("Y.m.d-H\hi\ms\s", $thisrun) . "<br />";

}
else {
  $cronbusy = variable_get('cron_busy', false);
  print "Cron is <strong>NOT</strong> currently running!<br />";
}

print "<strong>The current time on the server is: " . date("Y.m.d-H\hi\ms\s", time()) . "</strong><br />";

$cronlast = variable_get('cron_last', time());
print "Last cron run <strong>completed</strong> at " . date("Y.m.d-H\hi\ms\s", $cronlast) . "<br />";

$nextrun = variable_get('poormanscron_lastrun', 0)
             + 60 * variable_get('poormanscron_interval', 60);
print "<strong>Next poormanscron to run at " . date("Y.m.d-H\hi\ms\s", $nextrun) . "</strong><br />";

?>
pwolanin’s picture

Since this issue is not moving, maybe it would help to focus on the critical issue and keep new features separate.

So, it seem to me that the critical issue is simply preventing cron from running if another instance is in progress, and doing appropriate cleanup in case of timeout, crash, etc.

To make good progress on this, it seem as though it would be useful to be able to reproducibly cause these faults. Timout is probably the easiest by setting PHP's max_execution_time to something short on a test site.

Crash could perhaps be simulated by restarting apache?

chx’s picture

Assigned: ADMarshall » chx
Status: Needs work » Needs review
FileSize
1.17 KB

This is a simple way to a) prevent multiple crons from running b) automatically expire the lock after a certain time. If you just use TRUE/FALSE and the code aborts then you are stuck. I am using a time based semaphor so after four minutes you can run cron.

beginner’s picture

Status: Needs review » Needs work

Eh, that sounds clever. I like this idea.
the variable con_busy is no longer used, so maybe we need an update to remove it.

the function returns TRUE after the run. Do we need to return FALSE or 'busy' in the first part: if() { return 'busy'} ?

Dries’s picture

Code looks good. Two nits:

1. Add some code comments explaining that we can't run cron.php on top of each other and the side-effects it causes.

2. Maybe change variable_set('cron_start', 0) to variable_del('cron_start')?

pwolanin’s picture

is the 4 minute timeout delay long enough? See the original posting at the top:

Since the newsletter takes hours to send once the simplenews.module invokes cron.php to send the a newsletter to any subscribers it has not yet been sent to, our re-loading of cron.php restarted the whole sending process

Maybe the timeout delay needs to be settable somewhere (even if only in settings.php)?

beginner’s picture

About the time delay, look at the code just above what would be patched:

  if (!ini_get('safe_mode')) {
    set_time_limit(240);
  }

Either safe mode is on, in which case cron is probably only allocated 30 s. to do its job, or it's off, and it is allocated 4 min.
So, chx' s patch makes sense in this regard.

Having said that, I don't know what would happen if a newsletter takes more than 30 s. or more than 4 min. to send. But simplenews.module allows to configure the number of newsletters to send at each cron run, which I guess solves the problem.

Steven’s picture

From PHP.net: The set_time_limit() function and the configuration directive max_execution_time only affect the execution time of the script itself. Any time spent on activity that happens outside the execution of the script such as system calls using system(), stream operations, database queries, etc. is not included when determining the maximum time that the script has been running.

This means that assuming cron never runs for more than 4 minutes is false.

A proper solution is to use register_shutdown_function(). This allows us to always reset the "cron_start" variable when cron stops, for any reason. Modules should already be using this when the tasks they do can take too long (like node search indexing does). We can even check the value of "cron_start" in the shutdown function to see if the shutdown is graceful or not.

The only case where cron could then still lock itself out is when the database connection is lost in mid processing, so the variable cannot be reset. It's probably a good idea to build in a long maximum delay (e.g. an hour) after which cron is automatically unlocked.

Also, the watchdog message doesn't make sense at all, even if it was written under the assumption that cron never runs longer than 4 minutes:

+  if (variable_get('cron_start', FALSE) + 240 > time()) {
     watchdog('cron', t('Last cron run did not complete.'), WATCHDOG_WARNING);
   }

This message is only true if 4 minutes have not yet passed since the last invocation of cron.php. So the correct message would be "Attempting to re-run cron while it is still running."

pwolanin’s picture

thanks for clarifying this- from previous reading I was very wary of the magic 4 minute setting.

Steven’s picture

Status: Needs work » Needs review
FileSize
2.49 KB

Here's a patch for HEAD which introduces the approach I outlined... Summary:

  • We prevent cron from running multiple times. There is a time limit of 4 minutes per cron run, though this only counts actual PHP processing time, not waiting for DB, I/O, ...
  • We gracefully shut down even if the time limit is hit (though we do inform the admin of this).
  • We assume that any cron run that takes longer than an hour is in fact stuck (e.g. db connection dropped).
drumm’s picture

Looks okay on initial reading of the code. (This is not a full review.)

beginner’s picture

Status: Needs review » Needs work

The patch looks good and seems to answer all the different concerns expressed above (shutdown function and proper handling in case of a crash).

My two points in #15 still stand, though.

Steven’s picture

Status: Needs work » Fixed

beginner: I added an update to delete 'cron_busy'. The function does not need to return FALSE because null is equivalent to false for boolean evaluations (if).

Committed to HEAD as there were no other objections.

Anonymous’s picture

Status: Fixed » Closed (fixed)
lazly’s picture

Version: x.y.z » 4.7.9
alimosavi’s picture

Version: 4.7.9 » 6.17
Assigned: chx » alimosavi
Status: Closed (fixed) » Active

I get this error too , My Db size is > 1GB ,

pwolanin’s picture

Version: 6.17 » 4.7.9
Assigned: alimosavi » Unassigned
Status: Active » Closed (fixed)