Applies to both Drupal 4.6.x & 4.7.0. While (inadvisedly) completing delivery 4,000+ emails via the simplenews.module and cron.php, we discovered that the apparent attempt in cron.php to prevent cron.php from running on top of itself while a previous run of cron.php is still running, does not work.
Since our (shared) hosting service has a once-per-day limit on (real Linux) cron runs, we ran cron.php manually by loading its URL into a browser. But since the host does not provide ssh (shell) services and cron.php provides no feedback itself and the Drupal admin logs provide none until a run of cron.php has finished, we tried stopping and reloading cron.php a few times manually.
Since the newsletter takes hours to send once the simplenews.module invokes cron.php to send the a newsletter to any subscribers it has not yet been sent to, our re-loading of cron.php restarted the whole sending process from the next subscriber address waiting as another server process thread while the old thread was still running causing overload on shared host, leading our hosting service to shut our site's account down until the problem was resolved and many, many of our subscribers receiving our (ridiculously large) newsletter many times each. Hence I've labelled this issue "critical".
While I am not 100% sure there might be some reason for cron.php to work this way, both its code and performance Drupal's admin logs suggest this is a bug.
In cron.php, before any modules' cron handlers are invoked, the global variable "cron_busy" is first checked to see "if the last cron run completed". If it is not set, it is give a default value of "false".
If cron_busy is true, cron.php sends this dubious message to the Drupal admin ("watchdog") logs: "Last cron run did not complete".
But then it goes out of the if
control structure and always runs this code, whether an earlier cron.php run is still running or not(!?!):
// Iterate through the modules calling their cron handlers (if any):
// That is, it runs all modules' cron handlers!!! - ADM
module_invoke_all('cron');
// Clean up
variable_set('cron_busy', false);
// The implication of this is that if a second run of cron.php is shorter than an earlier one,
// cron.php will then set cron_busy to "false" BEFORE THE EARLIER RUN COMPLETED
// and then send this to the admin logs:
watchdog('cron', t('Cron run completed'));
// But an earlier cron.php run might still be running
// and there is no indication anywhere of which cron run just completed.
The attached zip file includes a patch I believe fixes this apparent bug (simply by moving the above code into the else
part of the if
control structure at issue) as well as cron.test01.php and cron.test02.php which use this patch and can be used to test the control logic against Drupal's admin logs (and/or under Drupal 4.6.x, the current value of cron_busy using the variables.module), without actually running Drupal's cron handlers (and with timing feedback from cron*.php).
Note: this is the first patch I've created under Windows and I'm not sure if it was done right. It didn't seem to work with the GNUWin32 patch program using the Drupal handbooks instructions.
Recommended test method:
* While no crons are running, run cron.test01.php (which basically just runs "sleep(5);") and the logs should read "Cron run completed".
* Do the same first using cron.test02.php (which basically just runs "sleep(30);"), then run cron.test01.php in another browser window while the first cron is running. After both cron*.php files have finished the logs should have entries like this:
cron 2006-06-27 15:04 Cron run completed adm details
warning cron 2006-06-27 15:04 Cron Running: Ignored attempted rerun. adm details
I believe this is a better (if not "correct") way for cron.php to work. The same test can be re-run changing the moving the above code from cron.php back outside the if
control structure, as is the case for the current cron.php in both Drupal 4.6.x and 4.7.0, and both the effective results and admin log entries will be much less intuitively correct, if not simply wrong, since one cron.php run will run over top of an earlier one.
Note: This also affects poormanscron.module.
Comment | File | Size | Author |
---|---|---|---|
#21 | cron_4.patch | 2.49 KB | Steven |
#14 | cron_start.patch | 1.17 KB | chx |
#9 | ps-kill-php-scripts.zip | 705 bytes | ADMarshall |
#4 | cron-rerun.diff.txt | 1.05 KB | beginner |
#3 | cron.php-v4.6.6_1.28.patch | 1.19 KB | ADMarshall |
Comments
Comment #1
beginner CreditAttribution: beginner commentedCan you just post a patch?
http://drupal.org/patch
Comment #2
beginner CreditAttribution: beginner commentedComment #3
ADMarshall CreditAttribution: ADMarshall commentedYa, I RTFM'd already and have done diff-patch a gazillion times under Linux. But the GNUWin32 patch didn't seem to work the same way (under Windows).
The long explanation, convoluted explanation and test scripts were provided because I couldn't quite believe there could really be such a bug in cron.php and felt obliged to prove it -- and I didn't have time to make the explanation shorter and clearer.
I've extracted the patch out of the zip file for you, Beginner, and attached it to this "follow-up". Now it's your turn to RTFM... ;)
Btw, the patch is named "cron.php-v4.6.6_1.28.patch" because it was based on the v. 1.28 version of cron.php that came with Drupal 4.6.6. But it seems to apply to both the 4.7.0 and cvs versions as well.
Comment #4
beginner CreditAttribution: beginner commentedFirst of all, +1 on correcting this bug.
I noticed this problem the same way as you did: by sending our newsletter to 1000+ emails. Each time I would run cron.php again would mean that the people would receive the newsletter one more time, i.e. many people received multiple copies of the newsletter.
What's more, cron.php can be run by anybody. There is never a guaranty that it won't be run twice, even if you know what you are doing: search engines may have found a link to cron.php somewhere and try to index the page at the same time as your scheduled run in taking place.
Thanks for providing test files, in your zip.
The code needs some very minor adjustments, though:
1) you insert some new lines where you shouldn't.
2) comments must be indented, too. (// Clean up).
3) remove your comment block, with your name and email: there is no mention of the author of any patch anywhere in the code, and I guess this patch will not get committed with it.
4) improper use of capitalization in the watchdog message.
Now, an important question:
What if because of a severe failure (server crash?) the cron is no longer running, but the variable cron_busy is still set as TRUE?
Maybe we need to add a function in admin/settings to allow to reset the variable as FALSE in cases like.
I attach a new patch.
Comment #5
pwolanin CreditAttribution: pwolanin commentedIn search_cron(), a shutdown function is registered which does some cleanup if the function fails or times out. Can this be done in cron.php? Seems like this would be the approriate way to set the watchdog error "cron run did not complete" as well as resetting the 'cron_busy' variable?
Comment #6
beginner CreditAttribution: beginner commentedInteresting idea.
http://php.net/register_shutdown_function
the comments mention differences between *nix and windows platforms.
I haven't had time to study the question in details, though.
Comment #7
drummI'm concerned about cron runs which exceed the maximum execution time set by PHP.
I'm not sure, but registering a shutdown function still might not execute if the script exceeds any alotted time limit (especially since it can be an Apache-level time limit instead).
I think the best solution might be to add better locking to simplenews. (Or a queue API for Drupal so modules won't have to think about making a proper queue.)
Comment #8
Steven CreditAttribution: Steven commentedI know from the search_cron() changes that register_shutdown_function() does allow you to get around PHP's time limit. Apache's limit, that I don't know.
Comment #9
ADMarshall CreditAttribution: ADMarshall commentedFirst, apologies to beginner for my surly earlier reply. Once the firefighting was over, I realized I should have put up my patch and comments more concisely. Also, thanks for keeping an open mind and adding your patch and ideas, and for the coding tips, too.
I too haven't had time to look enough into the references, patch and ideas everyone's provided, but I also think the patch I provided is an insufficient, stop-gap measure:
At least that's what we had to do. To try manage cron.php and the processes it runs, we had to use a two PHP BASH-system-call scripts; "ps.php" to monitor all processes run by our shared-host account's userid, which of course must be manually refreshed in a browser and "kill.php" which can kill any of those processes if it/they that appear to be causing problems, by copy-pasting a process' PID from ps.php's output, appending it to kill.php's URL as, eg,
http://example.com/protected-scripts/kill.php?pid=9999
. (Both these scripts are in the attached zip file, just in case it isn't immediately obvious how to make them, as it wasn't to me.)But this is too much to ask most of my clients or their staff to do. Instead, what they really needed is some simple cron-admin GUI/Web interface they can use to start, stop and check or watch the cron and its child processes if potential cron-related problems are suspected or expected, or if they simply want to invoke all (or, better, selected) Drupal cron hooks manually.
Similar suggestions have been made in other Drupal issue or support posts, but I can't remember any leading to an solution usable by the "average Drupal end-user" among any of my clients at least.
Comment #10
pwolanin CreditAttribution: pwolanin commentedWhile it would be lovely to be able to monitor/kill the processes started by cron, is there any cross-platform way to do so?
I can more readily imagine a new feature where cron is invoked from an admin page, with the option to only invoke the hook_cron implementations of a subset of the installed modules.
Comment #11
pwolanin CreditAttribution: pwolanin commentedWhile it would be lovely to be able to monitor/kill the processes started by cron, is there any cross-platform way to do so?
I can more readily imagine a new feature where cron is invoked from an admin page, with the option to only invoke the hook_cron implementations of a subset of the installed modules.
Comment #12
ADMarshall CreditAttribution: ADMarshall commentedI can more readily imagine a new feature where cron is invoked from an admin page, with the option to only invoke the hook_cron implementations of a subset of the installed modules.
Ya, that sounds even better. And better still would be something like poormanscron where one could set different cron intervals for different hook_cron calls. For example those for 15 min for aggregator and simplenews and a few hours for search-index updating.
In case it might help towards related goals in this thread, or just help other admins on Linux platforms understand or manage poormanscron-related runs, here's a small proggie I was using for as much (to run in a separate browser window/tab):
Comment #13
pwolanin CreditAttribution: pwolanin commentedSince this issue is not moving, maybe it would help to focus on the critical issue and keep new features separate.
So, it seem to me that the critical issue is simply preventing cron from running if another instance is in progress, and doing appropriate cleanup in case of timeout, crash, etc.
To make good progress on this, it seem as though it would be useful to be able to reproducibly cause these faults. Timout is probably the easiest by setting PHP's max_execution_time to something short on a test site.
Crash could perhaps be simulated by restarting apache?
Comment #14
chx CreditAttribution: chx commentedThis is a simple way to a) prevent multiple crons from running b) automatically expire the lock after a certain time. If you just use TRUE/FALSE and the code aborts then you are stuck. I am using a time based semaphor so after four minutes you can run cron.
Comment #15
beginner CreditAttribution: beginner commentedEh, that sounds clever. I like this idea.
the variable con_busy is no longer used, so maybe we need an update to remove it.
the function returns TRUE after the run. Do we need to return FALSE or 'busy' in the first part: if() { return 'busy'} ?
Comment #16
Dries CreditAttribution: Dries commentedCode looks good. Two nits:
1. Add some code comments explaining that we can't run cron.php on top of each other and the side-effects it causes.
2. Maybe change
variable_set('cron_start', 0)
tovariable_del('cron_start')
?Comment #17
pwolanin CreditAttribution: pwolanin commentedis the 4 minute timeout delay long enough? See the original posting at the top:
Maybe the timeout delay needs to be settable somewhere (even if only in settings.php)?
Comment #18
beginner CreditAttribution: beginner commentedAbout the time delay, look at the code just above what would be patched:
Either safe mode is on, in which case cron is probably only allocated 30 s. to do its job, or it's off, and it is allocated 4 min.
So, chx' s patch makes sense in this regard.
Having said that, I don't know what would happen if a newsletter takes more than 30 s. or more than 4 min. to send. But simplenews.module allows to configure the number of newsletters to send at each cron run, which I guess solves the problem.
Comment #19
Steven CreditAttribution: Steven commentedFrom PHP.net: The set_time_limit() function and the configuration directive max_execution_time only affect the execution time of the script itself. Any time spent on activity that happens outside the execution of the script such as system calls using system(), stream operations, database queries, etc. is not included when determining the maximum time that the script has been running.
This means that assuming cron never runs for more than 4 minutes is false.
A proper solution is to use register_shutdown_function(). This allows us to always reset the "cron_start" variable when cron stops, for any reason. Modules should already be using this when the tasks they do can take too long (like node search indexing does). We can even check the value of "cron_start" in the shutdown function to see if the shutdown is graceful or not.
The only case where cron could then still lock itself out is when the database connection is lost in mid processing, so the variable cannot be reset. It's probably a good idea to build in a long maximum delay (e.g. an hour) after which cron is automatically unlocked.
Also, the watchdog message doesn't make sense at all, even if it was written under the assumption that cron never runs longer than 4 minutes:
This message is only true if 4 minutes have not yet passed since the last invocation of cron.php. So the correct message would be "Attempting to re-run cron while it is still running."
Comment #20
pwolanin CreditAttribution: pwolanin commentedthanks for clarifying this- from previous reading I was very wary of the magic 4 minute setting.
Comment #21
Steven CreditAttribution: Steven commentedHere's a patch for HEAD which introduces the approach I outlined... Summary:
Comment #22
drummLooks okay on initial reading of the code. (This is not a full review.)
Comment #23
beginner CreditAttribution: beginner commentedThe patch looks good and seems to answer all the different concerns expressed above (shutdown function and proper handling in case of a crash).
My two points in #15 still stand, though.
Comment #24
Steven CreditAttribution: Steven commentedbeginner: I added an update to delete 'cron_busy'. The function does not need to return FALSE because null is equivalent to false for boolean evaluations (if).
Committed to HEAD as there were no other objections.
Comment #25
(not verified) CreditAttribution: commentedComment #26
lazly CreditAttribution: lazly commentedComment #27
alimosavi CreditAttribution: alimosavi commentedI get this error too , My Db size is > 1GB ,
Comment #28
pwolanin CreditAttribution: pwolanin commented