Hi,
This isn't a bug report per-se, but I was wondering if you have any thoughts on a better way to keep datasync_run.php running, other than having a very frequent cron invocation.
I see great potential for Datasync to be a generic module for doing tasks without blocking the user, for instance sending out a bunch of invite emails, processing some recently uploaded files, or (in my case) handling high levels of incoming SMS load by passing them to Datasync as jobs to be run at a controlled concurrency level.
The problem is the daemon shuts down after an hour (or when it runs out of memory), and won't start again until cron runs. On our server cron runs once every half hour, which means that there can be half-hour outages where datasync isn't doing anything. This is bad for interactive stuff like the SMS service I mentioned, where people are waiting for the response. A minute or two is not so bad, but half a hour is unacceptable.
The way I solved this before I discovered datasync was to force-spawn a new daemon process before the last one quits. In Datasync's case, that would be achieved by breaking out of the while(true) loop and having a bit of code that removes that process's pid, then spawns a new one before it exits. This means that the daemon is quitting and reloading (which helps with memory leak issues, static caches, etc), but there isn't a gap where the daemon isn't available to process jobs.
What do you think? If you like, I can attach a patch to do this (even making it optional to enable if you like?), or if you think this isn't important or have a better idea, I'm happy to hear it...
Thanks
| Comment | File | Size | Author |
|---|---|---|---|
| #15 | datasync-respawn2.patch | 2.75 KB | andrewlevine |
| #1 | datasync-respawn.patch | 1.67 KB | neilnz |
Comments
Comment #1
neilnz commentedIn fact, here's a working patch that implements my idea. Seems to work for me.
Cron still boots it up when it's not running, but if it's not killed by signal, it'll rebootstrap itself.
Comment #2
andrewlevine commentedI like this patch a lot. If accepted, I don't think this needs to be an option as it probably works better than cron anyway. I do think the 1 hour kill time should be an option or at least a ds_variable_get (regardless of this patch). I'm going to try to get Pierre to review this to make sure I'm not missing anything but then I will commit. Thanks for all the issues Neil.
Comment #3
andrewlevine commentedAlso, a reminder to myself (or someone else if they wany to do it :) : this change might require changes to README.txt and INSTALL.txt
Comment #4
pounardIsn't it a bit too much to make datasync_run.php doing active loop undefinitely? I mean, PHP is not really made to run in CLI, it could be a performance killer.
I'll review the patch as soon as I have free time (not for tonight I think).
Comment #5
andrewlevine commentedThe patch does not loop indefinitely. Instead of restarting on cron, it creates another instance of the datasync script right before the original one closes.
Comment #6
pounardYes I understood, but it acts as a deamon respawning itself then, right? Or maybe I misunderstood
Comment #7
neilnz commentedThe behaviour is very similar to what it was doing before. It'll still enter a sleep loop with 20 seconds between invocations, which should mitigate performance problems. The only difference is previously it would exit and hope that cron ran again within a few minutes and restarts the daemon from scratch.
The script was designed to exit after one hour, and the docs call for cron running at last once an hour. The reason for this is so the script dies just as cron runs and starts the new one. For me the script was dying a few seconds too late for the half-hourly cron I had, so it was quitting for half an hour.
I don't think there should be any issue with a daemon running from CLI that sleeps until there's work to do. It's possible the efficiency could be raised even more by switching from a poll loop to a UNIX socket with socket_select() blocking the loop until something is prodded to its socket to tell it to poll. This may have ramifications for the datasync scheduler though, which I believe relies on the post hook running every few seconds.
It'd be nice if the daemon could just run continuously without the hourly exit, but because Drupal isn't designed for long execution time, it tends to eat up memory with static caches until it shuts down. That (I believe) is why it does this teardown-restart cycle.
One issue with the respawning daemon though is that if the module is disabled, it's possible the daemon will continue to run indefinitely. It would probably be a good idea to put a check in the top of the script somewhere that makes sure the datasync module is still enabled?
Comment #8
pounardI'd prefer to make it run with system cron at regular interval (like Drupal cron) rather than have it running as a respawning deamon. But the respawning stuff is good as a fallback system when errors happens when long job are running.
Off-topic note: The problem with Drupal eating memory is when you run really massive jobs. I noted some problems, there is a really annoying memory leak in node_delete() but there isn't in node_save() (I did saved and updated 300,000 nodes without any memory limit problem, but node_delete() breaks after 2,000 nodes deleted with a memory limit of 256mo).
Comment #9
andrewlevine commentedPierre,
Why would you prefer to run with cron? I don't see the drawback. With this patch, the daemon is still spawned at cron, exactly how it is now. The only difference is that the daemon respawns another daemon right before it is about to exit. This means if you run cron.php every hour, 30 mins, 1min, that datasync won't be down needlessly waiting for cron to restart the daemon.
Comment #10
pounardYes, I aggree. I think it depends on one's need, I see two differents use cases:
I didn't mean to say this patch is bad, I would rather say it should be an optional feature.
Comment #11
andrewlevine commentedPierre,
DataSync runs as a daemon now without this patch. It would only change the method the daemon is kept alive. In your setup, do you let datasync die after an hour and then only turn it on with cron when you want to run it?
I don't mean to argue, I just want to understand your use-case to determine whether this makes sense to include for everyone. At the moment, I don't see why adding this feature would negatively affect any use-case.
Andrew
Comment #12
pounardOh you made a good point; I misunderstood this part of DataSync running part. You juste clarified it to me. I though DataSync were only living as long has it have jobs runnings.
So my comments are off topic then, forget it.
Comment #13
andrewlevine commentedPierre, I appreciate you taking a look at this and questioning it :). I will try to get this patch in shortly and then release a beta
Comment #14
pounardHehe, again, sorry for the misunderstanding :) At least I understood this part of DataSync (on which I didn't get interested before).
Comment #15
andrewlevine commentedcommitted. I modified the patch a little but the idea is the same. I've attached the patch in case anyone needs it. Thanks neilnz and pounard.