In some (reasonable!) mysql settings, big backups fail silently. This is because the mysql server closes the connection after some idle time. This is good, otherwise it may be blocked by zombie php processes. But: A big file backup behaves the same way (a DB backup otoh hits the database now and then ;-)).
When the mysql server closes, no dblog will be written any more about the problem.
Who is affected?
$ drush sqlq "show variables like 'wait_timeout'"
wait_timeout 180
Anyone whose wait_timeout is lower that a large backup may take.
Please,
* always use drush bam-backup to check if you really have "MySQL server has gone away" problem
* keep this issue focused on this problem (rather than other timeout problems)
* always report back with full drush output
With drush we get something like this:
$ drush bam-backup files backup_files backup_settings
PDOStatement::execute(): MySQL server has gone away database.inc:2227 [warning]
PDOStatement::execute(): Error reading result set's header database.inc:2227 [warning]
exception 'PDOException' with message 'SQLSTATE[HY000]: General error: 2006 MySQL server has gone away' in [error]
.../includes/database/database.inc:2227
...
#8
.../sites/all/modules/backup_migrate/includes/filters.inc(109):
backup_migrate_filters_invoke_all('post_backup', Object(backup_file), Object(backup_migrate_profile))
Possible resolutions
a) on every request increase wait timeout (add sql like "set wait_timeout = 600")
b) only on cli requests increase wait timeout
c) make BAM increase wait timeout on backups
d) make BAM ping the DB now and then
(this may not be possible when e.g. a tar process is forked)
Proposed: c)
Comment | File | Size | Author |
---|---|---|---|
#52 | backup_migrate-n2829492-52.patch | 4.28 KB | DamienMcKenna |
Comments
Comment #2
geek-merlinComment #3
geek-merlinComment #4
geek-merlinComment #5
geek-merlinSetting major as this causes silent data loss.
Comment #6
geek-merlinPatch flying in that should fix the issue. Untested yet.
Comment #7
geek-merlinFixed method name typo in the patch.
Comment #8
geek-merlinComment #9
geek-merlinNote: the standard setting of 1200s=20m may not be enough for very large file amounts (on my server >3gb).
Comment #10
Kris77 CreditAttribution: Kris77 commentedWill try your patch...
Comment #11
geek-merlinTake this. Fixed SQL syntax.
Comment #12
geek-merlinWow. placeholders don't work here. Also fixed that and it works on my box.
Comment #13
devad CreditAttribution: devad commentedThese are just some typical log error messages caused by low mysql wait_timeout settings. Posting for search purpose.
The first one is caused by cron backup run and the second one by manual backup run.
Comment #14
geek-merlinThanks @devad for the googlefood, yes messages like that are to be expected.
Did the patch work for you?
Comment #15
devad CreditAttribution: devad commentedHi @Alex.
I have used part of code from MySQL Wait Timeout module to fix my problem. I have put this line of code into my settings.php and it worked for me.
$databases['default']['default']['init_commands'] = array('SET SESSION wait_timeout = 300');
The module has Default, Drush and Cron parts, but i didn't need all that stuff, so I just used part from modules "Default" section and squeezed it into one line.
Comment #16
geek-merlin@devad: If you want to help, test this patch and report back.
Comment #17
devad CreditAttribution: devad commented@Alex: I can do tests via symplytest.me
However this test is rather specific due to large database need and it needs local testing engine which I am not good at.
Comment #18
pietrocap CreditAttribution: pietrocap commentedHi devad,
where did you put exactly the code "$databases['default']['default']['init_commands'] = array('SET SESSION wait_timeout = 300');" in your settings.php?
Comment #19
devad CreditAttribution: devad commented@pietrocap
I have put it at the very end of my settings.php file.
But I suppose that it can be anywhere behind $database array definition.
Comment #20
Kris77 CreditAttribution: Kris77 commentedHi @Axel.
I tried your patch but do not seem to work.
Just launch the bakcup remain on the page for a long time without doing anything.
Comment #21
pietrocap CreditAttribution: pietrocap commentedPath doesn't work for me. Same as #20
Comment #22
geek-merlin@Kris77, @pietrocap:
Note that big backups hit several time limits, and this patch only raises the DB time limit.
To overcome the other time limits (mainly server and php) it is best to use drush, as the webserver is not made for long-running processes.
To gather more insight into this, please test by running the backup from drush and report what you get, with something like:
Comment #23
RickJ CreditAttribution: RickJ commentedJust a note to say thanks for this fix - I'd been nearly tearing my hair out before I found this issue!
FYI, my scenario is nightly backups under cron, which worked fine, but then cron started failing to finish and clean up. This left the cron job flagged as still running, and the site stuck in maintenance mode.
The clue came when it occurred to me that the steady increase in site data had taken the backup time just over 3 mins, so it looked like a timeout issue - and led here. I considered just tweaking settings.php, but that affects all connections which is probably not desirable. This patch works perfectly.
Comment #24
geek-merlin#23:
> ... This patch works perfectly.
Yes it was hard to debug and good that it helped you too.
Comment #25
couturier CreditAttribution: couturier as a volunteer commentedSo @axel.rutz it sounds like this patch is ready to be ported?
Comment #26
Alex Andrascu CreditAttribution: Alex Andrascu at Intellix commentedIt would be good if we can have another pair of eyes reviewing this patch before setting it to RTBC. Let’s keep it on the backburner for now.
Comment #27
geek-merlin#26 alex:
> It would be good if we can have another pair of eyes reviewing this patch before setting it to RTBC.
Valid requrest. You feel like code review or live test?
For the records: We use it on currently 13 sites.
Comment #28
couturier CreditAttribution: couturier as a volunteer commentedI propose that we move forward with this patch. It solves a related timeout issue that users are still experiencing as of mid-November 2017, and @axel.rutz has thoroughly tested it on multiple sites.
Comment #29
RickJ CreditAttribution: RickJ commentedThis patch is still needed to prevent SQL timeouts, but the existing patch file won't apply to 7.x.3.5.
I've updated it to work with the current release.
Comment #30
geek-merlinIt looks we've had some pairs of eyes on that so tentatively setting RTBC.
Comment #32
RickJ CreditAttribution: RickJ commentedHiding old outdated patch file - does that stop it being tested?
Comment #33
geek-merlinThis is a straight reroll that is needed after #2940451: Unlimited max_execution_time treated as 0 seconds got in.
Patch can be applied with git am and contains 2 commits for improved niceness.
Comment #34
DamienMcKennaWhy is the first if() statement needed in this part?
I think it'd be fine to get rid of the outer if() statement.
Comment #35
RickJ CreditAttribution: RickJ commentedIt's not my patch, but that if() condition does appear a bit superfluous. It's hard to see what difference the source type makes to the logic of extending the SQL timeout to match the page timeout.
Comment #36
geek-merlinIf we have a DB backup, nothing needs to be done, no DB timeout will be hit as the DB is busy all the time.
We can of course bikeshed about code beauty here ;-).
Comment #37
DamienMcKennaIf you're backing up a giant database isn't it possible it might hit a timeout in _dump_table_data_sql_to_file()? There are certainly a number of issues related to exporting giant databases.
Comment #38
geek-merlinOK, that's a point. As raising the wait_timeout surely won't harm, feel free to drop the "if"!
Comment #39
geek-merlinOK, removed the surrounding if, rest unchanged.
Comment #40
RickJ CreditAttribution: RickJ commentedIt occurs to me that this feature is incomplete, as there's no admin facility to configure a value for the new setting backup_migrate_backup_max_time. So the value is in effect hard-coded as 1200.
I was considering adding it to the admin UI, but it would be the only global setting. All other settings are on a per-profile settings basis. For consistency this should be too. I'm happy to go ahead with the coding, but what's the consensus view?
Comment #41
DamienMcKennaAt the very least there should be a mention in the README.txt file about it.
Comment #42
RickJ CreditAttribution: RickJ commentedOK, I thought I'd have a go at doing it properly!
This patch will apply on top of the one in #39, I've uploaded it for sanity-check of the concept. In my testing it works fine.
It adds a "Max backup time" value in the backup Advanced settings, which applies both for one-off settings and profiles. The run-time code uses this value instead of looking for a non-existing settings variable.
Also for completeness, I've done the same for Restore profiles, with corresponding code when Restore is executed.
Let me know what you think. If it's OK I'll merge it with patch #39.
Comment #44
geek-merlin> It occurs to me that this feature is incomplete, as there's no admin facility to configure a value for the new setting backup_migrate_backup_max_time.
Just to note: backup_migrate_backup_max_time exists since 6.x.
That said, nothing against documenting it.
Comment #45
RickJ CreditAttribution: RickJ commentedHi Axel
Yes, sorry, I mis-worded my comment, the variable backup_migrate_backup_max_time is not itself new in patch #39. However, the issue still remains that although the code was written to check the value of that variable, there is no UI to actually set it. And if there were, it would be a global value, contrary to all the other settings.
Please check out my patch #42, it creates a setting that works the same way as all the other settings and options. I should have marked it Do Not Test, as it only applies after your patch #39, so it fails to apply directly to the current dev build.
I also think it's important to have the same timeout logic for restore, because exactly the same issue could arise. A backup that failed without the DB timeout fix would also probably fail to restore for the same reason.
Comment #46
geek-merlinHey Rick,
this is classical feature creep, so i'd say:
* the original patch fixes the problem
* documenting the hidden variable, or even moving the setting like you started, is feature creep and belongs in a followup issue, to not block and bikeshed a tested problem solution
What i'd say in that followup issue:
* your code looks solid and well-done
* that said, i'd say it's too complicated, one global variable that is set high enuff for the DB size and speed is perfectly fine.
* and as this variable usually will be set in settings.php, i'd say, let's document it in README, and not add a visual setting (that will not work if overridden via settings.php!)
* that said, i guess we never had a support request about this as 1200s=20m is so high that raising it is a very exceptional issue.
tl;dr: let's discuss in a followup, and please don't over-engineer!
Comment #47
RickJ CreditAttribution: RickJ commentedHi Axel
Thanks for your comments. I'd already realised earlier that I'd failed to consider settings.php as a means of setting the variable. And as you say, it's unlikely anyone would need to increase it, but the facility to add it to settings.php is there if required.
So I'm happy to trash my over-engineered solution!
However, I do think the code to extend the database timeout should be added to the restore case. Restores are rare compared to backups, so it's likely no-one's hit the problem, but it seems to me that a large backup that required the increased timeout would also probably need it if restored.
I'll get to this later, and add a patch to README, and the bit of code for restore.
Comment #48
geek-merlinCaring for restores can not be wrong. I'll be happy to review this!
Comment #49
RickJ CreditAttribution: RickJ commentedI've added the code to handle restore. To avoid duplicate code I factored the timeout-tweaking stuff into a new function _backup_migrate_set_timeout(). This led me to look at the existing function _backup_migrate_check_timeout(), which makes an explicit check to see if the time has been exceeded.
This function did not take account of the increased timeout, and used only the PHP 'max_execution_time' value. This appears obviously wrong, so I've also changed this code to use the extended timeout value.
Also added a note to README.txt.
Comment #50
geek-merlinI reviewed the patch and everything is as you described, and everything you described totally makes sense.
So if someone(tm) manually tests backup and restore, we can finally get this RTBC.
Comment #51
geek-merlinStraight reroll to HEAD.
Comment #52
DamienMcKennaA small improvement to the README.txt file, minor tweaks to the PHP code.
Comment #54
DamienMcKennaComment #56
DamienMcKennaCommitted.
Comment #57
geek-merlinYay! Thank ya all!
Comment #59
adam1 CreditAttribution: adam1 commentedRunning latest BAM 7.x-3.6, I regulary get below error messages.
Since in this version, the issue should have been fixed, I 'm asking why this is happening.