Problem/Motivation
Original report
Watchdog wid reaches maximum integer storeable
To reproduce: have a site which generates a large number of Notices and Warnings (eg. uninitialised variables, unset array keys) per page load, and which receives a very large amount of traffic.
Expected behaviour: dblog automatically limits the number of log messages to the specified value, site continues to function without issue.
Actual behaviour: because dblog never resets the wid auto_increment value, the wid eventually reaches 2147483648. At this point the site suffers a White Screen of Death; all requests fail with a 500 code but no errors are logged in the server logs (as is usual with such errors) and altering error logging does not produce any on-screen errors. (Probably because the attempt to log the DB exception raised fails in its own right.)
Detection of issue: the only ways to find out what is happening are a) to know this is a possible issue and check the current value of the auto_increment, or b) to use a debugger to step through the code line-by-line until a notice is generated.
Possible fixes or remediation:
1. Have dblog output the database error to the server logs, to screen, or somewhere accessible to developers.
2. Have dblog be aware of the maximum value an integer can take in the database used, and refrain from attempting to add any more entries once the wid gets close to that maximum.
3. Have dblog periodically reset all extant wids to start at 1, and reset the auto_increment.
Issue was observed in Drupal v7.67, but definitely affects all v7 variants, and may well affect v8 as well.
Steps to reproduce
Proposed resolution
Changing the primary key value from a regular serial field to a big serial field. The maximum value that could be stored was 2147483647 and will become 9223372036854775807.
Remaining tasks
Postponed on #3191391: Schema::changeField() has bug when changing regular serial field to big serial field
User interface changes
API changes
Data model changes
Release notes snippet
Comment | File | Size | Author |
---|---|---|---|
#79 | interdiff_76-79.txt | 2.67 KB | Arantxio |
#79 | 3081144-79.patch | 4.17 KB | Arantxio |
Comments
Comment #2
cilefen CreditAttribution: cilefen as a volunteer commentedComment #3
cilefen CreditAttribution: cilefen as a volunteer commentedThis could technically happen in any table, but watchdog and cache tables are most susceptible.
Comment #4
Berdircache tables don't have autoincrement primary keys.
Comment #5
cilefen CreditAttribution: cilefen as a volunteer commentedNoted!
Comment #6
geek-merlinHuh, this is tricky. According to the docs the counter can't be reset to a value smaller than the max.
So we have to either truncate the table or renumber it (e.g. delete and recreate the ID column, plus some voodoo to keep order).
Comment #8
catchDo we really, really need autoincrement on the watchdog table? What about dropping the column?
Comment #9
BerdirHow would we identify a single log message then and e.g. view it in the site?
I'm not so sure about the severity here. Yes, it is ugly, but it's also pretty easy to fix, just the clear log confirmation form/action does a TRUNCATE query, that should afaik reset the ID's, so if you run that when it happens then you lost your recent logs, but everything should work again then.
One strange thing is that the wid is apparently signed, so we're wasting 50% of the space we could have.
But even then, it's still quite a big number, and you have a site where this happens, then you should probably fix that so many notices are logged, because both having so many php notices/warnings and logging them into the database is a serious performance issue. Or as a quickfix, use syslog instead of the database.
While personally never saw a big issue with it in our sites, it has been the recommendation for a long time to disable dblog on a high-performance or even any production site and use something else.
So IMHO, I don't think this even qualifies as major yet alone a critical?
Comment #10
catchWe might have to use uuid instead of autoincrement ID. Don't love that really either.
So, while this is true we're talking about sites that are running for multiple years.
Comment #11
Berdir> So, while this is true we're talking about sites that are running for multiple years.
Sure. But unless I made a mistake, if a site logs 10 log entries per second then that's still ~7 years until you reach that limit. Twice as much if we change it to an unsigned integer. At that point, I'm pretty sure that it would be a serious performance issue long before this happens, so you have bigger issues?
If you keep 100k log records, then you can keep your log records for ~3h, at that point, why have dblog enabled at all?
Comment #12
daffie CreditAttribution: daffie commentedI do not think that truncating the whole watchdog table is a good idea.
You want to throw out the old messages, but keep the newer ones.
The problem with that is that the autoincrement counter cannot be reset.
Maybe it would be a good idea to change the autoincrement key to a key that just generates a unique value,
uniqid()
.Then there will be no more maximum value that can be reached.
Comment #13
geek-merlinHere's a SQL fiddle how this can work IF we want this.
http://sqlfiddle.com/#!9/19f165/1
Comment #14
geek-merlinThat said, i'd rather make the WID a bigint.
Comment #15
catchThe one problem with uniqid() is that autoincrement currently guarantees that watchdog messages can be ordered chronologically - the timestamp doesn't since obviously things can be logged in the same millisecond.
Comment #16
daffie CreditAttribution: daffie commentedGets then go for the bigint solution.
Comment #18
mindbet CreditAttribution: mindbet as a volunteer commentedAttached is a patch for suggestions from geek-merlin and daffie.
'wid' => [
'type' => 'serial',
'not null' => TRUE,
+ 'size' => 'big',
'description' => 'Primary Key: Unique watchdog event ID.',
],
Comment #19
daffie CreditAttribution: daffie commentedComment #20
BerdirSetting to needs review to let the tests run. Update tests are going to fail, this needs a corresponding update function.
Comment #21
daffie CreditAttribution: daffie commentedThe patch needs an upgrade path, so that the size of the wid column of the watchdog table in existing sites is changed to big.
Comment #22
BerdirAnd we also need an explicit test then, because as a non-breaking database structure change, nothing picks this up as a problem right now.
Comment #23
BerdirComment #24
phenaproximaThat was really friggin' tricky to write a test for, but I tried. I'm not sure this is exactly what we want, because even with the patch in #18, the test still fails for me. So either the test I wrote is flawed (very likely), or the bigint solution won't work. Thoughts?
Comment #26
geek-merlinSo #24 is the test only which fails as it should.
(Yay! we can use anon classes now, was not aware of this!)
If you post test-plus-fix patch, maybe the failure messages give someone a hint.
From a quick glance, it looks to me that the anonymous class does not get
$decorated
set (but maybe that's my lack of deep phpunit groking).Comment #27
BerdirWhat I was asking for is an upgrade path test, which should be trivial :). That's certainly a very creative test, but yes of course, the current patch isn't going to fix anything, it's just going to make it happen later.
I still think that we don't have to do more to fix it and also don't think this is critical :) An alternative option would be to do a requirements check that checks if we're getting close to the limit and recommends to empty the table *before* it happens.
If we start to replace autoincrement (ish) fields because of this, where does it stop? it's probably the most likely table to happen, but it's not impossible for others either.
Comment #28
phenaproximaHere's a patch combining the change in #18 with the test from #24. Let's see what it does. Also tagging for an upgrade path test per #27.
Comment #30
phenaproximaOkay, here's an attempt at a tested update path.
This should be probably be signed off on by someone who's more knowledgeable than me when it comes to databases, since it relies on PHP_INT_SIZE and assumes that a "normal" integer column will not contain a 64-bit value (which is likely incorrect on SQLite, for a start).
Comment #31
phenaproximaFor the record, I agree with this:
I don't know what an alternative would be, but making the column support big integers is just kicking the can down the road, and may very well fail at a certain scale.
Why, exactly, can't we switch to using UUIDs?
Comment #33
mindbet CreditAttribution: mindbet as a volunteer commentedThank you for letting me chime in; I hope I don't muddy the waters further.
Cleanest solution would be to log to watchdog using microtime()
and not use an autoincrement primary key (wid).
We would need to change to pruning function in dblog_cron so that it prunes
based on a time span rather than a number of records, but that might actually
be conceptually clearer for users.
The issue of a INT database primary key reaching maximum integer is possible,
but rare.
2^31 log entries at 1k per entry is 2 TB of data -- most hosts would stop you
long before you reached that.
But if you were hosting on AWS, and had not set a database size limit, and
if you were the victim of DDOS attack, you could possibly reach autoincrement max.
Alternate approach:
In DbLog.php, catch the database error and turn off the dblog module
\Drupal::service('module_installer')->uninstall(['dblog']);
Comment #34
Berdir> 2^31 log entries at 1k per entry is 2 TB of data -- most hosts would stop you long before you reached that.
> But if you were hosting on AWS, and had not set a database size limit, and if you were the victim of DDOS attack, you could possibly reach autoincrement max.
Deleting records doesn't reset the ID, so you don't actually have 2 TB of data, you could reach that with the limit set to 1000.
Comment #35
catch@phenaproxima the autoincrement gives us a definite sequence to sort by, for debugging we need to the sort to be more accurate than per-second.
As @mindbet mentions in #33 another option might be to log with microtime() but we'd need to make that change at the same time and update any sorts.
Comment #36
mindbet CreditAttribution: mindbet as a volunteer commentedHere is an updated patch to convert the watchdog primary key (wid) to bigint.
Comment #37
mindbet CreditAttribution: mindbet as a volunteer commentedComments on approach:
It is easily possible to add 25 log events per second to watchdog.
I was able to do this in local dev using ab.
At 25 logs/sec, the INT space (2^31) will be full in 2.72 years.
The BIGINT space (2^63) will last 11+ billion years.
The microtime suggestion, on further consideration, is not a good route.
microtime produces either a float or a string
The float is limited precision.
The string looks something like this:
0.88814800 1593953860
That string would need to be processed for every log entry.
https://www.php.net/manual/en/function.microtime.php
I tested the SQL command
ALTER TABLE `watchdog` CHANGE `wid` `wid` BIGINT NOT NULL AUTO_INCREMENT COMMENT 'Primary Key: Unique watchdog event ID.'
on local, on 100,000 records, and it took about 2 seconds.
There is a lot of discussion of the INT to BIGINT problem online, for example see:
https://sqlperformance.com/2016/01/sql-indexes/widening-identity-column-1
Comment #38
mindbet CreditAttribution: mindbet as a volunteer commentedCorrected patch from #36.
Comment #39
mindbet CreditAttribution: mindbet as a volunteer commentedRevision of 3081144-30-PASS.patch to fix:
1075 Incorrect table definition
Comment #40
geek-merlinAs of the suggestion to use microtime: It sounds appealing on first hear.
But with an autoincrement ID we get a sort of integrity constraint that we know when a log entry is deleted. We'd lose that.
Comment #41
mindbet CreditAttribution: mindbet as a volunteer commentedadding related issue:
https://www.drupal.org/project/drupal/issues/3024410
Comment #42
mindbet CreditAttribution: mindbet as a volunteer commentedHere is a revised patch.
I used some direct SQL statements to work around issues where it did not appear that $schema->changeField was making all of the necessary changes.
There is no db_type specific code for sqlite because sqlite appears to handle 8-byte integers out of the box, so no update needed.
The test in #30 still needs work IMHO — we need to check that watchdog.wid can accept a value greater than 2**31, but also need to check that subsequent writes to watchdog are successful.
Comment #43
mindbet CreditAttribution: mindbet as a volunteer commentedCode syntax fixes
Mostly to get Postgres test working.
Comment #44
mindbet CreditAttribution: mindbet as a volunteer commentedTrying another way to get Postgres changefield to work.
Comment #45
mindbet CreditAttribution: mindbet as a volunteer commentedPostgres changes are now made using the schema api.
In order to do this, I needed to make changes to the pgsql driver, in part based on code in this issue:
https://www.drupal.org/project/drupal/issues/3028706
Comment #46
mindbet CreditAttribution: mindbet as a volunteer commentedPatch 46 fixes some things in patch 45.
This is removed:
function updateSequenceOwnership
because it belongs to issue 3028706, not this issue.
Also there was parameter commented out for testing that is now restored:
diff -u b/core/modules/dblog/dblog.install b/core/modules/dblog/dblog.install
--- b/core/modules/dblog/dblog.install
+++ b/core/modules/dblog/dblog.install
@@ -17,7 +17,7 @@
'wid' => [
'type' => 'serial',
'not null' => TRUE,
-# 'size' => 'big',
+ 'size' => 'big',
Comment #48
daffie CreditAttribution: daffie commentedCan we change
!==
to<
.I think that this change is out of scope for this issue.
Can we just have one script for all databases and then add exceptions for specific databases.
These messages are not necessary.
Can we check before running the update script what the database field value is for "wid". Regular serial field and not big.
I do not this this is necessary. Just query the database for the field values of the field "wid". It should be big now.
Comment #49
anmolgoyal74 CreditAttribution: anmolgoyal74 at OpenSense Labs for DrupalFit commentedAddressed point => 1,2,4.
WIll be working on the remaining 3.
Comment #50
daffie CreditAttribution: daffie commentedComment #51
daffie CreditAttribution: daffie commentedPostponing this issue on #3191391: Schema::changeField() has bug when changing regular serial field to big serial field.
Comment #54
stephencamilo CreditAttribution: stephencamilo as a volunteer commentedComment #55
hestenetReset issue status.
Comment #58
quietone CreditAttribution: quietone at PreviousNext commentedAdd template and add postponed issue to remaining tasks.
Comment #62
larowlanAdding credit from #3024410: Watchdog wid column should be unsigned which I closed as a duplicate
Comment #63
andypostComment #64
dagmar#3191391: Schema::changeField() has bug when changing regular serial field to big serial field was merged for D10 and I just provided the backport for D9.5 I think this is now unblocked.
Comment #65
dagmarComment #66
dagmarUpgraded to work with Drupal 10.1.x. The lines from patch #49 regarding
core/lib/Drupal/Core/Database/Driver/pgsql/Schema.php
now seems included in another filecore/lib/Drupal/Core/Database/Driver/pgsql/Schema.php
.I think I also addressed comment .3 from #48
Comment #68
dagmarComment #69
ArantxioThanks for this input @dagmar, I think we could make this .install function even shorter using some inspiration from issue #3191391.
Let's see if this breaks the tests or not.
Comment #70
dagmarI updated the issue summary to describe the proposed solution.
Comment #71
daffie CreditAttribution: daffie commentedCan we replace this part with:
- insert row into table with a value for wid that is bigger than the regular maximum integer value.
- test that the row exists in the table with the correct value for the wid field
- insert another row without a value for wid. With the auto-increment functionality it should get the previous wid value plus 1.
- test that the new row exists in the table with the correct value for the wid field.
Comment #72
Akram Khanadded updated patch and addressed #71
Comment #73
daffie CreditAttribution: daffie commentedWe do not need to prophesize the database connection. Just use the regular one
Can we change the value of PHP_MAX_INT with its value times 1000. That will be 2147483647000.
All we need to do is to test that we can insert a value for the wid field that is bigger than 2147483647 and that for values bigger than that the auto increment functionality works. No prophesize functionality needed.
Comment #74
Akram KhanAddressed #73
Comment #75
daffie CreditAttribution: daffie at Finalist commentedCan you replace PHP_INT_MAX with 2147483647000.
Can we use the variable $connection. Please add:
$connection = Database::getConnection();
Comment #76
_pratik_ CreditAttribution: _pratik_ as a volunteer and at Specbee for Drupal India Association commentedAddressed #75 comment.
Thanks
Comment #78
daffie CreditAttribution: daffie at Finalist commented@Akram Khan and @pratik: You are supposed to run the test you are working on your local machine. To make sure that it passes. Now it is not.
Comment #79
ArantxioI've adjusted the patch so the test don't fail. The dblog table has a lot of not-null fields, so we need to fill these to insert the log.
I've also removed the use cases for FakeLogEntries and ProphecyTrait, we still pass the test and we don't need to alter the baseline anymore.
Comment #80
daffie CreditAttribution: daffie at Finalist commentedAll code changes look good to me.
I have updated the IS.
For me it is RTBC.
Comment #82
catchCommitted e9a82dd and pushed to HEAD. Thanks!