The primary database server experienced a crash due to a full disk earlier today around 6:59am PST (14:59 UTC). The Nagios monitoring system which normally alerts us to prevent these outages had also crashed and failed to send any notices of a problem to the infrastructure team. By 7:21am PST (15:21 UTC) Jeff Sheltren had cleared enough space to bring the database server back online and returned to normal operation.

We are sorry for any inconveniences the outage may have caused. We are taking steps to prevent failed monitoring in the future by adding additional monitoring of our monitoring server. Brandon Bergren is also working to fix the issue with the cache_form table which caused the disk to fill up.


vinoth.3v’s picture

cache_form, yes we need 'Cache Bin' specific cache clearing system in core.

Vinoth - வினோத்

alliax’s picture

That's funny because I think we Drupal admins all know too well that cache_form table.. :-)

vinoth.3v’s picture

no, What I mean is, We need bin specific, cached item specific cache expiration system in core, and drupal should remove expired items individually instead of whole table. We already have expire column, but not sure why it grows very big.
(I am also clearing that very big cache_form table manually :( )

Vinoth - வினோத்

Marko B’s picture

Full disk, sounds funny :) like we are some amateurs here.

o_baeko’s picture

Anyone can miss out on obvious things, the important thing is to make sure its not happening again, and also do an effort to identify other "downers".

Diskfull can happen in a million ways, so it really need a general monitoring/warning system (to admin). If its important for Drupal reputation, then it should be implemented in core. Additional functionality, like monitoring user specific parts of the disk that should not grow (beyond a certain limit?) would also be helpful.

Fixing the underlying bug (cache_form?), is obvious, but actually secondary.

Thanks for a fantastic product all of you!

JKingsnorth’s picture

Ah, you're monitoring your monitoring server. Presumably your monitoring server also monitors the monitor monitoring server.

Anonymous’s picture

Cache Form? Sounds familiar. Isn't that thee smallest table in every Drupal database? Yes, that's it. Glad it's solved now.

Bokhorza’s picture

the cache form doesn't work in my website
any help please ?

Jaypan’s picture

That's not related to the topic at hand (Drupal downtime), and you should open a new thread in the 'Post Installation' forum for such queries.

Checkout my Japan podcasts.
mo6’s picture

Great, but... who's going to monitor the "additional monitoring of our monitoring server"? :)