Background
Even though {cache_form} is a cache table, it is handled in a different manner than the other cache bins.
Normal full cache clears will call TRUNCATE TABLE, which on MySQL 5+ will drop and recreate the table.
{cache_form} never gets fully cleared under normal circumstances, but will get garbage collected. cids have a hard-coded TTL of 6 hours.
MySQL has a long standing bug (http://bugs.mysql.com/bug.php?id=1341) where the idbata1 file never shrinks, even when data is deleted from the system.
Now consider a Drupal installation which allows anonymous edits to pages, and where links to the edit pages are live on the site (these are in $tabs on node pages by default, so they will be followed). Each request to the node add or node edit page will result in a new form, with a new form ID, and a new entry in {cache_form}.
Now consider a large site with lots of nodes. A spider comes along and starts crawling the site. It will crawl the node pages, and it will also crawl the node edit pages. A bad spider which crawls too fast will generate a lot of data in {cache_form} as a result.
This is also potentially a problem with other forms that get loaded, but node pages make this particularly problematic, as they are typically the largest forms in the system (because of fields), so the disk can fill fairly quickly.
Problem
Because of the MySQL bug described above, this data may never be reclaimed even when the cache is GCed, so the disk on the system I describe above will eventually fill up. This is somewhat analogous to the itok change to prevent image style derivatives from filling up the disk. Very Bad Things happen to a MySQL instance when the disk fills up. Typically, tables will crash hard. On a Drupal system, the first ones that crash tend to be the high volume write ones (cache and friends, watchdog, sessions). Dropping and recreating these isn't a catastrophe, but this possibility also exists that this could happen with a table that has real data in it. In addition, trying to do anything on a system with a full disk is difficult.
Possible Solution / Mitigation?
While this problem can't be totally prevented, I suggest four things happen to help prevent it:
1. Alter the standard robots.txt to add a wildcard for Disallow: node/*/edit. This will prevent Google and Bing from crawling these by default. A user can override this if they choose by editing the file.
2. Alter node_page_edit() to add the robots noindex meta (this is because not all spiders honor wildcard disallows). This will prevent Google and Bing from crawling these by default. A user can override this if they choose via a page alter.
3. Alter form_set_cache() and the equivalent method in Drupal 8 to set $expires from a variable. This will allow site owners to choose smaller TTLs if they desire. This will help keep {cache_form} smaller, to an extent. The default can stay the same, so users won't be impacted unless they change it, and it can be a variable w/o a setting form. Or, make an exception to the backport policy and apply #2091511: Make cache_form expiration configurable, to mitigate runaway cache_form tables to 7.x
4. Edit the documentation to suggest using innodb_file_per_table=1 (this is now the default for MySQL 5.6.6 and later), and to also remind site owners to maintain their servers and OPTIMIZE tables as needed. With file-per-table, full cache clears will drop and recreate cache tables, which will then recaim disk space. Doing an OPTIMIZE on cache_form will reclaim space, as the table gets dropped and recreated with the existing data. Adding a hook_requirements() that checks innodb_file_per_table could also be an option.
Sidenotes
This issue was discussed with the Security Team, and their decision was that this can be a public issue.
This is not a hypothetical. I have seen it happen in the wild.
Comments
Comment #1
mpdonadioComment #2
David_Rothstein commentedFor what it's worth, that issue is already tagged for D7 backport (and seems like it would be perfectly safe to do so, at least on a quick glance)... so don't think any particular exception is needed?
Comment #3
quicksketchAdding #2819375: Do not make entries in "cache_form" when viewing forms that use #ajax['callback'] (Drupal 7 port) as a related issue. We can potentially fix this problem that viewing forms causes database writes for most forms. That would drastically decrease the number of cache_form entries and reduce the risk caused by what should be harmless spidering.