Page cache size is unlimited when arbitrary query parameters are requested [#3011426]

We noticed on several of our sites that the page_cache table was steadily growing to hundreds of megabytes, containing thousands of rows. We didn't understand why as these sites were not very content heavy and should be very easily cacheable. The example site we used for debugging contains 589 pages and no authenticated users. So we expected to see 589 page_cache entries. Perhaps a few more caused by views pagers, exposed filters and so on. But certainly no more then 1.000.

Problem 1:
After some investigation we noticed a new page_cache entry was created for every unique URL, regardless wether or not that URL actually creates a different output... This is caused by adding query parameters to the URL.
As an example I created a simple bash script that requests url's with a variable id: http://example.tld/?id=x and loops over x from 1 to 10.000. Sure enough the page_cache generated 10.000 cache entries, resulting in 1.5GB of mysql data in the page_cache table.

This should not happen as the cache contexts for the response on this page does not contain the "url.query_args" context. So the cache system should know that the "id" query parameter does not result in any change and should not cause more then 1 entry in the cache_page.
Is this normal behaviour? I could find some related issues, but no clear description as to why this happens, nor a solution:
#2877498: Dynamic cache does not respect query parameters
#2062463: allow page cache cid to be alterable
#2662196: Cache route by Uri and not just Query+Path

I think this could potentially be exploited to overload websites or even crash mysql database or other caching software?

Problem 2:
If we would fix problem 1, a second problem still pops up. Because in our example site we have some modules active, some configuration, some blocks, some real content, ... Mainly a search block is placed in the header of our site on all pages. That in turn creates a form on our website and somehow that seems to add "url.query_args" to the cache context for all pages. So even if the cache id for the page_cache would filter on only query args passed in the cache context, this behaviour would still add all query parameters causing every query parameter to generate a new page_cache entry.
So we did some searching and found that Drupal core adds "url.query_args" in a few places where it should not be needed at all... See the listing below of what I could find.

Why is this implemented like this?
Probably lots of contrib modules already "rely" on this "bazooka"-behaviour...?

core/lib/Drupal/Core/Form/FormBuilder.php:

  /**
   * #lazy_builder callback; renders a form action URL.
   *
   * @return array
   *   A renderable array representing the form action.
   */
  public function renderPlaceholderFormAction() {
    return [
      '#type' => 'markup',
      '#markup' => $this->buildFormAction(),
      '#cache' => ['contexts' => ['url.path', 'url.query_args']],
    ];
  }

modules/user/src/Form/UserPasswordForm.php:

  public function buildForm(array $form, FormStateInterface $form_state) {
    ...
  
    $form['#cache']['contexts'][] = 'url.query_args';

    return $form;
  }

modules/views/views.theme.inc:

function template_preprocess_views_mini_pager(&$variables) {
  ...

  // This is based on the entire current query string. We need to ensure
  // cacheability is affected accordingly.
  $variables['#cache']['contexts'][] = 'url.query_args';

modules/views/src/Plugin/views/style/Table.php:

  public function getCacheContexts() {
    $contexts = [];

    foreach ($this->options['info'] as $field_id => $info) {
      if (!empty($info['sortable'])) {
        // The rendered link needs to play well with any other query parameter
        // used on the page, like pager and exposed filter.
        $contexts[] = 'url.query_args';
        break;
      }
    }

    return $contexts;
  }

Issue fork drupal-3011426

Show commands

Start within a Git clone of the project using the version control instructions.

Add & fetch this issue fork’s repository

Or, if you do not have SSH keys set up on git.drupalcode.org:

Add & fetch this issue fork’s repository

3011426-page-cache-creates changes, plain diff MR !11634
Check out this branch for the first time

Check out existing branch, if you already have it locally

About issue forks

Comments

Comment #1

5 November 2018 at 10:56

weseze created an issue. See original summary.

Comment #2

idebr commented 6 November 2018 at 14:27

Issue tags:

+D8 cacheability

Comment #3

weseze commented 9 November 2018 at 11:30

Is this something that would need to be fixed? Or is it expected behaviour?

I can commit some time if needed.

Comment #4

dww

we/he/they

commented 9 November 2018 at 16:02

Issue summary:

View changes

Problem #1 definitely seems like a bug if it's true, but I suspect the real issue is actually problem #2. If you're sure the cache contexts don't include query_args, there shouldn't be unique cache entries if you're only varying query_args. But it seems like the problem is that query_args is getting added for you, against your will. ;) I agree it's a problem that core does this "bazooka" behavior, but I'm not enough of an expert on the whole D8 caching world, so I can't really say if/how we can fix this. Curious to hear what the caching subsystem maintainers have to say.

Meanwhile, I removed one of the duplicate links to other issues from the summary, and converted the rest to use the [#nid] syntax (e.g. #2877498: Dynamic cache does not respect query parameters) so folks can see the titles / status without having to click through.

Comment #5

weseze commented 12 November 2018 at 07:32

@dww: problem #1 is definitely true. Event tried on a clean vanilla Drupal install where the cache context contained no trace of query_args. Still created a cache entry varied on each query parameter.

Adding in the right query_args cache context (problem #2) is probably easily fixed, making the cache subsystem work with them will probably be the hard part? (if my findings are correct though)

But before committing my time to create patches, I would need to know that is indeed not intended behaviour or I'm not doing something completely wrong with my cache setup ;)

Comment #6

Pascal- commented 27 November 2018 at 08:52

We're experiencing a similar (or the same?) issue with several of our websites.
Any (temp) fix for this? For now we're doing a manual cache rebuild every 2 days ...

Comment #7

weseze commented 28 November 2018 at 11:40

Other the shutting down page caching, or patching core, I don't see any solution...

We also have a few sites that are building up massive caches from time to time.
Having a regular cron task and some finetuned settings might fix this, see: https://www.drupal.org/node/2891281 Although it does not fix the problem, just clears the cache on your cron run...

Comment #8

ndobromirov commented 28 November 2018 at 16:19

You can always use MemCached, Redius or any other evicting cache storage.

Comment #9

weseze commented 29 November 2018 at 11:54

@ndobromirov: that does not fix the issue here...

Comment #10

chi commented 20 January 2020 at 05:43

Re: Problem 1

I think it is by design. Request URI is hard coded in cache ID used by Page Cache module. So setting "url.query_args" cache context makes no sense as it more specific. I suppose that's implemented this way because there is no way to determine a precise cache context from incoming request. So that Page Cache module relies on the hard coded cache context which is made of HTTP host, URI and request format. I think that's a correct approach. After all URI is a Uniform Resource Identifier, why shoud a Drupal site serve the same content for different URIs?

Note that the context does not include cookies so that Page Cache does not serve pages for authenticated users.

Re: Problem 2

Search form does not include "url.query_args" context for me. Also I think blocks are typically rendered in cache placeholder so they are not cached on page level. Pagination is by it's nature is depending on "page" query parameter and when used with views exposed filters or sortable tables it should also depend on other query parameters as well.

Anyway I propose creating a separate ticket for each issue where some Drupal element declares too general page context. Because it's a completely different problem and it affects Dynamic Page Cache as well.

Comment #11

chi commented 20 January 2020 at 06:25

Issue tags:

+Security

As an example I created a simple bash script that requests url's with a variable id: http://example.tld/?id=x and loops over x from 1 to 10.000.

That's a known issue. I believe that affects earlier Drupal versions as well. Note that even we fix somehow the problem #1 it will not protect a site from that issue because the are many pages that really depend on "url.query_args" cache context. If someone wants to fill cache tables on a site he will just pick one of such pages as base URL. Basically any page with pagination can be used that way.
curl http://example.com/some-view/?{1..1000}

And besides, there is another way to populate the database with a garbage. When DB Log is enabled any anonymous user can leave messages there by visiting non existing pages curl http://example.com/hi_admin_let's_put_some_garbage_to_your_database or by producing errors on a site with specially crafted URLs like http://example.com?page[]. Furthermore there is way to put a few thousands DB log entities at once with a single HTTP request (#2864933: ContextualController::render() should validate incoming IDs).

Comment #12

Pascal- commented 20 January 2020 at 10:02

A workaround that "fixed" the issue for me, is disabling the cache for views with large amounts of data.

Both in D7 and D8 I had views that were only accessible to admins with large amounts of data, those were causing the database to fill rapidly.
In the views settings, I've disabled the cache and that fixed it.

Comment #13

20 January 2020 at 10:02

Version:

8.6.x-dev

» 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #14

20 January 2020 at 10:02

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #15

20 January 2020 at 10:02

Version:

8.9.x-dev

» 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #16

20 January 2020 at 10:02

Version:

9.2.x-dev

» 9.3.x-dev

Comment #17

20 January 2020 at 10:02

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #18

gagarine commented 11 August 2022 at 16:15

Version:

9.4.x-dev

» 10.1.x-dev

Comment #19

11 August 2022 at 16:15

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #20

ressa

he/him

commented 19 January 2024 at 18:37

A workaround that "fixed" the issue for me, is disabling the cache for views with large amounts of data.

I tried this (I am using Facets module, which requires this) but the View is still getting cached, and the cache_page table is getting views entries, which it should not.

I then tried disabling cache for everything under Performance (/admin/config/development/performance), setting "Caching | Browser and proxy cache maximum age" to <no caching> but even then, some entries were made in the cache table ...

I tried searching for drupal views "Caching:None" ignored and found the page Disable the cache for a specific view and tried uninstalling Internal Page Cache (page_cache) and Internal Dynamic Page Cache (dynamic_page_cache) modules, and finally, the View was not getting cached. But caching was pretty much disabled for all pages then.

Luckily, the CacheExclude module was also mentioned on that page. I tried it, and it works perfectly: After adding the view paths /mypath and /mypath/* to exclude also sub-pages, the view is not getting cached at all.

Comment #21

prudloff commented 22 March 2025 at 11:27

We also got bitten by this on a website where bot requests expanded massively the size of the SQL database.
The database_cache_max_rows setting helps with this but it is not enforced in real time.

Our current solution to this is :

switch to a Redis cache and set a memory limit
add rate limit on the server

Comment #22

gpotter commented 26 March 2025 at 18:27

Like @prudloff, we ran into the same issue.

The basic example is any controller that returns a render array. If a bot hits that route with unique query strings, it will create a new cache_page record, if that bot hammers on the page consistently, you end up with tens or even hundreds of gigabytes of data because the entire page is stored with the same content. This is namely the Internal Page Cache module that is causing the issue.

"database_cache_max_rows" doesnt work like prudloff mentioned because those max rows are only restricted on a cron run. So by the time the cron runs the site dies because there are too many records to restrict and delete old records.

We had similar solution ideas with Redis cache, or rate limits. Rate limits is a bit concerning of a solution as it could potentially rate limit a legitimate web crawler.

The primary issue on our server is the site would die on cron runs, so we clear cache before a cron run nightly. A cache clear runs at a good speed versus the db row constrain from cron. Probably because a cache clear is a simple quick truncate of the cache tables.

Comment #23

gagarine commented 26 March 2025 at 19:05

Priority:

Major

» Critical

This issue has been publicly known for years and still poses a serious risk.

In my opinion, the core problem is that adding irrelevant query parameters still returns a valid page. Instead, Drupal should return a 400 (Bad Request) response and avoid caching the page when “fake” query parameters are injected.

As it stands, adding arbitrary query strings (e.g., ?test, ?foo=bar) bypasses the page cache and triggers full page generation. An attacker can exploit this to cause a denial-of-service (DoS) by flooding the site with requests using unique query parameters, quickly overwhelming the backend.

What’s worse is that Drupal treats these URLs as legitimate and includes them—with the fake query strings—in the HTML it returns. The content is therefor not identical of the page requested on the normal URL.

This should absolutely be treated as a critical security issue.

Comment #24

cilefen commented 26 March 2025 at 21:46

If this is critical, are you saying that #2526150: Database cache bins allow unlimited growth: cache DB tables of gigabytes! does not work, or that it is ineffective?

I completely understand that even with row-limiting, an attacker can essentially replace useful cached objects with useless ones, and that Drupal could do better here. But can those outcomes render a site unusable or cause database deadlocks? Possibly.

Comment #25

catch

he/him

English

commented 26 March 2025 at 22:34

@cilefen

If this is critical, are you saying that #2526150: Database cache bins allow unlimited growth: cache DB tables of gigabytes! does not work, or that it is ineffective?

It's known to be ineffective because it relies on cron trimming the table size. This means that if the table is growing to ten times the row limit in-between cron runs, you not only have tables growing very large in-between cron runs, but also the cron logic has to delete a lot of rows. It doesn't mean the logic is 'bad' as such, it's just not a replacement for a 'real' cache backend like Redis or Memcache which can prune items more efficiently, both the pruning and the deciding which ones to prune.

@gagarine

In my opinion, the core problem is that adding irrelevant query parameters still returns a valid page. Instead, Drupal should return a 400 (Bad Request) response

This would require an 'allowed query parameters' API. But that API would have to validate both the parameter names and the values. And it couldn't just do that, because ?pager is a valid value and would have an integer range of 0-PHP_MAX_INT (or some arbitrary configured value). And unless it could be reliably implemented to validate the query parameters on a route-specific basis (which may not be the controller, it could be any block, JavaScript, 'destination' handling for a form etc.), then that still allows more than enough combinations of query parameters to create a very, very large number of cache entries. It would require more knowledge but it wouldn't prevent anything. And it wouldn't be possible to issue a 400 until Drupal 12 because loads of contrib modules would break.

One possibility here is to just completely prevent caching if there are any query parameters in the URL, those requests would have to rely on dynamic_page_cache instead.

What’s worse is that Drupal treats these URLs as legitimate and includes them—with the fake query strings—in the HTML it returns. The content is therefor not identical of the page requested on the normal URL.

Where are the query strings included in the HTML returned by the page?

Comment #26

26 March 2025 at 22:37

catch opened merge request !11634

Comment #27

catch

he/him

English

commented 26 March 2025 at 22:38

Title:	Page cache creates vast amounts of unneeded cache data	» Page cache size is unlimited when arbitrary query parameters are requested
Status:	Active	» Needs work

MR will fail at least one method in PageCacheTest if not elsewhere, because that test method relies on caching pages with query parameters. It would have to be switched to using key value or something.

Comment #28

gagarine commented 27 March 2025 at 10:25

This would require an 'allowed query parameters' API. But that API would have to validate both the parameter names and the values. And it couldn't just do that, because ?pager is a valid value and would have an integer range of 0-PHP_MAX_INT (or some arbitrary configured value).

Could the value be validated by the module that say they need this parameters? The validation may require a couple of DB queries, but it could be much lighter than building a full page.

Where are the query strings included in the HTML returned by the page?

The full URL is added in form action and in the pages's JS.
If this was not the case, we could use a checksum to avoid duplication in cache. But adding a parameter will still allow to bypass the cache.

Comment #29

catch

he/him

English

commented 27 March 2025 at 10:40

Could the value be validated by the module that say they need this parameters? The validation may require a couple of DB queries, but it could be much lighter than building a full page.

For views, which is the most likely way to end up with a pager query parameter, you would need to build and execute the entire view to validate the parameter to find out how many possible pages there might be - pretty much have built the page at that point.

The full URL is added in form action and in the pages's JS.
If this was not the case, we could use a checksum to avoid duplication in cache. But adding a parameter will still allow to bypass the cache.

That's by design - query parameters can be used to pre-fill form values and other things which affect how the form is built, also for the ?destination parameter.

#2503429: [PP-*] Allow both AJAX and non-AJAX forms to POST to dedicated URLs might allow the form action to be consistent each time, but that issue has had very little progress in the past 9 years because it will be incredibly hard to implement.

Comment #30

27 March 2025 at 10:40

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Page cache size is unlimited when arbitrary query parameters are requested

Issue fork drupal-3011426

Comments

Re: Problem 1

Re: Problem 2

Referenced by