Problem/Motivation

All our render cache entries like page, entities, blocks, views and so on are by default cached forever.

That's great. Except when you have a lot of data + a lot of variations and a cache backend that doesn't support something like LRU like the default database backend. Then your cache table will grow and grow and it's suddenly not so great anymore :)

You could argue that sites with a lot of data should use redis/memcache and not the database but also medium-sized with not so many users could run into problems eventually, also in regards to copying the database around since only few people remember to ignore cache tables.

Proposed resolution

There have been a lot of discussion around which files do you actually want to truncate:

  • $max_amount / 2
  • $max_amount
  • $max_amount with some jittering:
    Either the render cache service or the database backend (per-bin) should support an upper limit for the max-age. E.g. a day or a week. It would probably be great to vary that slightly for every write, e.g. +/- 5% so not all cache entries expire at the same time.

The simple solution of using just $max_amount was chosen for simplicity reasons.

Remaining tasks

User interface changes

API changes

Data model changes

CommentFileSizeAuthor
#260 interdiff-2526150-249-260.txt1.78 KBfgm
#260 2526150-dbcachemax-260.patch17.39 KBfgm
#258 interdiff-2526150-249-258.txt1.63 KBfgm
#258 2526150-dbcachemax-258.patch17.39 KBfgm
#255 interdiff-2526150-255.txt547 bytesflocondetoile
#255 2526150-255.patch17.33 KBflocondetoile
#249 interdiff-2526150.txt1.25 KBdawehner
#249 2526150-249.patch17.27 KBdawehner
#248 interdiff-2526150.txt1.78 KBdawehner
#248 2526150-247.patch17.23 KBdawehner
#240 interdiff-237-240.txt1000 bytesmpdonadio
#240 2526150-240.patch16.42 KBmpdonadio
#237 interdiff-219-237.txt4.46 KBmpdonadio
#237 2526150-237.patch16.42 KBmpdonadio
#219 interdiff-2526150.txt1.46 KBdawehner
#219 2526150-219.patch15.95 KBdawehner
#217 interdiff-2526150.txt2.91 KBdawehner
#217 2526150-217.patch15.95 KBdawehner
#214 interdiff-2526150.txt5.01 KBdawehner
#214 2526150-214.patch14.75 KBdawehner
#213 2526150-bounded_database_cache_bins__count-213.patch16.55 KBWim Leers
#213 interdiff.txt7.29 KBWim Leers
#205 2526150-bounded_database_cache_bins__count-205.patch13.48 KBWim Leers
#205 interdiff.txt2.34 KBWim Leers
#204 2526150-bounded_database_cache_bins__count-203.patch12.58 KBWim Leers
#204 interdiff.txt1.7 KBWim Leers
#200 2526150-bounded_database_cache_bins__count-200.patch11.84 KBWim Leers
#200 interdiff.txt1.7 KBWim Leers
#198 2526150-bounded_database_cache_bins__count-197.patch11.88 KBWim Leers
#198 interdiff.txt9.59 KBWim Leers
#196 2526150-bounded_database_cache_bins__count-196.patch4.38 KBWim Leers
#196 interdiff.txt2.53 KBWim Leers
#194 2526150-bounded_database_cache_bins__count-194.patch2.42 KBWim Leers
#170 2526150-bounded_database_cache_bins__size-170.patch3.36 KBWim Leers
#170 2526150-bounded_database_cache_bins__count-170.patch2.45 KBWim Leers
#170 2526150-slushi_cache-170.patch4.98 KBWim Leers
#128 interdiff.txt1.86 KBcatch
#128 2526150.patch13.08 KBcatch
#125 2526150.patch13.08 KBcatch
#123 interdiff.txt3.85 KBcatch
#123 2526150.patch3.85 KBcatch
#119 2526150-119.patch12.75 KBalexpott
#119 117-119-interdiff.txt5.34 KBalexpott
#117 interdiff.txt2.06 KBcatch
#117 2526150.patch12.95 KBcatch
#116 interdiff.txt1.98 KBcatch
#116 2526150.patch12.87 KBcatch
#115 2526150.patch10.89 KBcatch
#115 interdiff.txt8.26 KBcatch
#112 interdiff.txt2.13 KBcatch
#112 2526150.patch10.44 KBcatch
#108 2526150.patch10.18 KBcatch
#108 interdiff.txt617 bytescatch
#107 interdiff.txt6.36 KBcatch
#107 2526150.patch10.14 KBcatch
#104 2526150.patch10.13 KBcatch
#104 interdiff.txt5.6 KBcatch
#102 2526150.patch8.28 KBcatch
#100 2526150.patch6.56 KBcatch
#89 2526150.patch6.54 KBcatch
#87 2526150.patch6.54 KBcatch
#82 2526150-81.patch8.26 KBcatch
#79 2526150.patch1.5 KBcatch
#64 interdiff.txt1.79 KBDenchev
#64 database_cache_backend-2526150-63.patch8.77 KBDenchev
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 102,457 pass(es). View
#60 database_cache_backend-2526150-60.patch8.66 KBborisson_
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,576 pass(es). View
#60 interdiff.txt6.19 KBborisson_
#47 interdiff.txt1.36 KBolli
#47 2526150-47.patch8.71 KBolli
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,574 pass(es), 4 fail(s), and 0 exception(s). View
#46 2526150-46.patch8.72 KBhussainweb
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,552 pass(es), 4 fail(s), and 0 exception(s). View
#46 interdiff-45-46.txt704 byteshussainweb
#45 2526150-45.patch8.69 KBhussainweb
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,561 pass(es), 4 fail(s), and 0 exception(s). View
#45 interdiff-44-45.txt2.11 KBhussainweb
#44 2526150-44.patch8.55 KBhussainweb
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,560 pass(es). View
#40 2526150-40.patch9.29 KBhussainweb
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,937 pass(es). View
#40 interdiff-37-40.txt1.23 KBhussainweb
#37 2526150-37.patch9.24 KBhussainweb
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,938 pass(es), 2 fail(s), and 0 exception(s). View
#37 interdiff-36-37.txt1.61 KBhussainweb
#36 2526150-36.patch8.14 KBhussainweb
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,926 pass(es), 1 fail(s), and 0 exception(s). View
#36 interdiff-33-36.txt1.2 KBhussainweb
#33 2526150-33.patch7.18 KBhussainweb
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,933 pass(es), 2 fail(s), and 0 exception(s). View
#33 interdiff-31-33.txt5.55 KBhussainweb
#31 core-render_cache_max_age-2526150-31.patch6.13 KBDenchev
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,931 pass(es). View
#31 interdiff.txt5.61 KBDenchev
#28 render_cache_or-2526150-28.patch6.33 KBborisson_
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,858 pass(es). View
#28 interdiff.txt1016 bytesborisson_
#26 render_cache_or-2526150-26.patch5.39 KBborisson_
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,696 pass(es). View
#26 interdiff.txt2.41 KBborisson_
#25 interdiff.txt3.13 KBDenchev
#25 core-render_cache_max_age-2526150-25.patch5.96 KBDenchev
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,675 pass(es). View
#23 interdiff.txt2.11 KBDenchev
#23 core-render_cache_max_age-2526150-23.patch5.96 KBDenchev
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 99,791 pass(es). View
#20 interdiff.txt6.78 KBDenchev
#20 core-render_cache_max_age-2526150-20.patch5.31 KBDenchev
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,961 pass(es). View
#17 interdiff.txt4.79 KBDenchev
#17 core-render_cache_max_age-2526150-17.patch5.17 KBDenchev
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] Failed to run tests: PHP Fatal error encountered during run_tests.sh. See review log for details.. View
#15 interdiff.txt1.19 KBDenchev
#15 core-render_cache_max_age-2526150-15.patch1.15 KBDenchev
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,655 pass(es). View
#13 core-render_cache_max_age-2526150-13.patch978 bytesDenchev
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 32,317 pass(es), 780 fail(s), and 16 exception(s). View
Members fund testing for the Drupal project. Drupal Association Learn more

Comments

dawehner’s picture

I think making it per bin makes sense, as there might be other bins where this is useful.
Also having this as part of the cache configuration helps you to get an overview inside settings.php, in case needed.

Fabianx’s picture

Fabianx’s picture

I think if we do that, we should have also some pseudo-LRU:

If ($refresh_cache_time && $age_remaining < $refresh_cache_time) {
  // cache set again. (treat like cache miss, but do not rebuild only re-cache)
}

Proposed parameters are per bin:

- max_age: 86400
- max_age_jitter_percent: 5
- refresh_cache_time: 3600 // Do not let frequently accessed data expire, re-cache it instead.

< berdir> Fabianx-screen: that would be interesting, but then we need two max-ages.. since some items might have a "hard" max-age
< Fabianx-screen> berdir: Yes, but the cache backend can distinguish with a simple flag:
< Fabianx-screen> $data->is_default_max_age => TRUE / FALSE,

Berdir’s picture

I'm not a huge fan of introducing that complexity here. It would require storage changes in all the cache tables for one thing...

Wim Leers’s picture

Either the render cache service or the database backend (per-bin) should support an upper limit for the max-age. E.g. a day or a week.

I don't think this actually fixes the problem.

We only check if a cache item is expired when retrieving that cache item. Which means that any render cache item that is accessed infrequently (or never again) also does not get removed.

The only way I think we can solve this is by evicting expired cache items on cron, i.e. implementing LRU anyway.

Wim Leers’s picture

So berdir pointed out that we already do what I just described, in \Drupal\Core\Cache\DatabaseBackend::garbageCollection(), called by system_cron(). Ok :)

Fabianx’s picture

#5: system_cron() runs:

  foreach (Cache::getBins() as $cache_backend) {
    $cache_backend->garbageCollection();
  }

is that not sufficient?

Wim Leers’s picture

Given that, +1 to the proposed solution in principle:

Either the render cache service or the database backend (per-bin) should support an upper limit for the max-age. E.g. a day or a week. It would probably be great to vary that slightly for every write, e.g. +/- 5% so not all cache entries expire at the same time.

… but I still have some concerns:

  1. We want to prefer keeping render cache items of "bigger page components" cached, choosing to evict "smaller page components" from the cache instead. Because the smaller ones, when combined, result in the bigger ones being re-created in the cache anyway.
  2. In the future we'll probably want to send the Age header. We're able to base that on the render cache items created timestamps, taking the maximum value of those. This change would mean the Age would never exceed a week, for no good reason other than sites with huge amounts of content getting into trouble.
  3. I wonder if we can choose to not impose a maximum max-age, but instead specify a maximum number of cache items for a certain cache bin, or even the maximum bin size (see http://stackoverflow.com/questions/9620198/how-to-get-the-sizes-of-the-t...) . Given those restrictions, we'd then continue to do the same things as we do today in DatabaseBackend::garbageCollection(), but on top of that also delete the oldest cache items (by looking at the created timestamp) and delete, say, the 1% oldest cache items.

I don't think we can implement 1. I do think 2 is a useful thing to take into account, but it's not a solution. But 3 seems like a viable alternative solution?


I do think the already proposed resolution works, but then my second point is a concern. I wonder what you guys think of point 3 as an alternative solution? Just trying to make sure we end up with the best possible solution :)

Finally: what do other CMSes/web frameworks do for this?

dawehner’s picture

At least typo3 just defines a default lifetime of an hour, see http://wiki.typo3.org/Caching_framework

Fabianx’s picture

#8:

I don't think it belongs in render cache at all.

It is a pure database backend cache implementation detail and should have no affect on other metadata that we store extra.

1. I think we should at least move cache.page to its own cache_page bin again.

2. For the age header we should not depend on internal data of the cache backend (that is an implementation detail and not part of the interface), but rather provide our own in #cache. (similar with downstream-ttl and ensuring proper max-age is working correctly).

3. I still think to automatically refresh still active cache items is the sanest solution to have pseudo-LRU in database backend - configurable via settings for only that backend. (so others that are likely LRU already can support it, but don't need to.)

That would also take care of #2 then as the about-to-expire cache item would be just refreshed (and retain its age), it is also in principle similar to how If-Modified-Since works - where the reverse proxy returns a 304 when the backend returned data that is still matching the cache and just returns its cached item.
( in this case the cache tags take care of that.)

And I don't think it is complex to implement.

It is one Settings call in __construct and 5 lines of code in get() and 5 lines of code in set plus test coverage.

---

Thinking about 1. again - keep smaller fragments for longer, keep larger fragments shorter.

That kind of policy is very opposite to LRU.

I am not sure, it is something we should think about, but it is out of scope here I think.

Wim Leers’s picture

Thinking about 1. again - keep smaller fragments for longer, keep larger fragments shorter.

I said the opposite.

but it is out of scope here I think.

It is not, because it's related.

Everything I wrote in #8 was about this:

Just trying to make sure we end up with the best possible solution :)

i.e. making suring we take all things into account.

Fabianx’s picture

Okay, I actually would have thought the smaller fragments change less as they usually have less variation :p.

#11: The problem is 3) is not LRU and makes refreshing impossible, i.e. the oldest active caches are very likely the ones that have been created first on cold cache, but those are very likely front page and other things that are highly frequently used and that a LRU policy would not expire.

So we would do the exact opposite, while the goal is to expire things that have not been accessed since day X. (LRU)

But LRU is very costly, because you need a write for each cache_get().

The nice thing about the refresh-pseudo-LRU is that you only need a refresh when you are near the set expiration time.

e.g. for my own sites based on DB I would probably use:

expire=1 week (+/- 10% jitter)
working set=1 day

Everything that was accessed within 1 day would be kept permanently, which is the working set of the site.

So old GA crawled urls would expire, but my front page would not.

Edit:

This would also in general favor larger fragments (as they would get accessed more often) and expire the smaller fragments automatically.

e.g. if my front page does not change and all that is accessed is my front page, then all the block and entity caches would expire after one week.

If then my front page changes, obviously there is a larger cache miss, but that is the trade-off to make and would happen with any other LRU system. (interesting observation: LRU has problems with hierarchical cache items - need to some research if there is something in the literature on that).

However if there are uncached pages then obviously the most active blocks would still be refreshed (unless they depend on url / route obviously).

So I think the trade-off in general is okay.

---

Edit 2:

Putting a max number of items only works when there is a pressure to evict, e.g. that is making that again more complex.

Given our pseudo-LRU it would work like that, it would check the number of items configured and only expire those expired items until the threshold is reached again.

In this case however I would not restrict the working set, but rather only the number of expired items to keep.

So it is orthogonal to the approach and could help to keep some expired items around - however I don't think it is necessary.

Denchev’s picture

Status: Active » Needs review
FileSize
978 bytes
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 32,317 pass(es), 780 fail(s), and 16 exception(s). View

In this aproach a user can define in settings.php a max age for the appropriate bin.

Status: Needs review » Needs work

The last submitted patch, 13: core-render_cache_max_age-2526150-13.patch, failed testing.

Denchev’s picture

Status: Needs work » Needs review
FileSize
1.15 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,655 pass(es). View
1.19 KB

Ups forgot to change some stuff from earlier. here is the new one.

Berdir’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

Ok, we also need some tests, for example in DatabaseBackendUnitTest. Set the setting, then write something and check the expiration, possibly directly in the database.

Thinking about it, we should probably not call out to Settings directly in this class. Instead, let's add an optional array $configuration = [] to the constructor, store it in $this->configuration and then pass in the settings for a specific cache bin only into a class in DatabaseBackendFactory.

Denchev’s picture

Status: Needs work » Needs review
FileSize
5.17 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] Failed to run tests: PHP Fatal error encountered during run_tests.sh. See review log for details.. View
4.79 KB

Status: Needs review » Needs work

The last submitted patch, 17: core-render_cache_max_age-2526150-17.patch, failed testing.

Berdir’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -10,6 +10,7 @@
     use Drupal\Core\Database\Connection;
     use Drupal\Core\Database\SchemaObjectExistsException;
    +use Drupal\Core\Site\Settings;
     
    
    @@ -150,6 +161,11 @@ protected function prepareItem($cache, $allow_invalid) {
    +    // Get defined settings from settings.php if there are any and set the cache.
    +    $settings = (isset($this->configuration)) ? $this->configuration : Settings::get('cache_database', []);
    +    if (isset($settings[$this->bin]) && ($expire == Cache::PERMANENT || $expire > $settings[$this->bin])) {
    +      $expire = $settings[$this->bin];
    +    }
    

    No fallback to Settings here.

    DatabaseBackendFactory needs to load them from Settings and then inject them.

    $configuration is not meant to by keyed by $this->bin. DatabaseFactory just passed the configuration for that bin to it. But we should add a key inside to add additional settings later.

    So: if (isset($this->configuration['max_expire']) ...

  2. +++ b/core/modules/system/src/Tests/Cache/DatabaseBackendUnitTest.php
    @@ -29,8 +29,8 @@ class DatabaseBackendUnitTest extends GenericCacheBackendUnitTestBase {
        */
    -  protected function createCacheBackend($bin) {
    -    return new DatabaseBackend($this->container->get('database'), $this->container->get('cache_tags.invalidator.checksum'), $bin);
    +  protected function createCacheBackend($bin, $configuration) {
    +    return new DatabaseBackend($this->container->get('database'), $this->container->get('cache_tags.invalidator.checksum'), $bin, $configuration);
    

    Don't change the arguments here.

    Just call new DatabaseBackend directly in the new test.

The new test looks like a good start, but test with a few different values. Check with an expiratio that's one lower and one higher.

Denchev’s picture

Status: Needs work » Needs review
FileSize
5.31 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,961 pass(es). View
6.78 KB

1) Moved loading of settings to DatabaseBackendFactory in constructor and now we are passing value for the bin if available.
Also changed to

if (isset($this->configuration['max_expire']) ...

2) Removed changes to this part. Calling now DatabaseBackend directly in tests and added 2 more tests like suggested

Berdir’s picture

Status: Needs review » Needs work
  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -42,6 +42,13 @@ class DatabaseBackend implements CacheBackendInterface {
       /**
    +   * The maximum cache expiration time.
    +   *
    +   * @var array
    +   */
    +  protected $configuration;
    
    @@ -50,14 +57,17 @@ class DatabaseBackend implements CacheBackendInterface {
    +   * @param array $configuration
    +   *   The maximum cache expiration time.
    
    +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -26,6 +27,13 @@ class DatabaseBackendFactory implements CacheFactoryInterface {
    +   * The maximum cache expiration time.
    +   *
    +   * @var array
    +   */
    +  protected $configuration;
    

    This should explain a bit better what exactly is in there.

    Something like "Per-bin database backend configuration."

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -36,6 +44,7 @@ class DatabaseBackendFactory implements CacheFactoryInterface {
         $this->checksumProvider = $checksum_provider;
    +    $this->configuration = Settings::get('cache_database', []);
    

    We also need to actually document this somewhere, often, that's done in default.settings.php, see other keys there.

    Also, instead of settings, we could also use container parameters. The difference is that they would be configured in local.yml and passed in to the service using %some_key.

    cache backends are currently in settings, that's why I suggested this. but I'm also fine with a container parameter.

  3. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -48,7 +57,8 @@ function __construct(Connection $connection, CacheTagsChecksumInterface $checksu
    +    $configuration['max_expire'] = isset($this->configuration[$bin]) ? (int)$this->configuration[$bin] : null;
    

    Missing space after (int)

Wim Leers’s picture

Issue tags: +D8 cacheability
Denchev’s picture

Status: Needs work » Needs review
FileSize
5.96 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 99,791 pass(es). View
2.11 KB

1. Updated explanation
2. Added explanation in default.settings.php. Not exactly sure for this kind of settings what information is needed but here is my first try.
Since i added the explanation in default.settings.php and I already added the settings with Settings::get('cache_database', []); i left it like that. If container parameters is better i can change...
3. .

Berdir’s picture

Status: Needs review » Needs work
+++ b/sites/default/default.settings.php
@@ -691,3 +691,16 @@
+ * For example:
+ * @code ¶
+ * settings['cache_database']['render'] = 3600;
+ * @endcode
+ */
+# $settings['cache_database'][''] = 0;

Instead of a code example, just make the commented out code that example.

And we might even want to enable it by default and set it to a week or so.

To provide a data point here. We recently launched a small local news site. They didn't import their existing content so they started with a practically empty database.

However, cache_render grew to 1.3GB within days. As far as I could see, mostly due to 404 pages from google and other bots. That doesn't go away ever, without a manual full cache clear.

Denchev’s picture

Status: Needs work » Needs review
FileSize
5.96 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,675 pass(es). View
3.13 KB
borisson_’s picture

Issue tags: -Needs tests
FileSize
2.41 KB
5.39 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,696 pass(es). View

Removing the "needs tests" tag, because it already has tests. I also changed the DatabaseBackend to make sure that anything passed in $configuration is an actual array.

Also changed all array definitions to short syntaxes.

olli’s picture

Closed #1947852: Database cache backend garbage collection does not work with permanent entries as a duplicate.

  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -150,6 +160,10 @@ protected function prepareItem($cache, $allow_invalid) {
       public function set($cid, $data, $expire = Cache::PERMANENT, array $tags = array()) {
    +    // Check if maximum expiration settings for cache bin is defined
    +    if (isset($this->configuration['max_expire']) && ($expire == Cache::PERMANENT || $expire > $this->configuration['max_expire'])) {
    +      $expire = REQUEST_TIME + $this->configuration['max_expire'];
    +    }
    

    $expire is a timestamp and configuration['max_expire'] is a duration, so I think we need to add REQUEST_TIME to $expire > $this->configuration['max_expire']?

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -36,6 +44,7 @@ class DatabaseBackendFactory implements CacheFactoryInterface {
    +    $this->configuration = Settings::get('cache_database', []);
    
    @@ -48,7 +57,8 @@ function __construct(Connection $connection, CacheTagsChecksumInterface $checksu
    +    $configuration['max_expire'] = isset($this->configuration[$bin]) ? (int) $this->configuration[$bin] : null;
    +    return new DatabaseBackend($this->connection, $this->checksumProvider, $bin, $configuration);
    

    So I would add $settings['cache_database']['default'] = 86400; to settings.php? How about something like $settings['cache_database']['default']['max_expire'] = ...?

  3. +++ b/core/modules/system/src/Tests/Cache/DatabaseBackendUnitTest.php
    @@ -53,4 +53,23 @@ public function testSetGet() {
    +    $backend->set('test_cache1', 'foo');
    ...
    +    $backend->set('test_cache1', 'foo');
    ...
    +    $backend->set('test_cache1', 'foo');
    

    One of these could use a $expire param greater than the 'max_expire' setting.

#24:

cache_render grew to 1.3GB within days

Should we add a default?

borisson_’s picture

FileSize
1016 bytes
6.33 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,858 pass(es). View

#27

  1. I think the tests prove that the code works, so I don't think this is needed.
  2. @Denchev had already added this, but it looks like I had trouble applying his patch to settings.php locally. I added that code back to settings.php.
    No interdiff sadly.
  3. Added a new test that does exactly that.
  4. see .2, a default is added.
Denchev’s picture

Assigned: Unassigned » Denchev
Denchev’s picture

Status: Needs review » Needs work

@Ollie
1. You are right.
The tests have some flaw i will update them and will see for your other suggestions.

Denchev’s picture

Status: Needs work » Needs review
FileSize
5.61 KB
6.13 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,931 pass(es). View

1) Fixed set method.
2) Default value and bin now in settings and factory class.
3) Fixed tests.

Berdir’s picture

Status: Needs review » Needs work
  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -150,6 +160,10 @@ protected function prepareItem($cache, $allow_invalid) {
    +    // Check if maximum expiration settings for cache bin is defined
    +    if (isset($this->configuration['max_expire']) && ($expire == Cache::PERMANENT || $expire > $this->configuration['max_expire'] + REQUEST_TIME)) {
    +      $expire = $this->configuration['max_expire'] + REQUEST_TIME;
    +    }
    

    Coding style: comments need to end with a ".". Could also be a bit clearer:

    // If set, enforce an upper limit for the expiration to allow for garbage collection.

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -36,6 +44,9 @@ class DatabaseBackendFactory implements CacheFactoryInterface {
    +    // Set cache_database settings from settings.php and if none is set use
    +    // maximum expire for render bin for one week.
    +    $this->configuration = Settings::get('cache_database', ['render' => ['max_expire' => 86400 * 7]]);
    

    Suggestion:

    // Load settings, default to an max expiration of the render cache bin set to one week.

  3. +++ b/core/modules/system/src/Tests/Cache/DatabaseBackendUnitTest.php
    @@ -53,4 +53,23 @@ public function testSetGet() {
    +    $configuration = ['max_expire' => 2800];
    +    $backend = new DatabaseBackend($this->container->get('database'), $this->container->get('cache_tags.invalidator.checksum'), 'render', $configuration);
    +    $backend->set('test_cache3', 'foo', REQUEST_TIME + 2801);
    +    $cached = $backend->get('test_cache3');
    +    $this->assertIdentical($cached->expire, (string) (REQUEST_TIME + 2800), 'Maximum cache expire time is not exceeded.');
    

    $configuration and backend is always the same now, so you can just re-use it?

    Also, add another test for setMultiple(). that's a completely separate implementation right now that needs the same check.

  4. +++ b/sites/default/default.settings.php
    @@ -691,3 +691,11 @@
    + * Use this setting to set the maximum cache expiration time for a specific
    + * cache bin in the database.
    

    I would write this to allow for additional settings later. Make it a list (* - max_expire: ...) and also explain a bit what this is for. That some cache bins can get very big and persistent cache entries are never removed from them. Setting this will allow them to be garbage collected after the configured time, with the downside that they will have to be rebuilt if requested again.

hussainweb’s picture

Status: Needs work » Needs review
FileSize
5.55 KB
7.18 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,933 pass(es), 2 fail(s), and 0 exception(s). View

I have made changes as specified in #31 and some more nitpicks I found. I added a test as pointed out in 3 but I am not sure if that would work the same way as the earlier test. From what I read of the code, I am expecting this to fail.

Status: Needs review » Needs work

The last submitted patch, 33: 2526150-33.patch, failed testing.

Berdir’s picture

Yes, I expected it to fail, now we need to fix the test fail :)

hussainweb’s picture

Status: Needs work » Needs review
FileSize
1.2 KB
8.14 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,926 pass(es), 1 fail(s), and 0 exception(s). View

Attempting a fix. General observation: The code and tests assume that the max_expire setting would be defined in settings.php or overridden by the factory. What if both fail to set it? e.g., if settings.php just defines some other (future) element in the 'render' array (but not max_expire), then the default won't be set which means max_expire won't be set at all. In that case, there would be no upper limit at all. Is this what is intended?

hussainweb’s picture

FileSize
1.61 KB
9.24 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,938 pass(es), 2 fail(s), and 0 exception(s). View

Adding another test to see the effect for the scenario I described in #36.

The last submitted patch, 36: 2526150-36.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 37: 2526150-37.patch, failed testing.

hussainweb’s picture

Status: Needs work » Needs review
FileSize
1.23 KB
9.29 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 100,937 pass(es). View

Fixing failures.

olli’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -150,6 +160,12 @@ protected function prepareItem($cache, $allow_invalid) {
    +      $expire = $this->configuration['max_expire'] + REQUEST_TIME;
    

    What do you think about renaming 'max_expire' to 'max_lifetime'?

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -205,11 +221,21 @@ public function setMultiple(array $items) {
         foreach ($items as $cid => $item) {
    +      // Check if maximum expiration settings for cache bin is defined. If set,
    +      // enforce an upper limit for the expiration to allow for garbage
    +      // collection.
    +      $max_expire = isset($this->configuration['max_expire']) ? $this->configuration['max_expire'] + REQUEST_TIME : Cache::PERMANENT;
    

    $max_expire assignment can be moved outside the loop.

  3. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -36,6 +44,9 @@ class DatabaseBackendFactory implements CacheFactoryInterface {
    +    $this->configuration = Settings::get('cache_database', ['render' => ['max_expire' => 86400 * 7]]);
    
    +++ b/sites/default/default.settings.php
    @@ -691,3 +691,17 @@
    +$settings['cache_database']['render'] = [
    +  'max_expire' => 86400 * 7,
    +];
    

    Why do we need both? Isn't it enough to set it in default.settings.php?

Berdir’s picture

3. Good question, I'm not sure. Agreed that having both is a bit confusing. The common pattern is to have the example in default.settings.php commented out, we could do that?

Wim Leers’s picture

#41.1 makes a lot of sense to me.

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -150,6 +160,12 @@ protected function prepareItem($cache, $allow_invalid) {
+    if (isset($this->configuration['max_expire']) && ($expire == Cache::PERMANENT || $expire > $this->configuration['max_expire'] + REQUEST_TIME)) {

@@ -205,11 +221,21 @@ public function setMultiple(array $items) {
+      if ($max_expire != Cache::PERMANENT && ($item['expire'] == Cache::PERMANENT || $item['expire'] > $max_expire)) {

Let's use strict equality (===).

hussainweb’s picture

FileSize
8.55 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,560 pass(es). View
hussainweb’s picture

FileSize
2.11 KB
8.69 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,561 pass(es), 4 fail(s), and 0 exception(s). View

Fixing for comments in #41.

For point 3, I think we have the default in the constructor to make sure the value is there in case it is removed from settings.php. As @Berdir pointed out, we could comment it out.

For point 1, I am not sure if max_lifetime makes more sense than max_expiry but I have renamed it.

hussainweb’s picture

FileSize
704 bytes
8.72 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,552 pass(es), 4 fail(s), and 0 exception(s). View

The patch in #45 wasn't complete (missed default.settings.php), but the interdiff was correct. I had missed the comment in #43 and fixing that now.

olli’s picture

FileSize
8.71 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,574 pass(es), 4 fail(s), and 0 exception(s). View
1.36 KB

Sorry I meant just rename the settings 'max_expire', variable $max_expire is fine.

Wim Leers’s picture

Title: Render cache (or database backend per-bin?) needs support an upper limit for max-age » Database cache backend needs an optional max lifetime
olli’s picture

#42 So change the line in default.setting.php to

# $settings['cache_database']['render']['max_lifetime'] = 86400 * 7;

?

I think that setting (commented or not) should be before

# if (file_exists(__DIR__ . '/settings.local.php')) {
#   include __DIR__ . '/settings.local.php';
# }

, right?

The last submitted patch, 45: 2526150-45.patch, failed testing.

Wim Leers’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -42,6 +42,13 @@ class DatabaseBackend implements CacheBackendInterface {
       /**
    +   * Per-bin database backend configuration.
    +   *
    +   * @var array
    +   */
    +  protected $configuration;
    
    +++ b/sites/default/default.settings.php
    @@ -691,3 +691,17 @@
    +/**
    + * Per-bin database backend configuration.
    + *
    + * Configure settings for database backend. Available settings are:
    + * - max_expire: Specify the maximum expiry for an individual cache item in the
    + *   selected bin. Some cache bins can get very big and persistent cache entries
    + *   are never removed. Setting this value will allow them to be garbage
    + *   collected after the configured time, and will need to be rebuilt if the
    + *   item is requested again.
    + */
    +$settings['cache_database']['render'] = [
    +  'max_lifetime' => 86400 * 7,
    +];
    

    So this is an array, just in case additional settings are added in the future, right?

    Are we sure that's a good idea?

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -191,14 +201,24 @@ public function setMultiple(array $items) {
    +    // Check if maximum expiration settings for cache bin is defined. If set,
    +    // enforce an upper limit for the expiration to allow for garbage
    +    // collection.
    ...
    +      // If maximum expiration is set, enforce an upper limit on the expiry.
    

    These comments are effectively saying the same thing.

    I think we can keep just the first comment.

    And I think this would be clearer:
    Check if a 'max_lifetime' setting is set for the current cache bin. If set, the lifetime of a cache item is limited to the specified 'max_lifetime', thus preventing unused cache items to linger forever.

  3. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -191,14 +201,24 @@ public function setMultiple(array $items) {
    +    $max_expire = isset($this->configuration['max_lifetime']) ? $this->configuration['max_lifetime'] + REQUEST_TIME : Cache::PERMANENT;
    

    s/$max_expire?$expire/

  4. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -36,6 +44,9 @@ class DatabaseBackendFactory implements CacheFactoryInterface {
    +    // Load settings and if not set, default to maximum expiration of the render
    

    s/expiration/lifetime/

  5. +++ b/core/modules/system/src/Tests/Cache/DatabaseBackendUnitTest.php
    @@ -53,4 +53,78 @@ public function testSetGet() {
    +    // Empty configuration would force DatabaseBackend to select permanent
    +    // lifetime for cache objects.
    

    No configuration (the default) allows cache items to live forever.

  6. +++ b/core/modules/system/src/Tests/Cache/DatabaseBackendUnitTest.php
    @@ -53,4 +53,78 @@ public function testSetGet() {
    +    // Test set as well.
    

    s/set/set()/

The last submitted patch, 46: 2526150-46.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 47: 2526150-47.patch, failed testing.

olli’s picture

Related to #51.1, there is one issue that would add a similar per bin setting #1944740: Do not garbage collect as soon as items expire.

Wim Leers’s picture

#54: that's a good argument! :) However, that issue is going with a very different approach to creating a per-bin setting. Which one is preferable?

Fabianx’s picture

#54 #55:

I do think those two issues are orthogonal actually:

- This issue ensures that CACHE_PERMANENT or > expire is mapped to a max-expire value
- That issue ensures that expired items are not directly removed

While the other issue would potentially pro-long the time items of this issue are kept in the cache, I do not think it would help with the original problem.

Also staleness currently is not yet (widely?) used in core.

Train of unrelated thoughts

It would be very useful to return an old value, then setup a queue or after-request item to re-populate data, but thats not done currently.

Also likely would want to use a max-age=0 (or -2) then anyway, to avoid having an expired item be stored in another cache item.

Anyway.

TL;DR: This issue should continue as is.

Berdir’s picture

I think we had ideas for like 4 more possible settings in this issue alone, see the jitter/refresh in #3 for example that would make sure that not all the cache items expire at the same time and we keep caches that are actively used. I think it's good to just focus on this single feature in this issue, contrib can add all kinds of additional features in alternative implementations, not sure the others are core worthy.

@Fabianx: The rebuild in a queue behavior would probably more likely be implemented using $allow_invalid = TRUE and not in the backend, since you actually need to know what you have to rebuild and the backend doesn't.

Fabianx’s picture

#57: I agree with #57.

I also agree on that and that was my idea. I was just saying adding even more support for $allow_invalid = TRUE is likely not that helpful for core itself as there is no(t much) usage in core of that feature right now.

Wim Leers’s picture

Oh, yes, I was not asking to bring more settings in! I was only making sure we made the right design choice for this setting: A) will we likely add another setting (yes), B) the other issue uses a different setting structure, which one do we want (this one).

So I think that effectively just leaves #51 to be addressed?

borisson_’s picture

Status: Needs work » Needs review
FileSize
6.19 KB
8.66 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 101,576 pass(es). View

Fixed remarks from #51 and the test failures introduced in #45.

Wim Leers’s picture

Looks great!

I'd like to see one more person's thoughts on max_lifetime vs. max_expire. It was olli's idea, I +1'd it, but I wonder if some people are not in favor.

Berdir’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -191,14 +201,23 @@ public function setMultiple(array $items) {
         foreach ($items as $cid => $item) {
           $item += array(
    -        'expire' => CacheBackendInterface::CACHE_PERMANENT,
    +        'expire' => $expire,
             'tags' => array(),
           );
     
    +      if ($expire !== Cache::PERMANENT && ($item['expire'] === Cache::PERMANENT || $item['expire'] > $expire)) {
    +        $item['expire'] = $expire;
    +      }
    

    That double check (in case no expire was set) is a bit confusing, we set the default and then set the value again.

    The thing is that in that loop, we actually build two arrays. So we just put expire in there to get it out again. I tried for quite some time to come up with a better check here but wasn't successful.

    What we should do anyway is move at least part of the comment down to the main logic and explain what it is doing.

  2. +++ b/sites/default/default.settings.php
    @@ -678,6 +678,21 @@
    + $settings['cache_database']['render'] = [
    +   'max_lifetime' => 86400 * 7,
    + ];
    

    based on above, I think we agreed to comment this out by default?

Wim Leers’s picture

#62.1: the first comment covers both, currently. But let's let the first then say "set the correct default value: the maximum lifetime", and the second can give the details of the overriding that are in the first comment currently.

Denchev’s picture

FileSize
8.77 KB
PASSED: [[SimpleTest]]: [PHP 5.5 MySQL] 102,457 pass(es). View
1.79 KB

removed debug code.
#62.2 commented out.
and some comments for #62.1

Berdir’s picture

Looks good to me, +1 to max_lifetime.

Keeping the first comment makes sense to me, that explains the overall concept, the new one just those specific lines.

Wim Leers’s picture

Status: Needs review » Reviewed & tested by the community

Alright, thanks for the many rerolls! :)

catch’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -191,14 +201,25 @@ public function setMultiple(array $items) {
    +    $expire = isset($this->configuration['max_lifetime']) ? $this->configuration['max_lifetime'] + REQUEST_TIME : Cache::PERMANENT;
    

    When reading this it looks like it will be CACHE_PERMANENT unless configured.

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -36,6 +44,9 @@ class DatabaseBackendFactory implements CacheFactoryInterface {
    +    $this->configuration = Settings::get('cache_database', ['render' => ['max_lifetime' => 86400 * 7]]);
    

    But for cache.render we're always setting it - should we reference that in the comment above.

    If I was seeing the 7-day expiry, I'd probably check the backend first, then settings.php, then only after that find this in the factory if I was lucky.

    Also reading this, I wonder whether we couldn't put the default configuration into the container definition for cache.render, then it'd be kept closer to where the bin is defined - the factory doesn't know anything about specific bins in HEAD.

hussainweb’s picture

@catch: There is a small catch in the point 2 you have mentioned. It looks like we are always setting 'max_lifetime' even if not set in settings.php. That's not true though. If in settings.php, we set some other keys in that array but not max_lifetime, that value will no be set. That is where the CACHE_PERMANENT backup setting you pointed out in point 1 comes in.

catch’s picture

@hussainweb - yes it's possible to not set it, I just think the factory is a strange place to put the default setting that most sites will end up using.

Berdir’s picture

@catch: Putting it in the definition (which is something we already do for the backend configuration) wouldn't really make sense for configuration that's specific for one backend IMHO. Or maybe if we name it database_configuration or something like that?

Wim Leers’s picture

I'd very much prefer this to have in the container definition for a specific cache bin also. But if I remember correctly from an IRC conversation, that was considered impossible. I wish I wrote down on this issue why.

Berdir’s picture

  cache.bootstrap:
    class: Drupal\Core\Cache\CacheBackendInterface
    tags:
      - { name: cache.bin, default_backend: cache.backend.chainedfast }
    factory: cache_factory:get
    arguments: [bootstrap]

I think we could extend it to also support anothe stuff in that tag, not sure how complex it can be.

But as I said, that would make it a generic cache concept and not something that's specific to the database backend.

The last submitted patch, 64: database_cache_backend-2526150-63.patch, failed testing.

catch’s picture

catch’s picture

Coming back to this:

- I really think it should be per-bin. For example we'd want cache_bootstrap to stay with CACHE_PERMANENT, but maybe bring cache_render down to 12 hours.

- it's not really database specific, it's valuable for any cache backend that doesn't support LRU.

This means we need a term for non-LRU cache backends, I don't have one though.

ndobromirov’s picture

There you have it: NonLruCacheInterface :).

catch’s picture

Well there's that, but LRU is normally used as shorthand for 'storage size capped'.

APCu has a capped storage size, but this doesn't mean you can just throw items at it. Once it's full the behaviour depends on the ttl but the default is just to empty itself out entirely.

The way we deal with that is to not recommend APCu for any unbounded cache bins - instead it's restricted to ones where we expect the total size of cache items to be within a reasonable limit, so that it never gets full at all.

catch’s picture

Version: 8.0.x-dev » 8.1.x-dev
Priority: Major » Critical
Status: Needs work » Needs review
FileSize
1.5 KB

OK here's a suggestion that makes less assumptions about storage: 'configurable permanent ttl'.

If it's set, then cache storage can set 'permanent' cache entries to that length or higher, up to permanent again.

Here's an untested draft with the interface and CacheBackendInterface implementing it.

If the concept seems fine, we still need to sort out the configuration key for the bins, what it gets set to, and add tests.

I'm also going to bump this to critical, since the meta was triaged as critical and I don't think that meta gets fixed without this in some form.

Status: Needs review » Needs work

The last submitted patch, 79: 2526150.patch, failed testing.

Fabianx’s picture

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -194,8 +199,10 @@ protected function doSetMultiple(array $items) {
+        $item['expire'] = $this->maximumPermanentTtl;

minimum vs. maximum mixup ...

----

I re-read my comments again:

1. I think this is a good case for a decorator class: e.g. BoundedDatabaseBackend + using a trait. then every cache backend can implement that by implementing the Interface + using the trait.

2. I still think that:

- a) Random Jitter is a good idea (it is horrible if all items expire at the same time)

- b) A mechanism for refreshing cache items that are accessed before the TTL expires is added (pseudo-LRU)

As the setting is per bin, a site owner can e.g. choose to use cache_page as Expire + Refresh to ensure larger items that are frequently accessed, stay in the cache, while smaller items that are not accessed at all would expire. (as Wim Leers wanted)

On the other hand someone that wants to cache smaller items, could just use a larger ttl for cache_render and a smaller one for cache_page.

To give my code some perspective.

Jitter is as simple as:

    foreach ($items as $cid => $item) {
+      if (!isset($item['expire'] || $item['expire'] === CacheBackendInterface::CACHE_PERMANENT))  {
+        $item['expire'] = $this->addTtlJitter($this->minimumPermanentTtl);
+      }
       $item += array(
-        'expire' => CacheBackendInterface::CACHE_PERMANENT,
         'tags' => array(),
       );
// ...

protected function addTtlJitter($ttl) {
  $diff = $ttl * $this->ttlJitter;
  return rand($ttl - $diff, $ttl + $diff);
}

While refreshTIme would work like:

protected function prepareItem($cache, $allow_invalid) {
  // ...

+  if ($this instanceof TtlAwareCacheBackendInterface && $cache->ttl === CacheBackendInterface::CACHE_PERMANENT && $cache->expire + $this->refreshTtl >= REQUEST_TIME) {
+   // Refresh the item in the cache, could also just update $expire.
+   $this->setMultiple([$cache]);
+ }

  return $cache;
}

I see the problem though that we have nowhere to save that the item should be cached permanently instead of an item that truly is time dependent - however I think we should save the ttl in any case in addition to the expire timestamp. That would solve that for all cache backends.

e.g. practical example: homepage should never be expired, is accessed several times per hour, permanent items are cached for a week, e.g.:

  • - Set minimum Expire Ttl == 1 week
  • - Set jitter = 10% = 0.10 (so that not all items expire at the same time)
  • - Set refreshTime = 1 week - 1 day == 6 days (working set: 1 day)

So every item accessed within the delta of: minimumExpireTtl - refreshTime and originally cached CACHE::Permanent will be refreshed.

But it also happens only when the item is close to be expired.

And I do think this is superior than just expiring all content after 7 days when the last cache clear was, which is like a cache clear all (for cache_render e.g.).

And we all know that cache_clear_all can be problematic for little more frequently visited sites.

catch’s picture

Status: Needs work » Needs review
FileSize
8.26 KB

. I think this is a good case for a decorator class: e.g. BoundedDatabaseBackend + using a trait. then every cache backend can implement that by implementing the Interface + using the trait.

I struggle to see many people using this except the database backend, so not sure that's useful in practice. If the idea is to keep a reference implementation of the database backend then it seems good for that.

Random Jitter is a good idea (it is horrible if all items expire at the same time)

I also think that's a good idea, but it's 100% implementation - except for the new methods mentioning 'minimum' permanent ttl so it can vary upwards. I think we should add that before RTBC though yes.

The refresh idea is a good one, and I've done similar last-minute-update logic in custom code before, but it's going to require an additional column in the database schema and an update. Also we'd have to do that in a way that it doesn't blow up before the update runs which is not going to be straightforward - the column will be missing on any ->get() call.

Patch update that fixes the minor issues above, and doesn't explode doing drush cr.

If this passes, still needs tests, still needs jitter, and then we need to figure out whether to try the refresh or not here or move that to a follow-up.

Also not sure what to do about the change to CacheBackendFactoryInterface. We can't change the actual interface - as you can see from the patch I had to update the various factories including ones that don't need this.

We could allow the individual factories to add the optional parameter to the method, but then there's no interface to communicate what the param means.

Or possibly a new interface and something like addMethodCall..

Status: Needs review » Needs work

The last submitted patch, 82: 2526150-81.patch, failed testing.

Fabianx’s picture

I think addMethodCall is the way to go via the 'calls' property.

I see that it would be nicer to add as a parameter, but changing the interface is a no-go obviously.

And adding getWithTtl() feels just wrong.

Maybe we could use a 'tag', which is like a named parameter and a compiler pass to add the tag information as a method call?

--

On the other hand could we not introduce a new FactoryInterface that all core factories that want to support this would implement?

catch’s picture

I think an entirely new factory interface might be the way to go. End of the day here but will have a look tomorrow.

catch’s picture

Status: Needs work » Needs review

Small update:

- added the new factory interface
- switched to an $options array for the service argument and new factory interface. That would potentially allow us to make things like the refresh and jitter configurable if we wanted to.

If that looks OK, will start looking at some of the rest from #81.

catch’s picture

alexpott’s picture

Status: Needs review » Needs work
+++ b/core/core.services.yml
@@ -207,7 +207,7 @@ services:
+    arguments: [default, ['maximum_permanent_ttl', 86400]]

Shouldn't this be minimum_permanent_ttl?

catch’s picture

Status: Needs work » Needs review
FileSize
6.54 KB

Yes it should.

alexpott’s picture

Issue tags: +Needs tests

The setting of the expire time to the minimum ttl looks like it can be tested.

Fabianx’s picture

#88, #89: But is it not in fact a maximum?

e.g. PERMANENT == unlimited, while this is the maximum ttl time that is allowed to be set.

Though that is wrong, too. As you could set a higher value ...

I am still confused by the wording.

Agree with the tests.

Wim Leers’s picture

I'm also confused. I'd also expect this to be called "maximum". But, really, it's not even maximum or minimum, it's just mapping PERMANENT to some configurable number. Perhaps a clearer name would therefore be overridePermanentMaxAge.

I'd also go with max-age rather than TTL because Cache-Control: max-age and because https://www.drupal.org/developing/api/8/cache/max-age.

catch’s picture

Definitely needs tests - should be easy to add to the database backend tests.

On $minimum_permanent_ttl - I don't like the name either, clearly it's confusing even me.

In practice:

- it should be configurable via the container, as in the patch.
- backends should set the ttl no lower than the configured ttl, but for example if we use jitter, then they're going to set it higher than that (so it's not a maximum)
- backends could do other things like whitelist particular cids as permanent
- they could also ignore it and use permanent.

What about $overridePermamentTtl?

I'm not at all sure about max-age - we use expire in the database backend already which is already confusing, and this isn't going to be used for Cache-control header anywhere. When it's applied to the cache_render bin that may not be obvious at all.

Wim Leers’s picture

TTL === max-age, no?

catch’s picture

Yes it's the same thing, I just associate it with the Cache-control header more. Whereas ttl I think more of apcu or similar. I guess it's a case of whether that's a good thing or not that it has the association with the header.

catch’s picture

Or $override_permanent ;)

Wim Leers’s picture

   * @param int $expire
   *   One of the following values:
   *   - CacheBackendInterface::CACHE_PERMANENT: Indicates that the item should
   *     not be removed unless it is deleted explicitly.
   *   - A Unix timestamp: Indicates that the item will be considered invalid
   *     after this time, i.e. it will not be returned by get() unless
   *     $allow_invalid has been set to TRUE. When the item has expired, it may
   *     be permanently deleted by the garbage collector at any time.

Unfortunately the cache back-end interface is designed for absolute timestamps, so it's neither TTL nor max-age. Otherwise just following its existing terminology would've been perfect…

catch’s picture

Well except the argument is a ttl - we convert it to an expiry in the backend. Similarly CACHE_PERMANENT is treated as a ttl - it really means never as opposed to just being a far future expiry.

Wim Leers’s picture

Similarly CACHE_PERMANENT is treated as a ttl

It's not, it's an absolute time: "infinity". But PHP doesn't support infinity, so we map it to -1.

Well except the argument is a ttl - we convert it to an expiry in the backend.

True. But that's my point: we wouldn't need to discuss "TTL" vs "max-age" if we could just have be the same kind of thing as what we have, which is absolute numbers: timestamps.

So, we simply can't be consistent with what we have. If you then prefer TTL, works for me.

catch’s picture

It's not, it's an absolute time: "infinity". But PHP doesn't support infinity, so we map it to -1.

Is now + infinity different to just infinity?

Added tests. Also added some basic jitter to the database backend.

Not doing anything with the wording yet - I'm not tied to ttl at all, but not sure we have a better option yet.

No interdiff - will start posting them from this point though since this is more or less minimally viable now and the patch is getting bigger.

Status: Needs review » Needs work

The last submitted patch, 100: 2526150.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
8.28 KB

Arggh missing new files.

Wim Leers’s picture

Arggh missing new files.

Newb! :P

  1. +++ b/core/core.services.yml
    @@ -207,7 +207,7 @@ services:
    +    arguments: [default, ['maximum_permanent_ttl', 86400]]
    
    +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -42,6 +42,11 @@ class DatabaseBackend implements CacheBackendInterface {
    +  protected $minimumPermanentTtl = CacheBackendInterface::CACHE_PERMANENT;
    

    Still inconsistent.

  2. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackendFactory.php
    @@ -68,21 +68,16 @@ public function __construct(Settings $settings = NULL, $consistent_service_name
    +        // @todo: check interface.
    

    Do you mean an assert($service instanceof SomeInterface)?

  3. +++ b/core/lib/Drupal/Core/Cache/ConfigurablePermanentTtlInterface.php
    @@ -0,0 +1,21 @@
    + * collection. The actual TTL used may be the same or higher than configured.
    

    This last sentence is meant to communicate that there may be jitter? Not very clear to me.

    Also, why >=, and not <=? Documenting the rationale for that would be good I think?

  4. +++ b/core/lib/Drupal/Core/Cache/ConfigurablePermanentTtlInterface.php
    @@ -0,0 +1,21 @@
    +  function setMinimumPermanentTtl($ttl);
    +}
    

    Missing newline between these two.

  5. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -240,6 +252,14 @@ protected function doSetMultiple(array $items) {
    +   * Calculate permanent expiry.
    

    Missing @return.

  6. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -240,6 +252,14 @@ protected function doSetMultiple(array $items) {
    +    return rand($minimum, $minimum * 1.5) + REQUEST_TIME;
    

    Shouldn't this 1.5 be a constant on the class as well, to make it clear what it's for?

  7. +++ b/core/modules/system/src/Tests/Cache/DatabaseBackendUnitTest.php
    @@ -51,6 +52,21 @@ public function testSetGet() {
         $this->assertIdentical($cached_value_short, $backend->get($cid_short)->data, "Backend contains the correct value for short, non-ASCII cache id.");
    +
    +  }
    

    Extraneous \n.

catch’s picture

Should cover everything in #104.

Status: Needs review » Needs work

The last submitted patch, 104: 2526150.patch, failed testing.

Fabianx’s picture

I think my vote is for override_permanent_maxage or override_permanent_ttl.

That says much clearer that this is an override for permanent cache items.

catch’s picture

Status: Needs work » Needs review
FileSize
10.14 KB
6.36 KB
-    $consistent = $this->container->get($this-consistentServiceName);

+    $consistent = $this->container->get($this->consistentServiceName);

Was the test failure.

Switched to $override_permament_ttl for now.

catch’s picture

Need to use the constant in the test also.

Opened a follow-up to discuss the refresh - that's not necessary to fix the critical for me, and having to do a schema update to cache tables feels very 8.2.x-only. #2705993: Add auto-refresh/quasi-LRU to database backend

catch’s picture

Adds two new interfaces and updates the container, so tagging for rc target triage.

catch’s picture

Title: Database cache backend needs an optional max lifetime » Make CACHE_PERMANENT configurable in the database cache
Fabianx’s picture

+++ b/core/lib/Drupal/Core/Cache/ChainedFastBackendFactory.php
@@ -68,26 +68,34 @@ public function __construct(Settings $settings = NULL, $consistent_service_name
+    $consistent = $this->container->get($this->consistentServiceName);
+    if ($consistent instanceof ConfigurableCacheBackendFactoryInterface) {
+      $consistent_backend = $consistent->get($bin, $options);
+    }
+    else {
+      $consistent_backend = $consistent->get($bin);
+    }

I think this would be much easier to read with a helper function.


$backend = $this->getCacheBackendWithOptions($name, $options);

The rest looks great to me.

catch’s picture

Yes that's better.

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC - Looks great to me - except perhaps test coverage might need to be extended a little?

Wim Leers’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/core.services.yml
    @@ -207,31 +207,31 @@ services:
    +    arguments: [default, ['override_permanent_ttl', 86400]]
    ...
    +    arguments: [entity, ['override_permanent_ttl', 86400]]
    ...
    +    arguments: [menu, ['override_permanent_ttl', 86400]]
    ...
    +    arguments: [render, ['override_permanent_ttl', 86400]]
    ...
    +    arguments: [data, ['override_permanent_ttl', 86400]]
    

    Why not make this 86400 a container parameter itself, so you only have to override that one parameter on your sites if you need a different TTL?

  2. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackendFactory.php
    @@ -68,26 +68,46 @@ public function __construct(Settings $settings = NULL, $consistent_service_name
    +   * Get a cache backend with options if it supports it.
    

    s/Get/Gets/

  3. +++ b/core/lib/Drupal/Core/Cache/ChainedFastBackendFactory.php
    @@ -68,26 +68,46 @@ public function __construct(Settings $settings = NULL, $consistent_service_name
    +   * @return
    

    Missing type.

  4. +++ b/core/lib/Drupal/Core/Cache/ConfigurableCacheFactoryInterface.php
    @@ -0,0 +1,23 @@
    +interface ConfigurableCacheFactoryInterface {
    

    Why not let this extend \Drupal\Core\Cache\CacheFactoryInterface?

  5. +++ b/core/lib/Drupal/Core/Cache/ConfigurablePermanentTtlInterface.php
    @@ -0,0 +1,26 @@
    + * configured ttl could potentially be set very low for some use cases.
    

    s/ttl/TTL/

  6. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -42,6 +49,11 @@ class DatabaseBackend implements CacheBackendInterface {
    +   * The overridden permanent TTL.
    

    "override" in variable name vs "overridden" in documentation.

    Note this is not even the overridden value, this is literally the value to use. It just might be overridden.

    So, I think $permanentTtl would be a better name.

  7. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -240,6 +259,16 @@ protected function doSetMultiple(array $items) {
    +   * Calculate permanent expiry.
    

    s/Calculate/Calculates/

  8. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -240,6 +259,16 @@ protected function doSetMultiple(array $items) {
    +   * @return an expires timestamp based on the override TTL with added 'jitter'.
    

    Wrong doxygen.

catch’s picture

Status: Needs work » Needs review
FileSize
8.26 KB
10.89 KB

#1 done.

Minor points also done.

#6 done and wish we'd thought of that about 20 comments ago, much better.

#4 because not sure if it's clearer since they both have a single method with the same name and different arguments. But PHP allows it so why not, don't think it's less clear, just a bit neutral.

@Fabianx on tests. I thought about testing that the expires doesn't return a valid cache entry, but we already have tests for that, and it's tricky to test with REQUEST time without manipulating the value, which would then invalidate the test here. Could potentially test ChainedFast explicitly though.

catch’s picture

Added explicit testing of the ChainedFastBackend when both the fast and consistent backends have permanent overridden.

catch’s picture

That test passed but didn't test anything very interesting, better one.

Fabianx’s picture

Assigned: Denchev » catch
Status: Needs review » Reviewed & tested by the community

That is enough on tests for me and all feedback has been addressed.

I think this should be ready now for core committer review.

alexpott’s picture

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -240,6 +259,17 @@ protected function doSetMultiple(array $items) {
+    return rand($minimum, $minimum * self::JITTER_MULTIPLIER) + REQUEST_TIME;

From https://secure.php.net/manual/en/function.rand.php

Note: On some platforms (such as Windows), getrandmax() is only 32767. If you require a range larger than 32767, specifying min and max will allow you to create a range larger than this, or consider using mt_rand() instead.

I think we need to use mt_rand() here. Patch attached does this.

Patch includes a few coding standards fixes too.

Fabianx’s picture

RTBC on the interdiff

jibran’s picture

Seems ready to me as well but I think we should improve some documentation.

  1. +++ b/core/lib/Drupal/Core/Cache/ConfigurablePermanentTtlInterface.php
    @@ -0,0 +1,27 @@
    + * This allows cache backends with restrictions on storage space to set a
    + * shorter TTL for 'permanent' cache items, so that they get expired by garbage
    + * collection. The actual TTL used may be the same or higher than configured,
    + * to allow 'jitter' to be applied so that items set immediately after a cache
    + * clear do not all expire at the same time. Cache backends should assign jitter
    + * higher rather than lower to avoid the risk of cache stampedes, since the
    + * configured TTL could potentially be set very low for some use cases.
    

    I must say some awesome documentation. Can we add some of it above the JITTER_MULTIPLIER or at least add @see?

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -235,6 +254,17 @@ protected function doSetMultiple(array $items) {
    +    return mt_rand($minimum, $minimum * self::JITTER_MULTIPLIER) + REQUEST_TIME;
    

    We can also explain this line as well as per #119.

olli’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/core.services.yml
    @@ -207,31 +208,31 @@ services:
    +    arguments: [default, ['override_permanent_ttl', '%cache.configuration.override_permanent_ttl%']]
    

    ', ' -> ': '?

  2. +++ b/core/core.services.yml
    @@ -207,31 +208,31 @@ services:
         factory: cache_factory:get
    -    arguments: [render]
    +    arguments: [render, ['override_permanent_ttl', '%cache.configuration.override_permanent_ttl%']]
    

    This doesn't work with code that uses cache_factory directly like render_cache.

  3. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -37,6 +44,11 @@ class DatabaseBackend implements CacheBackendInterface {
       /**
    +   * The permanent TTL.
    +   */
    +  protected $permanentTtl = CacheBackendInterface::CACHE_PERMANENT;
    

    missing @var?

  4. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -34,16 +34,14 @@ function __construct(Connection $connection, CacheTagsChecksumInterface $checksu
    +      $class->setOverridePermanentTtl($options['override_permanent_ttl']);
    

    ->setPermanentTtl(

catch’s picture

Status: Needs work » Needs review
FileSize
3.85 KB
3.85 KB

Addressing #121 and #122.

However #122-2 is a good spot, and not sure what to do about that yet.

Status: Needs review » Needs work

The last submitted patch, 123: 2526150.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
13.08 KB

Rebased.

Status: Needs review » Needs work

The last submitted patch, 125: 2526150.patch, failed testing.

The last submitted patch, 125: 2526150.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
13.08 KB
1.86 KB
catch’s picture

To be honest the more I work on this patch, the less I like it. For a site that doesn't get many 404s, it could degrade performance rather than improving it, and the back and forth here shows that the concept is quite hard to grasp. There are also other options here such as setting an arbitrary limit on cache items and pruning in garbageCollection if it goes over - I like that option less than this one though.

I'd be tempted to bump #2699613: Set a shorter TTL for 404 responses in page_cache module and #2699627: url.path cache context for breadcrumbs is unnecessarily granular to critical and mark this postponed on those. They'll benefit sites running any cache backend and if we can get them tested on a site like @casey's that's suffering from this issue, it'd then allow us to see whether this issue is really critical by itself.

catch’s picture

Priority: Critical » Major
Status: Needs review » Postponed

Doing that for now.

Berdir’s picture

Status: Postponed » Active

That happened, this is still a problem. 404 is just one example, you can also just use arbitrary query arguments to fill up the cache.

xjm’s picture

Issue tags: -rc target triage
Fabianx’s picture

What might make more sense is to add a "lurker" cron job, which expires cache entries that are actually expired per the cache tags.

Though I am not sure if our structure allows for that - given we only have a checksum in the DB backend. However for serialized render arrays we could do it at least as we there have the cache tags information.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

geerlingguy’s picture

I just ran into this after launching a new D8 site that uses the database backend (currently memcached or redis isn't an option); see #2814015: cache_render table becomes extremely large over time.

We don't have anything out of the ordinary, but there are a couple views exposed filters in blocks, and those blocks have 100,000+ rows in the cache_render table (one per filter set and url permutation—the block only appears on one route, but the path can have filter arguments like ?abc-xyz), which never expire unless I run drush cr manually (then they just fill up again until the next cache clear).

We were dealing with a database around 10-15 MB total before launching the prod site; now that there's live traffic, the database balloons out to 10+ GB every day until we clear the cache. Backups and full database copies take 30+ minutes instead of 30s...

I believe I have seen maybe 5-10% of all Drupal sites I've ever worked with using a cache backend other than the database, and I see no reason that will change with Drupal 8—this could be a huge problem for anyone trying to run a small-to-medium sized Drupal 8 site on shared hosting with even pretty basic sites.

Berdir’s picture

If a single day is enough to generate such an amount of cache recoreds, this issue alone will not solve it for you unless you plan to set the expiration to just hours.

Having that many variations sounds like you want to prevent those blocks from being cached at all because they're probably by user or so and caching implies more overhead than not doing that. Try just disabling the cache in the views settings, that should bubble up. Page cache will still cache them and for anonymous users, dynamic page cache + optionall bigpipe will allow to cache the rest of the page and just inject those blocks, while bigpipe will allow you to serve the rest of the page while still building those views.

Berdir’s picture

That said, I agree that this is a critical scalability issue and I also still think that we should try to get a minimal* fix in and document that nicely somewhere, possibly even have a sane default in core.

* I still believe that jitter and that isn't necessary for a minimal solution, as that IMHO occurs naturally on most sites, as you get the requests and cache writes distributed over time. And sites big enough for that to become relevant probably should look into alternative cache backends anyway.

With the rise of container based hosting like pantheon, platform.sh, acquia and so on which afaik all offer redis/memcache, I do think that using such a backend will be more common than it used to be.

geerlingguy’s picture

Closed #2814015: cache_render table becomes extremely large over time as a duplicate.

And yes, more common... but at least 96% of the sites I've ever dealt with are not on Pantheon, Platform.sh, Acquia, etc. There's a silent majority of Drupal sites out there running on Hostgator, Dreamhost, preconfigured DigitalOcean VPSes, etc., and this problem (along with some others) doesn't help silence the mantra of "Drupal 8 is not for shared hosting"...

I agree with a minimal fix for now—the jitter doesn't seem (to me) like it would be a major issue/blocker.

Fabianx’s picture

#138: I do agree that some kind of expiration would be nice.

However please be aware that there is a working contrib backend by larowlan already:

https://github.com/larowlan/slushi_cache

I still think that using jitter is simple and can be achieved easily (and not sure what catch' last patch is missing anyway except for reviews and tests?) and we should:

- a) make simple changes
- b) ensure it will work for more complex cases, too.

Core has a higher responsibility in that regard.

--

To the issue above:

Likely it is indeed best to max-age=0 the view result with that many permutations (are they all valid?) and have the whole block be placeholder-ed automatically.

sajiniantony’s picture

I have upgraded the site to 8.2 version, it shows issue with the size of cache table and also deadlock error.Any help is appreciated

xmacinfo’s picture

With the rise of container based hosting like pantheon, platform.sh, acquia and so on which afaik all offer redis/memcache, I do think that using such a backend will be more common than it used to be.

This is a bold statement!

100% of the Drupal sites I am working with are not on those platforms.

catch’s picture

We don't have anything out of the ordinary, but there are a couple views exposed filters in blocks, and those blocks have 100,000+ rows in the cache_render table (one per filter set and url permutation—the block only appears on one route, but the path can have filter arguments like ?abc-xyz)

Do you have 10s/100s of thousands of requests changing those filters then? I wonder a bit whether caching with views exposed filters wouldn't benefit better from some flood control or something - stop entries getting there in the first place. Would have to do the flood control when rendering the view and then set max_age=0 when flood control kicks in.

Wim Leers’s picture

Backups and full database copies take 30+ minutes instead of 30s...

Well, that is a huge flaw in your backup system: you should never ever back up the cache_* tables.

But of course we should still solve the problem described in this issue.

catch’s picture

Assigned: catch » Unassigned

Unassigning me since I'm not actively working on this at the moment.

Wim Leers’s picture

AFAICT we have agreement this is a good idea.

It seems the last point of disagreement is the inclusion of jitter or not?

Wim Leers’s picture

Title: Make CACHE_PERMANENT configurable in the database cache » Make CACHE_PERMANENT configurable in the database cache — currently allows unlimited growth: cache DB tables of gigabytes!
Related issues: +#2828639: [BUGFIX] getRelated is using wrong cache tags

This problem also came up in #2828639: [BUGFIX] getRelated is using wrong cache tags.

And I know that some Acquia customers have run into this very problem too.

larowlan’s picture

FWIW Slushi cache solves this via contrib in the meantime.

malcomio’s picture

I observed a similar issue on shared hosting, which did not seem to be addressed by installing Slushi cache.

Some shared hosting providers will shut sites down with minimal warning if database sizes grow too large - as geerlingguy says in #138, issues like this are likely to push more small sites away from Drupal.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Jelle_S’s picture

We experienced this error when a google bot was crawling our facetted search, the cache_render table was growing to a couple of GB within minutes. We had to put our site in maintenance mode and exclude the facetted search in robots.txt to deal with it. Site is up and running again now, with a cronjob clearing the cache every 6 hours just to be sure... Not exaclty ideal. We thought about putting varnish in front of it, but since the varnish bin is shared with other websites it might compromise those if it grows this fast...

catch’s picture

@malcomio Slushi cache should work fine for shared hosting, what makes you think it wouldn't? You'd have to configure the TTL time quite low if you're getting crawled though.

malcomio’s picture

@catch on the side project site where I experienced this issue, it still occurred after installing Slushi cache, although looking at settings.php now, I think that it may not have the correct configuration. I'll check and if I'm still having problems, I'll create an issue on the module.

haasontwerp’s picture

Subscribing, experiencing this for a site i've built. Slushi cache won't help (or i'm probably setting it wrong). DB without cache is about 75 MB big, it grows to 250+ MB in one day. And then the hosting provider just shuts down database user access, so i get a understandably angry client on the phone and i'm getting tired of manualy clearing cache x times a day. Bigger database won't fix this, it will just take a little longer to fill up the DB to max size.

Wim Leers’s picture

@Jelle_S: Faceted search is the sort of module that should take responsibility to not cache every single variation. Because it's a module that allows insane numbers of variations. So of course it does not make sense to cache them all.

anavarre’s picture

Just registering a real-life example with a 8.2 site. PHP 5.6, core modules + Honeypot and a custom multimedia-related module, no memcache or alternate cache backend.

+--------------------------+--------------+------------+--------+-------------+---------------------+
| Tables in abcdefgh_db    | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+--------------------------+--------------+------------+--------+-------------+---------------------+
| cache_render             | 8312.00 MB   | 7978.88 MB | 220276 | 49.84 MB    | 0.01                |
| cache_dynamic_page_cache | 2924.00 MB   | 2706.72 MB | 43703  | 24.00 MB    | 0.01                |
+--------------------------+--------------+------------+--------+-------------+---------------------+

Rebuilt caches / Truncated tables

+--------------------------+--------------+------------+--------+-------------+---------------------+
| Tables in abcdefgh_db    | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+--------------------------+--------------+------------+--------+-------------+---------------------+
| cache_render             | 176.00 MB    | 156.70 MB  | 7444   | 2.55 MB     | 0.02                |
| cache_dynamic_page_cache | 96.00 MB     | 78.52 MB   | 1406   | 1.52 MB     | 0.02                |
+--------------------------+--------------+------------+--------+-------------+---------------------+

4 days after:

+--------------------------+--------------+------------+--------+-------------+---------------------+
| Tables in abcdefgh_db    | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+--------------------------+--------------+------------+--------+-------------+---------------------+
| cache_render             | 5120.00 MB   | 4689.84 MB | 116175 | 33.98 MB    | 0.01                |
| cache_dynamic_page_cache | 2348.00 MB   | 2207.66 MB | 39439  | 18.86 MB    | 0.01                |
+--------------------------+--------------+------------+--------+-------------+---------------------+
Wim Leers’s picture

The Local Actions and Local Tasks blocks also vary by URL, and they're enabled for all users by default, even though most sites only need those blocks for authenticated users (because it's possible that they're used on a wiki-like site that allows anonymous users to use them).

Those two blocks in particular can cause enormous amounts of cache entries. Which further confirms the need for this issue to be solved.

anavarre’s picture

Version: 8.3.x-dev » 8.4.x-dev
anavarre’s picture

#156 should the default for Local Actions and Local Tasks be to not show up for anonymous users, then? If yes, perhaps that should be a follow-up issue.

geerlingguy’s picture

I just noticed one site (the one that originally surfaced this issue for me) hit a full disk on the database server (100 GB) from the cache_render table expanding again. A drush cr fixed it for now, but this happened because for a few weeks, no code deployments were done (thus no cache rebuilds took place).

The entire database (besides cache_render) is under 1 GB :-/

mariagwyn’s picture

We have set Local Actions and Local Tasks blocks on the site mentioned by Aurelian above to 'authenticated' only. I am monitoring to see if this actually fixes the issue.

Berdir’s picture

No, it doesn't "fix" it. They result are a lot of useless cache entries, but they're practically empty and hardly take up any size. Any page that has those small blocks also has internal/dynamic page which are a lot bigger. It helps a little bit, and we're working on a few other things that all also help, but each only so much.

The only really reliable solution is using a cache backend with a fxed size like redis or memcache. A max age size as proposed here would help at least small to medium sized sites to not grow too large over time. And while this is not yet fixed, I recommend everyone who can't use redis/memcache to use https://www.drupal.org/project/slushi_cache.

What I still don't understand is why we can't just move forward with one simple setting as initially proposed, that's not perfect but it's simple and fairly reliable. I think that sites that grow so quickly that this isn't enough really shouldn't use the database backend anyway. i can work on updating the patch for that if we can agree on going forward with that.

catch’s picture

I'm OK with this going in without jitter, we can just move that to a follow-up and it's only an implementation detail in the database cache backend.

Wim Leers’s picture

The only really reliable solution is using a cache backend with a fxed size like redis or memcache.

This. I already wrote this in #8.3, almost two years ago.

Caching with unbounded storage makes no sense. In practice, every cache has a limited size. A database isn't designed to store caches, but if it is, then we should limit the number of items we store.

What I still don't understand is why we can't just move forward with one simple setting as initially proposed, that's not perfect but it's simple and fairly reliable. I think that sites that grow so quickly that this isn't enough really shouldn't use the database backend anyway. i can work on updating the patch for that if we can agree on going forward with that.

+1. If you update the patch, I promise swift reviews. Let's finally get this done. This is biting many D8 sites.

lukas.fischer’s picture

+1 for #163

hkirsman’s picture

+1 Just discovered my fairly small site with database of 15gb when it's really just 50mb.

xpersonas’s picture

Agreed. +1 for #163

phanosd’s picture

Amazing that this issue is actually running for so many years.

+1 just discovered my brand new flashy bling bling high traffic optimized website of 42.9mb on Friday night went onto 1GB over the weekend. Thanks cache render

Do we actually have a way to solve this that is practical and long term?

mariagwyn’s picture

Can someone clarify which of these patches may be workable at this point and what need to be fixed on them to get a working solution? Happy to apply a patch and test, but the existing patches are old.

Wim Leers’s picture

Assigned: Unassigned » Wim Leers

No movement in a month :( I promised swift reviews in #163 of a reroll. So I'm going to turn it around: I'll do a reroll, hopefully swift reviews will follow!

Wim Leers’s picture

To do what I said, the "reroll of the simple approach", I had to first find the simple approach. I went back all the way to the beginning.

  • The patch that I originally RTBC'd (on August 12, 2015) in #66 seems to be close to what we want. It was un-RTBC'd because there was discussion A) about it being specific to the database cache back-end, B) to make it a per-bin setting.
  • To address A, @catch rolled #79, as a PoC. He made it an actual working patch in #82 and #87.
  • Then @Fabianx started talking in #81 about "bounded DB backend", jitter and pseudo-LRU.
  • @catch added jitter support in #104.
  • @catch continued to refine the patch in subsequent comments, but it remained fundamentally the same. It was RTBC'd in #118 by @Fabianx.
  • @olli spotted a critical flaw that all of us missed, in #122.2. All prior patches fail for cases where code is using the cache factory directly!
  • Then in #129 @catch comes to the realization that he doesn't like this approach/patch. He thinks postponing this on #2699613: Set a shorter TTL for 404 responses in page_cache module and #2699627: url.path cache context for breadcrumbs is unnecessarily granular would be good,
    because those would already solve the problem for many sites.
  • Since then, both #2699613 and #2699627 landed, and the problems still exist. We see this on Acquia Cloud. And @Berdir also said it in #131.
  • @Berdir in #137 and again in #161 asks why we can't solve this with a simple setting. But of course, @olli spotted that critical flaw. So I started looking at how slushi_cache solves this. Turns out it solves this in a very simple way: by accepting a single "permanent TTL" setting instead of one per bin (Slushi Cache calls this the "melt time" 👏), it avoids the problem that @olli pointed out. And it is a simple decorator of the existing code. We could literally adopt Slushi cache as-is IMO. Or we could merge its code into DatabaseBackend.
    It does NOT meet two requirements that @catch posed though: A) per-bin configurability, B) being generic, instead of specific to the DatabaseBackend. But that's the thing, this problem is pretty much specific to the DatabaseBackend!
  • However, #152 and #153 show that they don't fix the entire problem. But they sure are reducing the problem's impact. The problem is that we don't know it will be enough per se.

So, if we want "the simple approach", I think we basically want to move Slushi Cache into core? 2526150-slushi_cache-170.patch does this. With two added features: per-bin overrides (but in a way that it won't break like @olli pointed out for earlier patches) + allowing to specify 0 or -1 to opt out (that was a flaw in Slushi Cache).


However … it looks like @Berdir and I are thinking something similar: the root cause is much simpler. @Berdir in #161:

The only really reliable solution is using a cache backend with a fxed size like redis or memcache. A max age size as proposed here would help at least small to medium sized sites to not grow too large over time.

I in #163:

Caching with unbounded storage makes no sense. In practice, every cache has a limited size. A database isn't designed to store caches, but if it is, then we should limit the number of items we store.

So here's a provocative thought… what if we do limit the amount of space that database tables can consume? LRU requires bookkeeping, so that's off the table. If you want LRU, use Redis or Memcached. But we can easily achieve fixed-size DB cache bins:

  1. Add a serial field to DatabaseBackend::schemaDefinition()
  2. Do something like DELETE FROM cache_data WHERE id <= ((SELECT MAX(fixed_size_id) FROM cache_data) - 1000)

So here's a patch implementing that. Including an upgrade path even. 2526150-bounded_database_cache_bins__count-170.patch

But then I wondered whether we could do even better: what if we could actually assign size limitations not by count but by space?
Turns out that MySQL at least allows this: https://stackoverflow.com/a/18238810 — you can query information_schema. And then if we construct the right query, we can get the number of megabytes consumed by a given table: https://stackoverflow.com/a/9620273. What remains then of course, is removing a number of rows to make it fit in that size again. We can't know the size of an individual record, so we'd have to assume they're all sized the same, and then remove the percentage that would make it fit within the budget again. Before you get too excited: this would then be MySQL-specific, which would be … sad. Anyway, very very rough PoC for this attached.

Wim Leers’s picture

So, #170 contains 3 approaches, all fairly simple:

  1. slushi_cache: moves https://www.drupal.org/project/slushi_cache into core, with minimal adjustments — this is probably the "simple" approach that was being asked for in
  2. count: makes the DB cache bins bounded (fixed size), by row count
  3. size: makes the DB cache bins bounded (fixed size), by disk space

Options 1 and 2 are database engine-agnostic. Option 3 is database engine-specific, and therefore an unlikely choice.

Option 1 does not protect against surges in activity, e.g. bots crawling your site, or malicious users attacking your site, option 2 does.

Wim Leers’s picture

And finally, a provocative related question: #2888838: Remove render caching?.

Wim Leers’s picture

Priority: Major » Critical

And I just realized this was made major instead of critical in #130 because it was though that two other issues would help (see #129), but they did not. Therefore the original criticalness still applies. Back to critical.

Wim Leers’s picture

Title: Make CACHE_PERMANENT configurable in the database cache — currently allows unlimited growth: cache DB tables of gigabytes! » Database cache bins allow unlimited growth: cache DB tables of gigabytes!

The current title implies one particular solution. As of #170, there is a very different option. So, retitling for clarity & simplicity.

The last submitted patch, 170: 2526150-slushi_cache-170.patch, failed testing. View results

catch’s picture

So here's a provocative thought… what if we do limit the amount of space that database tables can consume? LRU requires bookkeeping, so that's off the table. If you want LRU, use Redis or Memcached. But we can easily achieve fixed-size DB cache bins:

I was thinking about this issue this morning, and had a similar but different thought:

- have a per-bin settings-configurable number of rows
- when the size of the bin exceeds the number of rows, find the median created timestamp per-bin and delete all items created before that timestamp (i.e. about 50% of the table).
- do this on cron so there's no cost with cache sets.

This will clear some space without resulting in a complete cache clear, tables will be able to go over the limit for up to three hours.

My concern with both that approach and the one suggested by Wim is that it could lead to thrashing, but again memcache is the real answer to that.

Haven't reviewed the patches yet.

effulgentsia’s picture

For what it's worth, I like #176 a lot!

My concern with both that approach and the one suggested by Wim is that it could lead to thrashing

Really? What's a scenario in which #176 would result in thrashing? It's true that frequently used items could get purged shortly after their last use, but never shortly after their first use.

Wim Leers’s picture

#176 is a more advanced (and complicated) version of 2526150-bounded_database_cache_bins__count-170.patch. Why do we want to do stuff with median timestamp etc? Why not just look at the row count versus allowed row count, and nothing else?

I'm probably missing something :)

tedfordgif’s picture

@effulgentsia, I think @catch is saying that large sets of cache items could get cleared at once, and then regenerated at once (see jitter). It would probably depend on the cron interval, content, and access patterns.

The median timestamp approach in #176 appears intended to better preserve recently-created cache entries, but could also still allow large cache table size, and doesn't really approximate LRU. For example, if you get traffic that maxes the cache out during each cron interval, you cycle between ~50% and 100% of the peak cache size, assuming evenly-distributed timestamps. The cache hit ratio and % cleared might be unpredictable if the requests during a cron interval are front- or back-loaded. If you're getting the kind of traffic that makes the above concerns valid maybe you are using another cache backend, though.

The simple row-count approach provides a more predictable upper bound on DB size, which is the main issue here.

@catch, Wim's row-count approach only runs on cron, as you suggested.

catch’s picture

Really? What's a scenario in which #176 would result in thrashing? It's true that frequently used items could get purged shortly after their last use, but never shortly after their first use.

A bad but relatively common scenario I have in my head is something like this:

Cron run every ten minutes.

There 2 requests per second to unique pages from a crawler.

Mean of 20 render cache items per page.

This means:

1,200 unique pages visited in ten minutes.

24,000 cache items created in ten minutes.

Cache item limit of 10,000.

Whether you go down to 10,000 or 5,000 items, you're going to be deleting tens of thousands of rows per hour then recreating them again.

If we take a Drupal.org node with 300 comments by say 50 unique users, there could easily by 4-500 individual render cache items per page on a cold render cache, so you could hit a high row limit from a much smaller number of pages.

@tedfordgif thanks for answering Wim's 178 that's exactly the design and drawbacks - this wasn't a competitor to Wim's suggestions it was just what I was mulling when I saw this morning, and haven't really thought through beyond that (hence why I didn't post anything).

effulgentsia’s picture

Why do we want to do stuff with median timestamp etc? Why not just look at the row count versus allowed row count, and nothing else?

I think doing rows is fine. But why add the bounded_id column? Why not COUNT and ORDER BY created? Which I think would have the same effect of deleting the oldest first, which is the key feature in both #170 and #176. Or is the problem that adding an index on a decimal field is less efficient than on a serial field?

24,000 cache items created in ten minutes...Cache item limit of 10,000.

I think that's the problem with the #180 scenario. You shouldn't have a site that, as part of its normal operation, creates more cache items between cron runs than your limit. If it's an occasional blip, that's one thing, but if it's happening all the time, it means you need to do one of:

  • Increase the limit
  • Switch to a more advanced cache backend
  • Disable render caching
  • Write a module to disable caching conditionally (e.g., during crawler requests)
catch’s picture

You shouldn't have a site that, as part of its normal operation, creates more cache items between cron runs than your limit. If it's an occasional blip, that's one thing, but if it's happening all the time,

Question for me is - how would you find that out without manual auditing of cache tables? With the current issue you find out because your database is huge. One thing we could do is track on cron runs how many items were deleted vs. how many deleted last cron run or similar and log it, then at least there's an indication what's happening.

Wim Leers’s picture

#178: Exactly, it could still allow a large cache table size. Rather than a simple "FIXED SIZE, PERIOD", it's "while limit exceeded, delete ~50% and hope that will keep it within bounds". So I agree with The simple row-count approach provides a more predictable upper bound on DB size, which is the main issue here. — my intuition says it's better to KISS to solve the problem for now, then we can introduce more fancy algorithms later. "fixed size but only kinda and with deletion based on median timestamp" is conceptually a lot harder to understand than just "fixed size". So I fear doing this would introduce another Drupalism.

#180: I think this is about solving the common case. The default could be 10,000 entries. But for bigger sites, you'd want to tweak that number.

#181:

  1. Because I forgot that created even existed :) Happy to reroll if that approach is what we end up agreeing on.
  2. I agree that it's a matter of tweaking for those sites that see a lot of traffic. The default value should cater to small and medium traffic sites.

#182: We could do tracking, but yesterday's traffic is not necessarily an indication of today's traffic. Plus, then we're making it pretty complex.


It sounds to me like we've entered a bikeshed where the color is replaced with What should the default DB cache bin row count limit be? as the primary question and Ohhh do we want special weighted ways for making it fit within the limit? as the secondary question. The problem is that "the best weighted way" will depend on the particular site. That's why I'm arguing to go with the simplest possible solution.

catch’s picture

Rather than a simple "FIXED SIZE, PERIOD", it's "while limit exceeded, delete ~50% and hope that will keep it within bounds".

The row limit patch isn't a fixed size though, it's trimming down to the maximum size/row every cron run - i.e. it will immediately exceed the fixed size as soon as a new cache entry is created, until the next cron run.

Given that, and unless we run those queries on insert which would be a performance hit, I'm not sure what this gets us beyond the configurable TTL for permanent entries - with that the longevity of a cache entry is predictable at least and it'll prevent unbounded growth.

Wim Leers’s picture

Given that […] I'm not sure what this gets us beyond the configurable TTL for permanent entries - with that the longevity of a cache entry is predictable at least and it'll prevent unbounded growth.

With a configurable TTL for permanent entries, you still have an unbounded number of cache items. If your site has a traffic surge, or is DDoSsed or spidered or something else, then the gigabytes of data will set there a full day (assuming that configurable permanent TTL defaults to 86400 as in today's patches).

With a fixed size, you prevent that.

i.e. it will immediately exceed the fixed size as soon as a new cache entry is created, until the next cron run.

This is correct. But I think this is less problematic than the problem I described above.


I've laid out my thinking, but I don't feel strongly which approach should be the one. Either one is a big step forward. I'll let @catch choose.

mariagwyn’s picture

2526150-bounded_database_cache_bins__count-170.patch being tested now. Applied without error, no other apparent issues on the site. After only about an hour, I can say that the DB size has remained stable on a site which bloats (very) rapidly. As of right now:

| Tables in dbstage1 | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+-----------------------+--------------+------------+--------+-------------+---------------------+
| search_index          | 52.00 MB     | 20.58 MB   | 381744 | 22.58 MB    | 1.10                |
| search_dataset        | 18.00 MB     | 10.52 MB   | 1068   | 0.00 MB     | 0.00                |
| comment_field_data    | 15.00 MB     | 2.52 MB    | 10184  | 4.50 MB     | 1.79                |
| comment__comment_body | 15.00 MB     | 6.52 MB    | 8523   | 0.91 MB     | 0.14                |
| node_revision__body   | 12.00 MB     | 4.52 MB    | 1062   | 0.22 MB     | 0.05                |
+-----------------------+--------------+------------+--------+-------------+---------------------+
+-----------------------------+--------------+------------+--------+-------------+---------------------+
| Tables in dbstage2 | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+-----------------------------+--------------+------------+--------+-------------+---------------------+
| search_index                | 36.00 MB     | 14.55 MB   | 331914 | 13.55 MB    | 0.93                |
| cache_field                 | 29.00 MB     | 21.56 MB   | 9688   | 0.34 MB     | 0.02                |
| field_data_comment_body     | 27.00 MB     | 12.52 MB   | 17760  | 6.55 MB     | 0.52                |
| watchdog                    | 25.00 MB     | 16.52 MB   | 11584  | 0.56 MB     | 0.03                |
| search_dataset              | 17.00 MB     | 9.52 MB    | 1054   | 0.00 MB     | 0.00                |

For comparison, this is the same DB before a 'drush cr':

--------------------------+--------------+------------+--------+-------------+---------------------+
| Tables in dbprod1    | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+--------------------------+--------------+------------+--------+-------------+---------------------+
| cache_render             | 4912.00 MB   | 4580.58 MB | 91155  | 22.72 MB    | 0.00                |
| cache_dynamic_page_cache | 1692.00 MB   | 1482.86 MB | 29326  | 16.84 MB    | 0.01                |
| cache_data               | 232.00 MB    | 196.89 MB  | 72672  | 10.58 MB    | 0.05                |
| search_index             | 76.00 MB     | 40.48 MB   | 380034 | 24.33 MB    | 0.60                |
| cache_entity             | 76.00 MB     | 64.66 MB   | 7341   | 1.52 MB     | 0.02                |
+--------------------------+--------------+------------+--------+-------------+---------------------+
+----------------------------+--------------+------------+--------+-------------+---------------------+
| Tables in dbprod2 | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+----------------------------+--------------+------------+--------+-------------+---------------------+
| search_index               | 36.00 MB     | 14.55 MB   | 332827 | 13.55 MB    | 0.93                |
| cache_field                | 29.00 MB     | 21.56 MB   | 10739  | 0.34 MB     | 0.02                |
| field_data_comment_body    | 27.00 MB     | 12.52 MB   | 17835  | 6.55 MB     | 0.52                |
| watchdog                   | 25.00 MB     | 16.52 MB   | 11312  | 0.56 MB     | 0.03                |
| search_dataset             | 17.00 MB     | 9.52 MB    | 912    | 0.00 MB     | 0.00                |
+----------------------------+--------------+------------+--------+-------------+---------------------+

I will ping back later today or tomorrow with a new size report.

tedfordgif’s picture

@mariagwyn, it would be helpful to see the size of and number of rows in your cache_render table. How often do you run cron?

If you're using MySql, the following query might be helpful (replace 'drupal' with your database name):

SELECT table_name AS `Table`, TABLE_ROWS as 'Num. of Rows', round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB`  FROM information_schema.TABLES where table_schema = 'drupal' and table_name = 'cache_render'  ORDER BY (data_length + index_length) DESC limit 10;
effulgentsia’s picture

I've laid out my thinking, but I don't feel strongly which approach should be the one.

Seems to me the only difference between the two approaches is: when num_items_during_cron > max_items, should we prune to max_items (suggested by Wim) or to max_items/2 (suggested by catch)? However, the "2" in catch's formula seems pretty arbitrary. So I'd like to propose a 3rd option: prune to max_items * (max_items / num_items_during_cron).

What this does is make it so that we create ahead of time an amount of free space that's a function of how much we're likely to need.

In other words, if our limit is 10,000 but we're currently at 11,000, then we can prune to 10K * 10K/11K = 9091, which leaves nearly 1K for future growth prior to hitting the limit again. On the other hand, if we're currently at 40,000 (got to 4x our limit since the last cron run), then we'd prune down to 2,500. Of course, if >30K items got added since last cron run, then the freed up 7.5K will likely get used up pretty quickly, and we'll be back in a situation of exceeding our limit, but per #181, I think such a large disparity between the limit and what the site uses within a single cron interval is a bigger problem than this issue's scope should be.

mariagwyn’s picture

@tedforgif:

For comparison, here is prod. Note that it was cleared earlier today but has been growing in size:

--------------------------+--------------+------------+--------+-------------+---------------------+
| Tables in prodDB    | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+--------------------------+--------------+------------+--------+-------------+---------------------+
| cache_render             | 360.00 MB    | 299.92 MB  | 12937  | 3.53 MB     | 0.01                |
| cache_dynamic_page_cache | 120.00 MB    | 102.72 MB  | 2989   | 1.55 MB     | 0.02                |
| search_index             | 76.00 MB     | 40.48 MB   | 380034 | 24.33 MB    | 0.60                |
| cache_entity             | 68.00 MB     | 59.64 MB   | 7326   | 0.47 MB     | 0.01                |
| cache_data               | 32.00 MB     | 20.59 MB   | 7405   | 1.52 MB     | 0.07                |
+--------------------------+--------------+------------+--------+-------------+---------------------+

Num rows:

+--------------+--------------+------------+
| Table        | Num. of Rows | Size in MB |
+--------------+--------------+------------+
| cache_render |        12983 |     303.45 |
+--------------+--------------+------------+
1 row in set (0.00 sec)

On Testing site/DB, with db copied immediately after 'drush cr' on prod:

DB Size:

| Tables in dbstage1 | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+-----------------------+--------------+------------+--------+-------------+---------------------+
| search_index          | 52.00 MB     | 20.58 MB   | 381744 | 22.58 MB    | 1.10                |
| search_dataset        | 18.00 MB     | 10.52 MB   | 1068   | 0.00 MB     | 0.00                |
| comment_field_data    | 15.00 MB     | 2.52 MB    | 10184  | 4.50 MB     | 1.79                |
| comment__comment_body | 15.00 MB     | 6.52 MB    | 8523   | 0.91 MB     | 0.14                |
| node_revision__body   | 12.00 MB     | 4.52 MB    | 1062   | 0.22 MB     | 0.05                |

And number of rows:

+--------------+--------------+------------+
| Table        | Num. of Rows | Size in MB |
+--------------+--------------+------------+
| cache_render |           64 |       1.53 |
+--------------+--------------+------------+
Wim Leers’s picture

Seems to me the only difference between the two approaches is

Indeed. Well, roughly, because @catch's proposal is more complex than max_items/2.

In other words, if our limit is 10,000 but we're currently at 11,000, then we can prune to 10K * 10K/11K = 9091, which leaves nearly 1K for future growth prior to hitting the limit again.

I think this is simple enough to be "KISS". But I need to repeat my warning from #183: We could do tracking, but yesterday's traffic is not necessarily an indication of today's traffic.


@mariagwyn Thanks for sharing those numbers. It confirms that this patch is solving the symptoms. But your data doesn't yet tell us whether this would negatively impact response times. How did cache hit ratios change?

tedfordgif’s picture

I'm still in favor of the simple row limit. We're not trying to eliminate the (render_)cache entirely in this ticket, so emptying the cache to some percent of the max rows setting is not a goal. If we're getting the same traffic pattern during each cron interval, any cache entries we remove will be recreated, so we might as well just leave them in the cache if they fit in the "limit".

I initially liked the approach in #188 for having math, and I thought some more about the various ways of calculating the average size of the cache items, but I'm still firmly in the camp of KI(really)SS, meaning simple row limit. It's tunable both by cron frequency and row limit, and doesn't try too hard.

As mentioned previously, future gold-plating options include creating a db_cache_profiler module, which would help you tune those two parameters by sampling cache get/set, or db_cache_tuner, which keeps things tuned in real time. But the right approach is to get a real cache outside Drupal.

effulgentsia’s picture

For what it's worth, I agree. I think simply pruning to max_items is fine for resolving this critical. It means we're always over the limit, and therefore, we might need to figure out a good setting name for it ("max", "limit", and "bound" aren't entirely accurate), but I think that's ok.

andrewbelcher’s picture

+1 for the simple approach - solves the problem and is really simple to understand. re #192, perhaps an example in default.settings.php with help text along the lines of Set an upper row limit on the database cache bin, enforced on cron run. would suffice?

Wim Leers’s picture

Okay, seems like we have consensus for 2526150-bounded_database_cache_bins__count-170.patch. Now let's get that going:

  1. simplify (remove the new column being added, @effulgentsia in #181)
  2. test coverage
  3. resolve todo 1: make it configurable per bin
  4. resolve todo 2: bikeshed the size limit

First: rebasing, because #2775381: response.gzip is redundant and should be removed conflicts with it.

catch’s picture

Status: Needs review » Needs work

That patch doesn't remove the column. Edit: I missed the 'first rebasing', sorry...

Call it a 'target size' or 'target row count'? That removes the expectations around bounding/limits.

Wim Leers’s picture

Status: Needs work » Needs review
FileSize
2.53 KB
4.38 KB
  1. test coverage

Done.

Wim Leers’s picture

  1. resolve todo 1: make it configurable per bin

Done.

Wim Leers’s picture

  1. simplify (remove the new column being added, @effulgentsia in #181)

I started working on this, but this is not as trivial as #181 makes it out to be. This is not possible in MySQL:

But why add the bounded_id column? Why not COUNT and ORDER BY created?

You can't delete while at the same time ordering. You'd need need to SELECT all CIDs to delete in one query, then do a DELETE FROM cache_blah WHERE cid IN $ids. That could potentially be many IDs to gather. Or alternatively, do a join.

Both of those sound far more expensive than tracking an auto-increment ID and then having a simple single-scan query.

See:


P.S.: also: if there are many rows with the exact same timestamp, I'm not sure the behavior is defined.

Wim Leers’s picture

That leaves just

  1. resolve todo 2: bikeshed the size limit

@catch suggested 10K in #180, so setting that.

Wim Leers’s picture

So we still need to look into:

  1. simplify (remove the new column being added, @effulgentsia in #181)
  1. resolve todo 2: bikeshed the size limit

… but neither of them actually blocks commit. This patch has full test coverage. Point 1 would just be clean-up, and could happen at any time. Point 4 can take forever, but I sure hope it won't.

Also created a change record: https://www.drupal.org/node/2891281.

Please review!

The last submitted patch, 198: 2526150-bounded_database_cache_bins__count-197.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Status: Needs review » Needs work

The last submitted patch, 200: 2526150-bounded_database_cache_bins__count-200.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Wim Leers’s picture

Apparently there's one particularly well hidden creation of a DB cache bin. And it's a crucial one.

Wim Leers’s picture

#204 made me realize I didn't add support for "no maximum" yet. Fixed.

Also updated the Cache documentation in core/core.api.php.

mariagwyn’s picture

Testing patch: https://www.drupal.org/files/issues/2526150-bounded_database_cache_bins_...

For comparison:

BEFORE 'drush cr' and no patch:

| Tables in StageDB    | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+--------------------------+--------------+------------+--------+-------------+---------------------+
| cache_render             | 1088.00 MB   | 1013.91 MB | 34402  | 6.61 MB     | 0.01                |
| cache_dynamic_page_cache | 360.00 MB    | 328.83 MB  | 6955   | 4.59 MB     | 0.01                |
| search_index             | 76.00 MB     | 40.48 MB   | 380384 | 24.33 MB    | 0.60                |
| cache_entity             | 76.00 MB     | 63.64 MB   | 8323   | 0.47 MB     | 0.01                |
| cache_data               | 76.00 MB     | 61.84 MB   | 19656  | 3.52 MB     | 0.06                |
+--------------------------+--------------+------------+--------+-------------+---------------------+

AFTER 'drush cr', no new patch:

+-----------------------+--------------+------------+--------+-------------+---------------------+
| Tables in StageDB | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+-----------------------+--------------+------------+--------+-------------+---------------------+
| search_index          | 52.00 MB     | 20.58 MB   | 379779 | 22.58 MB    | 1.10                |
| search_dataset        | 18.00 MB     | 10.52 MB   | 981    | 0.00 MB     | 0.00                |
| comment_field_data    | 15.00 MB     | 2.52 MB    | 10172  | 4.50 MB     | 1.79                |
| comment__comment_body | 15.00 MB     | 6.52 MB    | 9996   | 0.91 MB     | 0.14                |
| node_revision__body   | 12.00 MB     | 4.52 MB    | 1085   | 0.22 MB     | 0.05                |
+-----------------------+--------------+------------+--------+-------------+---------------------+

No observable errors, will check back on DB size later.

The last submitted patch, 204: 2526150-bounded_database_cache_bins__count-203.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Status: Needs review » Needs work

The last submitted patch, 205: 2526150-bounded_database_cache_bins__count-205.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

tedfordgif’s picture

Status: Needs work » Needs review

Providing values less than -1 for $settings['database_cache_max_rows']* has the interesting theoretical behavior of deleting additional rows that were added between the select and delete queries. I'm ambivalent about needing to fix that.

If "$max_bounded_id - $this->maxRows" is negative, we don't need to run the delete query.

We may end up leaving fewer rows than the limit in the cache bin due to holes in the "$max_bounded_id - $this->maxRows" range introduced by deleting expired items. It doesn't matter which order the two delete queries are executed in. However, if you do the expire query first, then instead of MAX(bounded_id) do the select() equivalent of

SELECT bounded_id
FROM {this_cache_bin}
ORDER BY bounded_id DESC
LIMIT :this_max_rows, 1

you get the desired result. It's slower than the MAX() query, but this is only running in cron. Of course a null result would mean you don't need to run the delete query. Apologies for not having the time to make a proper patch.

dawehner’s picture

Issue summary: View changes

… but neither of them actually blocks commit. This patch has full test coverage. Point 1 would just be clean-up, and could happen at any time. Point 4 can take forever, but I sure hope it won't.

I think having the simple solution now is fine. People can always customize it and maybe provide something better tailored for their usecases, but these are not the people we should care about, they have fixed the problem already. We should more think about the people which have the problem right now.

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -326,6 +338,16 @@ public function invalidateAll() {
+      if ($this->maxRows !== -1) {
+        $query = $this->connection->select($this->bin);
+        $query->addExpression('MAX(bounded_id)', 'bounded_id');
+        $max_bounded_id = $query->execute()->fetchField();
+        $this->connection->delete($this->bin)
+          ->condition('bounded_id', $max_bounded_id - $this->maxRows, '<=')
+          ->execute();
+      }

I clicked on the various stackoverflow links from wim above and saw that all of them suggested to use "DELETE WHERE X IN (SELECT ...)". This would be one database query instead of two as suggested here. I think skipping the DB -> PHP -> DB layer is valueable, for example in terms of saving memory.I just had a look whether its possible, and I think given that \Drupal\Core\Database\Query\ConditionInterface::condition supports a SelectInterface as $value it totally seems.

+++ b/core/tests/Drupal/KernelTests/Core/Cache/DatabaseBackendTest.php
@@ -12,6 +12,13 @@
+  public static $maxRows = 100;

Nitpick: why is this public?

Berdir’s picture

  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -17,6 +17,15 @@
    +   * -1 means no maximum.
    

    should we maybe add a constant for -1, like we have for other things?

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -45,14 +54,17 @@ class DatabaseBackend implements CacheBackendInterface {
        */
    -  public function __construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider, $bin) {
    +  public function __construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider, $bin, $max_rows) {
    

    Always the BC question. I checked slushi cache and they don't override the constructor so wouldn't break (and it is also not really needed anymore after this I guess).

    I guess we can say that this should always be created through the constructor and should be safe. But it would also be relatively to default to 10k.

  3. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -43,7 +54,20 @@ public function __construct(Connection $connection, CacheTagsChecksumInterface $
    -    return new DatabaseBackend($this->connection, $this->checksumProvider, $bin);
    +    $max_rows_settings = $this->settings->get('database_cache_max_rows');
    +    // First, look for a cache bin specific setting.
    +    if (isset($max_rows_settings['bins'][$bin])) {
    +      $max_rows  = $max_rows_settings['bins'][$bin];
    +    }
    +    // Third, use configured default backend.
    +    elseif (isset($max_rows_settings['default'])) {
    +      $max_rows = $max_rows_settings['default'];
    +    }
    +    else {
    +      // Fall back to the default max rows if nothing else is configured.
    +      $max_rows = 10000;
    +    }
    

    would it help with readability and further changes here to put this in a separate protected method, like $this->getMaxRowsForBin($bin) or so?

  4. +++ b/core/modules/system/system.install
    @@ -1984,3 +1984,14 @@ function system_update_8401() {
    + */
    +function system_update_8402() {
    +  foreach (\Drupal\Core\Cache\Cache::getBins() as $bin => $cache_backend) {
    +    if ($cache_backend instanceof \Drupal\Core\Cache\DatabaseBackend) {
    +      db_drop_table("cache_$bin");
    +    }
    +  }
    +}
    

    shouldn't use full namespace and \Drupal::database()->schema()

    Also, I'd say it is possible to have a schema defined but not actually ever accessed it, then the table wouldn't exist. Wrap in a tableExists() to be sure to avoid any errors here?

catch’s picture

You'd need need to SELECT all CIDs to delete in one query, then do a DELETE FROM cache_blah WHERE cid IN $ids.

You could do this no?

$cid = SELECT cid FROM {cache_} ORDER BY cid ASC LIMIT 10000, 10001;

DELETE FROM {cache_} WHERE cid < $cid;

Or what dawehner suggests in #210 if that works cross-database.

Also #180 wasn't a suggestion for 10k, it was using 10k as an example for where something could go wrong with a moderate-sized site. Maybe 10k's OK but we could probably get a rough size estimate from @mariagwyn's queries in #206.

Wim Leers’s picture

  • Should now be green.
  • #206: do you see the new queries appear in MySQL's slow query log?
  • #210: nitpick: fixed.
  • #211.1: done.
  • #211.2: I think you're saying you have some light BC concerns, but not really in practice? Also: Slushi cache doesn't override the constructor because they don't allow per-bin overrides. Finally: I could remove this change by just calling Settings::get() globally if people think that's preferable.
  • #211.3: done. Note that I mirrored this after \Drupal\Core\Cache\CacheFactory::get().
  • #211.4: good catch! Fixed.

#210 + #212: RE DB query + schema:

This would be one database query instead of two as suggested here.

I think skipping the DB -> PHP -> DB layer is valueable, for example in terms of saving memory.

Or what dawehner suggests in #210 if that works cross-database.

Yes, but:

  1. \Drupal\Core\Database\Query\Delete does not support ->join() (\Drupal\Core\Database\Query\Select does support this)
  2. I literally can't figure out how to do a subquery with our DB layer. Zero mentions of "subquery" in database.api.php. This is maddening.
  3. I'd be happy to use a "direct" query (skipping our DB abstraction layer), but I don't know the differences between the various SQL databases we support well enough to know what then actually works cross-database.

Can somebody else please tackle #210+#212?

dawehner’s picture

I literally can't figure out how to do a subquery with our DB layer. Zero mentions of "subquery" in database.api.php. This is maddening.

Let me quote myself :)

I just had a look whether its possible, and I think given that \Drupal\Core\Database\Query\ConditionInterface::condition supports a SelectInterface as $value it totally seems.

So well, let's try to pass along a SelectInterface object :

$select = \Drupal::database()->select('node');
=> Drupal\Core\Database\Driver\mysql\Select {#7413}
>>> $select->fields('node', ['nid']);
=> Drupal\Core\Database\Driver\mysql\Select {#7413}
>>> $delete = \Drupal::database()->delete('node');
=> Drupal\Core\Database\Driver\mysql\Delete {#7415}
>>> $delete->condition('node.nid', $select);
=> Drupal\Core\Database\Driver\mysql\Delete {#7415}
>>> (string) $delete
=> """
   DELETE FROM {node} \n
   WHERE node.nid = (SELECT node.nid AS nid\n
   FROM \n
   {node} node)
   """

Note: This is not a valid query right now, but I think you get the direction :)

Now that I wrote about it, I remember that before mysql 5.6 select IN (subquery) has been potentially really painfully slow. Given that we require const MYSQLND_MINIMUM_VERSION = '5.0.9';, we should avoid it.

I like @catch's suggestion, as it requires no change in the DB schema as well.

Always the BC question. I checked slushi cache and they don't override the constructor so wouldn't break (and it is also not really needed anymore after this I guess).

IMHO we should try to make changes like this as less BC breaking as possible. Not though for just this reason, but also to show other people how to make changes like this in the least problematic way. Core should set a good example for others.

Status: Needs review » Needs work

The last submitted patch, 214: 2526150-214.patch, failed testing. View results

Wim Leers’s picture

@dawehner++
@dawehner++

  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -60,9 +60,9 @@ class DatabaseBackend implements CacheBackendInterface {
    +  public function __construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider, $bin, $max_rows = 50000) {
    

    This 50000 should match whatever default limit we settle on, it's 10000 at the moment.

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -345,12 +345,18 @@ public function garbageCollection() {
    +        $first_invalid_cid = $this->connection->select($this->bin)
    ...
    +          ->orderBy("{$this->bin}.cid", 'ASC')
    ...
    +            ->condition("{$this->bin}.cid", $first_invalid_cid, '<')
    

    This is ordering by cid, it should be ordering by created.

  3. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackendFactory.php
    @@ -36,12 +36,12 @@ class DatabaseBackendFactory implements CacheFactoryInterface {
    -   *   The settings array.
    +   *   (optional) The settings array.
        */
    -  public function __construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider, Settings $settings) {
    +  public function __construct(Connection $connection, CacheTagsChecksumInterface $checksum_provider, Settings $settings = NULL) {
         $this->connection = $connection;
         $this->checksumProvider = $checksum_provider;
    -    $this->settings = $settings;
    +    $this->settings = $settings ?: Settings::getInstance();
    

    +1!

  4. +++ b/core/modules/system/system.install
    @@ -1986,18 +1984,3 @@ function system_update_8401() {
    -function system_update_8402() {
    

    Yay!

  5. The test failure is in the newly added test coverage that ensures this is working correctly:
    1) Drupal\KernelTests\Core\Cache\DatabaseBackendTest::testGarbageCollection
    Failed asserting that 101 is identical to 100.
    

    Should be easy to fix. Also: yay for tests.

dawehner’s picture

This 50000 should match whatever default limit we settle on, it's 10000 at the moment.

Oops, I copied it from the wrong part of core.api.php

This is ordering by cid, it should be ordering by created.

Yes, we indeed need that, given that CID is not a numerical ID. Sadly this means we need a new index for 'created' which also means we need to do a little bit more on every insert.

Wim Leers’s picture

Status: Needs work » Needs review

Great! Let's see if this passes tests :) I can only find nits.

I think this is RTBC as soon as it is green, but I probably can't RTBC it because I worked too much on it.

  1. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -62,7 +62,7 @@ class DatabaseBackend implements CacheBackendInterface {
        *   (optional) The maximum number of rows that are allowed in this cache bin table.
    

    Nit: 80 cols.

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -505,6 +505,7 @@ public function schemaDefinition() {
             'expire' => ['expire'],
    +         'created' => ['created'],
    

    Supernit: the new line here has one leading space too many.

  3. +++ b/core/modules/system/system.install
    @@ -1984,3 +1986,20 @@ function system_update_8401() {
     }
    +
    +
    +/**
    

    Supernit: should be one \n, not two.

dawehner’s picture

Supernit: the new line here has one leading space too many.

Ha, I new something looked odd.

Wim Leers’s picture

I think this is RTBC as soon as it is green, but I probably can't RTBC it because I worked too much on it.

Who's willing to RTBC? :) 🙏

dawehner’s picture

I would put my bets on catch.

xjm’s picture

Berdir’s picture

#2892179: Follow-up for #2775381: clearing render cache in PerformanceForm::submitForm() is obsolete has an interesting impact on this as we go from 10k entries in cache_render to 10k entries in cache_render and cache_page each.

Was there a discussion on the default size? Can't seem to find it.

cache_render entries are usually fairly small but the page cache is the full HTML page. On an newspaper that I just checked, a randomly picked cached page for an article was ~100k plus a few k for cache tags and other metadata. According to https://www.keycdn.com/support/the-growth-of-web-page-size/, the average size of the HTML part is 67kb.

With 100k, 10k entries results in ~1GB, with the average 67k, it is, surprise!, around 670MB. Plus indexes, cache tags, ...

That's still a pretty large number for a small site, assuming they actually do get 10k different requests.

Does make we wonder if the default should be lower.

Wim Leers’s picture

No, the actual number wasn't actually discussed. I mistook @catch's mention of 10K in #180 as an actual suggestion. So I implemented it in #200. He clarified in #212 that he didn't mean to suggest that. He said there that we might want to base our limit off of the data in #206. So, this:

| Tables in StageDB    | Size on Disk | Size in DB | Rows   | Index in DB | Index-to-data ratio |
+--------------------------+--------------+------------+--------+-------------+---------------------+
| cache_render             | 1088.00 MB   | 1013.91 MB | 34402  | 6.61 MB     | 0.01                |
| cache_dynamic_page_cache | 360.00 MB    | 328.83 MB  | 6955   | 4.59 MB     | 0.01                |
| search_index             | 76.00 MB     | 40.48 MB   | 380384 | 24.33 MB    | 0.60                |
| cache_entity             | 76.00 MB     | 63.64 MB   | 8323   | 0.47 MB     | 0.01                |
| cache_data               | 76.00 MB     | 61.84 MB   | 19656  | 3.52 MB     | 0.06                |
+--------------------------+--------------+------------+--------+-------------+---------------------+

A newspaper site probably is higher end and has more KBs of HTML. @mariagwyn's data suggests a much smaller size. Still, I think it's fine to base it on 67KB (the average you cite) or 100 KB (the rounding up, and taking into account future growth).

Although I do want to call out that it's actually quite hard to have 10K entries in cache_dynamic_page — you need a pretty big/complex site to reach those numbers — many/most small/medium Drupal sites won't get to that point.

Honestly, I don't care too much about the exact number — we can still tweak that in the future. I think @Berdir may in fact be one of the people with the most data available to them, so I'd suggest … let @Berdir decide? :) As long as you can provide a rationale for the number you propose, @Berdir, so that we can revisit it in the future in a structured, reasoned way, then I'm happy to trust & use whatever "max rows" limit you propose!

Wim Leers’s picture

Oh and in #223 I think you meant #2889603: Split the internal page cache from the rest of the render cache, not #2892179 :)

It's important that we keep that in mind while doing this, indeed, but it also means you'll actually be able to control the render cache & page cache separately: you can limit the render cache to a much lower number than the page cache, or vice versa, depending on your use case. Right now, it's impossible to impose different limits. My point being: there's a good side to #2889603 for this issue as well!

xjm’s picture

(Don't think the tag change was intentional. Carry on.)

fgm’s picture

Although I do want to call out that it's actually quite hard to have 10K entries in cache_dynamic_page — you need a pretty big/complex site to reach those numbers — many/most small/medium Drupal sites won't get to that point.

Actually, on an almost-small site (15k nodes, 1 user) I'm currently working on, I just checked and see 11k items currently in the DPC bin, and 80k items set in that bin the last 3 days, so you definitely don't need to be newspaper-size to reach these numbers: I think a good number of D8 sites will be at that scale or above. For comparaison, render bin currently has 340k items, and got 525k items set in 3 days.

Wim Leers’s picture

@xjm Thanks!

@fgm 15K nodes is not "small" anymore in my book :) It all depends on your POV.
DPC roughly scales linearly with the amount of content in your case. That makes sense.

fgm’s picture

@Wim Leers : I meant small as in something that can be done by a two-person army, as opposed to media sites which tend to be in the umpteen millions of nodes, but OK.

On another train of thought, I had another idea this weekend for this issue:

  • Instead of focusing on purge-time strategies, how if we changed the logic at set time ? I mean, whenever an item is set to cache with Cache::PERMANENT, automatically add an actual TTL computed from a random-ish value within some site-dependent range (probably built from the total number of rows in all entity base tables or something like that),
  • retain use of Cache::PERMANENT to estimate staleness
  • but use /that/ value at purge time.

It would enable existing logic to keep working on items classified as permanent, while allowing expirations to proceeed normally. Plus, it would stop our use of a negative "magic number" in TTLs, allowing them to become unsigned.

Wim Leers’s picture

#229: the approach you propose is what this issue was doing originally. The problem is that you can then still easily grow beyond your bounds, because doing TTL estimation based on current number of rows in a cache table does not say anything about future row insertions.

The current patch also allows expirations to proceed normally, as long as the number of rows is smaller than the maximum number of rows.

fgm’s picture

Well, I was not advocating basing the estimate on the number of rows in the table, but on the sum of the number of rows in the entities base table, which is IMHO a decent heuristic of what to expect for many (although) bins, be it directly (render, page, entity, data) or indirectly (other bins) because sites with more content tend to have more complexity anyway, leading to bigger menus, containers, config, etc.

But sorry to have missed that in the initial patch, I read them too fast.

mpdonadio’s picture

  1. +++ b/core/core.api.php
    @@ -606,6 +606,17 @@
    + * For cache bins that are stored in the database, the number of rows is limited
    + * to 10,000 by default. This can be changed for all database cache bins:
    + * @code
    

    10000 is use as a literal a few places in the patch. Do we want a constant?

  2. +++ b/core/modules/system/system.install
    @@ -1984,3 +1986,18 @@ function system_update_8401() {
    +function system_update_8402() {
    +  foreach (Cache::getBins() as $bin => $cache_backend) {
    +    if ($cache_backend instanceof DatabaseBackend) {
    +      $table_name = "cache_$bin";
    +      $schema = Database::getConnection()->schema();
    +      if ($schema->tableExists($table_name)) {
    +        $schema->dropTable($table_name);
    +      }
    +    }
    +  }
    +}
    

    Consider the following:

    - I installed Drupal before this patch. DatabaseBackend got used.
    - I installed Memcache. MemcacheBackend (or whatever it is called) is used, but the old cache_* tables are still in the database.
    - I update Drupal w/ this patch applied, and this hook runs. Code here will only update bins (and therefore tables) that are DatabaseBackend.
    - I uninstall Memcache. Now DatabaseBackend, but there is a schema mismatch on the tables that existed but were essentially unused when Memcache was installed.

    I think we can safely remove the `if ($cache_backend instanceof DatabaseBackend)` check b/c the `if ($schema->tableExists($table_name)) {` check, and this will fix that scenario.

    ?

ndobromirov’s picture

@mpdonadio from #232.1 - I think it's better to be a configuration. The default will be 10 000. There will be no UI for it but it can be changed if it's deemed needed for the particular site.

Berdir’s picture

@mpdonadio did not suggest to not make it configurable, just that the we make the default value a constant, so it' easier to change in the future if we want to. At least that's what I understood.

And yes, agreed with the update function, that's a very good point. It's quite common that production uses memcache/redis but the database tables are still there and then developers sync that to their local site where they use the database. It's no longer that big of a deal because the only change is the added index, we no longer have the field.

Re #224, all our D8 sites uses redis, so I can't directly answer that question. What I can do is answer how it distributes on our smaller and bigger sites across the different bins given a fixed max size, which I think is also quite interesting.

On a small site which with limited amount of content, redis is using only 56M memory. 8200 keys. 2900 of those are in the render cache bin (about 480 of those pages), 330 in dynamic_page_cache, 1500 in data, 1000 in entity (including files and so on). So without limits, render cache seems to contain about 10x the amount of entries of dynamic page cache. But most of them are far smaller than page cache and dynamic page cache entries. It also means that the default limits we have would not do anything on this site. Which IMHO also means that we could go (even considerably) lower with our limit by default. As long as we clearly document that bigger sites might want to increase that if they have the storage to do so. Some of the reports in this issue claim their sites where shut down by their hoster at 250MB.

Looking at a bigger site, redis uses the available 1GB of memory, with 100k keys. The site doesn't use dynamic_page_cache currently, and the entries are distributed among the bigger bins like this:

render: 22712 (6300 of those is page cache).
data: 20138
entity: 14530
menu: 3563

The site has far more content than it currently stores in the cache (but most of that isn't requested frequently), so this is how redis distributes it based on its LRU algorithm. But most sites like that use memcache/redis, so I think we don't have to worry too much about those. Our main use case is smaller sites like the first one. And 10k seems quite large for those, 5k or even lower seems like it should be enough as a default, and I think we might want to document that sites that have very low limits on how much database storage they have should set it as low as 1k?

Wim Leers’s picture

Assigned: Unassigned » catch

#232.2 That's an impressive edge case that you caught there! :)

#234: document that sites that have very low limits on how much database storage they have should set it as low as 1k? — if that's the case, I'd rather set the default to 1K. We should ensure Drupal runs fine for those without technical knowledge. Those that run bigger sites are more likely to have the necessary skills/insights/knowledge to increase these limits. Besides, even "just" 1000 cache table rows gives you a fair amount of scalability gain: 1000 Dynamic Page Cache table rows already means hundreds of pages that barely cost any processing power.

Based on the argument made in #234, I'm fine with setting the default limit to 1000. Assigning to @catch to get his thoughts on this.

fgm’s picture

Based on the metrics in #234, I did a check on sizes on a running medium-size site (+/- 23k entities) for comparison over a 2 weeks period. None of the bins have any evictions, so that capacities are not saturated here.

  • data: 125k items, 800k items set, 170M = 1.3k/item
  • dpc: 15k items, 175k items set, 255M = 17k/item
  • entity: 20k items, 100k items set, 40M = 20k/item
  • menu: 6k items, 43k items set, 15M = 0.7k/item
  • render: 300k items, 2M items set, 730M = 2.4k/item

This seems to be pretty much in line with the figures in #234:

  • render contains 20 times the number of entries in DPC, but is only 3 times larger.
  • all the bigger items in render are actually page cache items
  • Also, over these 2 weeks, items tend to be re-set 6 to 10 times, making it more or less once every other day. Not sure what to make of that.
mpdonadio’s picture

Addresses my comments in #232.

+function system_update_8402() {
+  foreach (Cache::getBins() as $bin => $cache_backend) {
+    // Try to delete the table regardless of which cache backend is handling it.
+    // This is to ensure the new schema is used if the configuration for the
+    // backend class is changed after the update hook runs.
+    $table_name = "cache_$bin";
+    $schema = Database::getConnection()->schema();
+    if ($schema->tableExists($table_name)) {
+      $schema->dropTable($table_name);
+    }
+  }
+}

DatabaseBackend::removeBin() has the dropTable() inside a try/catch. Do we want one here? Not sure what to do if it actually throws an exception here...

catch’s picture

Based on the argument made in #234, I'm fine with setting the default limit to 1000. Assigning to @catch to get his thoughts on this.

I still think this is going to result in cache churn, such that sites end up doing cache rebuilds on expensive caches much more regularly, similar to the Drupal 7 bug where the entire cache was cleared every cron run - i.e. the cure could end up being worse than the disease here. So I'm really not sure what to do about this issue at all at this point.

FatherShawn’s picture

If the limit is configurable, and a reasonable value is chosen, then we will be able to tune this to the particular site. With some experience we can adjust the default in future versions.

mpdonadio’s picture

Per IRC w/ @catch and @berdir, we are going to lower the limit to 5000.

Made followup #2896473: Add monitoring/reporting of cache table pruning.

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

I updated the change record with the new default, also included a query to show size of all cache tables as well as mention the follow-up.

I think we are good to go here.

Andre-B’s picture

Since this is the default implementation for all cache database backends I believe the limit of 5k to not sufficient for most midsize drupal projects.

If I understand correctly this would be used for cache_entity, cache_menu, cache_render and the newly added cache_page. I did a quick check on one of our smallest projects which already exceeds 5000 entries on cache_entity, cache_menu and cache_page bins.

Regarding the limit for cache_page: do not forget that redirects, taxonomy term pages (inlcuding) pagers, search results + pagers count towards the same limit - I believe with any usual project this limit can and will be reached rather sooner than later.

Berdir’s picture

Then you can increase the setting, that's pretty easy and documented, if you do have the storage available for that.

And how old are the oldest cache entries in that table? Have they been created in the last 24h, last few days, more than a week? I'm not worried about it if they are older than a few days, chances are that especially the render cache entries (cache_page in 8.x only exists as of today or so) have been invalidated in that time already anyway and are dead weight (due to a node_list or rendered cache tag invalidation for example).

xjm’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Needs work

Does the page cache split affect the default size of the bin?

  1. +++ b/core/core.api.php
    @@ -606,6 +606,17 @@
    + * to 5000 by default. This can be changed for all database cache bins:
    + * @code
    + *  $settings['database_cache_max_rows']['default'] = 50000;
    + * @endcode
    

    This says both 5000 sets it to 50,000.

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -17,6 +17,25 @@
    +   * * @see ::MAXIMUM_NONE
    +   *
    +   * @var int
    

    @see should be at the end of the docblock, and also not have two asterisks.

  3. Should we also be actually adding a hunk to settings.php, or changing one? Should be adding any recommendations for users in the documentation on how to affect the setting? It seems like there's more documentation in the change record than there is here.
dawehner’s picture

This says both 5000 sets it to 50,000.

Well, the idea is to provide an example of a different value :)

dawehner’s picture

I'll work on expanding the documentation a bit.

xjm’s picture

Couple other small fixes:

  1. +++ b/core/core.api.php
    @@ -606,6 +606,17 @@
    + * @code
    + *  $settings['database_cache_max_rows']['default'] = 50000;
    + * @endcode
    + *
    + * Or per bin (in this case removing the maximum):
    + * @code
    + *  $settings['database_cache_max_rows']['bins']['dynamic_page_cache'] = -1;
    + * @endcode
    

    The code snippets are also indented incorrectly (one space instead of two).

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -17,6 +17,25 @@
    +   * -1 means "no maximum".
    

    "...when used as the maximum number of rows for the cache backend."

I thought about whether it made sense for the constants to only be exposed to the database implementation, and I think it does, so no actional feedback there.

dawehner’s picture

Status: Needs work » Needs review
FileSize
17.23 KB
1.78 KB
dawehner’s picture

I haven't seen #247 when posting it. I adapted the message in #247.2 to fit into 80 chars.

xjm’s picture

The docs improvements look good to me, thanks! One small nitpick:

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -18,6 +18,11 @@ class DatabaseBackend implements CacheBackendInterface {
+   * @see Read more how to change it at the cache paragraph in core.api.php.

@see doesn't really work this way, so we could probably use @link since we can't link a specific entry in the core.api.php
really.

Still wondering if we should put something about this in default.settings.php. There's a section with the header "Page caching: " that is actually just the comment for the example of $settings['omit_vary_cookie'] = TRUE;. Then there's sections on $settings['cache_ttl_4xx'] = 3600; and $settings['form_cache_expiration'] = 21600;, APC configuration, etc. If I were reading settings.php on configuring a particular site, I'm not sure how I'd find the relelvant section in core.api.php. (This also affects HEAD I think but we're expanding its impact here.) Thoughts?

xjm’s picture

+++ b/core/core.api.php
@@ -606,6 +606,29 @@
+ * For cache bins that are stored in the database, the number of rows is limited
+ * to 5000 by default. This can be changed for all database cache bins:
+ * @code
+ * $settings['database_cache_max_rows']['default'] = 50000;
+ * @endcode

Also, to clarify this, I think we should say "This can be changed for all database cache bins. For examplem to instead limit the number of rows to 50,000:"

anavarre’s picture

Sharing a bit more real-life data. 3 random sites, different traffic patterns.

site1

  • cache_render: 1230249 rows, 28493.25MB
  • cache_dynamic_page_cache: 13603 rows, 813.20MB

site2

  • cache_render: 1843819 rows, 29701.78MB
  • cache_dynamic_page_cache: 140986 rows, 19052.25MB

site3

  • cache_render: 766997 rows, 37994.30MB
  • cache_dynamic_page_cache: 152522 rows, 5526.31MB

This obviously doesn't account for changes in #2889603: Split the internal page cache from the rest of the render cache

Berdir’s picture

Status: Needs review » Needs work

@anavarre:

Not 100% sure if you are trying to say something specific with that data :)

Same questions about your data as I asked @Andre-B in #243. How old are those cache entries, is this data generated after hours, days, weeks? Easy to check with a select from_unixtime(min(created) from cache_render. Either way, your sites all seem to be a bit bigger, with 13k-150k entries in dynamic page cache, which means you have at least that many context-variations*urls. IMHO, it seems acceptable that a site like that needs customize the cache bin size if they want to have it use that amount of data.

If the data is indeed generated in just hours, then I guess that you could have a scenario that @catch is worried about, where are writing and deleting so much data that the overhead of cleaning it up is basically larger than what we save with caching it in the first place. But we have the follow-up to provide some reporting on that if we figure out a nice way to do that, so such cases are informed about this and can customize it. And I still think that scenario is preferable to a full disk or very large bill from their hosting provider that is auto-growing the disk ;)

I also still have a @todo for writing a blog post on optimizing cache_render usage a bit, a few things were already mentioned here, like hiding local tasks/actions blocks for anon users.

@xjm #247: About separate page cache bin, I asked/discussed the same in #223 and I think it is OK. On large enough sites, we go from typically 3 bins reaching the max (render, dynamic page cache, data) to 4. cache_page will likely be the largest of those, although probably comparable to dynamic page cache.

Setting to needs work for #251.

Andre-B’s picture

@Berdir - that data I mentioned can be generated in less than 12 hours. Usual bot traffic from google, microsoft, various seo tools and whatever robots you can imagine will easily fetch more than 5k urls within a day. If your site has no varnish in front of it to do time based caching you rely heavily on drupals page cache to be there. Regarding the invalidation aspect - yes, that was an issue and we fixed it basically by having two solutions
1) solution we had was to eliminate offending cache tags as much as we could, we had a scenario in which having a view not to be refreshed if the page itself wasn't changed was perfectly fine.
2) solution we came up with was to modify the page cache itself to behave like a time based cache and cache those pages for 30 minutes - 7 days.

with any of those solutions we implemented ways for editors to refresh pages before ttl has been reached - this probably can be achieved using purge module somewhat as well.

After all - defining which caches are meaningful and should be kept had the most impact and reduced the cache_render table the most for us. The IO waste for unused cache is important as well - especially if you have to cleanup that mess on cron.

flocondetoile’s picture

Status: Needs work » Needs review
FileSize
17.33 KB
547 bytes

Patch updated with #251

anavarre’s picture

#253

Not 100% sure if you are trying to say something specific with that data :)

Oh sorry, I was trying to give real-life data for sites I know are having moderate to very large cache_render and cache_dynamic_page_cache tables. I was expecting the number of rows would help with the target number that is being discussed here.

Easy to check with a select from_unixtime(min(created) from cache_render.

Not so easy, actually. With 1M+ rows this is a very expensive operation. What I can say is due to #2752961: No reliable method exists for clearing the Twig cache every code deploy needs to drush cr, which gives a good indication those are being rebuilt frequently (every 2 weeks in the worst case scenario seems reasonable)

a few things were already mentioned here, like hiding local tasks/actions blocks for anon users.

I tried that a few weeks ago as it seemed a very straight forward win. Couldn't notice a significant decrease in the table usage in the long run, unfortunately.

xjm’s picture

Status: Needs review » Needs work

#250 is also not addressed yet.

fgm’s picture

Status: Needs work » Needs review
FileSize
17.39 KB
1.63 KB

Here is a rerolled version accounting for the changes in #250 and also

- fixing the wording on the next comment paragraph which didn't make sense (to me)
- fixing the capitalization in the MySQL example in the same paragraph.

anavarre’s picture

+++ b/core/core.api.php
@@ -620,8 +621,8 @@
  * information_schema.TABLES where table_schema = '***DATABASE_NAME***' and
+++ b/core/core.api.php
@@ -620,8 +621,8 @@
  * limit 10;

If you're going to capitalize the SQL query, then we need consistency.

fgm’s picture

Indeed. Sadly, the name of the table is actually uppercase, so it looks a bit strange nonetheless.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

anavarre’s picture

Spent some time testing the patch (applies successfully). This looks to be working okay. Fresh new Drupal install.

cache_render doesn't exist yet:

$ drush @drucker.local sqlq "SELECT COUNT(*) FROM cache_render;"
 [error]  Query failed.

Create a bunch of test nodes:

$ drupal create:nodes article --limit="1000" --title-words="9" --time-range="1" --quiet

Then crawl a few of them:

$ for i in {1..100} ; do http --output=/dev/null http://drucker.local/node/$i ; done

Now we have enough stored items to start testing:

$ drush @drucker.local sqlq "SELECT COUNT(*) FROM cache_render;"
521

Allow only 50 entries in cache_render:

echo "\$settings['database_cache_max_rows']['bins']['render'] = 50;" >> settings.php

Running cron:

$ drush @drucker.local cron
 [notice] Starting execution of comment_cron().
 [notice] Starting execution of dblog_cron(), execution of comment_cron() took 23.59ms.
 [notice] Starting execution of field_cron(), execution of dblog_cron() took 0.85ms.
 [notice] Starting execution of file_cron(), execution of field_cron() took 14.66ms.
 [notice] Starting execution of history_cron(), execution of file_cron() took 18.39ms.
 [notice] Starting execution of node_cron(), execution of history_cron() took 0.59ms.
 [notice] Starting execution of search_cron(), execution of node_cron() took 25.96ms.
 [notice] Starting execution of system_cron(), execution of search_cron() took 9601.28ms.
 [notice] Starting execution of update_cron(), execution of system_cron() took 10.84ms.
 [notice] Execution of update_cron() took 1128.71ms.
 [notice] Cron run completed.

And now the cache_render table is down to 50 items as anticipated.

$ drush @drucker.local sqlq "SELECT COUNT(*) FROM cache_render;"
50
flocondetoile’s picture

About #250

Still wondering if we should put something about this in default.settings.php

+1 to put some words about this setting, and a code example, as many other settings available, in this file. And the settings.php file should be, I think, a much more file read than the core.api.php, at least on an initial look.

Berdir’s picture

Status: Needs review » Needs work
  1. +++ b/core/core.api.php
    @@ -606,10 +606,34 @@
      *
    - * @see https://www.drupal.org/node/1884796
    + * @link https://www.drupal.org/node/1884796 @endlink
      * @}
    

    This was the wrong @see being changed. Having a @see with a link is valid, what is not valid is the @see with just text a few lines below.

    @see https://www.drupal.org/docs/develop/coding-standards/api-documentation-a...

    ;)

  2. +++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
    @@ -17,6 +17,30 @@
    +   *
    +   * @see Read more how to change it at the cache paragraph in core.api.php.
    +   */
    

    this one.