Problem/Motivation
All our render cache entries like page, entities, blocks, views and so on are by default cached forever.
That's great. Except when you have a lot of data + a lot of variations and a cache backend that doesn't support something like LRU like the default database backend. Then your cache table will grow and grow and it's suddenly not so great anymore :)
You could argue that sites with a lot of data should use redis/memcache and not the database but also medium-sized with not so many users could run into problems eventually, also in regards to copying the database around since only few people remember to ignore cache tables.
Proposed resolution
There have been a lot of discussion around which files do you actually want to truncate:
- $max_amount / 2
- $max_amount
- $max_amount with some jittering:
Either the render cache service or the database backend (per-bin) should support an upper limit for the max-age. E.g. a day or a week. It would probably be great to vary that slightly for every write, e.g. +/- 5% so not all cache entries expire at the same time.
The simple solution of using just $max_amount was chosen for simplicity reasons.
Remaining tasks
User interface changes
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #297 | 2526150-297.patch | 17.55 KB | mpdonadio |
| #294 | interdiff-290-294.txt | 3.07 KB | Anonymous (not verified) |
| #294 | 2526150-294.patch | 17.56 KB | Anonymous (not verified) |
| #290 | 2526150-290.patch | 18.25 KB | wim leers |
| #267 | interdiff-266-267.txt | 1.06 KB | mpdonadio |
Comments
Comment #1
dawehnerI think making it per bin makes sense, as there might be other bins where this is useful.
Also having this as part of the cache configuration helps you to get an overview inside settings.php, in case needed.
Comment #2
fabianx commentedComment #3
fabianx commentedI think if we do that, we should have also some pseudo-LRU:
Proposed parameters are per bin:
- max_age: 86400
- max_age_jitter_percent: 5
- refresh_cache_time: 3600 // Do not let frequently accessed data expire, re-cache it instead.
< berdir> Fabianx-screen: that would be interesting, but then we need two max-ages.. since some items might have a "hard" max-age
< Fabianx-screen> berdir: Yes, but the cache backend can distinguish with a simple flag:
< Fabianx-screen> $data->is_default_max_age => TRUE / FALSE,
Comment #4
berdirI'm not a huge fan of introducing that complexity here. It would require storage changes in all the cache tables for one thing...
Comment #5
wim leersI don't think this actually fixes the problem.
We only check if a cache item is expired when retrieving that cache item. Which means that any render cache item that is accessed infrequently (or never again) also does not get removed.
The only way I think we can solve this is by evicting expired cache items on cron, i.e. implementing LRU anyway.
Comment #6
wim leersSo berdir pointed out that we already do what I just described, in
\Drupal\Core\Cache\DatabaseBackend::garbageCollection(), called bysystem_cron(). Ok :)Comment #7
fabianx commented#5: system_cron() runs:
is that not sufficient?
Comment #8
wim leersGiven that, +1 to the proposed solution in principle:
… but I still have some concerns:
Ageheader. We're able to base that on the render cache itemscreatedtimestamps, taking the maximum value of those. This change would mean theAgewould never exceed a week, for no good reason other than sites with huge amounts of content getting into trouble.DatabaseBackend::garbageCollection(), but on top of that also delete the oldest cache items (by looking at the created timestamp) and delete, say, the 1% oldest cache items.I don't think we can implement 1. I do think 2 is a useful thing to take into account, but it's not a solution. But 3 seems like a viable alternative solution?
I do think the already proposed resolution works, but then my second point is a concern. I wonder what you guys think of point 3 as an alternative solution? Just trying to make sure we end up with the best possible solution :)
Finally: what do other CMSes/web frameworks do for this?
Comment #9
dawehnerAt least typo3 just defines a default lifetime of an hour, see http://wiki.typo3.org/Caching_framework
Comment #10
fabianx commented#8:
I don't think it belongs in render cache at all.
It is a pure database backend cache implementation detail and should have no affect on other metadata that we store extra.
1. I think we should at least move cache.page to its own cache_page bin again.
2. For the age header we should not depend on internal data of the cache backend (that is an implementation detail and not part of the interface), but rather provide our own in #cache. (similar with downstream-ttl and ensuring proper max-age is working correctly).
3. I still think to automatically refresh still active cache items is the sanest solution to have pseudo-LRU in database backend - configurable via settings for only that backend. (so others that are likely LRU already can support it, but don't need to.)
That would also take care of #2 then as the about-to-expire cache item would be just refreshed (and retain its age), it is also in principle similar to how If-Modified-Since works - where the reverse proxy returns a 304 when the backend returned data that is still matching the cache and just returns its cached item.
( in this case the cache tags take care of that.)
And I don't think it is complex to implement.
It is one Settings call in __construct and 5 lines of code in get() and 5 lines of code in set plus test coverage.
---
Thinking about 1. again - keep smaller fragments for longer, keep larger fragments shorter.
That kind of policy is very opposite to LRU.
I am not sure, it is something we should think about, but it is out of scope here I think.
Comment #11
wim leersI said the opposite.
It is not, because it's related.
Everything I wrote in #8 was about this:
i.e. making suring we take all things into account.
Comment #12
fabianx commentedOkay, I actually would have thought the smaller fragments change less as they usually have less variation :p.
#11: The problem is 3) is not LRU and makes refreshing impossible, i.e. the oldest active caches are very likely the ones that have been created first on cold cache, but those are very likely front page and other things that are highly frequently used and that a LRU policy would not expire.
So we would do the exact opposite, while the goal is to expire things that have not been accessed since day X. (LRU)
But LRU is very costly, because you need a write for each cache_get().
The nice thing about the refresh-pseudo-LRU is that you only need a refresh when you are near the set expiration time.
e.g. for my own sites based on DB I would probably use:
expire=1 week (+/- 10% jitter)
working set=1 day
Everything that was accessed within 1 day would be kept permanently, which is the working set of the site.
So old GA crawled urls would expire, but my front page would not.
Edit:
This would also in general favor larger fragments (as they would get accessed more often) and expire the smaller fragments automatically.
e.g. if my front page does not change and all that is accessed is my front page, then all the block and entity caches would expire after one week.
If then my front page changes, obviously there is a larger cache miss, but that is the trade-off to make and would happen with any other LRU system. (interesting observation: LRU has problems with hierarchical cache items - need to some research if there is something in the literature on that).
However if there are uncached pages then obviously the most active blocks would still be refreshed (unless they depend on url / route obviously).
So I think the trade-off in general is okay.
---
Edit 2:
Putting a max number of items only works when there is a pressure to evict, e.g. that is making that again more complex.
Given our pseudo-LRU it would work like that, it would check the number of items configured and only expire those expired items until the threshold is reached again.
In this case however I would not restrict the working set, but rather only the number of expired items to keep.
So it is orthogonal to the approach and could help to keep some expired items around - however I don't think it is necessary.
Comment #13
thenchev commentedIn this aproach a user can define in settings.php a max age for the appropriate bin.
Comment #15
thenchev commentedUps forgot to change some stuff from earlier. here is the new one.
Comment #16
berdirOk, we also need some tests, for example in DatabaseBackendUnitTest. Set the setting, then write something and check the expiration, possibly directly in the database.
Thinking about it, we should probably not call out to Settings directly in this class. Instead, let's add an optional array $configuration = [] to the constructor, store it in $this->configuration and then pass in the settings for a specific cache bin only into a class in DatabaseBackendFactory.
Comment #17
thenchev commentedComment #19
berdirNo fallback to Settings here.
DatabaseBackendFactory needs to load them from Settings and then inject them.
$configuration is not meant to by keyed by $this->bin. DatabaseFactory just passed the configuration for that bin to it. But we should add a key inside to add additional settings later.
So: if (isset($this->configuration['max_expire']) ...
Don't change the arguments here.
Just call new DatabaseBackend directly in the new test.
The new test looks like a good start, but test with a few different values. Check with an expiratio that's one lower and one higher.
Comment #20
thenchev commented1) Moved loading of settings to DatabaseBackendFactory in constructor and now we are passing value for the bin if available.
Also changed to
2) Removed changes to this part. Calling now DatabaseBackend directly in tests and added 2 more tests like suggested
Comment #21
berdirThis should explain a bit better what exactly is in there.
Something like "Per-bin database backend configuration."
We also need to actually document this somewhere, often, that's done in default.settings.php, see other keys there.
Also, instead of settings, we could also use container parameters. The difference is that they would be configured in local.yml and passed in to the service using %some_key.
cache backends are currently in settings, that's why I suggested this. but I'm also fine with a container parameter.
Missing space after (int)
Comment #22
wim leersComment #23
thenchev commented1. Updated explanation
2. Added explanation in default.settings.php. Not exactly sure for this kind of settings what information is needed but here is my first try.
Since i added the explanation in default.settings.php and I already added the settings with Settings::get('cache_database', []); i left it like that. If container parameters is better i can change...
3. .
Comment #24
berdirInstead of a code example, just make the commented out code that example.
And we might even want to enable it by default and set it to a week or so.
To provide a data point here. We recently launched a small local news site. They didn't import their existing content so they started with a practically empty database.
However, cache_render grew to 1.3GB within days. As far as I could see, mostly due to 404 pages from google and other bots. That doesn't go away ever, without a manual full cache clear.
Comment #25
thenchev commentedComment #26
borisson_Removing the "needs tests" tag, because it already has tests. I also changed the
DatabaseBackendto make sure that anything passed in$configurationis an actual array.Also changed all array definitions to short syntaxes.
Comment #27
olli commentedClosed #1947852: Database cache backend garbage collection does not work with permanent entries as a duplicate.
$expire is a timestamp and configuration['max_expire'] is a duration, so I think we need to add REQUEST_TIME to
$expire > $this->configuration['max_expire']?So I would add
$settings['cache_database']['default'] = 86400;to settings.php? How about something like$settings['cache_database']['default']['max_expire'] = ...?One of these could use a $expire param greater than the 'max_expire' setting.
#24:
Should we add a default?
Comment #28
borisson_#27
No interdiff sadly.
Comment #29
thenchev commentedComment #30
thenchev commented@Ollie
1. You are right.
The tests have some flaw i will update them and will see for your other suggestions.
Comment #31
thenchev commented1) Fixed set method.
2) Default value and bin now in settings and factory class.
3) Fixed tests.
Comment #32
berdirCoding style: comments need to end with a ".". Could also be a bit clearer:
// If set, enforce an upper limit for the expiration to allow for garbage collection.
Suggestion:
// Load settings, default to an max expiration of the render cache bin set to one week.
$configuration and backend is always the same now, so you can just re-use it?
Also, add another test for setMultiple(). that's a completely separate implementation right now that needs the same check.
I would write this to allow for additional settings later. Make it a list (* - max_expire: ...) and also explain a bit what this is for. That some cache bins can get very big and persistent cache entries are never removed from them. Setting this will allow them to be garbage collected after the configured time, with the downside that they will have to be rebuilt if requested again.
Comment #33
hussainwebI have made changes as specified in #31 and some more nitpicks I found. I added a test as pointed out in 3 but I am not sure if that would work the same way as the earlier test. From what I read of the code, I am expecting this to fail.
Comment #35
berdirYes, I expected it to fail, now we need to fix the test fail :)
Comment #36
hussainwebAttempting a fix. General observation: The code and tests assume that the
max_expiresetting would be defined in settings.php or overridden by the factory. What if both fail to set it? e.g., if settings.php just defines some other (future) element in the 'render' array (but not max_expire), then the default won't be set which means max_expire won't be set at all. In that case, there would be no upper limit at all. Is this what is intended?Comment #37
hussainwebAdding another test to see the effect for the scenario I described in #36.
Comment #40
hussainwebFixing failures.
Comment #41
olli commentedWhat do you think about renaming 'max_expire' to 'max_lifetime'?
$max_expire assignment can be moved outside the loop.
Why do we need both? Isn't it enough to set it in default.settings.php?
Comment #42
berdir3. Good question, I'm not sure. Agreed that having both is a bit confusing. The common pattern is to have the example in default.settings.php commented out, we could do that?
Comment #43
wim leers#41.1 makes a lot of sense to me.
Let's use strict equality (
===).Comment #44
hussainwebFirst rerolling for conflicts from #2336627: Deadlock on cache_config (DatabaseBackend::setMultiple()).
Comment #45
hussainwebFixing for comments in #41.
For point 3, I think we have the default in the constructor to make sure the value is there in case it is removed from settings.php. As @Berdir pointed out, we could comment it out.
For point 1, I am not sure if max_lifetime makes more sense than max_expiry but I have renamed it.
Comment #46
hussainwebThe patch in #45 wasn't complete (missed default.settings.php), but the interdiff was correct. I had missed the comment in #43 and fixing that now.
Comment #47
olli commentedSorry I meant just rename the settings 'max_expire', variable $max_expire is fine.
Comment #48
wim leersComment #49
olli commented#42 So change the line in default.setting.php to
?
I think that setting (commented or not) should be before
, right?
Comment #51
wim leersSo this is an array, just in case additional settings are added in the future, right?
Are we sure that's a good idea?
These comments are effectively saying the same thing.
I think we can keep just the first comment.
And I think this would be clearer:
Check if a 'max_lifetime' setting is set for the current cache bin. If set, the lifetime of a cache item is limited to the specified 'max_lifetime', thus preventing unused cache items to linger forever.s/$max_expire?$expire/
s/expiration/lifetime/
No configuration (the default) allows cache items to live forever.
s/set/set()/
Comment #54
olli commentedRelated to #51.1, there is one issue that would add a similar per bin setting #1944740: Do not garbage collect as soon as items expire.
Comment #55
wim leers#54: that's a good argument! :) However, that issue is going with a very different approach to creating a per-bin setting. Which one is preferable?
Comment #56
fabianx commented#54 #55:
I do think those two issues are orthogonal actually:
- This issue ensures that CACHE_PERMANENT or > expire is mapped to a max-expire value
- That issue ensures that expired items are not directly removed
While the other issue would potentially pro-long the time items of this issue are kept in the cache, I do not think it would help with the original problem.
Also staleness currently is not yet (widely?) used in core.
Train of unrelated thoughts
It would be very useful to return an old value, then setup a queue or after-request item to re-populate data, but thats not done currently.
Also likely would want to use a max-age=0 (or -2) then anyway, to avoid having an expired item be stored in another cache item.
Anyway.
TL;DR: This issue should continue as is.
Comment #57
berdirI think we had ideas for like 4 more possible settings in this issue alone, see the jitter/refresh in #3 for example that would make sure that not all the cache items expire at the same time and we keep caches that are actively used. I think it's good to just focus on this single feature in this issue, contrib can add all kinds of additional features in alternative implementations, not sure the others are core worthy.
@Fabianx: The rebuild in a queue behavior would probably more likely be implemented using $allow_invalid = TRUE and not in the backend, since you actually need to know what you have to rebuild and the backend doesn't.
Comment #58
fabianx commented#57: I agree with #57.
I also agree on that and that was my idea. I was just saying adding even more support for $allow_invalid = TRUE is likely not that helpful for core itself as there is no(t much) usage in core of that feature right now.
Comment #59
wim leersOh, yes, I was not asking to bring more settings in! I was only making sure we made the right design choice for this setting: A) will we likely add another setting (yes), B) the other issue uses a different setting structure, which one do we want (this one).
So I think that effectively just leaves #51 to be addressed?
Comment #60
borisson_Fixed remarks from #51 and the test failures introduced in #45.
Comment #61
wim leersLooks great!
I'd like to see one more person's thoughts on
max_lifetimevs.max_expire. It was olli's idea, I +1'd it, but I wonder if some people are not in favor.Comment #62
berdirThat double check (in case no expire was set) is a bit confusing, we set the default and then set the value again.
The thing is that in that loop, we actually build two arrays. So we just put expire in there to get it out again. I tried for quite some time to come up with a better check here but wasn't successful.
What we should do anyway is move at least part of the comment down to the main logic and explain what it is doing.
based on above, I think we agreed to comment this out by default?
Comment #63
wim leers#62.1: the first comment covers both, currently. But let's let the first then say "set the correct default value: the maximum lifetime", and the second can give the details of the overriding that are in the first comment currently.
Comment #64
thenchev commentedremoved debug code.
#62.2 commented out.
and some comments for #62.1
Comment #65
berdirLooks good to me, +1 to max_lifetime.
Keeping the first comment makes sense to me, that explains the overall concept, the new one just those specific lines.
Comment #66
wim leersAlright, thanks for the many rerolls! :)
Comment #67
catchWhen reading this it looks like it will be CACHE_PERMANENT unless configured.
But for cache.render we're always setting it - should we reference that in the comment above.
If I was seeing the 7-day expiry, I'd probably check the backend first, then settings.php, then only after that find this in the factory if I was lucky.
Also reading this, I wonder whether we couldn't put the default configuration into the container definition for cache.render, then it'd be kept closer to where the bin is defined - the factory doesn't know anything about specific bins in HEAD.
Comment #68
hussainweb@catch: There is a small catch in the point 2 you have mentioned. It looks like we are always setting 'max_lifetime' even if not set in settings.php. That's not true though. If in settings.php, we set some other keys in that array but not max_lifetime, that value will no be set. That is where the CACHE_PERMANENT backup setting you pointed out in point 1 comes in.
Comment #69
catch@hussainweb - yes it's possible to not set it, I just think the factory is a strange place to put the default setting that most sites will end up using.
Comment #70
berdir@catch: Putting it in the definition (which is something we already do for the backend configuration) wouldn't really make sense for configuration that's specific for one backend IMHO. Or maybe if we name it database_configuration or something like that?
Comment #71
wim leersI'd very much prefer this to have in the container definition for a specific cache bin also. But if I remember correctly from an IRC conversation, that was considered impossible. I wish I wrote down on this issue why.
Comment #72
berdirI think we could extend it to also support anothe stuff in that tag, not sure how complex it can be.
But as I said, that would make it a generic cache concept and not something that's specific to the database backend.
Comment #74
catchComment #75
catchComment #76
catchComing back to this:
- I really think it should be per-bin. For example we'd want cache_bootstrap to stay with CACHE_PERMANENT, but maybe bring cache_render down to 12 hours.
- it's not really database specific, it's valuable for any cache backend that doesn't support LRU.
This means we need a term for non-LRU cache backends, I don't have one though.
Comment #77
ndobromirov commentedThere you have it: NonLruCacheInterface :).
Comment #78
catchWell there's that, but LRU is normally used as shorthand for 'storage size capped'.
APCu has a capped storage size, but this doesn't mean you can just throw items at it. Once it's full the behaviour depends on the ttl but the default is just to empty itself out entirely.
The way we deal with that is to not recommend APCu for any unbounded cache bins - instead it's restricted to ones where we expect the total size of cache items to be within a reasonable limit, so that it never gets full at all.
Comment #79
catchOK here's a suggestion that makes less assumptions about storage: 'configurable permanent ttl'.
If it's set, then cache storage can set 'permanent' cache entries to that length or higher, up to permanent again.
Here's an untested draft with the interface and CacheBackendInterface implementing it.
If the concept seems fine, we still need to sort out the configuration key for the bins, what it gets set to, and add tests.
I'm also going to bump this to critical, since the meta was triaged as critical and I don't think that meta gets fixed without this in some form.
Comment #81
fabianx commentedminimum vs. maximum mixup ...
----
I re-read my comments again:
1. I think this is a good case for a decorator class: e.g. BoundedDatabaseBackend + using a trait. then every cache backend can implement that by implementing the Interface + using the trait.
2. I still think that:
- a) Random Jitter is a good idea (it is horrible if all items expire at the same time)
- b) A mechanism for refreshing cache items that are accessed before the TTL expires is added (pseudo-LRU)
As the setting is per bin, a site owner can e.g. choose to use cache_page as Expire + Refresh to ensure larger items that are frequently accessed, stay in the cache, while smaller items that are not accessed at all would expire. (as Wim Leers wanted)
On the other hand someone that wants to cache smaller items, could just use a larger ttl for cache_render and a smaller one for cache_page.
To give my code some perspective.
Jitter is as simple as:
While refreshTIme would work like:
I see the problem though that we have nowhere to save that the item should be cached permanently instead of an item that truly is time dependent - however I think we should save the ttl in any case in addition to the expire timestamp. That would solve that for all cache backends.
e.g. practical example: homepage should never be expired, is accessed several times per hour, permanent items are cached for a week, e.g.:
So every item accessed within the delta of: minimumExpireTtl - refreshTime and originally cached CACHE::Permanent will be refreshed.
But it also happens only when the item is close to be expired.
And I do think this is superior than just expiring all content after 7 days when the last cache clear was, which is like a cache clear all (for cache_render e.g.).
And we all know that cache_clear_all can be problematic for little more frequently visited sites.
Comment #82
catchI struggle to see many people using this except the database backend, so not sure that's useful in practice. If the idea is to keep a reference implementation of the database backend then it seems good for that.
I also think that's a good idea, but it's 100% implementation - except for the new methods mentioning 'minimum' permanent ttl so it can vary upwards. I think we should add that before RTBC though yes.
The refresh idea is a good one, and I've done similar last-minute-update logic in custom code before, but it's going to require an additional column in the database schema and an update. Also we'd have to do that in a way that it doesn't blow up before the update runs which is not going to be straightforward - the column will be missing on any ->get() call.
Patch update that fixes the minor issues above, and doesn't explode doing drush cr.
If this passes, still needs tests, still needs jitter, and then we need to figure out whether to try the refresh or not here or move that to a follow-up.
Also not sure what to do about the change to CacheBackendFactoryInterface. We can't change the actual interface - as you can see from the patch I had to update the various factories including ones that don't need this.
We could allow the individual factories to add the optional parameter to the method, but then there's no interface to communicate what the param means.
Or possibly a new interface and something like addMethodCall..
Comment #84
fabianx commentedI think addMethodCall is the way to go via the 'calls' property.
I see that it would be nicer to add as a parameter, but changing the interface is a no-go obviously.
And adding getWithTtl() feels just wrong.
Maybe we could use a 'tag', which is like a named parameter and a compiler pass to add the tag information as a method call?
--
On the other hand could we not introduce a new FactoryInterface that all core factories that want to support this would implement?
Comment #85
catchI think an entirely new factory interface might be the way to go. End of the day here but will have a look tomorrow.
Comment #86
catchSmall update:
- added the new factory interface
- switched to an $options array for the service argument and new factory interface. That would potentially allow us to make things like the refresh and jitter configurable if we wanted to.
If that looks OK, will start looking at some of the rest from #81.
Comment #87
catchComment #88
alexpottShouldn't this be
minimum_permanent_ttl?Comment #89
catchYes it should.
Comment #90
alexpottThe setting of the expire time to the minimum ttl looks like it can be tested.
Comment #91
fabianx commented#88, #89: But is it not in fact a maximum?
e.g. PERMANENT == unlimited, while this is the maximum ttl time that is allowed to be set.
Though that is wrong, too. As you could set a higher value ...
I am still confused by the wording.
Agree with the tests.
Comment #92
wim leersI'm also confused. I'd also expect this to be called "maximum". But, really, it's not even maximum or minimum, it's just mapping
PERMANENTto some configurable number. Perhaps a clearer name would therefore be .I'd also go with
max-agerather than TTL becauseCache-Control: max-ageand because https://www.drupal.org/developing/api/8/cache/max-age.Comment #93
catchDefinitely needs tests - should be easy to add to the database backend tests.
On $minimum_permanent_ttl - I don't like the name either, clearly it's confusing even me.
In practice:
- it should be configurable via the container, as in the patch.
- backends should set the ttl no lower than the configured ttl, but for example if we use jitter, then they're going to set it higher than that (so it's not a maximum)
- backends could do other things like whitelist particular cids as permanent
- they could also ignore it and use permanent.
What about $overridePermamentTtl?
I'm not at all sure about max-age - we use expire in the database backend already which is already confusing, and this isn't going to be used for Cache-control header anywhere. When it's applied to the cache_render bin that may not be obvious at all.
Comment #94
wim leersTTL === max-age, no?
Comment #95
catchYes it's the same thing, I just associate it with the Cache-control header more. Whereas ttl I think more of apcu or similar. I guess it's a case of whether that's a good thing or not that it has the association with the header.
Comment #96
catchOr $override_permanent ;)
Comment #97
wim leersUnfortunately the cache back-end interface is designed for absolute timestamps, so it's neither TTL nor max-age. Otherwise just following its existing terminology would've been perfect…
Comment #98
catchWell except the argument is a ttl - we convert it to an expiry in the backend. Similarly CACHE_PERMANENT is treated as a ttl - it really means never as opposed to just being a far future expiry.
Comment #99
wim leersIt's not, it's an absolute time: "infinity". But PHP doesn't support infinity, so we map it to -1.
True. But that's my point: we wouldn't need to discuss "TTL" vs "max-age" if we could just have be the same kind of thing as what we have, which is absolute numbers: timestamps.
So, we simply can't be consistent with what we have. If you then prefer TTL, works for me.
Comment #100
catchIs now + infinity different to just infinity?
Added tests. Also added some basic jitter to the database backend.
Not doing anything with the wording yet - I'm not tied to ttl at all, but not sure we have a better option yet.
No interdiff - will start posting them from this point though since this is more or less minimally viable now and the patch is getting bigger.
Comment #102
catchArggh missing new files.
Comment #103
wim leersNewb! :P
Still inconsistent.
Do you mean an
assert($service instanceof SomeInterface)?This last sentence is meant to communicate that there may be jitter? Not very clear to me.
Also, why >=, and not <=? Documenting the rationale for that would be good I think?
Missing newline between these two.
Missing
@return.Shouldn't this 1.5 be a constant on the class as well, to make it clear what it's for?
Extraneous
\n.Comment #104
catchShould cover everything in #104.
Comment #106
fabianx commentedI think my vote is for override_permanent_maxage or override_permanent_ttl.
That says much clearer that this is an override for permanent cache items.
Comment #107
catchWas the test failure.
Switched to $override_permament_ttl for now.
Comment #108
catchNeed to use the constant in the test also.
Opened a follow-up to discuss the refresh - that's not necessary to fix the critical for me, and having to do a schema update to cache tables feels very 8.2.x-only. #2705993: Add auto-refresh/quasi-LRU to database backend
Comment #109
catchAdds two new interfaces and updates the container, so tagging for rc target triage.
Comment #110
catchComment #111
fabianx commentedI think this would be much easier to read with a helper function.
The rest looks great to me.
Comment #112
catchYes that's better.
Comment #113
fabianx commentedRTBC - Looks great to me - except perhaps test coverage might need to be extended a little?
Comment #114
wim leersWhy not make this 86400 a container parameter itself, so you only have to override that one parameter on your sites if you need a different TTL?
s/Get/Gets/
Missing type.
Why not let this extend
\Drupal\Core\Cache\CacheFactoryInterface?s/ttl/TTL/
"override" in variable name vs "overridden" in documentation.
Note this is not even the overridden value, this is literally the value to use. It just might be overridden.
So, I think
$permanentTtlwould be a better name.s/Calculate/Calculates/
Wrong doxygen.
Comment #115
catch#1 done.
Minor points also done.
#6 done and wish we'd thought of that about 20 comments ago, much better.
#4 because not sure if it's clearer since they both have a single method with the same name and different arguments. But PHP allows it so why not, don't think it's less clear, just a bit neutral.
@Fabianx on tests. I thought about testing that the expires doesn't return a valid cache entry, but we already have tests for that, and it's tricky to test with REQUEST time without manipulating the value, which would then invalidate the test here. Could potentially test ChainedFast explicitly though.
Comment #116
catchAdded explicit testing of the ChainedFastBackend when both the fast and consistent backends have permanent overridden.
Comment #117
catchThat test passed but didn't test anything very interesting, better one.
Comment #118
fabianx commentedThat is enough on tests for me and all feedback has been addressed.
I think this should be ready now for core committer review.
Comment #119
alexpottFrom https://secure.php.net/manual/en/function.rand.php
I think we need to use mt_rand() here. Patch attached does this.
Patch includes a few coding standards fixes too.
Comment #120
fabianx commentedRTBC on the interdiff
Comment #121
jibranSeems ready to me as well but I think we should improve some documentation.
I must say some awesome documentation. Can we add some of it above the
JITTER_MULTIPLIERor at least add @see?We can also explain this line as well as per #119.
Comment #122
olli commented', '->': '?This doesn't work with code that uses cache_factory directly like render_cache.
missing
@var?->setPermanentTtl(Comment #123
catchAddressing #121 and #122.
However #122-2 is a good spot, and not sure what to do about that yet.
Comment #125
catchRebased.
Comment #128
catchComment #129
catchTo be honest the more I work on this patch, the less I like it. For a site that doesn't get many 404s, it could degrade performance rather than improving it, and the back and forth here shows that the concept is quite hard to grasp. There are also other options here such as setting an arbitrary limit on cache items and pruning in garbageCollection if it goes over - I like that option less than this one though.
I'd be tempted to bump #2699613: Set a shorter TTL for 404 responses in page_cache module and #2699627: url.path cache context for breadcrumbs is unnecessarily granular to critical and mark this postponed on those. They'll benefit sites running any cache backend and if we can get them tested on a site like @casey's that's suffering from this issue, it'd then allow us to see whether this issue is really critical by itself.
Comment #130
catchDoing that for now.
Comment #131
berdirThat happened, this is still a problem. 404 is just one example, you can also just use arbitrary query arguments to fill up the cache.
Comment #132
xjmComment #133
fabianx commentedWhat might make more sense is to add a "lurker" cron job, which expires cache entries that are actually expired per the cache tags.
Though I am not sure if our structure allows for that - given we only have a checksum in the DB backend. However for serialized render arrays we could do it at least as we there have the cache tags information.
Comment #135
geerlingguy commentedI just ran into this after launching a new D8 site that uses the database backend (currently memcached or redis isn't an option); see #2814015: cache_render table becomes extremely large over time.
We don't have anything out of the ordinary, but there are a couple views exposed filters in blocks, and those blocks have 100,000+ rows in the cache_render table (one per filter set and url permutation—the block only appears on one route, but the path can have filter arguments like
?abc-xyz), which never expire unless I rundrush crmanually (then they just fill up again until the next cache clear).We were dealing with a database around 10-15 MB total before launching the prod site; now that there's live traffic, the database balloons out to 10+ GB every day until we clear the cache. Backups and full database copies take 30+ minutes instead of 30s...
I believe I have seen maybe 5-10% of all Drupal sites I've ever worked with using a cache backend other than the database, and I see no reason that will change with Drupal 8—this could be a huge problem for anyone trying to run a small-to-medium sized Drupal 8 site on shared hosting with even pretty basic sites.
Comment #136
berdirIf a single day is enough to generate such an amount of cache recoreds, this issue alone will not solve it for you unless you plan to set the expiration to just hours.
Having that many variations sounds like you want to prevent those blocks from being cached at all because they're probably by user or so and caching implies more overhead than not doing that. Try just disabling the cache in the views settings, that should bubble up. Page cache will still cache them and for anonymous users, dynamic page cache + optionall bigpipe will allow to cache the rest of the page and just inject those blocks, while bigpipe will allow you to serve the rest of the page while still building those views.
Comment #137
berdirThat said, I agree that this is a critical scalability issue and I also still think that we should try to get a minimal* fix in and document that nicely somewhere, possibly even have a sane default in core.
* I still believe that jitter and that isn't necessary for a minimal solution, as that IMHO occurs naturally on most sites, as you get the requests and cache writes distributed over time. And sites big enough for that to become relevant probably should look into alternative cache backends anyway.
With the rise of container based hosting like pantheon, platform.sh, acquia and so on which afaik all offer redis/memcache, I do think that using such a backend will be more common than it used to be.
Comment #138
geerlingguy commentedClosed #2814015: cache_render table becomes extremely large over time as a duplicate.
And yes, more common... but at least 96% of the sites I've ever dealt with are not on Pantheon, Platform.sh, Acquia, etc. There's a silent majority of Drupal sites out there running on Hostgator, Dreamhost, preconfigured DigitalOcean VPSes, etc., and this problem (along with some others) doesn't help silence the mantra of "Drupal 8 is not for shared hosting"...
I agree with a minimal fix for now—the jitter doesn't seem (to me) like it would be a major issue/blocker.
Comment #139
fabianx commented#138: I do agree that some kind of expiration would be nice.
However please be aware that there is a working contrib backend by larowlan already:
https://github.com/larowlan/slushi_cache
I still think that using jitter is simple and can be achieved easily (and not sure what catch' last patch is missing anyway except for reviews and tests?) and we should:
- a) make simple changes
- b) ensure it will work for more complex cases, too.
Core has a higher responsibility in that regard.
--
To the issue above:
Likely it is indeed best to max-age=0 the view result with that many permutations (are they all valid?) and have the whole block be placeholder-ed automatically.
Comment #140
sajiniantony commentedI have upgraded the site to 8.2 version, it shows issue with the size of cache table and also deadlock error.Any help is appreciated
Comment #141
xmacinfoThis is a bold statement!
100% of the Drupal sites I am working with are not on those platforms.
Comment #142
catchDo you have 10s/100s of thousands of requests changing those filters then? I wonder a bit whether caching with views exposed filters wouldn't benefit better from some flood control or something - stop entries getting there in the first place. Would have to do the flood control when rendering the view and then set max_age=0 when flood control kicks in.
Comment #143
wim leersWell, that is a huge flaw in your backup system: you should never ever back up the
cache_*tables.But of course we should still solve the problem described in this issue.
Comment #144
catchUnassigning me since I'm not actively working on this at the moment.
Comment #145
wim leersAFAICT we have agreement this is a good idea.
It seems the last point of disagreement is the inclusion of jitter or not?
Comment #146
wim leersThis problem also came up in #2828639: [BUGFIX] getRelated is using wrong cache tags.
And I know that some Acquia customers have run into this very problem too.
Comment #147
larowlanFWIW Slushi cache solves this via contrib in the meantime.
Comment #148
malcomio commentedI observed a similar issue on shared hosting, which did not seem to be addressed by installing Slushi cache.
Some shared hosting providers will shut sites down with minimal warning if database sizes grow too large - as geerlingguy says in #138, issues like this are likely to push more small sites away from Drupal.
Comment #150
jelle_sWe experienced this error when a google bot was crawling our facetted search, the cache_render table was growing to a couple of GB within minutes. We had to put our site in maintenance mode and exclude the facetted search in robots.txt to deal with it. Site is up and running again now, with a cronjob clearing the cache every 6 hours just to be sure... Not exaclty ideal. We thought about putting varnish in front of it, but since the varnish bin is shared with other websites it might compromise those if it grows this fast...
Comment #151
catch@malcomio Slushi cache should work fine for shared hosting, what makes you think it wouldn't? You'd have to configure the TTL time quite low if you're getting crawled though.
Comment #152
malcomio commented@catch on the side project site where I experienced this issue, it still occurred after installing Slushi cache, although looking at settings.php now, I think that it may not have the correct configuration. I'll check and if I'm still having problems, I'll create an issue on the module.
Comment #153
haasontwerp commentedSubscribing, experiencing this for a site i've built. Slushi cache won't help (or i'm probably setting it wrong). DB without cache is about 75 MB big, it grows to 250+ MB in one day. And then the hosting provider just shuts down database user access, so i get a understandably angry client on the phone and i'm getting tired of manualy clearing cache x times a day. Bigger database won't fix this, it will just take a little longer to fill up the DB to max size.
Comment #154
wim leers@Jelle_S: Faceted search is the sort of module that should take responsibility to not cache every single variation. Because it's a module that allows insane numbers of variations. So of course it does not make sense to cache them all.
Comment #155
anavarreJust registering a real-life example with a 8.2 site. PHP 5.6, core modules + Honeypot and a custom multimedia-related module, no memcache or alternate cache backend.
Rebuilt caches / Truncated tables
4 days after:
Comment #156
wim leersThe Local Actions and Local Tasks blocks also vary by URL, and they're enabled for all users by default, even though most sites only need those blocks for authenticated users (because it's possible that they're used on a wiki-like site that allows anonymous users to use them).
Those two blocks in particular can cause enormous amounts of cache entries. Which further confirms the need for this issue to be solved.
Comment #157
anavarreComment #158
anavarre#156 should the default for Local Actions and Local Tasks be to not show up for anonymous users, then? If yes, perhaps that should be a follow-up issue.
Comment #159
geerlingguy commentedI just noticed one site (the one that originally surfaced this issue for me) hit a full disk on the database server (100 GB) from the cache_render table expanding again. A
drush crfixed it for now, but this happened because for a few weeks, no code deployments were done (thus no cache rebuilds took place).The entire database (besides cache_render) is under 1 GB :-/
Comment #160
mariagwyn commentedWe have set Local Actions and Local Tasks blocks on the site mentioned by Aurelian above to 'authenticated' only. I am monitoring to see if this actually fixes the issue.
Comment #161
berdirNo, it doesn't "fix" it. They result are a lot of useless cache entries, but they're practically empty and hardly take up any size. Any page that has those small blocks also has internal/dynamic page which are a lot bigger. It helps a little bit, and we're working on a few other things that all also help, but each only so much.
The only really reliable solution is using a cache backend with a fxed size like redis or memcache. A max age size as proposed here would help at least small to medium sized sites to not grow too large over time. And while this is not yet fixed, I recommend everyone who can't use redis/memcache to use https://www.drupal.org/project/slushi_cache.
What I still don't understand is why we can't just move forward with one simple setting as initially proposed, that's not perfect but it's simple and fairly reliable. I think that sites that grow so quickly that this isn't enough really shouldn't use the database backend anyway. i can work on updating the patch for that if we can agree on going forward with that.
Comment #162
catchI'm OK with this going in without jitter, we can just move that to a follow-up and it's only an implementation detail in the database cache backend.
Comment #163
wim leersThis. I already wrote this in #8.3, almost two years ago.
Caching with unbounded storage makes no sense. In practice, every cache has a limited size. A database isn't designed to store caches, but if it is, then we should limit the number of items we store.
+1. If you update the patch, I promise swift reviews. Let's finally get this done. This is biting many D8 sites.
Comment #164
lukas.fischer commented+1 for #163
Comment #165
hkirsman commented+1 Just discovered my fairly small site with database of 15gb when it's really just 50mb.
Comment #166
xpersonas commentedAgreed. +1 for #163
Comment #167
phanosd commentedAmazing that this issue is actually running for so many years.
+1 just discovered my brand new flashy bling bling high traffic optimized website of 42.9mb on Friday night went onto 1GB over the weekend. Thanks cache render
Do we actually have a way to solve this that is practical and long term?
Comment #168
mariagwyn commentedCan someone clarify which of these patches may be workable at this point and what need to be fixed on them to get a working solution? Happy to apply a patch and test, but the existing patches are old.
Comment #169
wim leersNo movement in a month :( I promised swift reviews in #163 of a reroll. So I'm going to turn it around: I'll do a reroll, hopefully swift reviews will follow!
Comment #170
wim leersTo do what I said, the "reroll of the simple approach", I had to first find the simple approach. I went back all the way to the beginning.
because those would already solve the problem for many sites.
slushi_cachesolves this. Turns out it solves this in a very simple way: by accepting a single "permanent TTL" setting instead of one per bin (Slushi Cache calls this the "melt time" 👏), it avoids the problem that @olli pointed out. And it is a simple decorator of the existing code. We could literally adopt Slushi cache as-is IMO. Or we could merge its code intoDatabaseBackend.It does NOT meet two requirements that @catch posed though: A) per-bin configurability, B) being generic, instead of specific to the
DatabaseBackend. But that's the thing, this problem is pretty much specific to theDatabaseBackend!So, if we want "the simple approach", I think we basically want to move Slushi Cache into core?
2526150-slushi_cache-170.patchdoes this. With two added features: per-bin overrides (but in a way that it won't break like @olli pointed out for earlier patches) + allowing to specify 0 or -1 to opt out (that was a flaw in Slushi Cache).However … it looks like @Berdir and I are thinking something similar: the root cause is much simpler. @Berdir in #161:
I in #163:
So here's a provocative thought… what if we do limit the amount of space that database tables can consume? LRU requires bookkeeping, so that's off the table. If you want LRU, use Redis or Memcached. But we can easily achieve fixed-size DB cache bins:
serialfield toDatabaseBackend::schemaDefinition()DELETE FROM cache_data WHERE id <= ((SELECT MAX(fixed_size_id) FROM cache_data) - 1000)So here's a patch implementing that. Including an upgrade path even.
2526150-bounded_database_cache_bins__count-170.patchBut then I wondered whether we could do even better: what if we could actually assign size limitations not by count but by space?
Turns out that MySQL at least allows this: https://stackoverflow.com/a/18238810 — you can query
information_schema. And then if we construct the right query, we can get the number of megabytes consumed by a given table: https://stackoverflow.com/a/9620273. What remains then of course, is removing a number of rows to make it fit in that size again. We can't know the size of an individual record, so we'd have to assume they're all sized the same, and then remove the percentage that would make it fit within the budget again. Before you get too excited: this would then be MySQL-specific, which would be … sad. Anyway, very very rough PoC for this attached.Comment #171
wim leersSo, #170 contains 3 approaches, all fairly simple:
slushi_cache: moves https://www.drupal.org/project/slushi_cache into core, with minimal adjustments — this is probably the "simple" approach that was being asked for incount: makes the DB cache bins bounded (fixed size), by row countsize: makes the DB cache bins bounded (fixed size), by disk spaceOptions 1 and 2 are database engine-agnostic. Option 3 is database engine-specific, and therefore an unlikely choice.
Option 1 does not protect against surges in activity, e.g. bots crawling your site, or malicious users attacking your site, option 2 does.
Comment #172
wim leersAnd finally, a provocative related question: #2888838: Optimize render caching.
Comment #173
wim leersAnd I just realized this was made major instead of critical in #130 because it was though that two other issues would help (see #129), but they did not. Therefore the original criticalness still applies. Back to critical.
Comment #174
wim leersThe current title implies one particular solution. As of #170, there is a very different option. So, retitling for clarity & simplicity.
Comment #176
catchI was thinking about this issue this morning, and had a similar but different thought:
- have a per-bin settings-configurable number of rows
- when the size of the bin exceeds the number of rows, find the median created timestamp per-bin and delete all items created before that timestamp (i.e. about 50% of the table).
- do this on cron so there's no cost with cache sets.
This will clear some space without resulting in a complete cache clear, tables will be able to go over the limit for up to three hours.
My concern with both that approach and the one suggested by Wim is that it could lead to thrashing, but again memcache is the real answer to that.
Haven't reviewed the patches yet.
Comment #177
effulgentsia commentedFor what it's worth, I like #176 a lot!
Really? What's a scenario in which #176 would result in thrashing? It's true that frequently used items could get purged shortly after their last use, but never shortly after their first use.
Comment #178
wim leers#176 is a more advanced (and complicated) version of
2526150-bounded_database_cache_bins__count-170.patch. Why do we want to do stuff with median timestamp etc? Why not just look at the row count versus allowed row count, and nothing else?I'm probably missing something :)
Comment #179
tedfordgif commented@effulgentsia, I think @catch is saying that large sets of cache items could get cleared at once, and then regenerated at once (see jitter). It would probably depend on the cron interval, content, and access patterns.
The median timestamp approach in #176 appears intended to better preserve recently-created cache entries, but could also still allow large cache table size, and doesn't really approximate LRU. For example, if you get traffic that maxes the cache out during each cron interval, you cycle between ~50% and 100% of the peak cache size, assuming evenly-distributed timestamps. The cache hit ratio and % cleared might be unpredictable if the requests during a cron interval are front- or back-loaded. If you're getting the kind of traffic that makes the above concerns valid maybe you are using another cache backend, though.
The simple row-count approach provides a more predictable upper bound on DB size, which is the main issue here.
@catch, Wim's row-count approach only runs on cron, as you suggested.
Comment #180
catchA bad but relatively common scenario I have in my head is something like this:
Cron run every ten minutes.
There 2 requests per second to unique pages from a crawler.
Mean of 20 render cache items per page.
This means:
1,200 unique pages visited in ten minutes.
24,000 cache items created in ten minutes.
Cache item limit of 10,000.
Whether you go down to 10,000 or 5,000 items, you're going to be deleting tens of thousands of rows per hour then recreating them again.
If we take a Drupal.org node with 300 comments by say 50 unique users, there could easily by 4-500 individual render cache items per page on a cold render cache, so you could hit a high row limit from a much smaller number of pages.
@tedfordgif thanks for answering Wim's 178 that's exactly the design and drawbacks - this wasn't a competitor to Wim's suggestions it was just what I was mulling when I saw this morning, and haven't really thought through beyond that (hence why I didn't post anything).
Comment #181
effulgentsia commentedI think doing rows is fine. But why add the
bounded_idcolumn? Why not COUNT and ORDER BYcreated? Which I think would have the same effect of deleting the oldest first, which is the key feature in both #170 and #176. Or is the problem that adding an index on a decimal field is less efficient than on a serial field?I think that's the problem with the #180 scenario. You shouldn't have a site that, as part of its normal operation, creates more cache items between cron runs than your limit. If it's an occasional blip, that's one thing, but if it's happening all the time, it means you need to do one of:
Comment #182
catchQuestion for me is - how would you find that out without manual auditing of cache tables? With the current issue you find out because your database is huge. One thing we could do is track on cron runs how many items were deleted vs. how many deleted last cron run or similar and log it, then at least there's an indication what's happening.
Comment #183
wim leers#178: Exactly, it could still allow a large cache table size. Rather than a simple "FIXED SIZE, PERIOD", it's "while limit exceeded, delete ~50% and hope that will keep it within bounds". So I agree with — my intuition says it's better to KISS to solve the problem for now, then we can introduce more fancy algorithms later. "fixed size but only kinda and with deletion based on median timestamp" is conceptually a lot harder to understand than just "fixed size". So I fear doing this would introduce another Drupalism.
#180: I think this is about solving the common case. The default could be 10,000 entries. But for bigger sites, you'd want to tweak that number.
#181:
createdeven existed :) Happy to reroll if that approach is what we end up agreeing on.#182: We could do tracking, but yesterday's traffic is not necessarily an indication of today's traffic. Plus, then we're making it pretty complex.
It sounds to me like we've entered a bikeshed where the color is replaced with as the primary question and as the secondary question. The problem is that "the best weighted way" will depend on the particular site. That's why I'm arguing to go with the simplest possible solution.
Comment #184
catchThe row limit patch isn't a fixed size though, it's trimming down to the maximum size/row every cron run - i.e. it will immediately exceed the fixed size as soon as a new cache entry is created, until the next cron run.
Given that, and unless we run those queries on insert which would be a performance hit, I'm not sure what this gets us beyond the configurable TTL for permanent entries - with that the longevity of a cache entry is predictable at least and it'll prevent unbounded growth.
Comment #185
wim leersWith a configurable TTL for permanent entries, you still have an unbounded number of cache items. If your site has a traffic surge, or is DDoSsed or spidered or something else, then the gigabytes of data will set there a full day (assuming that configurable permanent TTL defaults to
86400as in today's patches).With a fixed size, you prevent that.
This is correct. But I think this is less problematic than the problem I described above.
I've laid out my thinking, but I don't feel strongly which approach should be the one. Either one is a big step forward. I'll let @catch choose.
Comment #186
mariagwyn commented2526150-bounded_database_cache_bins__count-170.patch being tested now. Applied without error, no other apparent issues on the site. After only about an hour, I can say that the DB size has remained stable on a site which bloats (very) rapidly. As of right now:
For comparison, this is the same DB before a 'drush cr':
I will ping back later today or tomorrow with a new size report.
Comment #187
tedfordgif commented@mariagwyn, it would be helpful to see the size of and number of rows in your cache_render table. How often do you run cron?
If you're using MySql, the following query might be helpful (replace 'drupal' with your database name):
Comment #188
effulgentsia commentedSeems to me the only difference between the two approaches is: when
num_items_during_cron > max_items, should we prune tomax_items(suggested by Wim) or tomax_items/2(suggested by catch)? However, the "2" in catch's formula seems pretty arbitrary. So I'd like to propose a 3rd option: prune tomax_items * (max_items / num_items_during_cron).What this does is make it so that we create ahead of time an amount of free space that's a function of how much we're likely to need.
In other words, if our limit is 10,000 but we're currently at 11,000, then we can prune to 10K * 10K/11K = 9091, which leaves nearly 1K for future growth prior to hitting the limit again. On the other hand, if we're currently at 40,000 (got to 4x our limit since the last cron run), then we'd prune down to 2,500. Of course, if >30K items got added since last cron run, then the freed up 7.5K will likely get used up pretty quickly, and we'll be back in a situation of exceeding our limit, but per #181, I think such a large disparity between the limit and what the site uses within a single cron interval is a bigger problem than this issue's scope should be.
Comment #189
mariagwyn commented@tedforgif:
For comparison, here is prod. Note that it was cleared earlier today but has been growing in size:
Num rows:
On Testing site/DB, with db copied immediately after 'drush cr' on prod:
DB Size:
And number of rows:
Comment #190
wim leersIndeed. Well, roughly, because @catch's proposal is more complex than
max_items/2.I think this is simple enough to be "KISS". But I need to repeat my warning from #183:
@mariagwyn Thanks for sharing those numbers. It confirms that this patch is solving the symptoms. But your data doesn't yet tell us whether this would negatively impact response times. How did cache hit ratios change?
Comment #191
tedfordgif commentedI'm still in favor of the simple row limit. We're not trying to eliminate the (render_)cache entirely in this ticket, so emptying the cache to some percent of the max rows setting is not a goal. If we're getting the same traffic pattern during each cron interval, any cache entries we remove will be recreated, so we might as well just leave them in the cache if they fit in the "limit".
I initially liked the approach in #188 for having math, and I thought some more about the various ways of calculating the average size of the cache items, but I'm still firmly in the camp of KI(really)SS, meaning simple row limit. It's tunable both by cron frequency and row limit, and doesn't try too hard.
As mentioned previously, future gold-plating options include creating a db_cache_profiler module, which would help you tune those two parameters by sampling cache get/set, or db_cache_tuner, which keeps things tuned in real time. But the right approach is to get a real cache outside Drupal.
Comment #192
effulgentsia commentedFor what it's worth, I agree. I think simply pruning to
max_itemsis fine for resolving this critical. It means we're always over the limit, and therefore, we might need to figure out a good setting name for it ("max", "limit", and "bound" aren't entirely accurate), but I think that's ok.Comment #193
andrewbelcher commented+1 for the simple approach - solves the problem and is really simple to understand. re #192, perhaps an example in
default.settings.phpwith help text along the lines of Set an upper row limit on the database cache bin, enforced on cron run. would suffice?Comment #194
wim leersOkay, seems like we have consensus for
2526150-bounded_database_cache_bins__count-170.patch. Now let's get that going:First: rebasing, because #2775381: response.gzip is redundant and should be removed conflicts with it.
Comment #195
catchThat patch doesn't remove the column. Edit: I missed the 'first rebasing', sorry...
Call it a 'target size' or 'target row count'? That removes the expectations around bounding/limits.
Comment #196
wim leersDone.
Comment #198
wim leersDone.
Comment #199
wim leersI started working on this, but this is not as trivial as #181 makes it out to be. This is not possible in MySQL:
You can't delete while at the same time ordering. You'd need need to
SELECTall CIDs to delete in one query, then do aDELETE FROM cache_blah WHERE cid IN $ids. That could potentially be many IDs to gather. Or alternatively, do a join.Both of those sound far more expensive than tracking an auto-increment ID and then having a simple single-scan query.
See:
P.S.: also: if there are many rows with the exact same timestamp, I'm not sure the behavior is defined.
Comment #200
wim leersThat leaves just
@catch suggested 10K in #180, so setting that.
Comment #201
wim leersSo we still need to look into:
… but neither of them actually blocks commit. This patch has full test coverage. Point 1 would just be clean-up, and could happen at any time. Point 4 can take forever, but I sure hope it won't.
Also created a change record: https://www.drupal.org/node/2891281.
Please review!
Comment #204
wim leersApparently there's one particularly well hidden creation of a DB cache bin. And it's a crucial one.
Comment #205
wim leers#204 made me realize I didn't add support for "no maximum" yet. Fixed.
Also updated the Cache documentation in
core/core.api.php.Comment #206
mariagwyn commentedTesting patch: https://www.drupal.org/files/issues/2526150-bounded_database_cache_bins_...
For comparison:
BEFORE 'drush cr' and no patch:
AFTER 'drush cr', no new patch:
No observable errors, will check back on DB size later.
Comment #209
tedfordgif commentedProviding values less than -1 for $settings['database_cache_max_rows']* has the interesting theoretical behavior of deleting additional rows that were added between the select and delete queries. I'm ambivalent about needing to fix that.
If "$max_bounded_id - $this->maxRows" is negative, we don't need to run the delete query.
We may end up leaving fewer rows than the limit in the cache bin due to holes in the "$max_bounded_id - $this->maxRows" range introduced by deleting expired items. It doesn't matter which order the two delete queries are executed in. However, if you do the expire query first, then instead of MAX(bounded_id) do the select() equivalent of
you get the desired result. It's slower than the MAX() query, but this is only running in cron. Of course a null result would mean you don't need to run the delete query. Apologies for not having the time to make a proper patch.
Comment #210
dawehnerI think having the simple solution now is fine. People can always customize it and maybe provide something better tailored for their usecases, but these are not the people we should care about, they have fixed the problem already. We should more think about the people which have the problem right now.
I clicked on the various stackoverflow links from wim above and saw that all of them suggested to use "DELETE WHERE X IN (SELECT ...)". This would be one database query instead of two as suggested here. I think skipping the DB -> PHP -> DB layer is valueable, for example in terms of saving memory.I just had a look whether its possible, and I think given that
\Drupal\Core\Database\Query\ConditionInterface::conditionsupports a SelectInterface as$valueit totally seems.Nitpick: why is this public?
Comment #211
berdirshould we maybe add a constant for -1, like we have for other things?
Always the BC question. I checked slushi cache and they don't override the constructor so wouldn't break (and it is also not really needed anymore after this I guess).
I guess we can say that this should always be created through the constructor and should be safe. But it would also be relatively to default to 10k.
would it help with readability and further changes here to put this in a separate protected method, like $this->getMaxRowsForBin($bin) or so?
shouldn't use full namespace and \Drupal::database()->schema()
Also, I'd say it is possible to have a schema defined but not actually ever accessed it, then the table wouldn't exist. Wrap in a tableExists() to be sure to avoid any errors here?
Comment #212
catchYou could do this no?
$cid = SELECT cid FROM {cache_} ORDER BY cid ASC LIMIT 10000, 10001;
DELETE FROM {cache_} WHERE cid < $cid;
Or what dawehner suggests in #210 if that works cross-database.
Also #180 wasn't a suggestion for 10k, it was using 10k as an example for where something could go wrong with a moderate-sized site. Maybe 10k's OK but we could probably get a rough size estimate from @mariagwyn's queries in #206.
Comment #213
wim leersSettings::get()globally if people think that's preferable.\Drupal\Core\Cache\CacheFactory::get().#210 + #212: RE DB query + schema:
Yes, but:
\Drupal\Core\Database\Query\Deletedoes not support->join()(\Drupal\Core\Database\Query\Selectdoes support this)database.api.php. This is maddening.Can somebody else please tackle #210+#212?
Comment #214
dawehnerLet me quote myself :)
So well, let's try to pass along a
SelectInterfaceobject :Note: This is not a valid query right now, but I think you get the direction :)
Now that I wrote about it, I remember that before mysql 5.6 select IN (subquery) has been potentially really painfully slow. Given that we require
const MYSQLND_MINIMUM_VERSION = '5.0.9';, we should avoid it.I like @catch's suggestion, as it requires no change in the DB schema as well.
IMHO we should try to make changes like this as less BC breaking as possible. Not though for just this reason, but also to show other people how to make changes like this in the least problematic way. Core should set a good example for others.
Comment #216
wim leers@dawehner++
@dawehner++
This 50000 should match whatever default limit we settle on, it's 10000 at the moment.
This is ordering by
cid, it should be ordering bycreated.+1!
Yay!
Should be easy to fix. Also: yay for tests.
Comment #217
dawehnerOops, I copied it from the wrong part of
core.api.phpYes, we indeed need that, given that CID is not a numerical ID. Sadly this means we need a new index for 'created' which also means we need to do a little bit more on every insert.
Comment #218
wim leersGreat! Let's see if this passes tests :) I can only find nits.
I think this is RTBC as soon as it is green, but I probably can't RTBC it because I worked too much on it.
Nit: 80 cols.
Supernit: the new line here has one leading space too many.
Supernit: should be one
\n, not two.Comment #219
dawehnerHa, I new something looked odd.
Comment #220
wim leersWho's willing to RTBC? :) 🙏
Comment #221
dawehnerI would put my bets on catch.
Comment #222
xjmComment #223
berdir#2892179: Follow-up for #2775381: clearing render cache in PerformanceForm::submitForm() is obsolete has an interesting impact on this as we go from 10k entries in cache_render to 10k entries in cache_render and cache_page each.
Was there a discussion on the default size? Can't seem to find it.
cache_render entries are usually fairly small but the page cache is the full HTML page. On an newspaper that I just checked, a randomly picked cached page for an article was ~100k plus a few k for cache tags and other metadata. According to https://www.keycdn.com/support/the-growth-of-web-page-size/, the average size of the HTML part is 67kb.
With 100k, 10k entries results in ~1GB, with the average 67k, it is, surprise!, around 670MB. Plus indexes, cache tags, ...
That's still a pretty large number for a small site, assuming they actually do get 10k different requests.
Does make we wonder if the default should be lower.
Comment #224
wim leersNo, the actual number wasn't actually discussed. I mistook @catch's mention of 10K in #180 as an actual suggestion. So I implemented it in #200. He clarified in #212 that he didn't mean to suggest that. He said there that we might want to base our limit off of the data in #206. So, this:
A newspaper site probably is higher end and has more KBs of HTML. @mariagwyn's data suggests a much smaller size. Still, I think it's fine to base it on 67KB (the average you cite) or 100 KB (the rounding up, and taking into account future growth).
Although I do want to call out that it's actually quite hard to have 10K entries in
cache_dynamic_page— you need a pretty big/complex site to reach those numbers — many/most small/medium Drupal sites won't get to that point.Honestly, I don't care too much about the exact number — we can still tweak that in the future. I think @Berdir may in fact be one of the people with the most data available to them, so I'd suggest … let @Berdir decide? :) As long as you can provide a rationale for the number you propose, @Berdir, so that we can revisit it in the future in a structured, reasoned way, then I'm happy to trust & use whatever "max rows" limit you propose!
Comment #225
wim leersOh and in #223 I think you meant #2889603: Split the internal page cache from the rest of the render cache, not #2892179 :)
It's important that we keep that in mind while doing this, indeed, but it also means you'll actually be able to control the render cache & page cache separately: you can limit the render cache to a much lower number than the page cache, or vice versa, depending on your use case. Right now, it's impossible to impose different limits. My point being: there's a good side to #2889603 for this issue as well!
Comment #226
xjm(Don't think the tag change was intentional. Carry on.)
Comment #227
fgmActually, on an almost-small site (15k nodes, 1 user) I'm currently working on, I just checked and see 11k items currently in the DPC bin, and 80k items set in that bin the last 3 days, so you definitely don't need to be newspaper-size to reach these numbers: I think a good number of D8 sites will be at that scale or above. For comparaison, render bin currently has 340k items, and got 525k items set in 3 days.
Comment #228
wim leers@xjm Thanks!
@fgm 15K nodes is not "small" anymore in my book :) It all depends on your POV.
DPC roughly scales linearly with the amount of content in your case. That makes sense.
Comment #229
fgm@Wim Leers : I meant small as in something that can be done by a two-person army, as opposed to media sites which tend to be in the umpteen millions of nodes, but OK.
On another train of thought, I had another idea this weekend for this issue:
Cache::PERMANENT, automatically add an actual TTL computed from a random-ish value within some site-dependent range (probably built from the total number of rows in all entity base tables or something like that),Cache::PERMANENTto estimate stalenessIt would enable existing logic to keep working on items classified as permanent, while allowing expirations to proceeed normally. Plus, it would stop our use of a negative "magic number" in TTLs, allowing them to become unsigned.
Comment #230
wim leers#229: the approach you propose is what this issue was doing originally. The problem is that you can then still easily grow beyond your bounds, because doing TTL estimation based on current number of rows in a cache table does not say anything about future row insertions.
The current patch also allows expirations to proceed normally, as long as the number of rows is smaller than the maximum number of rows.
Comment #231
fgmWell, I was not advocating basing the estimate on the number of rows in the table, but on the sum of the number of rows in the entities base table, which is IMHO a decent heuristic of what to expect for many (although) bins, be it directly (render, page, entity, data) or indirectly (other bins) because sites with more content tend to have more complexity anyway, leading to bigger menus, containers, config, etc.
But sorry to have missed that in the initial patch, I read them too fast.
Comment #232
mpdonadio10000 is use as a literal a few places in the patch. Do we want a constant?
Consider the following:
- I installed Drupal before this patch. DatabaseBackend got used.
- I installed Memcache. MemcacheBackend (or whatever it is called) is used, but the old cache_* tables are still in the database.
- I update Drupal w/ this patch applied, and this hook runs. Code here will only update bins (and therefore tables) that are DatabaseBackend.
- I uninstall Memcache. Now DatabaseBackend, but there is a schema mismatch on the tables that existed but were essentially unused when Memcache was installed.
I think we can safely remove the `if ($cache_backend instanceof DatabaseBackend)` check b/c the `if ($schema->tableExists($table_name)) {` check, and this will fix that scenario.
?
Comment #233
ndobromirov commented@mpdonadio from #232.1 - I think it's better to be a configuration. The default will be 10 000. There will be no UI for it but it can be changed if it's deemed needed for the particular site.
Comment #234
berdir@mpdonadio did not suggest to not make it configurable, just that the we make the default value a constant, so it' easier to change in the future if we want to. At least that's what I understood.
And yes, agreed with the update function, that's a very good point. It's quite common that production uses memcache/redis but the database tables are still there and then developers sync that to their local site where they use the database. It's no longer that big of a deal because the only change is the added index, we no longer have the field.
Re #224, all our D8 sites uses redis, so I can't directly answer that question. What I can do is answer how it distributes on our smaller and bigger sites across the different bins given a fixed max size, which I think is also quite interesting.
On a small site which with limited amount of content, redis is using only 56M memory. 8200 keys. 2900 of those are in the render cache bin (about 480 of those pages), 330 in dynamic_page_cache, 1500 in data, 1000 in entity (including files and so on). So without limits, render cache seems to contain about 10x the amount of entries of dynamic page cache. But most of them are far smaller than page cache and dynamic page cache entries. It also means that the default limits we have would not do anything on this site. Which IMHO also means that we could go (even considerably) lower with our limit by default. As long as we clearly document that bigger sites might want to increase that if they have the storage to do so. Some of the reports in this issue claim their sites where shut down by their hoster at 250MB.
Looking at a bigger site, redis uses the available 1GB of memory, with 100k keys. The site doesn't use dynamic_page_cache currently, and the entries are distributed among the bigger bins like this:
render: 22712 (6300 of those is page cache).
data: 20138
entity: 14530
menu: 3563
The site has far more content than it currently stores in the cache (but most of that isn't requested frequently), so this is how redis distributes it based on its LRU algorithm. But most sites like that use memcache/redis, so I think we don't have to worry too much about those. Our main use case is smaller sites like the first one. And 10k seems quite large for those, 5k or even lower seems like it should be enough as a default, and I think we might want to document that sites that have very low limits on how much database storage they have should set it as low as 1k?
Comment #235
wim leers#232.2 That's an impressive edge case that you caught there! :)
#234: — if that's the case, I'd rather set the default to 1K. We should ensure Drupal runs fine for those without technical knowledge. Those that run bigger sites are more likely to have the necessary skills/insights/knowledge to increase these limits. Besides, even "just" 1000 cache table rows gives you a fair amount of scalability gain: 1000 Dynamic Page Cache table rows already means hundreds of pages that barely cost any processing power.
Based on the argument made in #234, I'm fine with setting the default limit to 1000. Assigning to @catch to get his thoughts on this.
Comment #236
fgmBased on the metrics in #234, I did a check on sizes on a running medium-size site (+/- 23k entities) for comparison over a 2 weeks period. None of the bins have any evictions, so that capacities are not saturated here.
This seems to be pretty much in line with the figures in #234:
Comment #237
mpdonadioAddresses my comments in #232.
DatabaseBackend::removeBin() has the dropTable() inside a try/catch. Do we want one here? Not sure what to do if it actually throws an exception here...
Comment #238
catchI still think this is going to result in cache churn, such that sites end up doing cache rebuilds on expensive caches much more regularly, similar to the Drupal 7 bug where the entire cache was cleared every cron run - i.e. the cure could end up being worse than the disease here. So I'm really not sure what to do about this issue at all at this point.
Comment #239
fathershawnIf the limit is configurable, and a reasonable value is chosen, then we will be able to tune this to the particular site. With some experience we can adjust the default in future versions.
Comment #240
mpdonadioPer IRC w/ @catch and @berdir, we are going to lower the limit to 5000.
Made followup #2896473: Add monitoring/reporting of cache table pruning.
Comment #241
berdirI updated the change record with the new default, also included a query to show size of all cache tables as well as mention the follow-up.
I think we are good to go here.
Comment #242
Andre-BSince this is the default implementation for all cache database backends I believe the limit of 5k to not sufficient for most midsize drupal projects.
If I understand correctly this would be used for cache_entity, cache_menu, cache_render and the newly added cache_page. I did a quick check on one of our smallest projects which already exceeds 5000 entries on cache_entity, cache_menu and cache_page bins.
Regarding the limit for cache_page: do not forget that redirects, taxonomy term pages (inlcuding) pagers, search results + pagers count towards the same limit - I believe with any usual project this limit can and will be reached rather sooner than later.
Comment #243
berdirThen you can increase the setting, that's pretty easy and documented, if you do have the storage available for that.
And how old are the oldest cache entries in that table? Have they been created in the last 24h, last few days, more than a week? I'm not worried about it if they are older than a few days, chances are that especially the render cache entries (cache_page in 8.x only exists as of today or so) have been invalidated in that time already anyway and are dead weight (due to a node_list or rendered cache tag invalidation for example).
Comment #244
xjmDoes the page cache split affect the default size of the bin?
This says both 5000 sets it to 50,000.
@see should be at the end of the docblock, and also not have two asterisks.
settings.php, or changing one? Should be adding any recommendations for users in the documentation on how to affect the setting? It seems like there's more documentation in the change record than there is here.Comment #245
dawehnerWell, the idea is to provide an example of a different value :)
Comment #246
dawehnerI'll work on expanding the documentation a bit.
Comment #247
xjmCouple other small fixes:
The code snippets are also indented incorrectly (one space instead of two).
"...when used as the maximum number of rows for the cache backend."
I thought about whether it made sense for the constants to only be exposed to the database implementation, and I think it does, so no actional feedback there.
Comment #248
dawehnerComment #249
dawehnerI haven't seen #247 when posting it. I adapted the message in #247.2 to fit into 80 chars.
Comment #250
xjmThe docs improvements look good to me, thanks! One small nitpick:
@see doesn't really work this way, so we could probably use
@linksince we can't link a specific entry in thecore.api.phpreally.
Still wondering if we should put something about this in
default.settings.php. There's a section with the header "Page caching: " that is actually just the comment for the example of$settings['omit_vary_cookie'] = TRUE;. Then there's sections on$settings['cache_ttl_4xx'] = 3600;and$settings['form_cache_expiration'] = 21600;, APC configuration, etc. If I were readingsettings.phpon configuring a particular site, I'm not sure how I'd find the relelvant section incore.api.php. (This also affects HEAD I think but we're expanding its impact here.) Thoughts?Comment #251
xjmAlso, to clarify this, I think we should say "This can be changed for all database cache bins. For examplem to instead limit the number of rows to 50,000:"
Comment #252
anavarreSharing a bit more real-life data. 3 random sites, different traffic patterns.
site1
site2
site3
This obviously doesn't account for changes in #2889603: Split the internal page cache from the rest of the render cache
Comment #253
berdir@anavarre:
Not 100% sure if you are trying to say something specific with that data :)
Same questions about your data as I asked @Andre-B in #243. How old are those cache entries, is this data generated after hours, days, weeks? Easy to check with a select from_unixtime(min(created) from cache_render. Either way, your sites all seem to be a bit bigger, with 13k-150k entries in dynamic page cache, which means you have at least that many context-variations*urls. IMHO, it seems acceptable that a site like that needs customize the cache bin size if they want to have it use that amount of data.
If the data is indeed generated in just hours, then I guess that you could have a scenario that @catch is worried about, where are writing and deleting so much data that the overhead of cleaning it up is basically larger than what we save with caching it in the first place. But we have the follow-up to provide some reporting on that if we figure out a nice way to do that, so such cases are informed about this and can customize it. And I still think that scenario is preferable to a full disk or very large bill from their hosting provider that is auto-growing the disk ;)
I also still have a @todo for writing a blog post on optimizing cache_render usage a bit, a few things were already mentioned here, like hiding local tasks/actions blocks for anon users.
@xjm #247: About separate page cache bin, I asked/discussed the same in #223 and I think it is OK. On large enough sites, we go from typically 3 bins reaching the max (render, dynamic page cache, data) to 4. cache_page will likely be the largest of those, although probably comparable to dynamic page cache.
Setting to needs work for #251.
Comment #254
Andre-B@Berdir - that data I mentioned can be generated in less than 12 hours. Usual bot traffic from google, microsoft, various seo tools and whatever robots you can imagine will easily fetch more than 5k urls within a day. If your site has no varnish in front of it to do time based caching you rely heavily on drupals page cache to be there. Regarding the invalidation aspect - yes, that was an issue and we fixed it basically by having two solutions
1) solution we had was to eliminate offending cache tags as much as we could, we had a scenario in which having a view not to be refreshed if the page itself wasn't changed was perfectly fine.
2) solution we came up with was to modify the page cache itself to behave like a time based cache and cache those pages for 30 minutes - 7 days.
with any of those solutions we implemented ways for editors to refresh pages before ttl has been reached - this probably can be achieved using purge module somewhat as well.
After all - defining which caches are meaningful and should be kept had the most impact and reduced the cache_render table the most for us. The IO waste for unused cache is important as well - especially if you have to cleanup that mess on cron.
Comment #255
flocondetoilePatch updated with #251
Comment #256
anavarre#253
Oh sorry, I was trying to give real-life data for sites I know are having moderate to very large
cache_renderandcache_dynamic_page_cachetables. I was expecting the number of rows would help with the target number that is being discussed here.Not so easy, actually. With 1M+ rows this is a very expensive operation. What I can say is due to #2752961: No reliable method exists for clearing the Twig cache every code deploy needs to
drush cr, which gives a good indication those are being rebuilt frequently (every 2 weeks in the worst case scenario seems reasonable)I tried that a few weeks ago as it seemed a very straight forward win. Couldn't notice a significant decrease in the table usage in the long run, unfortunately.
Comment #257
xjm#250 is also not addressed yet.
Comment #258
fgmHere is a rerolled version accounting for the changes in #250 and also
- fixing the wording on the next comment paragraph which didn't make sense (to me)
- fixing the capitalization in the MySQL example in the same paragraph.
Comment #259
anavarreIf you're going to capitalize the SQL query, then we need consistency.
Comment #260
fgmIndeed. Sadly, the name of the table is actually uppercase, so it looks a bit strange nonetheless.
Comment #262
anavarreSpent some time testing the patch (applies successfully). This looks to be working okay. Fresh new Drupal install.
cache_renderdoesn't exist yet:Create a bunch of test nodes:
$ drupal create:nodes article --limit="1000" --title-words="9" --time-range="1" --quietThen crawl a few of them:
$ for i in {1..100} ; do http --output=/dev/null http://drucker.local/node/$i ; doneNow we have enough stored items to start testing:
Allow only 50 entries in
cache_render:echo "\$settings['database_cache_max_rows']['bins']['render'] = 50;" >> settings.phpRunning cron:
And now the
cache_rendertable is down to 50 items as anticipated.Comment #263
flocondetoileAbout #250
+1 to put some words about this setting, and a code example, as many other settings available, in this file. And the settings.php file should be, I think, a much more file read than the core.api.php, at least on an initial look.
Comment #264
berdirThis was the wrong @see being changed. Having a @see with a link is valid, what is not valid is the @see with just text a few lines below.
@see https://www.drupal.org/docs/develop/coding-standards/api-documentation-a...
;)
this one.
Comment #265
star-szrDiscussed with @xjm @effulgentsia @cilefen and we agreed this is a critical issue, this isn't just happening on large sites and it can lead to sites going down because of full disks, etc.
To give another little bit of anecdotal data (I can provide more data on request if that's helpful). One of the client sites I've worked on recently that I would consider a fairly small site (< 400 nodes and not a lot of other entities/data) has a ~3.8GB cache_render table. It looks like in this specific case it's mostly cached search results (Search API module). The oldest cache record is ~2 days old.
SELECT table_name AS `Table`, TABLE_ROWS as 'Num. of Rows', round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` FROM information_schema.TABLES where table_schema = 'drupal' and table_name = 'cache_render' ORDER BY (data_length + index_length) DESC limit 10;(Replaced 'drupal' with the table name)
Comment #266
mpdonadioReroll for clean interdiff.
$ git checkout 1ead5f1b4697a74c69c95ddbd2a2ede600751f37
$ git rebase 8.4.x
First, rewinding head to replay your work on top of it...
Applying: 2526150-dbcachemax-260.patch
Using index info to reconstruct a base tree...
M core/lib/Drupal/Core/DrupalKernel.php
M core/modules/system/system.install
Falling back to patching base and 3-way merge...
Auto-merging core/modules/system/system.install
CONFLICT (content): Merge conflict in core/modules/system/system.install
Auto-merging core/lib/Drupal/Core/DrupalKernel.php
error: Failed to merge in the changes.
Patch failed at 0001 2526150-dbcachemax-260.patch
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".
Just had to update the update hook number.
Comment #267
mpdonadio#264 should be fixed.
Comment #268
berdirThanks @mpdonadio. Assuming this passes, lets get this finally in :)
Comment #271
catchI'm still not keen on the implementation here and think it's going to lead to cache churning that will also be hard to detect (i.e. frequently requested cache items such as a footer block on every page being constantly removed and recreated).
On the other hand the only alternative that comes to mind (apart from the original idea of a non-permanent expiry timestamp) would be some kind of LRU approximation such as updating the created timestamp when we read a cache item then deleting oldest first - we can't do that every cache request but could do it based on mtrand(1,1000) = 500 or similar. To be honest I'm not keen on that idea either because it's adding a lot of complexity when memcache or redis do this properly, and extra writes on most page requests.
So... I'm going to commit this to 8.5.x and 8.4.x, but if sites still run into trouble with it, then I think we should look at a sample-based LRU approximation or switch it back to the expires idea. Created a follow-up for that here #2904569: LRU approximation for database cache.
Comment #272
jibranPublished the change notice as well.
Comment #273
tacituseu commentedThis commit is most likely responsible for failures in HEAD (see #216.5):
https://www.drupal.org/pift-ci-job/748061
https://www.drupal.org/pift-ci-job/748066
Comment #274
berdirYes, also seeing this on https://www.drupal.org/pift-ci-job/748057
I guess this happens when we manage to write two cache entries in the same microsecond or so? should we add a usleep() in that loop?
Comment #275
wim leersI last commented here >1 month ago, in #235. A lot has happened since!
Thoughtful documentation improvements and great questions from @Berdir.
@catch (in #271): I agree that this patch is not perfect, but that's 100% because we're trying to use a SQL database for caching, which is … not what SQL databases are designed for. None of these problems exist at all when using Redis/Memcached/…. So we'll always have a non-perfect trade-off. I'm not too concerned about constant recreation of render cache items, because you still have Page Cache + Dynamic Page Cache in front of those too.
#274: yes, sleeping for a bit seems sensible. Shall we do this in a follow-up?
Comment #279
cilefen commentedI reverted because DatabaseBackendTest::testGarbageCollection is failing on HEAD.
https://www.drupal.org/pift-ci-job/748136
Comment #280
star-szrI looked at this for a while and with the caveat that I'm not really familiar with this patch, here are some thoughts.
My impression is that the test is randomly failing because sometimes multiple records are added with the same value for the
createdcolumn. So after garbage collection, the total number of rows is less than the configured value for max rows.Garbage collection logic. If multiple rows have the same value for
createdit will result in a somewhat unpredictable garbage collection.This assertion assumes that garbage collection will be consistent in terms of the number of records it removes.
For this assertion, would it be sufficient to check that the garbage collection reduced the rows to fall within the limit, rather than checking for an exact match?
Comment #281
wim leers#280: yep, that's what @Berdir suggested in #274.
usleep()has a resolution of one 10^6th of a second (microsecond), thecreatedcolumn has a resolution of one 10^3rd of a second (millisecond). Therefore we need tousleep(1000).(Observing my change, it takes ~100 ms longer to run the tests, because we have 100 test rows, and are pausing 1 millisecond between each insertion. So it appears to be working correctly :))
Comment #282
star-szrAh, I totally missed some of the comments between the commit and the revert! 😅 Thank you @Berdir @Wim Leers.
Comment #283
joelpittetNit, can be fixed on commit:
First comment, is followed by Third. Second disappeared, send out a search party! IMO, it doesn't need the numbering.
This looks great otherwise, glad it's configuable and has a reasonable default:)
Comment #284
cilefen commentedDo we need those comments? The code is clear.
Comment #285
joelpittetI don't think so, but hesitated in saying that in my comment. +1 to scrapping them.
Comment #286
heddnDoes this deserve a comment why we are sleeping?
Comment #287
mpdonadioTBH, I would bump it to 1500us. Not sure if I trust PHP to sleep accurately. Could also use microtime() to check for ms rollovers and/or calculate a better sleep value.
Comment #288
wim leers#283 + #284 + #285: those comments are copy/pasted from
\Drupal\Core\Cache\CacheFactory::get(). If we want to remove those comments here, should we remove them there too?#286 + #287: Hah, first we talk about removing comments, now we talk about adding it. Bumping to 1500 makes it so that the comment asked for would become rather unclear. Why not bump it to 2000, or 5000? Why does 1500 solve the problem and 1000 does not? And using
microtime()there sounds like it'd become very complex.Let me know what you decide, I'm happy to reroll. But let's first get on the same page about what we want to happen.
Comment #289
berdirNot sure about the comments in regards to First/Third, we could do a separate issue to clean that up everywhere?
I think 1000 is fine and logically makes sense. Pretty sure that usleep() is guaranteed to be at least as long and even if not, we still do a *database insert query* in the loop, that isn't free. We already spend more than 1ms on the loop in most cases, only sometimes, 1 out of 100 calls manages to be timed so well (probably starts almost exactly at the beginnning on the ms to be able to finish within).
That said, adding a comment that explains that a bit makes sense. Something like "Ensure that each cache happens in a different millisecond by waiting 1ms (1000 micro seconds). The garbage collection might otherwise keep less than exactly 100 records, which is acceptable for real-world cases.".
Comment #290
wim leers#289++. Done.
Comment #291
Anonymous (not verified) commentedFirst, Thirdis nit, they can not hold the critical issue, and can be transferred to a separate issues (per #288, #289). If not, we need to focus on the need to fix it here.So, back to RTBC.
Comment #292
anavarreeach cache item that is created or each cache item created. Alternatively, each cache entry is created in...
s/micro seconds/microseconds
Comment #294
Anonymous (not verified) commented#293: random fail #2857843: Random fail in Drupal\KernelTests\Core\Entity\ContentEntityChangedTest::testChanged.
Review changes:
nit typo
just improve the annotation, after appearance of the
Settings::getInstance().Fix number, but save numeration, because it may help prevent reorder this conditions.
nit typo - unused statement.
use second variant of #292:
happensin...+292.2: s/micro seconds/microseconds
Other lines changed due to change length of prev lines.
Comment #295
wim leers#294 fixes the nits in #292 and more nits. Back to RTBC.
Comment #296
catchNeeds a re-roll...
Comment #297
mpdonadioJust needed a simple rebase
$ git checkout 087c5cd541f11b643044d2354c044d3aee2491cb
$ git apply --index 2526150-294.patch
$ git rebase origin/8.4.x
First, rewinding head to replay your work on top of it...
Applying: 2526150-294.patch
Using index info to reconstruct a base tree...
M core/tests/Drupal/KernelTests/Core/Cache/DatabaseBackendTest.php
M core/tests/Drupal/KernelTests/Core/Command/DbDumpTest.php
Falling back to patching base and 3-way merge...
Auto-merging core/tests/Drupal/KernelTests/Core/Command/DbDumpTest.php
Auto-merging core/tests/Drupal/KernelTests/Core/Cache/DatabaseBackendTest.php
Comment #298
catchComment #301
catchCommitted/pushed to 8.5.x and cherry-picked to 8.4.x. Thanks!
Comment #303
wim leersCreated #2919697: Update project page — deprecate for >=8.4.x to update the https://www.drupal.org/project/slushi_cache contrib module's project page and to deprecate it for >=8.4.x.
Comment #304
wim leersComment #305
andypostit was a bad idea to inject settings singleton, filed meta to clean-up #3299828: Stop storing Settings singleton in object properties