ChainedFastBackend should not invalidate the whole fastBackend when doing a Set() [#2611400]

Problem/Motivation

The current implementation of ChainedFastBackend and DatabaseCacheTagsChecksum are not efficient when dealing with changes - writes to the cache backends.

- DatabaseCacheTagsChecksum heavily relies on the database to do validate cache item checksum so ChainedFastBackend is completely avoiding the use of tags on the fastBackend.

- ChainedFastBackend is invalidating the complete fastBackend on every cache write - both for the current head and other heads.

This implementation behaviour is penalizing applications where data changes a lot. ChainedFastBackend even had to be disabled during the install process where cache writes happen all the time. We don't won't a Drupal that will only work for read-only applications.

Proposed resolution

- Create a new Cache API (CacheRawBackend) that is simpler, faster and lighter than the current one. This new API saves data as-is in the storage backend so that it can leverage native funcionalities such as counter() (apc_inc for example) and touch(). This new Cache API is not tag aware.

- Create a new ChainedFastBackend (ChainedRawFastBackend) on top of the new Cache API. The old implementation of ChainedFastBackend cannot be used on the new Cache API because this new API is not able to store cache metadata such as the created timestamp, so it needs a new invalidation strategy. As the new Cache API, the new ChainedRawFastBackend is not tag aware.

- Implement an alternative to DatabaseCacheTagsChecksum that is built on top of the new ChainedRawFastBackend.

- Improve the current ChainedFastBackend to not invalidate the fastBackend for it's own webhead. It should only invalidate the fastBackend for other heads/processes.

- Bring back the usage of tags for the fastCache backend in ChainedFastBackend. This is needed to prevent the complete invalidation of the fastBackend when something changes. It should not be a problem now that cache tags are backed by a fast in-memory storage backend.

- Bring back ChainedFastBackend during the install process. It should actually be faster than not using ChainedFastBackend at all.

Remaining tasks

Profile before doing any further work and decide if it is worth the effort
Review all the code. I've already spoted some bugs in it
Write test coverage - lots of them

User interface changes

None

API changes

No API changes, only new APIs and changes in implementation details.

Data model changes

None.

Comment	File	Size	Author
#25	2611400-fastchained-alt2.patch	61.61 KB	david_garcia
#25
#20	2611400-fastchained-alt2.patch	60.16 KB	david_garcia
#20
#16	2611400-fastchained-alt.patch	19.41 KB	david_garcia
#16
#15	2611400-fastchained-alt.patch	30.36 KB	david_garcia
#15
#12	2611400-fastchained.patch	11.06 KB	david_garcia
#12	8.0.x: PHP 5.5 & MySQL 5.5 14,323 pass
#4	patch.patch	7.36 KB	david_garcia
#4	8.0.x: PHP 5.5 & MySQL 5.5 14,316 pass, 2 fail
#2	patch.patch	7.1 KB	david_garcia
#2	8.0.x: PHP 5.5 & MySQL 5.5 13,310 pass, 701 fail

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

9 November 2015 at 04:06

david_garcia created an issue. See original summary.

Comment #2

david_garcia CreditAttribution: david_garcia commented 9 November 2015 at 04:16

Status:

Active

» Needs review

File	Size
patch.patch	7.1 KB
8.0.x: PHP 5.5 & MySQL 5.5 13,310 pass, 701 fail

Comment #3

9 November 2015 at 04:40

Status:

Needs review

» Needs work

The last submitted patch, 2: patch.patch, failed testing.

Comment #4

david_garcia CreditAttribution: david_garcia commented 9 November 2015 at 05:04

Status:

Needs work

» Needs review

File	Size
patch.patch	7.36 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,316 pass, 2 fail

Temporarily removing persisting invalidations in __destruct().

Comment #5

9 November 2015 at 05:44

Status:

Needs review

» Needs work

The last submitted patch, 4: patch.patch, failed testing.

Comment #6

Fabianx CreditAttribution: Fabianx at Tag1 Consulting commented 9 November 2015 at 09:34

Priority:	Normal	» Major
Issue tags:		+D8 cacheability, +Performance, +rc target triage

Major + Performance tags: Very great catch!!!

Comment #7

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 9 November 2015 at 11:02

Issue tags:

+Needs tests

Very, very interesting :)

Comment #8

Berdir

German

Switzerland

CreditAttribution: Berdir at MD Systems GmbH commented 9 November 2015 at 16:12

This will need to be profiled.

The problem with keeping the cache tags for example is that we have to check them every time we fetch caches. That will likely add quite a few additional DB queries, we at least need to know how many exactly. Because we will definitely trade some runtime performance on completely warm caches with being faster if changes happen.

Comment #9

Berdir

German

Switzerland

CreditAttribution: Berdir at MD Systems GmbH commented 9 November 2015 at 16:18

On the other side, as I mentioned in the other issue, every time we at least only just one cache tag, we completely invalidate every fast chained backend, likely repeatedly (we usually don't do a single invalidation but multiple, through multiple calls). So every saved node, user, comment, ... invalidates the fast chained backends.

Another idea for optimization would be to somehow track what cache tags are really used by a backend. Not sure about an efficient and safe way to do so. We could have a cache entry that e.g. has a list of the first few characters and wether we have any cache tags or not.. that could e.g. allow us to filter out any node*, user*, 4xx* and so on cache tags.

I don't think the invalidate on terminate idea is going to work. For starters, it will definitely break tests as they need to be able to do changes in the test runner process that will invalidate caches on child web requests.

Comment #10

david_garcia CreditAttribution: david_garcia commented 9 November 2015 at 16:29

This will need to be profiled.

Sure.

The problem with keeping the cache tags for example is that we have to check them every time we fetch caches.

Yep, the first thing that I did was to write an alternative backend for cache tags on Couchbase. Maybe a concept similar to fastChainedBackend, but for cache tags, would do the job. Constantly retrieving cache tags from the database was one of the first issues that came up when profiling D8.

I don't think the invalidate on terminate idea is going to work. For starters, it will definitely break tests as they need to be able to do changes in the test runner process that will invalidate caches on child web requests.

Being able to move this to shutdown will be a great win. I've noticed that the same binary gets outdated once again and again and again on the same request.

But I can't really see why this error is showing up in the test bot when moving invalidation to shutdown:

General error: failed to instantiate user-supplied statement class

Comment #11

Berdir

German

Switzerland

CreditAttribution: Berdir at MD Systems GmbH commented 9 November 2015 at 16:39

... Maybe a concept similar to fastChainedBackend, but for cache tags, would do the job....

Yes, that would be nice :)

Comment #12

david_garcia CreditAttribution: david_garcia commented 9 November 2015 at 18:39

Status:

Needs work

» Needs review

File	Size
2611400-fastchained.patch	11.06 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,323 pass

2 files were hidden/shown/deleted

File	Size
patch.patch	7.1 KB
8.0.x: PHP 5.5 & MySQL 5.5 13,310 pass, 701 fail
patch.patch	7.36 KB
8.0.x: PHP 5.5 & MySQL 5.5 14,316 pass, 2 fail

Current tests needed some tweaking to account for the changes. Let's see if this gets a green light from the test bot.

Comment #13

9 November 2015 at 19:28

Status:

Needs review

» Needs work

The last submitted patch, 12: 2611400-fastchained.patch, failed testing.

Comment #14

Fabianx CreditAttribution: Fabianx at Tag1 Consulting commented 10 November 2015 at 09:33

Just me thinking out loud:

Best theoretical strategy would be:

- 1. global timestamp / atomic counter - if valid, receive items directly without checking cache tags (no need to check tags if there was no write / invalidation at all) - that is like a mega cache tag
- 2. Retrieve from fast BE with cache tags as usual
- 3. Retrieve from consistent backend with cache tags as usual

Probably contrib

Comment #15

david_garcia CreditAttribution: david_garcia commented 10 November 2015 at 20:22

File	Size
2611400-fastchained-alt.patch	30.36 KB

What about Cache Tags on top of ChainedFastBackend? See attached patch.

Summary of changes in the patch:

ChainedFastBackend is used again during install
ChainedFastBackend invalidations do not affect the environment that has made the invalidation, only other environments - fastBackend is not completely cleared when doint a set() anymore.
CacheTags are now built on top of ChainedFastBackend, so that they need not to be read on every single request from persistent storage + you get all the improvements done to ChainedFastBackend.
Because cache tags now are built on top of ChainedFastBackend persistent storage is never hit when reading cache tags. You only need to rebuild the fastCache tag storage if another webhead/process invalidates a cache tag.

Still needs work, but the current patch will suffice to do benchmarks to check out the performance impact of the change.

To be able to further optimize tag invalidations on top of cache backends, these should be expanded to include counter()/increment() functionalities. After all, underlying cache backends such as APC/Wincache/Couchbase already support doing increments(). The problem here is the way cache backends store their data.

Comment #16

david_garcia CreditAttribution: david_garcia commented 10 November 2015 at 23:50

File	Size
2611400-fastchained-alt.patch	19.41 KB

Fixed a bug + cleaned up unused code.

I did some very basic profiling and these were the results.

Clean install - with quick edit module uninstalled - and 6 nodes.

- Open the home page and the edit page of a node on different windows.
- Warm both them up.
- Profile home page.
- Save the node from the edit page.
- Profile the home page.

Without the patch - current head:

Wall time:

Warm home page: 208ms
After saving a node: 375ms

Queries:

Warm home page: 93
After saving a node: 160

With the patch:

Wall time:

Warm home page: 172ms (-36ms ~ -17%)
After saving a node: 341ms (-34ms ~ -10%)

Queries:

Warm home page: 47 (-46 ~ -50%)
After saving a node: 106 (-54 ~ -33%)

I expect this savings to be greater the more content the page, the more cache tags are involved and of course, the more binaries you route to ChainedFastBackend (only the default ones during the tests).

Now this needs 3d party opinion and profiling (just to make sure I did not mess up too much).

Of course this will not pass tests as is, and there is still some work to be done.

Greetings!

Comment #17

Fabianx CreditAttribution: Fabianx at Tag1 Consulting commented 11 November 2015 at 11:28

Nice results!

It makes a lot of sense to store cache tags in the same fashion.

Beware of one thing, APCu / memcache / redis (to some extent) are volatile backends, which means that it is difficult to decide between:

NULL and 0 as a checksum provider.

There is two strategies:

- Write through to MySQL backend (as done here)
- But then also fetch any NULL entries from MySQL backend (to see if they vanished in between)

- Write all tags only to volatile backend during cache get
- Then if encountering NULL => we know its a "cache miss"

I still think overall the whole cache tags API could need/use an "atomic counter" concept for the BE storage, the same as the fast chained backend itself could use it.

Obviously there are many ways to achieve the same goals, so these are just some ideas / thoughts.

Keep up the great work!

Comment #18

david_garcia CreditAttribution: david_garcia commented 11 November 2015 at 12:19

I still think overall the whole cache tags API could need/use an "atomic counter" concept for the BE storage, the same as the fast chained backend itself could use it.

The "lastWriteTimestamp" in fastChained backend does that job. It serves as a trigger to notify other webheads that they need to refresh their fastBackend for a specific binary.

If tags sit on top ChainedFastBackend, then they also have this out-of-the-box.

The real issue making cache tags work on top the current cache backend is that we added an overhed when processing cache tags that was not there before.

A tag stored in Wincache/APCu as a raw value takes up 16B, while a "drupal" cache object containing that data takes up 196B. That's x12 increased storage needs. That means that we could previously store up to 65,000 tags on 1Mb of memory, but now we can only store 5,000 tags. I can easily imagine an application with hundreds of thousands, if not millions, of cache tags. 1 Million tags took 15Mb before the patch, and now it takes 200Mb. Talking about APCu/Wincache that is a big leap.

Furthermore, by storing cache tags as raw values, we can leverage touch()/counter() native functions of the underlying storage backends, and won't need all the processing overhead of the current storage backends (serializing/unserializing, perparing items, etc.)

The only way to overcome this is to have new and simple Cache Backends, the patch in #15 had an interface for this: CacheRawBackendInterface.

These backends would be like the current ones, but much much more simple - literally just a bridge to the underlying storage. They would store values - when possible - using their native respresentation allowing us to expose counter()/touch() from the underlying storage directly.

Comment #19

Fabianx CreditAttribution: Fabianx at Tag1 Consulting commented 11 November 2015 at 19:07

#18: Yes, but cache tags don't need the full cache API. Overall they are just atomic counters. That is what I mean with that API.

Comment #20

david_garcia CreditAttribution: david_garcia commented 11 November 2015 at 20:17

File	Size
2611400-fastchained-alt2.patch	60.16 KB

1 file was hidden/shown/deleted

File	Size
2611400-fastchained-alt.patch	30.36 KB

This is my last attempt before finally choosing whats the right way to solve this before it gets out of hand :)

In this last patch I've added a new Cache API :( that is way much simpler than the current Cache API, but is faster, uses less storage and leverages native capabilities of the storage engines such as increment/counter and touch (reseting expiration without setting).

Of course, the old ChainedFastBackend does not work on top of this new API because there are no timpestamps to use for invalidation. so other invalidation strategy is used.

In the end there's a new Cache API "CacheRawBackendInterface" and a new ChainedFastBackend "ChainedFastRawBackend" that are used by the CacheTagInvalidator to store tags.

The old ChainedFastBackend is still used as the regular cache backend.

This new Cache API might have other uses such as abstracting the ApcuFileCacheBackend.php away from APCU, or whenever we need something really fast that does not have all the added weight of the regular Cache API.

For anyone willing to profile this, the last patch contains both approaches and a switch to use one or another.

Simply apply the patch and in __construct() of CacheCacheTagsChecksum class modify the $strategy variable:

public function __construct($connection) {
    // TODO:: THIS IS A QUICK WORKAROUND TO TEST THE SOLUTION.
    $bin = 'tags';
    $strategy = 'new-cache-api';
    switch($strategy) {
      case 'new-cache-api':
        [...]
        break;
      case 'old-cache-api':
        [...]
        break;
    }
  }

If you apply the patch on an already installed system, you need to rebuild the service container.

Comment #21

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers at Acquia commented 11 November 2015 at 21:10

Without the patch - current head:

Wall time:

Warm home page: 208ms
After saving a node: 375ms

Queries:

Warm home page: 93
After saving a node: 160

With the patch:

Wall time:

Warm home page: 172ms (-36ms ~ -17%)
After saving a node: 341ms (-34ms ~ -10%)

Queries:

Warm home page: 47 (-46 ~ -50%)
After saving a node: 106 (-54 ~ -33%)

Wow.

Can you describe your environment? Operating system, PHP version, rough hardware description, etc.

Awesome work here!

Comment #22

david_garcia CreditAttribution: david_garcia commented 11 November 2015 at 21:51

Can you describe your environment? Operating system, PHP version, rough hardware description, etc.

IIS/Windows 8.1/PHP 5.6.15/APCu/Zend Opcache/MySQL 5.6.26 - Probably not properly setup because I never never never use APCu, Zend Opcache or MySQL. But still the changes between runs are significant.

Profiled with uProfiler+XHGUI with profile_bultin_functions=false.

There is still work to be done, but current patch should be enough to evaluate the performance impact of the change. Someone needs to confirm if those benchmarks are reliable or I simply messed up.

Even if the edge was real it is going to be impossible to get this properly done in less than 9 days. But at least there will be a clear path on how to make cache tags stop hammering the database like there is no tomorrow.

Comment #23

catch

he/him

English

CreditAttribution: catch commented 12 November 2015 at 11:42

This is something we can definitely do in a minor release, and might be able to do in a patch release - new code is added but there's no deprecation and it's mostly implementation details.

Comment #24

david_garcia CreditAttribution: david_garcia commented 12 November 2015 at 23:09

Issue summary:

View changes

Comment #25

david_garcia CreditAttribution: david_garcia commented 13 November 2015 at 01:26

File	Size
2611400-fastchained-alt2.patch	61.61 KB

Fixed a few bugs and implemented some of the pending stuff.

Comment #26

david_garcia CreditAttribution: david_garcia commented 17 November 2015 at 02:43

Just for the record - I am running a backported version of the ChainedFastBackend in this patch in production sites and seeing issues with Memcache stampeede protection in ChainedFastBackend::getLastWriteTimestamp.

Comment #27

vurt CreditAttribution: vurt as a volunteer and at Computer Manufaktur GmbH commented 24 November 2015 at 17:41

I repeated the test scenario on Debian Jessie (PHP 5.6) with a remote (well tuned) MySQL-Server.

One node

Wall time:

Warm home page: 316ms
After saving a node: 663ms

Queries:

Warm home page: 76
After saving a node: 171

With the patch:

Wall time:

Warm home page: 309ms
After saving a node: 502ms

Queries:

Warm home page: 54
After saving a node: 75

1000 nodes

Wall time:

Warm home page: 389ms
After saving a node: 767ms

Queries:

Warm home page: 76
After saving a node: 148

With the patch:

Wall time:

Warm home page: 389ms
After saving a node: 510ms

Queries:

Warm home page: 54
After saving a node: 85

I think the savings are impressive. Nice work! Each database query that can be saved is worth a lot. The database is the hardest part to scale...

Comment #28

david_garcia CreditAttribution: david_garcia commented 24 November 2015 at 20:26

That is ~38% faster and ~50% less queries after "things change", looks promissing

And I have some ideas on how this can be further improved:

- Store/Retrieve invalidation timestamps for all binaries in a single place, and retrieve them all at once.

and/or

- Have a global invalidation timestamp - for all binaries - so that we do not retrieve this timestamp for every single binary unless it is strictly necessary.

I got the stampeede protection issue solved on my D7 backport of ChainedFastBackend - that is working like a charm on production.

Comment #29

david_garcia CreditAttribution: david_garcia commented 24 December 2015 at 16:40

I need this working - and maintainable - before the end of January so I moved it to contrib:

https://www.drupal.org/project/supercache

Besides, it might be good to mature this thing somewhere else before considering it in core.

Comment #30

xjm

she/her

English

CreditAttribution: xjm at Acquia commented 30 March 2016 at 15:04

Issue tags:

-rc target triage

Comment #31

30 March 2016 at 15:04

Version:

8.0.x-dev

» 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #32

Fabianx CreditAttribution: Fabianx at Tag1 Consulting commented 24 June 2016 at 22:05

#29: I am currently working on improving chained fast and my thoughts go into a similar direction.

Also we should potentially support async in the future. So if we added a completely new Cache API V2 we should be sure we can make use of async cache fetches in the future.

However:

This issue is complicated and also the supercache is pretty complicated.

Can we split this up into tiny steps instead of a full re-architecture?

Yes, the raw cache backends are awesome, but I cannot see why they cannot be implemented on top of the existing structure.

Comment #33

david_garcia CreditAttribution: david_garcia commented 25 June 2016 at 07:01

This issue is complicated and also the supercache is pretty complicated.

True... it must be split into smaller pieces to deal with.

Yes, the raw cache backends are awesome, but I cannot see why they cannot be implemented on top of the existing structure.

The first step would be to see what to do with rawcache. Rawcache is much like the current cache backends but aiming at:

Being faster (no cache tags)
Allow to consume the very common touch/counter functionalities of cache backends (redis, couchbase, wincache, apc all implement this in one way or another). To achieve this you need to store cache items natively which, as a side effect, removes the ability to store any sort of metadata along with the cache item itself.

I've reviewed the implementation in Supercache and I believe rawcaches should be even simpler than what they are now, and remove support for a TLL in cache items (you could have some sort of global TLL just to prevent very old cache data from hanging around) but per-item TTL makes no sense, especially now that the regular cache backends support cache tags.

Even this could more or less be built on top of the current cache backends, it would be confusing and even more complex. You could - for example - store the cache item metadata as another cache item itself in the storage backend.

Comment #34

Fabianx CreditAttribution: Fabianx at Tag1 Consulting commented 25 June 2016 at 08:16

#33 Thanks for the top-level explanation.

I also realized that it would save a lot of write-traffic if the data in the cache currently is the same (and had not been invalidated) to not write it through and yes the timestamp makes that pretty impossible - though just refreshing the timestamp would still be simpler.

It would still be good if we started the other way round:

What can we do with the cache backends we have right now and then gradually work our way to where we can just plug-in the raw cache backend API and be done.

Comment #35

25 June 2016 at 08:16

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #36

Fabianx CreditAttribution: Fabianx as a volunteer commented 13 September 2016 at 10:36

Version:

8.2.x-dev

» 8.1.x-dev

I have check the patch again and it needs work:

The reason is that

- For ChainedFast it is possible to invalidate at the end of the request. The biggest problem is:

If you have process A taking 2s and process B taking 30s on web heads A and B (or on drush and cli), when we write invalidations at the end of the request, then all caches build with data from A is invalid.

Example (that made it more clear to me):

- Block has text 'HI'

- Block is stored in A and B with 'Hi'

- Block is updated in B with 'Hello World' and B takes 30s, the cache for the block is invalidated, but the timestamp / counter / cache tags are not updated until end-of-request.

- A is retrieving the Block with text 'Hi' from APCu, because the backend is not invalidated.

- A is storing the Block text 'Hi' in the page cache for '/'.

Now B finishes and invalidates the bin counter and the cache tags.

- Now the Block in A is invalid (both in APCu and the DB) and the cache in '/' is also invalid.

So 30s of caches we can't use as it is just now invalid, but on the other hand we could use the older data during that time.

But it depends the 'when-do-users-see-new-content' on the longest running request.

A possible solution for that is a pseudo-timeout listener in core / utility component, which means that you can do e.g.:

\Drupal::service('timeout')->setTimeout($callable, 500);

We would need to add code to common services to call the timeout service to check if something is scheduled, but that would work as we have some very often called functions, e.g. hook invocations, event listeners, database layer, cache layer, etc.

I likely would start with the hook / event listeners as its unlikely something calls no hooks.

One problem is guzzle, because that can lead to a long wait time for a third party service, but we could flush all data before that.

Comment #37

catch

he/him

English

CreditAttribution: catch at Tag1 Consulting, Third and Grove commented 13 September 2016 at 11:21

Version:

8.1.x-dev

» 8.3.x-dev

Here's a possible approach that should be low-risk:

Process A writes, last_write_timestamp =1

Process A writes again, checks last_write_timestamp - it’s still 1, so doesn't bother to increment again.

Process B writes, it hasn’t written already, so last_write_timestamp gets updated to 2.

Process A writes again, last_write_timestamp is 2, this is different from 1, so it writes it as 3.

That way if you're doing something with drush locally, you'd only get a single invalidation per-request (at the start, rather than the end).

It would mean more reads from the backend doing the setting, but they should be cheap compared to whatever else it's doing if it's writing cache items.

Comment #38

Fabianx CreditAttribution: Fabianx as a volunteer commented 13 September 2016 at 11:34

The problem with invalidating on the end-of-request and why supercache is flawed in that respect (unless my example can be disputed) is when using cache collectors, because those don't use invalidation, but just cache setting.

Cache collectors, create a lock, get the data(), merge key into the array(), set the data(), release the lock()

However if the invalidation is happening just at the end of the request, then the cache collector has written invalid data, which is valid.

So either we need a ->merge() operation on our caching layer, which means the CF backend would only read from the consistent backend for the cache_get().

That has the downside that you always have a DB query, even though the data would match the rest of the bin still perfectly.

We set a cache tag for the bin itself, which we increment on invalidations and store within the cache item.

But that has the downside that the cache collector could become completely invalid not just on the fast, but also in the consistent backend.

We do what catch suggested in #37, which is indeed very low-risk.

But as it stands supercache (and the patch in this issue) would - according to my understanding, not work properly with cache collectors as invalid data could be persisted in the end.

I like all the ideas however very much, also I think we could just extend our cache backends with ->counter(), etc.

Comment #39

david_garcia CreditAttribution: david_garcia commented 13 September 2016 at 12:47

@Fabianx, @Catch the underlying code for the patch that went into (https://www.drupal.org/project/supercache) has had since then lot's of bug fixes and improvements.

I believe the approach for this issue should be somewhat merge what that module does back into core, and ditch the original patch posted here.

Comment #40

david_garcia CreditAttribution: david_garcia commented 13 September 2016 at 12:56

Example (that made it more clear to me):

- Block has text 'HI'

- Block is stored in A and B with 'Hi'

- Block is updated in B with 'Hello World' and B takes 30s, the cache for the block is invalidated, but the timestamp / counter / cache tags are not updated until end-of-request.

- A is retrieving the Block with text 'Hi' from APCu, because the backend is not invalidated.

- A is storing the Block text 'Hi' in the page cache for '/'.

Now B finishes and invalidates the bin counter and the cache tags.

- Now the Block in A is invalid (both in APCu and the DB) and the cache in '/' is also invalid.

If cache tags are properly setup, won't the block in A (that was generated with outdated data) and the page cache in '/' be automatically invalidated the moment B persists cache tag invalidations?

BTW: for batch invalidation of tags using the database as a persistent storage, an issue should be open to allow for that first (https://www.drupal.org/node/2730871).

Indeed, if you are using DB for persistent storage you won't be making the most out of Supercache.

Comment #41

Fabianx CreditAttribution: Fabianx as a volunteer commented 13 September 2016 at 13:25

#40 Thanks for the response.

Even when reading the new supercache code, I think there is still a race condition related to the timestamp.

HEAD broken example:

- Any invalidation in the request => $this->last_invalidation = now(), $this->lastWriteTimestamp = now(), $consistent->get('last_write_timestamp') == now()

AND now() == t(42)

- Now request B comes in, reads t(42) and $this->last_invalidation = NULL, $this->lastWriteTimestamp == $consistent->get('last_write_timestamp') == 42

- So B reads 'foo' from consistent, it stores:

$fast_backend->set('foo', ['bar']) // t(80)

- Now we have a cache->set('foo', ['bar', 'baz']);

- And $created is internally set to t(100) on 'A' and in Consistent Backend

- We see that $this->last_invalidation === $consistent->get('last_write_timestamp'), but we always write, so now() == 120

- Now request on head B reads 'foo' _again_ to write the new data (which is typical for cache collectors), but $this->lastWriteTimestamp == 42 still and statically cached.

- Now request B checks 'foo', which has t(80), which is > 42 => valid

=> B has read invalid data

=> HEAD fails for our static caching case, too.

=> => => Cache collectors are evil

---

Or rather cache collectors MUST have a current timestamp, so:

lock();

// This is new.
if ($this->cache instanceof ResettableCacheInterface) {
  $this->cache->reset();
}

$cache = $this->cache->get($cid);
$items = $cache ? $cache->data : array();
$items = array_merge($items, $updated);
$this->cache->set($cid, $items);

unlock();

Comment #42

Fabianx CreditAttribution: Fabianx as a volunteer commented 13 September 2016 at 13:38

#40: I made more examples now.

The first was just a problem with invalid caches that need to be thrown away, but cache collectors are just plain broken by this.

Usually we have the pattern:

- cache miss => rebuild => cache set (we never read before write)

And for dependent data:

- cache miss => rebuild => cache set + tags for dependencies we read in rebuild

Any cache write for dependencies must also invalidate the tag first, so we usually do _just_ invalidate and rebuild on demand and mostly depend on cache tag based invalidation now.

So usually invalidation is completely different from updating the cache. They are two different operations.

That works quite well.

But cache collectors do:

- cache get - cache miss - partial rebuild - ... - end of request - lock() - cache get - update - cache set - unlock()

And while we accept a certain degree of race there, due to statically caching the partial rebuild, the time between cache get and cache set is quite short, while a statically cached timestamp could be valid and is very likely valid due to the first cache get().

Comment #43

Fabianx CreditAttribution: Fabianx as a volunteer commented 13 September 2016 at 18:24

The main reason why catch's approach cannot work with the current way this works, is that the fast backend creates a _NEW_ expire timestamp(), as if it was valid now().

e.g. we probably need a ChainedFastCacheableBackendInterface for APCu and add:

$this->fastBackend->setWithMetadata($item->cid, $item->data, [], ['created' => $item->created]);

$this->fastBackend->setWithCreated($item->cid, $item->data, [], $item->created);

if we are sure we never need to overwrite any other metadata.

BECAUSE IF ->created always matches what is in the DB or the timestamp if it is larger, then we still have race conditions, however they are less, because reading from consistent and writing to fast is no longer an operation that refreshes the cache item. So this is likely something we should do in any case.

- Cache item enters cache at t=20 in some web head

Case A:

- A invalidates the bin for some other item (t=40)
- B reads the item from consistent, and stores it with t=40 (=max(20,40)) into its fast cache backend
- A invalidates the bin again (t=60), but does not write the ts
- B reads the item from fast cache backend as ->created t=40 matches t=40 of last_write_ts
- B writes it again, the item was valid, but the bin was not => FINE

Case B: Clear invalidation / Delete

- A invalidates the bin for some other item (t=40)
- B reads the item from consistent, and stores it with t=40 (=max(20,40)) into its fast cache backend
- A invalidates the cache item itself by clearing it (t=60), but does not write the ts
- B reads the item from fast cache backend as ->created t=40 matches t=40 of last_write_ts
- B writes it again, the item was valid, but the consistent item was not => NOT FINE

=> on clears we need to write through to the bin in every case

Case C: Cache->set()

- A invalidates the bin for some other item (t=40)
- B reads the item from consistent, and stores it with t=40 (=max(20,40)) into its fast cache backend
- A sets the cache item to something else (t=60), but does not write the ts
- B reads the item from fast cache backend as ->created t=40 matches t=40 of last_write_ts
- B writes it again, the item was valid, but the consistent item was not => NOT FINE

=> on set we need to write through to the bin in every case

Case D: Cache Tag invalidation

- A invalidates the bin for some other item (t=40)
- B reads the item from consistent, and stores it with t=40 (=max(20,40)) into its fast cache backend
- A invalidates the tags of the item (t=60), but does not write the ts
- B reads the item from fast cache backend as ->created t=40 matches t=40 of last_write_ts and as there are no tags, cache tag invalidation does not interest
- B writes it again, the item was valid, but the consistent item was not => However as the cache tags had been cached as well from the first consistent read, this works by chance as it writes the item with the old cache tags, ts = 60, fast backend has ts = 60 information.

But if B reads by chance again just from the fast backend on the next request and then writes, then the data will be stored with the right cache tags suddenly. As the invalidation of the cache tag was never seen.

So the idea to store created with this, does not help to make catch' idea workable :/.

I think its possible to write test cases for the cases above.

Comment #44

david_garcia CreditAttribution: david_garcia commented 14 September 2016 at 07:20

Comment #45

catch

he/him

English

CreditAttribution: catch at Tag1 Consulting, Third and Grove commented 21 September 2016 at 20:56

Couple more ideas:

1. In the fast backend, keep a lastValidTimestamp and lastWriteTimestamp. lastValidTimestamp is compared against cache->created. lastWriteTimestamp is compared with the timestamp in the consistent backend, as long as they're the same, we don't update lastValidTimestamp. This ought to address the original concern in the issue summary with local writes, but without requiring cache tags in the fast backend.

2. When we cache set, get lastWriteTimestamp from both backends as well as lastValidTimestamp, compare them, and if lastValidTimestamp is updated, throw away the write. This starts to help with the race condition Fabianx mentions.

Cache collectors still need some help to get a fresh cache entry when they request it the second time (i.e. just before the write, because we want to cumulatively merge cache items). I wonder whether we could always re-fetch lastWriteTimestamp whenever a duplicate cache ID is requested (requires keeping an array of already-seen cache IDs). I've seen non-cache-collector cases in the wild where the same cache item is fetched over and over in a request, but it's usually a symptom of a problem rather than by design - we static cache most things.

Comment #46

Fabianx CreditAttribution: Fabianx as a volunteer commented 21 September 2016 at 21:34

#45

2. The problem is that we can also have dependent caches:

e.g.

function a() {
  if ($cached = $cache->get('a')) {
    return $cached->data;
  }

  $a_data = calc_a();
  $cache->set('a', $a_data, Cache::PERMANENT, ['tag:a']);
}



function b() {
  if ($cached = $cache->get('b')) {
    return $cached->data;
  }

  $a_data = a();
  $b_data = calc($a_data);
  $cache->set('b', $b_data, Cache::PERMANENT, ['tag:a', 'tag:b']);
}

But because we throw away the cache tags in the fast backend and rely purely on the last-write timestamp, we never know that b depends on a.

Therefore just checking for duplicate get() (to detect CC pattern) is not enough, but we indeed have to throw away all set's() that try to set data once invalid data has been read at all from this bin.

And for this scenario not even reset() would help as even if we read a new timestamp, the data written can still be wrong.

TL;DR: If we want to support dependent caches with chained fast, then we need to push the cache tags and validate from the fast backend.

Now of course we could 'cache the cache tags' in the fast backend, but that is another discussion again.

The advantage there is that we would know a read conflict and could invalidate an invalidly written item.

Basically I think we run into the problems with both ACID and CAP here - however for a limited subset functionality and especially just cli() + web, fast chained might still work okay enough.

But therefore I think for many web servers https://www.drupal.org/project/lcache will be the better choice for 8.3.x / 8.4.x as it seems to have a good theoretical background for the concurrency model, while chained fast does seem like it does (time stamps), but actually due to the implementation has several short comings.

Obviously it makes zero sense to read the timestamp every time as it would be a round-trip to the database every time.

Another advantage is that with lcache you just have one round-trip to the database (check that there are zero transactions on a different web head) if only one web server is used, because drush does only need to write through to the transaction log, but can't read from APCu due to missing SHM segment so would not need to be put in sync.

Caveat: I have not yet checked lcache for the e.g. cache collector problem and how it resolves conflicts on the same data and dependent data.

In practice we could and should implement the same for chained_fast though, because the one webhead + cli is way easier to deal with than concurrent web heads in general.

e.g. if its the same head that wrote => no invalidation needed ever
if it is drush that wrote => always invalidate

There is still potential for races, but way more manageable.

Comment #47

catch

he/him

English

CreditAttribution: catch at Tag1 Consulting, Third and Grove commented 21 September 2016 at 22:07

Therefore just checking for duplicate get() (to detect CC pattern) is not enough, but we indeed have to throw away all set's() that try to set data once invalid data has been read at all from this bin.

That's what #45.2 should deal with though. i.e. we optimize the case where we've only written locally, but we're more aggressive about throwing away items where another web head has written in-between. Between the two we should get better consistency with ~neutral performance.

The issue with cache collectors is that in this case, we don't want to throw away the write (unless there was an invalidation), but we do want to get a fresh item before we write back.

Comment #48

catch

22 September 2016 at 12:52

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #61

andypost

he/him

Russian

CreditAttribution: andypost as a volunteer and at Skilld commented 25 February 2023 at 13:30

The issue is duplicated by #3334489: ChainedFastBackend invalidates all items when cache tags are invalidated

Comment #62

25 February 2023 at 13:30

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.