Problem/Motivation
It is possible that invalidating cache tags causes race conditions in combination with slow database transactions, so that an tag is invalidated during a transaction, then another process reads that and fetches the old values from the database before the transaction is committed.
Proposed resolution
Based on #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock, we can delay cache tag invalidation to happen after the database transaction has been committed.
Note: This will need a new version of the core patch that I will upload soon.
Remaining tasks
Write an explicit test for it.
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #25 | redis-delayed-tag-invalidation-3018203-25.patch | 8.83 KB | berdir |
| #24 | redis-delayed-tag-invalidation-3018203-24.patch | 8.83 KB | berdir |
| #17 | redis-delayed-tag-invalidation-3018203-17-interdiff.txt | 786 bytes | berdir |
| #17 | redis-delayed-tag-invalidation-3018203-17.patch | 7.41 KB | berdir |
| #15 | interdiff.txt | 587 bytes | slasher13 |
Comments
Comment #2
berdirHere's a first patch, all cache related tests are passing, the only fails are in the queue implementation, but it looks like core changed/improved the test coverage there.
Comment #3
wim leersComment #4
berdirAnd a second iteration, this also uses that pattern for direct deletions, which are specifically used for deleting cached entities which is the problem we are fighting.
It could be used for other interactions too, but deleteAll(), invalidation and so on are not often used and not during transactions.
Comment #5
berdirTracked down a problem we sometimes had when a cache item was loaded again during the transaction that had been deleted before, this caused some nasty loops or other problems. Forgot to check for scheduled deletions when fetching items again.
This now also relies on the latest iteration of the core patch for the $success argument.
Comment #6
davidwhthomas commented@Berdir, many thanks for the update.
We are also hitting with this issue when the db transaction and cache invalidation is not in sync + stale cache and looking into applying these core + redis patches.
Just noting after applying the latest core patch ( from https://www.drupal.org/files/issues/2019-09-10/2966607-127.patch ) and the redis patch here, this error:
[15-Oct-2019 03:46:37] PHP Fatal error: Class Drupal\redis\Cache\RedisCacheTagsChecksum contains 1 abstract method and must therefore be declared abstract or implement the remaining methods (Drupal\redis\Cache\RedisCacheTagsChecksum::getTagInvalidationCounts) in /.../web/modules/contrib/redis/src/Cache/RedisCacheTagsChecksum.php on line 14"Seems need to also implement
RedisCacheTagsChecksum::getTagInvalidationCountsedit: perhaps that function isn't used so much for redis, but this approach could be ok to add that function, just a suggestion:
Comment #7
berdirYes, this patch has not been updated to be used with the most recent core patch. We're using it together with https://www.drupal.org/files/issues/2019-01-26/2966607-93.patch.
I'll update it once the core patch lands or that version no longer applies.
Comment #8
davidwhthomas commentedThanks Berdir!
Here's an updated patch for now, just in case useful later
Comment #9
berdirThanks, that implementation isn't correct however.
The work I did in the core issue in recent patches allows us to massively simplify the implementation here again, as all the static cache handling is now in the trait and we only need to do the actual work.
interdiff against my latest patch.
Not tested yet.
Comment #10
berdirSame patch but without the .info.yml change.
Comment #11
wim leers#6 + #7: Wow, nice coincidence in reviving this issue today and then having the core patch land the same day :D
I was gonna ask "why not queue tests?" but then I realized you'd need a Redis instance 🤦♂️ Good thing I did not ask that!
Does this need manual testing? I think it's sufficient for Berdir to test this? Does this warrant an
8.x-2.xbranch?Comment #12
davidwhthomas commentedNice, thanks for the clarification! (#9) I'm not so familiar with the update and your insight is much appreciated.
I'm going to test out this latest redis patch together with https://www.drupal.org/files/issues/2019-09-10/2966607-127.patch it looks like a good update.
Just one thing I'd like to clarify, if the cache tag is queued for delayed invalidation in the request, there is logic to prevent queuing it again, however after the transaction and the cache invalidation, want to double-check the same tags be invalidated later in the same request (e.g another node update transaction for the same node in same request?)
The reason/use case being we have some long running processes (drush queue workers for import) if the same node is updated in the same drush request want to be sure the cache can be invalidated later in that same request. That looks to be the case, but good to confirm. (will test it out)
edit: just saw Wim's comment (posted as I was typing this) and the fact the related core patch is now merged, great stuff! Will test it out with that, many thanks guys.
Comment #13
berdir> Does this need manual testing?
The more testing the better. But the existing patch has been tested on large D8 production sites since last february, so I think it's pretty stable.
> Does this warrant an 8.x-2.x branch?
I don't think so, there is no public API change here. We just need 8.8, so I'm going to wait on that being released and then probably commit it.
Comment #14
wim leersIsn't the bump in core requirement sufficient to warrant this? Your call of course, but in the Drupal world we're unnecessarily afraid to bump major versions :D
Ohhhh … you'll be able to specify
core_version_requirement: ^8.8🤓🥳Comment #15
slasher13Make it compatible with php-redis 5
see https://www.drupal.org/project/redis/issues/3068810
Comment #16
davidwhthomas commentedJust noting a followup after some patch testing:
We had some issues with the patch from https://www.drupal.org/files/issues/2019-09-10/2966607-127.patch together with the redis patch in #10 ( on core v8.5.15 )
1. New field config import failed with
drush config-importPerhaps missing field storage or similar information for that step. Config could import later ok via the UI + cache clear
2. Deleting a field to test, the field still showing the field UI list, even though deleted.
3. and also when deleting a comment from a comment list View: (empty comment view row render issue?)
Seems maybe it was expecting cleared cache but the cache wasn't yet cleared in that transaction?
Removing the redis patch but keeping the core patch was ok and allowed both those test cases to pass.
Seems may need a core update to 8.8? or some other update to the redis patch part.
Comment #17
berdirYes, you're right, that didn't work yet at all, as I said, that was completely untested :)
This should work better.
Comment #18
davidwhthomas commentedThanks @Berdir! the updated patch in #17 fixes the issues mentioned in #16 and it appears to be working well. Will continue to test it out but looks good for now. Nice one!
Comment #19
wim leersDrupal 8.8.0 shipped. What's blocking this from getting committed + shipped in a release? :)
Comment #20
berdirWell, because of https://www.drupal.org/project/memcache/issues/2996615#comment-13401697 ;)
To commit this, I need to require 8.8, which I plan to do around 9.0 beta for my projects.
Comment #21
wim leersThat's fair. But … wouldn't it be acceptable in this case to already tag a release now that requires Drupal 8.8? Yes, that would mean dropping 8.7 support, but for 8.7, people can continue to rely on the latest release (1.2)?
Comment #22
berdirThat is correct, and there are different opinions on that among contrib maintainers. https://dev.acquia.com/drupal9/deprecation_status for example says about 8.8 deprecations "Once 8.7 is unsupported", but you could either interpret that as "once your module no longer supports it" or until is actually no longer supported.
Many adopted the core cycle of supporting the version with security support as long as it's supported. I'm somewhere in the middle and start requiring the latest version somewhere inbetween.
One problem with requiring 8.8 early, until we have minor/patch releases for contrib, is that if redis would have a security release then I couldn't go back to requiring 8.7 and would force users that are still on 8.7 to update to 8.8 over night to be safe. I have a plan if that should happen between ~march and 8.9 release for one of my modules but I want to limit the time window a bit ;)
Comment #23
wim leersYep.
That's fair — I would just say that right now it makes sense to weigh the likelihood of a security release in the mean time (very low given the maturity and surface area of this particular module) versus the benefit of getting Drupal 8.8 sites to benefit from this scalability improvement. I personally think the latter means it's worth tagging a new release. It's fine if you don't agree! I just wanted to make the case, that's all :)
Comment #24
berdirHere's a patch that ads the core_version_requirement and also fixes a last deprecation notice and some travis fixes, so I can test it there too: https://github.com/md-systems/redis/pull/31
Comment #25
berdirFixing the default theme, needs classy for now.
Comment #27
berdirSo, after some consideration, I decided to get this in now as it is not just D9 preparation but fixes really nasty inconsistencies.
Comment #28
wim leers🥳