Problem/Motivation

When you have a cache collector that has very long cache ID then the lock acquire can fail with an exception.

It was actually a weird migration of user roles that caused this for us, but if you have long user role machine names and a user has many roles then this could happen with the LocaleLookup.

Proposed resolution

Hash the lock name if it's > 255 characters. Or is the cache collector implementation responsible for generating a cache ID that's shorter? we actually allow cache ID's that are longer now and they're automatically hashed.

Remaining tasks

User interface changes

API changes

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Berdir’s picture

Note: Until #2501183: LocaleLookup cache ID is using numeric indexes of the roles field, not role ID's is fixed, you actually need 100 roles or so for a user to trigger this ;) But it will be easier after that.

alexpott’s picture

Priority: Normal » Major

I think this is a major bug given that we have no limit on cid length.

jhedstrom’s picture

We actually have a test now that is throwing this exception, but not failing. #2535774: Error handler swallows catchable fatal errors surfaces this failing.

jhedstrom’s picture

Status: Active » Needs review
FileSize
622 bytes

Probably not the best test in the world, but it illustrates the hidden exception.

jhedstrom’s picture

This exception is actually not due to cid length, but rather to invalid characters. Hashing here would solve both length and characters.

catch’s picture

Issue tags: +rc target triage

Status: Needs review » Needs work

The last submitted patch, 4: 2501191-04-TEST-ONLY.patch, failed testing.

jhedstrom’s picture

Status: Needs work » Needs review
FileSize
3.05 KB
2.44 KB

This adds a more appropriate integration test of the cache collector. I added a @todo because I'm not sure why a cache get would still work after the destruct method is called. This method is what writes to the semaphore table (by way of lock), so we want that in the test.

Status: Needs review » Needs work

The last submitted patch, 8: 2501191-08-TEST-ONLY.patch, failed testing.

jhedstrom’s picture

Status: Needs work » Needs review
FileSize
4.81 KB
1.76 KB

This adds a cid/lock normalization method. It is copied directly from DatabaseBackend, and might make more sense as a static utility method somewhere.

dawehner’s picture

Status: Needs review » Reviewed & tested by the community

Nevermind my review ... you copied existing code from the cache backend, so both points are not valid.

  1. +++ b/core/lib/Drupal/Core/Cache/CacheCollector.php
    @@ -265,6 +266,30 @@ protected function updateCache($lock = TRUE) {
    +    // Nothing to do if the ID is a US ASCII string of 255 characters or less.
    +    $cid_is_ascii = mb_check_encoding($cid, 'ASCII');
    

    IMHO we usually don't require to have mb enabled, right? I guess when mb is not enabled we could just go with the hash directly?

  2. gg

  3. +++ b/core/lib/Drupal/Core/Cache/CacheCollector.php
    @@ -265,6 +266,30 @@ protected function updateCache($lock = TRUE) {
    +    // Return a string that uses as much as possible of the original cache ID
    +    // with the hash appended.
    +    $hash = Crypt::hashBase64($cid);
    

    Do we really need a cryptographic hash? Isn't it enough to go with something like sha512?

Berdir’s picture

There's an existing issue about mb_check_encoding() that results in completely broken sites without mbstring, so I actually that we shouldn't add it in more places until that's fixed.

If we move that code into a helper somewhere then we can discuss to refactor/fix that in the existing other issue but if not, then we should do it correctly i think.

The last submitted patch, 8: 2501191-08-TEST-ONLY.patch, failed testing.

dawehner’s picture

xjm’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/system/src/Tests/Menu/MenuRouterTest.php
@@ -204,6 +204,7 @@ protected function doTestExoticPath() {
+    $this->assertNoText(t('The website encountered an unexpected error. Please try again later.'));

Let's change this to assert a 200 instead. Asserting that there is no error text is not a reliable way to provide coverage as it can yield false positives.

effulgentsia’s picture

Issue tags: -rc target triage +rc target

Discussed with @xjm, and we agree that fixing conditions that trigger exceptions is a great thing to do prior to release, so long as doing so isn't disruptive, which this doesn't appear to be.

jhedstrom’s picture

Assigned: Unassigned » jhedstrom

re #15: That test was mainly meant to illustrate the error existed (in our current tests, before tests for this issue were specifically added). I'll reroll to check for a 200.

jhedstrom’s picture

Assigned: jhedstrom » Unassigned
Status: Needs work » Reviewed & tested by the community

Nevermind, the original error isn't caught by checking for a 200 (that's one reason it has remained hidden in our tests). It properly fails once #2535774: Error handler swallows catchable fatal errors is applied. We can either leave this explicit check for the error message, or remove it entirely since there are now explicit tests here?

(Setting to RTBC to get xjm's attention)

The last submitted patch, 8: 2501191-08-TEST-ONLY.patch, failed testing.

  • catch committed 4a4edff on 8.0.x
    Issue #2501191 by jhedstrom: CacheCollector can result in an exception...
catch’s picture

Status: Reviewed & tested by the community » Fixed

Given this unblocks the error handler issue, committed/pushed to 8.0.x as is.

Let's improve the test coverage in the other issue though.

  • catch committed 4a4edff on 8.1.x
    Issue #2501191 by jhedstrom: CacheCollector can result in an exception...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.