Drupal\Core\Cache\DatabaseBackend->get() can raise an "Undefined array key" error.

------ Original report ------
Sometimes I get the following in watchdog when saving a file to s3. The strange thing is that the file is uploaded.

Warning: Undefined array key "s3fs:uri:s3://attachments/test.pdf" in Drupal\Core\Cache\DatabaseBackend->getMultiple() (regel 121 van /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php)
#0 /data/code/project/www/core/includes/bootstrap.inc(347): _drupal_error_handler_real(2, 'Undefined array...', '/data/code/jobb...', 121)
#1 /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php(121): _drupal_error_handler(2, 'Undefined array...', '/data/code/jobb...', 121)
#2 /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php(92): Drupal\Core\Cache\DatabaseBackend->getMultiple(Array, false)
#3 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1391): Drupal\Core\Cache\DatabaseBackend->get('s3fs:uri:s3://a...')
#4 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1362): Drupal\s3fs\StreamWrapper\S3fsStream->readCache('s3://attachment...')
#5 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1283): Drupal\s3fs\StreamWrapper\S3fsStream->getS3fsObject('s3://attachment...')
#6 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1098): Drupal\s3fs\StreamWrapper\S3fsStream->stat('s3://attachment...')
#7 [internal function]: Drupal\s3fs\StreamWrapper\S3fsStream->url_stat('s3://attachment...', 6)
#8 /data/code/project/www/core/lib/Drupal/Core/File/FileSystem.php(559): file_exists('s3://attachment...')
#9 /data/code/project/www/modules/contrib/s3fs/src/S3fsFileService.php(429): Drupal\Core\File\FileSystem->getDestinationFilename('s3://attachment...', 0)
#10 /data/code/project/www/core/modules/file/src/Upload/FileUploadHandler.php(193): Drupal\s3fs\S3fsFileService->getDestinationFilename('s3://attachment...', 0)
#11 /data/code/project/www/core/modules/file/file.module(846): Drupal\file\Upload\FileUploadHandler->handleFileUpload(Object(Drupal\file\Upload\FormUploadedFile), Array, 's3://attachment...', 0)
#12 /data/code/project/www/core/modules/file/file.module(722): file_save_upload('field_objection...', Array, 's3://attachment...', NULL, 0)
#13 /data/code/project/www/core/modules/file/file.module(1447): _file_save_upload_from_form(Array, Object(Drupal\Core\Form\FormState))
#14 /data/code/project/www/core/modules/file/src/Element/ManagedFile.php(76): file_managed_file_save_upload(Array, Object(Drupal\Core\Form\FormState))
#15 /data/code/project/www/modules/custom/project/src/Plugin/Field/FieldWidget/AdditionsFileWidget.php(324): Drupal\file\Element\ManagedFile::valueCallback(Array, Array, Object(Drupal\Core\Form\FormState))
#16 [internal function]: Drupal\project\Plugin\Field\FieldWidget\AdditionsFileWidget::value(Array, Array, Object(Drupal\Core\Form\FormState))
#17 /data/code/project/www/core/lib/Drupal/Core/Form/FormBuilder.php(1268): call_user_func_array(Array, Array)

Steps to reproduce

$cache = \Drupal::cache();
$cache->set('test1', 'lowercase');
$cache->get('test1'); //  Good here
$cache->get('test1 '); // The trailing space causes an issues because mysql ignores trailing spaces on primary keys and will return a result of cid 'test1' instead of 'test1 '

Proposed resolution

When MySQL queries with a condition on a varchar, and the string has a trailing space, MySQL will return a result where there is a value for the same string without a trailing space. In the database cache backend, this creates a mis-match between the passed in cache IDs and the returned cache ID, resulting in a PHP warning.

When we normalize cache IDs, we avoid hashing ASCII cache IDs and send the literal. When the cache ID has a trailing space, we can hash that too, then it'll genuinely be unique (i.e. return a cache miss).

Test only job: https://git.drupalcode.org/project/drupal/-/jobs/2310869

Issue fork drupal-3368537

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

sboden created an issue. See original summary.

cmlara’s picture

Title: "Undefined array key" while saving » "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get
Project: S3 File System » Drupal core
Version: 8.x-3.2 » 11.x-dev
Component: Code » cache system
Priority: Minor » Normal
Issue summary: View changes
Related issues: +#2909174: Check if cache cid exist, +#2352207: Database cache backend does not treat cid as case sensitive, +#2833951: Issue with directories and trailing whitespaces

I'm going to send this over to Drupal Core as this error itself would appear to originate from the cache system.

If we could see the file URI that would be helpful for the core developers.

I was able to reproduce this with the following code (going off my knowledge of database primary key limitations for mysql) which may or may not be the cause of the original post

$cache = \Drupal::cache();
$cache->set('test1', 'lowercase');
$cache->get('test1'); //  Good here
$cache->get('test1 '); // The trailing space causes an issues because mysql ignores trailing spaces on primary keys and will return a result of cid 'test1' instead of 'test1 '

https://dev.mysql.com/doc/refman/5.7/en/char.html for documentation on trailing pad characters and character primary keys.

@sboden Note that even if this is space related that even if core fixes this we will still not support paths with trailing spaces until 4.x at the earliest, see #2833951: Issue with directories and trailing whitespaces. While we have work to do in s3fs, the cache system is suppose to handle converting strings into a form that can be stored/retrieved so a trailing slash in a cache::get() is not to my knowledge a fault in s3fs.

twod’s picture

We get the same type of error when page cache stores the results of a view with an exposed filter of some value, and you filter again for the same string with a different combination of uppercase/lowercase characters.

Page cache generated a cache id with the first search string and stored that in the cache table via the database backend.
The next time page cache asks for the same string (only case differs) the database backend performs a case-insensitive query and returns the original item, which has a different cache id compared to what it put in the mapping array.

IIRC MariaDB, MySQL and SQLite all do case-insensitive queries by default, unsure about PostgreSQL, so maybe it would be easiest to make the database backend just lowercase the cids when normalizing them.
Other cache backends, like memory, will of course treat these as different cache ids. The interface does not explicitly mention how this is intended to work so maybe we can get away with that?

twod’s picture

There is indeed a test which validates that all cache backends have case-sensitive cache ids in GenericCacheBackendUnitTestBase, but what makes it case-[in]sensitive is (obvious in hind-sight), the selected collation for the cid column.

We were using utf8mb4_sv_0900_ai_ci, and switching to utf8mb4_sv_0900_as_cs for just that column on the cache tables fixes the issue with notices for us.

alexpott’s picture

Can people who have the problem originally reported here confirm the following details:

  1. What database you using - including version - you can get this information on admin/reports/status
  2. If you connect to your database run the query select cid from cache_page limit 1; and then use the cid returned to a select where you have changed the case of the cid select cid from cache_page where cid = 'cid_case_changed';
  3. The output of the SQL statement show create table cache_page; (mysql) or the CLI command pg_dump --table cache_page --schema-only (postgres)

For the create table I get

| cache_page | CREATE TABLE `cache_page` (
  `cid` varchar(255) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '' COMMENT 'Primary Key: Unique cache ID.',
  `data` longblob COMMENT 'A collection of data to cache.',
  `expire` bigint NOT NULL DEFAULT '0' COMMENT 'A Unix timestamp indicating when the cache entry should expire, or -1 for never.',
  `created` decimal(14,3) NOT NULL DEFAULT '0.000' COMMENT 'A timestamp with millisecond precision indicating when the cache entry was created.',
  `serialized` smallint NOT NULL DEFAULT '0' COMMENT 'A flag to indicate whether content is serialized (1) or not (0).',
  `tags` longtext COMMENT 'Space-separated list of cache tags for this entry.',
  `checksum` varchar(255) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL COMMENT 'The tag invalidation checksum when this entry was saved.',
  PRIMARY KEY (`cid`),
  KEY `expire` (`expire`),
  KEY `created` (`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Storage for the cache API.' |

And although the collation for the table is utf8mb4_0900_ai_ci the queries are case-sensitive.

mysql>  select cid from cache_page where cid = 'http://drupal8alt.test/fr:';
+----------------------------+
| cid                        |
+----------------------------+
| http://drupal8alt.test/fr: |
+----------------------------+
1 row in set (0.00 sec)

mysql>  select cid from cache_page where cid = 'Http://drupal8alt.test/fr:';
Empty set (0.00 sec)

That's because the cid column has it's pwn collation of ascii_bin.

catch made their first commit to this issue’s fork.

catch’s picture

The trailing space issue is easy to reproduce without needing any special collation, i.e. the steps given in #2. I've pushed some test coverage that should show it on gitlab too hopefully.

catch’s picture

Here we go:

https://git.drupalcode.org/project/drupal/-/jobs/2308350

Fail      Other      phpunit-55.xml       0 Drupal\KernelTests\Core\Cache\Datab
    PHPUnit Test failed to complete; Error: PHPUnit 10.5.29 by Sebastian
    Bergmann and contributors.
    
    Runtime:       PHP 8.3.10
    Configuration: /builds/project/drupal/core/phpunit.xml.dist
    
    ..........W...                                                    14 / 14
    (100%)
    
    Time: 00:11.572, Memory: 8.00 MB
    
    1 test triggered 1 PHP warning:
    
    1)
    /builds/project/drupal/core/lib/Drupal/Core/Cache/DatabaseBackend.php:139
    Undefined array key "trailing-space-test"
    
    Triggered by:
    
    * Drupal\KernelTests\Core\Cache\DatabaseBackendTest::testSetGet
     
    /builds/project/drupal/core/tests/Drupal/KernelTests/Core/Cache/DatabaseBackendTest.php:51
    
catch’s picture

Priority: Normal » Major
Status: Active » Needs review

And pushed a separate commit with a suggested fix - comment could probably use some work, I couldn't find MySQL docs specific to treatment of trailing spaces in indexes as opposed to comparisons in general.

Major because this is potentially contributing to the 404 issue reported in #3456244: Normalize the incoming path with urldecode directly in RouteProvider.

cmlara’s picture

All MySQL collations are of type PAD SPACE. This means that all CHAR, VARCHAR, and TEXT values are compared without regard to any trailing spaces. “Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant.

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters results in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.

https://dev.mysql.com/doc/refman/5.7/en/char.html

catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes

alexpott’s picture

Status: Needs review » Reviewed & tested by the community

This looks great now. Nice reviews @cmlara and nice MR @catch.

catch’s picture

Title: "Undefined array key" error from Drupal\Core\Cache\DatabaseBackend->get » Ensure trailing whitespace at the end of a cache ID results in a unique cache item

Oh good spot on the whitespace, I briefly noticed it but thought it was my eyes playing tricks on me instead of an actual bug :(

Retitling for the actual problem we're fixing.

  • larowlan committed fb2469d3 on 11.x
    Issue #3368537 by catch, TwoD, alexpott, cmlara, sboden: Ensure trailing...

  • larowlan committed ea0088d7 on 10.4.x
    Issue #3368537 by catch, TwoD, alexpott, cmlara, sboden: Ensure trailing...

  • larowlan committed 71b0922a on 11.0.x
    Issue #3368537 by catch, TwoD, alexpott, cmlara, sboden: Ensure trailing...
larowlan’s picture

Version: 11.x-dev » 10.3.x-dev

Committed to 11.x and backported to 10.4.x and 11.0.x
Waiting for a second opinion about 10.3.x, leaving at RTBC

alexpott’s picture

@larowlan thanks for the commit. We have decent evidence this is behind some of the original reports on #3456244: Normalize the incoming path with urldecode directly in RouteProvider and will fix a possible regression in 10.3.x

  • alexpott committed 3974f844 on 10.3.x authored by larowlan
    Issue #3368537 by catch, TwoD, alexpott, cmlara, sboden: Ensure trailing...
alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Discussed with @catch and @larowlan - we agreed to backport this to 10.3.x

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

wim leers’s picture

WOW, what a crazy edge case! 🤯