Drupal\Core\Cache\DatabaseBackend->get() can raise an "Undefined array key" error.
------ Original report ------
Sometimes I get the following in watchdog when saving a file to s3. The strange thing is that the file is uploaded.
Warning: Undefined array key "s3fs:uri:s3://attachments/test.pdf" in Drupal\Core\Cache\DatabaseBackend->getMultiple() (regel 121 van /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php)
#0 /data/code/project/www/core/includes/bootstrap.inc(347): _drupal_error_handler_real(2, 'Undefined array...', '/data/code/jobb...', 121)
#1 /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php(121): _drupal_error_handler(2, 'Undefined array...', '/data/code/jobb...', 121)
#2 /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php(92): Drupal\Core\Cache\DatabaseBackend->getMultiple(Array, false)
#3 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1391): Drupal\Core\Cache\DatabaseBackend->get('s3fs:uri:s3://a...')
#4 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1362): Drupal\s3fs\StreamWrapper\S3fsStream->readCache('s3://attachment...')
#5 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1283): Drupal\s3fs\StreamWrapper\S3fsStream->getS3fsObject('s3://attachment...')
#6 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1098): Drupal\s3fs\StreamWrapper\S3fsStream->stat('s3://attachment...')
#7 [internal function]: Drupal\s3fs\StreamWrapper\S3fsStream->url_stat('s3://attachment...', 6)
#8 /data/code/project/www/core/lib/Drupal/Core/File/FileSystem.php(559): file_exists('s3://attachment...')
#9 /data/code/project/www/modules/contrib/s3fs/src/S3fsFileService.php(429): Drupal\Core\File\FileSystem->getDestinationFilename('s3://attachment...', 0)
#10 /data/code/project/www/core/modules/file/src/Upload/FileUploadHandler.php(193): Drupal\s3fs\S3fsFileService->getDestinationFilename('s3://attachment...', 0)
#11 /data/code/project/www/core/modules/file/file.module(846): Drupal\file\Upload\FileUploadHandler->handleFileUpload(Object(Drupal\file\Upload\FormUploadedFile), Array, 's3://attachment...', 0)
#12 /data/code/project/www/core/modules/file/file.module(722): file_save_upload('field_objection...', Array, 's3://attachment...', NULL, 0)
#13 /data/code/project/www/core/modules/file/file.module(1447): _file_save_upload_from_form(Array, Object(Drupal\Core\Form\FormState))
#14 /data/code/project/www/core/modules/file/src/Element/ManagedFile.php(76): file_managed_file_save_upload(Array, Object(Drupal\Core\Form\FormState))
#15 /data/code/project/www/modules/custom/project/src/Plugin/Field/FieldWidget/AdditionsFileWidget.php(324): Drupal\file\Element\ManagedFile::valueCallback(Array, Array, Object(Drupal\Core\Form\FormState))
#16 [internal function]: Drupal\project\Plugin\Field\FieldWidget\AdditionsFileWidget::value(Array, Array, Object(Drupal\Core\Form\FormState))
#17 /data/code/project/www/core/lib/Drupal/Core/Form/FormBuilder.php(1268): call_user_func_array(Array, Array)
Steps to reproduce
$cache = \Drupal::cache();
$cache->set('test1', 'lowercase');
$cache->get('test1'); // Good here
$cache->get('test1 '); // The trailing space causes an issues because mysql ignores trailing spaces on primary keys and will return a result of cid 'test1' instead of 'test1 '
Proposed resolution
When MySQL queries with a condition on a varchar, and the string has a trailing space, MySQL will return a result where there is a value for the same string without a trailing space. In the database cache backend, this creates a mis-match between the passed in cache IDs and the returned cache ID, resulting in a PHP warning.
When we normalize cache IDs, we avoid hashing ASCII cache IDs and send the literal. When the cache ID has a trailing space, we can hash that too, then it'll genuinely be unique (i.e. return a cache miss).
Test only job: https://git.drupalcode.org/project/drupal/-/jobs/2310869
Issue fork drupal-3368537
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3368537-undefined-array-key
changes, plain diff MR !9030
- 3456244-11x-dont-trim-path
changes, plain diff MR !9045
Comments
Comment #2
cmlaraI'm going to send this over to Drupal Core as this error itself would appear to originate from the cache system.
If we could see the file URI that would be helpful for the core developers.
I was able to reproduce this with the following code (going off my knowledge of database primary key limitations for mysql) which may or may not be the cause of the original post
https://dev.mysql.com/doc/refman/5.7/en/char.html for documentation on trailing pad characters and character primary keys.
@sboden Note that even if this is space related that even if core fixes this we will still not support paths with trailing spaces until 4.x at the earliest, see #2833951: Issue with directories and trailing whitespaces. While we have work to do in s3fs, the cache system is suppose to handle converting strings into a form that can be stored/retrieved so a trailing slash in a cache::get() is not to my knowledge a fault in s3fs.
Comment #3
twodWe get the same type of error when page cache stores the results of a view with an exposed filter of some value, and you filter again for the same string with a different combination of uppercase/lowercase characters.
Page cache generated a cache id with the first search string and stored that in the cache table via the database backend.
The next time page cache asks for the same string (only case differs) the database backend performs a case-insensitive query and returns the original item, which has a different cache id compared to what it put in the mapping array.
IIRC MariaDB, MySQL and SQLite all do case-insensitive queries by default, unsure about PostgreSQL, so maybe it would be easiest to make the database backend just lowercase the cids when normalizing them.
Other cache backends, like memory, will of course treat these as different cache ids. The interface does not explicitly mention how this is intended to work so maybe we can get away with that?
Comment #4
twodThere is indeed a test which validates that all cache backends have case-sensitive cache ids in
GenericCacheBackendUnitTestBase, but what makes it case-[in]sensitive is (obvious in hind-sight), the selected collation for the cid column.We were using utf8mb4_sv_0900_ai_ci, and switching to utf8mb4_sv_0900_as_cs for just that column on the cache tables fixes the issue with notices for us.
Comment #5
alexpottCan people who have the problem originally reported here confirm the following details:
select cid from cache_page limit 1;and then use the cid returned to a select where you have changed the case of the cidselect cid from cache_page where cid = 'cid_case_changed';show create table cache_page;(mysql) or the CLI commandpg_dump --table cache_page --schema-only(postgres)For the create table I get
And although the collation for the table is utf8mb4_0900_ai_ci the queries are case-sensitive.
That's because the cid column has it's pwn collation of ascii_bin.
Comment #7
catchThe trailing space issue is easy to reproduce without needing any special collation, i.e. the steps given in #2. I've pushed some test coverage that should show it on gitlab too hopefully.
Comment #9
catchHere we go:
https://git.drupalcode.org/project/drupal/-/jobs/2308350
Comment #10
catchAnd pushed a separate commit with a suggested fix - comment could probably use some work, I couldn't find MySQL docs specific to treatment of trailing spaces in indexes as opposed to comparisons in general.
Major because this is potentially contributing to the 404 issue reported in #3456244: Normalize the incoming path with urldecode directly in RouteProvider.
Comment #11
cmlarahttps://dev.mysql.com/doc/refman/5.7/en/char.html
Comment #12
catchComment #13
catchComment #16
alexpottThis looks great now. Nice reviews @cmlara and nice MR @catch.
Comment #17
catchOh good spot on the whitespace, I briefly noticed it but thought it was my eyes playing tricks on me instead of an actual bug :(
Retitling for the actual problem we're fixing.
Comment #21
larowlanCommitted to 11.x and backported to 10.4.x and 11.0.x
Waiting for a second opinion about 10.3.x, leaving at RTBC
Comment #22
alexpott@larowlan thanks for the commit. We have decent evidence this is behind some of the original reports on #3456244: Normalize the incoming path with urldecode directly in RouteProvider and will fix a possible regression in 10.3.x
Comment #24
alexpottDiscussed with @catch and @larowlan - we agreed to backport this to 10.3.x
Comment #27
wim leersWOW, what a crazy edge case! 🤯