There is no validation of encoding on any of the configuration object types, which throws unhandled exception.

Problem/Motivation

Configuration system handles look up of objects defined in Drupal.
Objects are stored in "config" database table, with object stored in "name" field (eg node.type.article, views.view.articles, user.role.anonymous, etc)

- "name" field is defined as "varchar_ascii" in Drupal, and varchar(255) with collation ascii_general_ci in Database
- "name" is used to look up routes, node types, view names, user roles, etc etc

To resolve URL "node/add/article" route is matched to node/add/{node_type}
readMultiple() is called in core/lib/Drupal/Core/Config/DatabaseStorage.php to check configuration object {node_type} exists

There is no validation of encoding on any of the configuration object types, which throws unhandled exception:

The website encountered an unexpected error. Try again later.

Drupal\Core\Database\DatabaseExceptionWrapper: SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (ascii_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '=': SELECT "name", "data" FROM "config" WHERE "collection" = :collection AND "name" IN ( :names__0 ); Array ( [:collection] => [:names__0] => node.type.хакер ) in Drupal\Core\Config\DatabaseStorage->readMultiple() (line 110 of core/lib/Drupal/Core/Config/DatabaseStorage.php).

This leads to unnecessary hits of database which can lead to server outage.
The error message is not handled and white screen of death is displayed.

This issue affects all routes of configuration objects, example:

- /node/add/öüä
- /media/add/öüä
- /admin/structure/views/view/öüä

Steps to reproduce

Navigate to a route and insert non-ASCII values into URL:
- /node/add/тест
- /media/add/тест
- /admin/structure/views/view/тест
- /views/ajax?view_name=view_тест&view_display_id=page_1&_drupal_ajax=1

Proposed resolution

Allow param converters to add a route requirement for the param they are converting and use this to add a regex that permits only printable ASCII characters for config entities.

Remaining tasks

User interface changes

If you change a URL and use an invalid name for a config entity you will get a 404 instead of an exception.

Introduced terminology

N/a

API changes

New interface for param converters to implement if they want to add a route requirement \Drupal\Core\ParamConverter\ParamConverterRequirementsInterface

Data model changes

N/a

Release notes snippet

N/a

Issue fork drupal-3475540

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

jannakha created an issue. See original summary.

jannakha changed the visibility of the branch 3475540-unhandled-exception-ascii to hidden.

jannakha’s picture

Version: 11.0.x-dev » 11.x-dev
jannakha’s picture

Status: Active » Needs review

before I start writing tests - please review/comment/provide suggestions to the solution - MR!9542!
Thanks!

jannakha’s picture

Issue summary: View changes
cilefen’s picture

I don't believe this is a bug and I think it is the caller's responsibility to filter input. By that I mean that each use-case needs fixing separately.

What does this have to do with security?

vladimiraus’s picture

cilefen’s picture

Indeed but I would say that particular one is a node system bug. Any code that loads configuration objects should understand the rules: that their keys are ASCII only.

jannakha’s picture

@cilefen
Re: "caller's responsibility to filter input" - it's a lookup into config database which doesn't validate non-ascii input, there are levels of abstractions between config object and node/webform/view/or other caller which will make it quite hard to validate it on per-caller basis.

* there's a case-by-case fix which doesn't fix much: https://www.drupal.org/project/drupal/issues/3457963

It was discovered during a pen test and flagged as a potential security issue since:
- This leads to unnecessary hits of database which can lead to server outage.
- The error message is not handled and white screen of death is displayed.

examples:
https://events.drupal.org/node/add/тест
https://www.cyber.gov.au/node/add/тест

Stack trace of node/add/{node_type} and each other config has its own trace:

#0 /web/core/lib/Drupal/Core/Database/Connection.php(883): Drupal\mysql\Driver\Database\mysql\ExceptionHandler->handleExecutionException(Object(PDOException), Object(Drupal\Core\Database\StatementWrapperIterator), Array, Array)
#1 /web/core/lib/Drupal/Core/Config/DatabaseStorage.php(119): Drupal\Core\Database\Connection->query('SELECT [name], ...', Array, Array)
#2 /web/core/lib/Drupal/Core/Config/CachedStorage.php(95): Drupal\Core\Config\DatabaseStorage->readMultiple(Array)
#3 /web/core/lib/Drupal/Core/Config/ConfigFactory.php(165): Drupal\Core\Config\CachedStorage->readMultiple(Array)
#4 /web/core/lib/Drupal/Core/Config/ConfigFactory.php(136): Drupal\Core\Config\ConfigFactory->doLoadMultiple(Array)
#5 /web/core/lib/Drupal/Core/Config/Entity/ConfigEntityStorage.php(181): Drupal\Core\Config\ConfigFactory->loadMultiple(Array)
#6 /web/core/lib/Drupal/Core/Entity/EntityStorageBase.php(312): Drupal\Core\Config\Entity\ConfigEntityStorage->doLoadMultiple(Array)
#7 /web/core/lib/Drupal/Core/Entity/EntityRepository.php(183): Drupal\Core\Entity\EntityStorageBase->loadMultiple(Array)
#8 /web/core/lib/Drupal/Core/Entity/EntityRepository.php(175): Drupal\Core\Entity\EntityRepository->getCanonicalMultiple('node_type', Array, Array)
#9 /web/core/lib/Drupal/Core/ParamConverter/EntityConverter.php(134): Drupal\Core\Entity\EntityRepository->getCanonical('node_type', '\xD0\xB4\xD1\x8B\xD0\xBB\xD0\xB2\xD0\xBE\xD0\xB0\xD0\xB4\xD1...', Array)
#10 /web/core/lib/Drupal/Core/ParamConverter/ParamConverterManager.php(100): Drupal\Core\ParamConverter\EntityConverter->convert('\xD0\xB4\xD1\x8B\xD0\xBB\xD0\xB2\xD0\xBE\xD0\xB0\xD0\xB4\xD1...', Array, 'node_type', Array)
#11 /web/core/lib/Drupal/Core/Routing/Enhancer/ParamConversionEnhancer.php(45): Drupal\Core\ParamConverter\ParamConverterManager->convert(Array)
#12 /web/core/lib/Drupal/Core/Routing/Router.php(270): Drupal\Core\Routing\Enhancer\ParamConversionEnhancer->enhance(Array, Object(Symfony\Component\HttpFoundation\Request))
#13 /web/core/lib/Drupal/Core/Routing/Router.php(150): Drupal\Core\Routing\Router->applyRouteEnhancers(Array, Object(Symfony\Component\HttpFoundation\Request))
#14 /web/core/lib/Drupal/Core/Routing/AccessAwareRouter.php(90): Drupal\Core\Routing\Router->matchRequest(Object(Symfony\Component\HttpFoundation\Request))
#15 /vendor/symfony/http-kernel/EventListener/RouterListener.php(105): Drupal\Core\Routing\AccessAwareRouter->matchRequest(Object(Symfony\Component\HttpFoundation\Request))
#16 [internal function]: Symfony\Component\HttpKernel\EventListener\RouterListener->onKernelRequest(Object(Symfony\Component\HttpKernel\Event\RequestEvent), 'kernel.request', Object(Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher))
jannakha’s picture

Assigned: jannakha » Unassigned
cilefen’s picture

Maybe throw a specific exception for this?

cilefen’s picture

Even an InvalidArgument exception explaining that only ASCII is allowed would be clear to callers.

smustgrave’s picture

+1 for throwing an exception mentioned in #13

jannakha’s picture

so I've added the exception at Drupal\Core\Config\DatabaseStorage->readMultiple() level, but it's not being caught anywhere in the stack and each stack trace is different for different config types (see below).

Please advise/recommend where/how exception should be thrown/caught.

IMHO handling it on Drupal\Core\Config\DatabaseStorage->readMultiple() without additional exceptions is sufficient as it's such an edge case which is used for exploitation/attack only (there's already check for if (empty($names)) which doesn't throw any exceptions).

Here's a stack trace for: /node/add/тест

The website encountered an unexpected error. Try again later.

InvalidArgumentException: The array contains invalid values. in Drupal\Core\Config\DatabaseStorage->readMultiple() (line 116 of core/lib/Drupal/Core/Config/DatabaseStorage.php).
Drupal\Core\Config\CachedStorage->readMultiple(Array) (Line: 165)
Drupal\Core\Config\ConfigFactory->doLoadMultiple(Array) (Line: 136)
Drupal\Core\Config\ConfigFactory->loadMultiple(Array) (Line: 181)
Drupal\Core\Config\Entity\ConfigEntityStorage->doLoadMultiple(Array) (Line: 312)
Drupal\Core\Entity\EntityStorageBase->loadMultiple(Array) (Line: 183)
Drupal\Core\Entity\EntityRepository->getCanonicalMultiple('node_type', Array, Array) (Line: 175)
Drupal\Core\Entity\EntityRepository->getCanonical('node_type', 'тест', Array) (Line: 134)
Drupal\Core\ParamConverter\EntityConverter->convert('тест', Array, 'node_type', Array) (Line: 100)
Drupal\Core\ParamConverter\ParamConverterManager->convert(Array) (Line: 45)
Drupal\Core\Routing\Enhancer\ParamConversionEnhancer->enhance(Array, Object) (Line: 270)
Drupal\Core\Routing\Router->applyRouteEnhancers(Array, Object) (Line: 150)
Drupal\Core\Routing\Router->matchRequest(Object) (Line: 90)
Drupal\Core\Routing\AccessAwareRouter->matchRequest(Object) (Line: 105)
Symfony\Component\HttpKernel\EventListener\RouterListener->onKernelRequest(Object, 'kernel.request', Object)
call_user_func(Array, Object, 'kernel.request', Object) (Line: 111)

here's a stack for: /admin/structure/views/view/тест

The website encountered an unexpected error. Try again later.

InvalidArgumentException: The array contains invalid values. in Drupal\Core\Config\DatabaseStorage->readMultiple() (line 116 of core/lib/Drupal/Core/Config/DatabaseStorage.php).
Drupal\Core\Config\CachedStorage->readMultiple(Array) (Line: 165)
Drupal\Core\Config\ConfigFactory->doLoadMultiple(Array) (Line: 136)
Drupal\Core\Config\ConfigFactory->loadMultiple(Array) (Line: 181)
Drupal\Core\Config\Entity\ConfigEntityStorage->doLoadMultiple(Array) (Line: 312)
Drupal\Core\Entity\EntityStorageBase->loadMultiple(Array) (Line: 487)
Drupal\Core\Config\Entity\ConfigEntityStorage->loadMultipleOverrideFree(Array) (Line: 478)
Drupal\Core\Config\Entity\ConfigEntityStorage->loadOverrideFree('тест') (Line: 80)
Drupal\Core\ParamConverter\AdminPathConfigEntityConverter->convert('тест', Array, 'view', Array) (Line: 64)
Drupal\views_ui\ParamConverter\ViewUIConverter->convert('тест', Array, 'view', Array) (Line: 100)
Drupal\Core\ParamConverter\ParamConverterManager->convert(Array) (Line: 45)
Drupal\Core\Routing\Enhancer\ParamConversionEnhancer->enhance(Array, Object) (Line: 270)
Drupal\Core\Routing\Router->applyRouteEnhancers(Array, Object) (Line: 150)
Drupal\Core\Routing\Router->matchRequest(Object) (Line: 90)
Drupal\Core\Routing\AccessAwareRouter->matchRequest(Object) (Line: 105)
Symfony\Component\HttpKernel\EventListener\RouterListener->onKernelRequest(Object, 'kernel.request', Object)
call_user_func(Array, Object, 'kernel.request', Object) (Line: 111)

Here's stack trace for a contrib module: /webform/тест:

The website encountered an unexpected error. Try again later.

InvalidArgumentException: The array contains invalid values. in Drupal\Core\Config\DatabaseStorage->readMultiple() (line 116 of core/lib/Drupal/Core/Config/DatabaseStorage.php).
Drupal\Core\Config\CachedStorage->readMultiple(Array) (Line: 165)
Drupal\Core\Config\ConfigFactory->doLoadMultiple(Array) (Line: 136)
Drupal\Core\Config\ConfigFactory->loadMultiple(Array) (Line: 181)
Drupal\Core\Config\Entity\ConfigEntityStorage->doLoadMultiple(Array) (Line: 312)
Drupal\Core\Entity\EntityStorageBase->loadMultiple(Array) (Line: 183)
Drupal\Core\Entity\EntityRepository->getCanonicalMultiple('webform', Array, Array) (Line: 175)
Drupal\Core\Entity\EntityRepository->getCanonical('webform', 'тест', Array) (Line: 134)
Drupal\Core\ParamConverter\EntityConverter->convert('тест', Array, 'webform', Array) (Line: 100)
Drupal\Core\ParamConverter\ParamConverterManager->convert(Array) (Line: 45)
Drupal\Core\Routing\Enhancer\ParamConversionEnhancer->enhance(Array, Object) (Line: 270)
Drupal\Core\Routing\Router->applyRouteEnhancers(Array, Object) (Line: 150)
Drupal\Core\Routing\Router->matchRequest(Object) (Line: 90)
Drupal\Core\Routing\AccessAwareRouter->matchRequest(Object) (Line: 105)
Symfony\Component\HttpKernel\EventListener\RouterListener->onKernelRequest(Object, 'kernel.request', Object)
call_user_func(Array, Object, 'kernel.request', Object) (Line: 111)
cilefen’s picture

Indeed, what I am suggesting is a developer experience improvement that would highlight the precise problem of invalid input to the function. It is up to the callers to provide valid input in the first place or to handle the exception.

smustgrave’s picture

Issue tags: +Needs tests

Will also need test coverage.

cafuego’s picture

I will add a dissenting opinion, just to be helpful. Rather than be limited to us-ascii, Drupal should handle UTF8 throughout. That way, users and developers would be able to properly use Drupal in their local language.

cilefen’s picture

Title: Unhandled exception when looking up a configuration objects by name which contains non-ASCII characters » Throw an understandable exception when there is an attempt to load config entities with disallowed characters
Category: Bug report » Task
Status: Needs review » Needs work
Issue tags: +DX (Developer Experience)

@cafuego That is a separate matter. The reasons for ASCII are from #1923406: Use ASCII character set on alphanumeric fields so we can index all 255 characters and I suggest creating a new issue if you want to challenge those decisions in a way that works across all supported database engines.

jannakha’s picture

patch of MR9542 @ 265e30ba9c9e65396211044895ec0c1e50c52449 for temporary fix

arunkumark made their first commit to this issue’s fork.

arunkumark’s picture

Status: Needs work » Needs review

Added test for the Special character and additional check added into the Configuration exist method.

herved’s picture

Interesting, I found some incoming requests to our production going to /filter/tips/。。。, which is accessible as anonymous.
This throws such errors as well.

cilefen’s picture

cilefen’s picture

Status: Needs review » Needs work

This needs work based on the plan for this issue, which is to throw an understandable exception.

tobiasb’s picture

Such request are done by spambots, which triggers a false-positive in your app-monitoring software. So I am happy, with a notfound exception, but not with any other which triggers a false-positive problem.

cilefen’s picture

See comment #16. I believe this component should throw a specific exception when there is invalid input and the callers must be fixed to send only valid input.

tobiasb’s picture

@cilefen

The caller are spambots. Spambots does not do what we want. ;-)

I mean do we have issue like:

Why does this not work?

$config = \Drupal::configFactory()->getEditable('Über.settings');

or

drush cget 'Über.settings'

The patch does what would happen, when you only use ASCII or postgres (?). Page 404, so I am happy. :D

The exception DatabaseExceptionWrapper in the new test are not thrown or should not. So try/catch can be removed.

I would replace it with just a request to /node/add/Über which should return a 404.
With postgres this should already throw a not found exception.

And because we do not use native typehints to force config name is string. the $value should not be null in mb_check_encoding (deprecated).

cilefen’s picture

Configuration object names must be ASCII only. Other input is invalid. In my opinion the better developer experience is to throw an exception when there is invalid input than to silently do nothing. Calling functions should handle the exception.

cilefen’s picture

The only way to have a consistent experience across database engines is to throw an exception on invalid input.

jannakha’s picture

If exception is to be thrown, consider:
where the exception should be thrown:
- at DB level
- at config level
- at entity level
- at routing level

where exception should be handled:
- at config level
- at entity level
- at routing level
- at request processing level

In the end, exception would be visible to the client as 404 (as route is not found)

cilefen’s picture

Not all situations would be 404s.

oily’s picture

I think that a simple InvalidArgumentException that returns a short, simple explanatory message to the screen is all that is needed.

oily’s picture

Issue summary: View changes
oily’s picture

Issue summary: View changes
oily’s picture

@jannakha I would appreciate it if you can take a look at the kernel test you created and see if it can be made to fail then pass when the MR is applied.

Screenshot of the InvalidArgumentException

Unlike the DatabaseStorageWrapper exception no database query is involved. Instead, the InvalidArgumentException is thrown if validation of the 'name' argument fails. The validation does not rely on or require a database query.

oily’s picture

If anyone believes that this issue should be fixed by presenting a 404 or other 4xx status code to the user with custom message and possibly also a log message, please create a child issue.

I would suggest the following steps:

  1. Create a unique Exception e.g. one that extends the InvalidArgumentException class
  2. Create an event listener that listens for the unique Exception to be thrown
  3. Redirect to a 4xx status code page and display a message
  4. Possibly add a log entry also
oily’s picture

I have edited the kernel test so it now passes and created instead Functional test coverage. I have reverted DatabaseStorage.php to pre-issue version and found that the Functional test fails. Then checking out the latest version of that file (the latest MR) the tests all pass. So I am flagging 'Needs Review'

oily’s picture

Status: Needs work » Needs review
oily’s picture

Issue summary: View changes
smustgrave’s picture

Status: Needs review » Needs work

Appears to still have open threads to address.

oily’s picture

@smustgrave I have added Exceptions to the read and exists methods and checked for not NULL.

oily’s picture

After throwing InvalidArgumentException for the DatabaseStorage::exists and DatabaseStorage::read methods, I found that the former breaks an existing kernel assertion/ test. So I have reverted to the existing DatabaseExceptionWrapper class to fix the test. The tests are back to green. Setting back to 'Needs review'.

oily’s picture

Status: Needs work » Needs review
oily’s picture

Issue tags: -Needs tests
oily’s picture

marcin dębicki’s picture

Patch of MR9542 @ 4090fac2899a7df61a6db7ebe553ad39ecdd3a8c for temporary fix

smustgrave’s picture

Status: Needs review » Needs work

Small update around the test, rest looks good!

vladimiraus’s picture

#47 works well

tobiasb’s picture

Status: Needs work » Needs review
vladimiraus’s picture

Status: Needs review » Needs work

We do not need to use use function... for global functions.
Was that suggested by AI?

tobiasb’s picture

Status: Needs work » Needs review
smustgrave’s picture

Status: Needs review » Needs work

Appears to have unresolved threads.

jannakha’s picture

just curious,
- why https://www.cyber.gov.au/node/add/testXYZ returns 404, but
- why https://www.cyber.gov.au/node/add/тест would return unhandled exception?
should it just return 404?

Patch throws an exception, but it has to be handled somewhere - at the moment any Drupal website shows white screen of death on 'node/add/тест'.

usingsession’s picture

I didn't find any problems while testing the patch (MR). Everything works.

Regarding this:

just curious,
- why https://www.cyber.gov.au/node/add/testXYZ returns 404, but
- why https://www.cyber.gov.au/node/add/тест would return unhandled exception?
should it just return 404?

My opinion is that in this case, it is justified, since we cannot check that this configuration does not exist in a database table to return a 404. But we know for sure that according to Drupal, the configuration name must contain only [^a-z0-9_]+ (at least from the administrative panel we cannot create it with a different name). Therefore, I believe that there should be validation of configuration objects, namely their names, and the patch only fixes the result of the lack of this check.

Since the main responsibility of DatabaseStorage is to perform queries rather than validate the encoding of parameter, especially considering that if the database uses the utf8mb4_unicode_ci collation, there will be no issues with the query.

jannakha’s picture

Status: Needs work » Needs review
Issue tags: +user experience

other question:
What will be a suggested way to handle those exceptions to actually render a 404 page?
White screen of death is not a great user experience and bad look for a website reputation:

Example:

Temporarily Unavailable
The website that you're trying to reach is having technical difficulties and is currently unavailable.

We are aware of the issue and are working hard to fix it. Thank you for your patience.

Full exception stack is different for different types of config objects (nodes, views, etc):

View:

The website encountered an unexpected error. Try again later.

InvalidArgumentException: The name views.view.тест of the configuration object contains invalid (non-ASCII) characters. in Drupal\Core\Config\DatabaseStorage::Drupal\Core\Config\{closure}() (line 123 of core/lib/Drupal/Core/Config/DatabaseStorage.php).
array_walk(Array, Object) (Line: 121)
Drupal\Core\Config\DatabaseStorage->readMultiple(Array) (Line: 95)
Drupal\Core\Config\CachedStorage->readMultiple(Array) (Line: 165)
Drupal\Core\Config\ConfigFactory->doLoadMultiple(Array) (Line: 136)
Drupal\Core\Config\ConfigFactory->loadMultiple(Array) (Line: 163)
Drupal\Core\Config\Entity\ConfigEntityStorage->doLoadMultiple(Array) (Line: 313)
Drupal\Core\Entity\EntityStorageBase->loadMultiple(Array) (Line: 480)
Drupal\Core\Config\Entity\ConfigEntityStorage->loadMultipleOverrideFree(Array) (Line: 471)
Drupal\Core\Config\Entity\ConfigEntityStorage->loadOverrideFree('тест') (Line: 80)
Drupal\Core\ParamConverter\AdminPathConfigEntityConverter->convert('тест', Array, 'view', Array) (Line: 64)
Drupal\views_ui\ParamConverter\ViewUIConverter->convert('тест', Array, 'view', Array) (Line: 100)
Drupal\Core\ParamConverter\ParamConverterManager->convert(Array) (Line: 45)
Drupal\Core\Routing\Enhancer\ParamConversionEnhancer->enhance(Array, Object) (Line: 244)
Drupal\Core\Routing\Router->applyRouteEnhancers(Array, Object) (Line: 124)
Drupal\Core\Routing\Router->matchRequest(Object) (Line: 89)
Drupal\Core\Routing\AccessAwareRouter->matchRequest(Object) (Line: 101)
Symfony\Component\HttpKernel\EventListener\RouterListener->onKernelRequest(Object, 'kernel.request', Object) (Line: 246)
Symfony\Component\EventDispatcher\EventDispatcher::Symfony\Component\EventDispatcher\{closure}(Object, 'kernel.request', Object) (Line: 206)
Symfony\Component\EventDispatcher\EventDispatcher->callListeners(Array, 'kernel.request', Object) (Line: 56)
Symfony\Component\EventDispatcher\EventDispatcher->dispatch(Object, 'kernel.request') (Line: 159)
Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object, 1) (Line: 76)
Symfony\Component\HttpKernel\HttpKernel->handle(Object, 1, 1) (Line: 53)
Drupal\Core\StackMiddleware\Session->handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\KernelPreHandle->handle(Object, 1, 1) (Line: 28)
Drupal\Core\StackMiddleware\ContentLength->handle(Object, 1, 1) (Line: 201)
Drupal\page_cache\StackMiddleware\PageCache->fetch(Object, 1, 1) (Line: 138)
Drupal\page_cache\StackMiddleware\PageCache->lookup(Object, 1, 1) (Line: 87)
Drupal\page_cache\StackMiddleware\PageCache->handle(Object, 1, 1) (Line: 61)
Drupal\advban\AdvbanMiddleware->handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\NegotiationMiddleware->handle(Object, 1, 1) (Line: 53)
Drupal\Core\StackMiddleware\AjaxPageState->handle(Object, 1, 1) (Line: 123)
Drupal\cloudflare\CloudFlareMiddleware->handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\StackedHttpKernel->handle(Object, 1, 1) (Line: 715)
Drupal\Core\DrupalKernel->handle(Object) (Line: 19)

Node:

The website encountered an unexpected error. Try again later.

InvalidArgumentException: The name node.type.тест of the configuration object contains invalid (non-ASCII) characters. in Drupal\Core\Config\DatabaseStorage::Drupal\Core\Config\{closure}() (line 123 of core/lib/Drupal/Core/Config/DatabaseStorage.php).
array_walk(Array, Object) (Line: 121)
Drupal\Core\Config\DatabaseStorage->readMultiple(Array) (Line: 95)
Drupal\Core\Config\CachedStorage->readMultiple(Array) (Line: 165)
Drupal\Core\Config\ConfigFactory->doLoadMultiple(Array) (Line: 136)
Drupal\Core\Config\ConfigFactory->loadMultiple(Array) (Line: 163)
Drupal\Core\Config\Entity\ConfigEntityStorage->doLoadMultiple(Array) (Line: 313)
Drupal\Core\Entity\EntityStorageBase->loadMultiple(Array) (Line: 183)
Drupal\Core\Entity\EntityRepository->getCanonicalMultiple('node_type', Array, Array) (Line: 175)
Drupal\Core\Entity\EntityRepository->getCanonical('node_type', 'тест', Array) (Line: 134)
Drupal\Core\ParamConverter\EntityConverter->convert('тест', Array, 'node_type', Array) (Line: 100)
Drupal\Core\ParamConverter\ParamConverterManager->convert(Array) (Line: 45)
Drupal\Core\Routing\Enhancer\ParamConversionEnhancer->enhance(Array, Object) (Line: 244)
Drupal\Core\Routing\Router->applyRouteEnhancers(Array, Object) (Line: 124)
Drupal\Core\Routing\Router->matchRequest(Object) (Line: 89)
Drupal\Core\Routing\AccessAwareRouter->matchRequest(Object) (Line: 101)
Symfony\Component\HttpKernel\EventListener\RouterListener->onKernelRequest(Object, 'kernel.request', Object) (Line: 246)
Symfony\Component\EventDispatcher\EventDispatcher::Symfony\Component\EventDispatcher\{closure}(Object, 'kernel.request', Object) (Line: 206)
Symfony\Component\EventDispatcher\EventDispatcher->callListeners(Array, 'kernel.request', Object) (Line: 56)
Symfony\Component\EventDispatcher\EventDispatcher->dispatch(Object, 'kernel.request') (Line: 159)
Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object, 1) (Line: 76)
Symfony\Component\HttpKernel\HttpKernel->handle(Object, 1, 1) (Line: 53)
Drupal\Core\StackMiddleware\Session->handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\KernelPreHandle->handle(Object, 1, 1) (Line: 28)
Drupal\Core\StackMiddleware\ContentLength->handle(Object, 1, 1) (Line: 201)
Drupal\page_cache\StackMiddleware\PageCache->fetch(Object, 1, 1) (Line: 138)
Drupal\page_cache\StackMiddleware\PageCache->lookup(Object, 1, 1) (Line: 87)
Drupal\page_cache\StackMiddleware\PageCache->handle(Object, 1, 1) (Line: 61)
Drupal\advban\AdvbanMiddleware->handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\NegotiationMiddleware->handle(Object, 1, 1) (Line: 53)
Drupal\Core\StackMiddleware\AjaxPageState->handle(Object, 1, 1) (Line: 123)
Drupal\cloudflare\CloudFlareMiddleware->handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\StackedHttpKernel->handle(Object, 1, 1) (Line: 715)
Drupal\Core\DrupalKernel->handle(Object) (Line: 19)

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

smustgrave’s picture

Status: Needs review » Needs work

This one needs a rebase.

tobiasb’s picture

Status: Needs work » Needs review

Rebased.

dries made their first commit to this issue’s fork.

dries’s picture

I ran into this issue on my personal blog, so I took a closer look and helped review and test it. Disclaimer: I used Claude Code, but reviewed everything manually.

I just committed a change because testExceptionIsThrownWithNonAsciiCharacters contains dead code. See diff.

PHPUnit’s expectException() and expectExceptionMessage() are setters, so each call overwrites the previous one. In the original test, they are called twice, which means only the second pair is actually asserted. I verified this manually by replacing the first expectExceptionMessage() with a random string. The test still passes.

$this->expectException(\InvalidArgumentException::class);          // line 157 — set
$this->expectExceptionMessage('The config name part of the URI...'); // line 158 — set

// We cannot test multiple exceptions at once.
$this->expectException(\InvalidArgumentException::class);          // line 161 — overwrites 157
$this->expectExceptionMessage('The name config.testóú...');        // line 163 — overwrites 158

As a result, read() with non-ASCII input is never actually tested, only readMultiple().

My fix splits the test into two separate methods: one for read() and one for readMultiple(), so each exception is properly asserted.

dries’s picture

I just committed another change to the tests: see diff.

The MR adds ASCII validation to read() and readMultiple(), but exists() also queries the config table by name and will hit the same collation error with non-ASCII input. In other words, the solution is incomplete.

The existing tests tried to show this by calling exists('config.testáéóú') and catching a DatabaseExceptionWrapper. However, this check was part of testExceptionIsThrownIfQueryFails, which drops the config table and recreates it with a broken schema. In that setup, exists() fails because the table is broken, not because of non-ASCII input, so the test was not actually verifying what it claimed.

So I moved the test into a dedicated test method that runs against a normal/correct config table, where the collation mismatch is the actual cause of the failure. After my commit, you can see that the tests still pass, meaning that we throw a DatabaseExceptionWrapper (i.e. exists() causes a collation error). We clearly isolated the problem, which I think is an improvement.

I am happy to work on a fix, but I have a question: should exists() return FALSE or throw an InvalidArgumentException?

Returning FALSE feels natural to me, but it would be inconsistent with read() and readMultiple(), which throw an exception. Most (all?) PHP methods that are of type exists() for lack of a better term (e.g. file_exists(), class_exists(), array_key_exists(), etc) return FALSE.

That said, it also feels weird to return FALSE when we don't actually go to the database ... So throwing InvalidArgumentException might actually the right call?

dries’s picture

One more commit: I added the same mb_check_encoding guard to exists() that read() and readMultiple() use. It now throws InvalidArgumentException, so I updated the isolated test accordingly.

@olly pointed out in #43 that this breaks existing tests, but that has been over a year, and there were other changes since. Let's see what happens ...

dries’s picture

Looks like the change in exists() works now ... 👍

This was discussed already (e.g. #28, #37, #54, #55) but the statusCodeEquals(500) feels a bit iffy because the correct response is almost certainly a 404, not a 500.

The route parameter goes through the parameter converter, hits config entity storage, throws the InvalidArgumentException, and returns a 500. Hence the statusCodeEquals(500) assert in one of the tests.

This could be handled at a higher level, i.e. do better parameter validation at the routing level. Per the previous comments that could belong in its own issue, or it could now be a relatively easy addition to this one.

dries’s picture

I did some more digging to think through the 404 vs 500 problem, and the actual bug I'm seeing on my site. Here are my current learnings and thinking.

Most config entity IDs use the machine_name data type (/^[a-z0-9_]+$/, defined in core.data_types.schema.yml), though some entity types use custom patterns (shortcut sets allow dashes, blocks allow dots, language entities use a langcode pattern with uppercase). Either way, all variants are ASCII-only.

These constraints are validated on write (the config entity validation enforces the schema constraints, and UI forms use them too). However, they are not validated during entity loading.

To fix the original issue, we could add a check in ConfigEntityStorage::doLoadMultiple() that skips non-ASCII IDs before they become config names. It's a 3-line change:

foreach ($ids as $id) {
  // Config entity IDs use ASCII-only patterns (machine_name, langcode,
  // etc). Non-ASCII IDs can never be valid and would cause a database
  // collation error if queried.
  if (!mb_check_encoding($id, 'ASCII')) {
    continue;
  }
  $names[] = $prefix . $id;
}

I tested it on my localhost and it results in a 404 on all 3 paths given in the issue summary (i.e. /node/add/öüä, /media/add/öüä, /admin/structure/views/view/öüä).

Since non-ASCII IDs can never be valid, they are skipped by the check, and result in an "entity not found". It's maybe a bit strange to silently skip over them, but it should be fine because of the validation that happens on write.

So, with this 3-line check, the entity repository returns NULL, the param converter throws ParamNotConvertedException, and the routing enhancer turns that into a 404. Now the behavior for /node/add/invalid-id and /node/add/öüä both result in a 404, rather than a 404 and 500 respectively. And because the check sits inside the storage layer, every caller (routing, REST, Drush, contrib) hits it automatically ...

I considered using the strict machine name pattern /^[a-z0-9_]+$/ instead because mb_check_encoding($id, 'ASCII') always felt a bit odd, but that would reject valid IDs from entity types with custom patterns. As explained in the second paragraph, entity IDs are not fully standardized. I also didn't see a way to validate IDs using a config entity specific validator. I contemplated loading the typed config manager etc, but that would add a lot of overhead in a hot code path. In absence, I concluded that the mb_check_encoding is probably the right common-denominator. It's fast, and in a way, the entity ID standard is ASCII-only.

This 3-line fix could be complementary to the DatabaseStorage validation already in this MR. The InvalidArgumentException in read(), readMultiple(), and exists() remains a safety net for code that calls config storage directly.

But it could also be instead of the DatabaseStorage validation changes we made. With the 3-line change, the safety net doesn't really trigger anymore. I'll leave that up to the entity system maintainers to decide as I'm not an expert on it.

I didn't commit these test changes, but I've added a patch to the comment if that helps any of you evaluate this. I'd be happy to commit it in case we want to include it and run the tests. I'd personally include it as it's only a 3-line change and it's directly related to the bug report. If not, I'm happy to create a separate issue for this too. Hope this helps!

jannakha’s picture

patch #65 seems like a good way to return a 404 error (rather than 500)
can it be added to MR?

dries’s picture

It could be included, but it would be good to get some extra eyes on it. Should it live in doLoadMultiple, or does it belong at a level higher? If so, we'd need to add the check in multiple places. Or should doLoadMultiple catch exceptions instead, and skip over those?

alexpott’s picture

@dries your comments lead me to ponder if maybe we could instead put your improvement in the routing system. What would be really neat is if \Drupal\Core\ParamConverter\ParamConverterManager::setRouteParameterConverters() could allow a param converter to set a regex requirement if one is not set and the param converter supported it. We'd need to add a new interface to support getting a regex but I think this should be possible. The param converter could then get the regex from config schema for config entities and from the field system for content entities.

alexpott’s picture

I also think making the suggested change in #65 is worth is as a defence in depth but ideally the routing system would catch this for us.

alexpott’s picture

Here's some sub-optimal code to set the route regex for a config entity type on a route.

            // The schema key for config entities is prefix + '.*'
            $schema_definition = \Drupal::service('config.typed')->getDefinition($config_entity_type->getConfigPrefix() . '.*');
            $id_key = $config_entity_type->getKey('id');
            $schema_definition = \Drupal::service('config.typed')->getDefinition($schema_definition['mapping'][$id_key]['type']);
            if (isset($schema_definition['constraints']['Regex']['pattern'])) {
              $route->setRequirement($bundle_entity_type_id, trim($schema_definition['constraints']['Regex']['pattern'], '/'));
            }

Just pasting it here in case someone has a go at implementing #68

alexpott’s picture

I've thrown up an MR with #68 mostly implemented. I've manually tested and run through the route building the debugger and this works quite nicely apart from for the multi-part IDs eg core.entity_form_display.*.*.* and where the ID is based on another config entity eg. editor.editor.* - I think we will be able to support both cases with a little work. See https://git.drupalcode.org/project/drupal/-/merge_requests/15283

dries’s picture

@alexpott: thanks for the quick response and code. I decided to write some test to help me learn about your proposed changes. I just pushed them to your branch. It includes a kernel test for getRouteRequirement() and a functional test that hits /node/add/öüä and asserts a 404. They pass on my localhost.

While writing the tests, I noticed one thing myself. Config entities with multi-wildcard schemas (like entity_view_mode with core.entity_view_mode.*.*) don't match the simple prefix.* lookup, so getRouteRequirement() returned NULL and their routes are unprotected. Since all config entity IDs are ASCII-only (this is what this entire issue is about), one option is to fall back to a generic ASCII pattern like [a-zA-Z0-9_.\-]+ when the specific schema pattern can't be determined. I included this in my commit but happy to revert if that is wrong, or a better solution emerges.

Also, the pattern extraction uses trim($pattern, $delimiter) then trim($pattern, '^$'). I wondered if we have regex with flags and it appears we do. In core.data_types.schema.yml we have pattern: '/^[a-z][a-z0-9+\-\.]*$/i', for example. trim only removes the leading / (the trailing character is i, not /). I replaced it with a preg_match that properly separates the delimiter, pattern body, and flags.

I can't say I fully understand your code yet, but I feel my additions are directionally correct.

dries’s picture

Just to summarize, we now have 3 layers going on (sorry for the scope creep!):

1. "Route requirements" that reject non-matching URLs at the router level. The most architecturally correct layer, in my opinion. I believe this catches 95%+ of the invalid IDs at the right level.

2. ConfigEntityStorage::doLoadMultiple() skips non-ASCII IDs before they reach config storage. Useful because not everything goes through routing (e.g. Drush, programmatic entity loads, etc). Plus, we might not always get route requirements? I must admit that I don't fully understand the multi-wildcard schemas, potential contrib gaps, etc in 1. This layer might be optional, but it probably doesn't hurt as it's just 3 lines of code that are pretty fast.

3. DatabaseStorage throws InvalidArgumentException for non-ASCII names in read(), readMultiple(), exists(). This is a good safety net for any code that calls config storage directly.

dries’s picture

Some great progress, Alex.

I've been digging into this more this morning, and I have some more thoughts based on what I've learned so far. I'm not an expert on the Entity system, so please take this with a grain of salt.

The config schema defines what a valid entity ID looks like (e.g. machine_name with /^[a-z0-9_]+$/). And the typed data system can already validate against it: calling $entity->getTypedData()->validate() on a config entity with an invalid ID produces the right violations. This is tested in ConfigEntityValidationTestBase::testInvalidMachineNameCharacters(). So far so good! :)

But this validation is only used during config import. Strangely, it does not seem to be called during save(), load(), or routing. Also, forms use the MachineName element, which is separate from the schema. If someone or something changes or overwrites the schema pattern, the form element doesn't know. It seems fragile that the MachineName validation is not necessarily the same as the entity ID validation. Lastly, save() only checks for empty IDs and length.

So, this all leads to the question: should the schema not be the single source of truth, and should it not be used everywhere?

For save: ConfigEntityStorage::save() could call $entity->getTypedData()->validate() before writing. The infrastructure already exists, it just needs to be wired up.

For forms: if ConfigEntityStorage::save() validated against the schema via $entity->getTypedData()->validate(), the MachineName element wouldn't need to do validation anymore (in this case)?

For load and routing: Alex's MR already reads the regex pattern from the config schema via TypedConfigManagerInterface::getDefinition(). That is the same schema that $entity->getTypedData()->validate() reads from. That is a great step in the right direction, as both paths start to use the same source of truth.

The one question is where the schema-to-regex lookup lives. Alex's MR puts it in the param converter. I think it might actually belong on the entity type, similar to how hasIntegerId() already describes the ID format and is used in DefaultHtmlRouteProvider to set \d+ route requirements. A new getIdConstraint() method would generalize this for both content and config entities.

(I also think it may lead to a path where hasIntegerId() could be deprecated, but I've not verified that. At a minimum, something like getIdConstraint() could replace hasIntegerId() in DefaultHtmlRouteProvider. It would be an improvement.)

This is obviously bigger than the current issue but I think it points to the right long-term direction: one schema, used for import, save, load, and routing, rather than solving the same problem differently at each layer.

So, maybe we create a new issue for this, and get 3475540-unhandled-exception-ascii-2 committed here? I think 3475540-unhandled-exception-ascii-2 could be RTBC.

dries’s picture

One more thought, separate from my previous comment, as it's more radical. :)

We should question how much validation infrastructure we actually need. We've basically operated without ID validation on load or save for the entire lifetime of Drupal 8 through 11. Sure, we validate on config import, but that is only so useful if everything else bypasses it.

I'm not saying we shouldn't do any of this. But it's always worth questioning new requirements before adding complexity.

Sometimes less is more. The validation gap is real, and the existing validation inconsistent. In many ways, the system hasn't been perfect (this issue proves that), but it's also been largely fine.

We could just validate everything as 'needs to be ASCII' and not allow entity types to deviate from that. That could be a very simple but effective improvement to a system that has worked largely fine?

Food for thought?

alexpott’s picture

Status: Needs review » Needs work

@dries I think #76 points us in a good direction. I think this issue should:

  • Add the ASCII check to loadMultiple()
  • Add a simple ascii regex on config entity routes where there is no requirement already. It think this regex perhaps should be more permissive… i.e. just be printable ascii.. [\x20-\x7E]  … we could increase the complexity and not permit the characters listed in \Drupal\Core\Config\ConfigBase::validateName()  but not sure that’s worth it.

We should also open a follow-up to discuss whether we can improve the route requirement based on the regex constraint and/or whether we can ties the param conversion and entity validation using constraints together in a better way.

alexpott’s picture

Actually @tim.plunkett indicated they are ambivalent about the low level changes to config and the more I think about it the more i agree. We're just just one exception for another to no real benefit. The improvement here is we're stopping the routing system from being able to cuase these exceptions and that's delivered by https://git.drupalcode.org/project/drupal/-/merge_requests/15283

dries’s picture

I like the "simple regex" approach in 253487f0. It's much simpler, fixes the bug, and defers complexity for now. In general, it makes sense to move your branch forward. Thank you for all the work on this, Alex!

Direct calls to ConfigFactory, Drush, and programmatic entity loads would still throw a DatabaseExceptionWrapper on MySQL. The DatabaseStorage validation addresses that, and ensures consistent behavior across database backends. It feels like a minor improvement, and is well-tested now, but it might be optional. I'll let others decide.

I created #3582417: Use config schema as single source of truth for entity ID validation to explore using config schema as the single source of truth for entity ID validation across save, load, and routing.

I'm glad we explored different options and reined it back in.

alexpott’s picture

Status: Needs work » Needs review

alexpott changed the visibility of the branch 3475540-unhandled-exception-ascii-2 to hidden.

borisson_’s picture

Status: Needs review » Reviewed & tested by the community

I think this simple regex looks great, I was wondering if this would break any current sites - but because we're not adding to an existing interface but creating a new one this seems like it will not be an issue.

alexpott’s picture

Title: Throw an understandable exception when there is an attempt to load config entities with disallowed characters » Use a route requirement to prevent non-ASCII characters causing an exception when looking up a config entity
Issue summary: View changes

I've fixed up the issue summary and title to align with the solution we've ended up with.

alexpott’s picture

Category: Task » Bug report

Also this feels like a bug fix. 500s are becoming 404s

dries’s picture

I did a review. Overall this is clean and much simpler than the earlier iterations.

I added two comments that could be safely ignored.

Everything seems to work on my localhost.

We still need a final verdict on the DatabaseStorage validation, I believe.

Thank you, @alexpott.

alexpott’s picture

I've renamed the interface and improved the comment to be more precise.

dries’s picture

I agree this is RTBC.

Once it is in, we can reopen the discussion to decide on the DatabaseStorage validation (hidden branch), if needed.

needs-review-queue-bot’s picture

Status: Reviewed & tested by the community » Needs work
StatusFileSize
new91 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

alexpott’s picture

Status: Needs work » Reviewed & tested by the community

Rebased on main and fixed the conflict.

catch’s picture

One question on the MR, it's about the presence or absence of a single word in a comment so leaving RTBC.

alexpott’s picture

Responded in the MR. I think the comment is fine.

catch’s picture

Version: main » 11.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

Committed/pushed to main, thanks!

Will need a backport MR for 11.x

  • catch committed 69709341 on main
    fix: #3475540 Use a route requirement to prevent non-ASCII characters...

smustgrave’s picture

Status: Patch (to be ported) » Reviewed & tested by the community

Relatively easy backport, just 1 test needed to manually apply.

  • godotislate committed 677b3b1b on 11.x
    fix: #3475540 Use a route requirement to prevent non-ASCII characters...
godotislate’s picture

Status: Reviewed & tested by the community » Patch (to be ported)

Committed 677b3b1 and pushed to 11.x. Thanks!

Consulting other RMs on whether this should go to 11.4.x, leaving at Patch (to be ported) for now.