Problem/Motivation

in core/lib/Drupal/Core/Config/DatabaseStorage.php

public function readMultiple(array $names) {
    $list = [];
    try {
      $list = $this->connection->query('SELECT name, data FROM {' . $this->connection->escapeTable($this->table) . '} WHERE collection = :collection AND name IN ( :names[] )', [':collection' => $this->collection, ':names[]' => $names], $this->options)->fetchAllKeyed();
      foreach ($list as &$data) {
        $data = $this->decode($data);
      }
    }
    catch (\Exception $e) {
      // If we attempt a read without actually having the database or the table
      // available, just return an empty array so the caller can handle it.
    }
    return $list;
  }

if there is any connection failure, exception will be catched, and an empty value will be returned.
According to the comment, the caller should handle it

while in
core/lib/Drupal/Core/Config/CachedStorage.php -> readMultiple

      $list = $this->storage->readMultiple($names_to_get);
      // Cache configuration objects that were loaded from the storage, cache
      // missing configuration objects as an explicit FALSE.
      $items = [];
      foreach ($names_to_get as $name) {
        $data = isset($list[$name]) ? $list[$name] : FALSE;
        $data_to_return[$name] = $data;
        $items[$cache_keys_map[$name]] = ['data' => $data];
      }
      $this->cache->setMultiple($items);

looks like it just set the FALSE to the cache backend

This is here because config could be fetched before the database is setup and it should succeed - and tables might only be created for some config upon first write - so lack of a table is not a failure.

However the catch is not just catching table not exists failures but also connection failures. So it can happen that the connection is lost and instead of an exception the code assumes no table and therefore gives a successful empty response even if there is configuration data in the database. This then gets injected into cache so even once the connection to the database is restored, the configuration is always returned empty

Steps to reproduce

Can be reproduced in a test by creating an invalid table with missing columns that will force an exception when calling the method. (Note: due to SQLite quoting this won’t work with a SQLite database)

Proposed resolution

Do not set empty config to the cache backend when there is an exception on getting the data, unless that exception is because the table does not exist. Instead, throw the exception to end the request.

We can detect table not exists failures by subsequently checking for the table in the exception. If it is not there we have confirmed it is safe to return empty. If it is indeed there then we can assume the query failed in some other way that needs rethrowing. In a connection loss instance this check for table will itself detect the loss of connection and throw (e.g. MySQL has gone away)

MR used

MR 2839 = 11.x

Remaining tasks

Review

User interface changes

NA

API changes

NA

Data model changes

NA

Release notes snippet

NA

Issue fork drupal-3181013

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

Pan Lee created an issue. See original summary.

Pan Lee’s picture

Issue summary: View changes
Pan Lee’s picture

Issue summary: View changes
alexpott’s picture

Version: 8.8.x-dev » 9.2.x-dev
Status: Active » Needs review
StatusFileSize
new2.48 KB
new3.74 KB

I agree that this looks a bit odd.

I think we can fix this by checking if the table exists.

The last submitted patch, 4: 3181013-4.test-only.patch, failed testing. View results

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

driskell’s picture

Due to https://www.drupal.org/project/drupal/issues/3267207 this patch won't work properly with MySQL installations, as the tableExists will return false if the connection is broken.

driskell’s picture

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

driskell’s picture

Priority: Major » Critical
StatusFileSize
new3.74 KB

Added patch for 9.4+ where there is a new method "exists" exhibiting the same issue that has caused me two outages so far and at least one "weird missing button issue"

driskell’s picture

StatusFileSize
new4.12 KB

Resolved comment changes in the patch so they are maintained properly.

Status: Needs review » Needs work

The last submitted patch, 12: 3181013-cache-corruption-9-4.patch, failed testing. View results

driskell’s picture

StatusFileSize
new4.3 KB
driskell’s picture

Status: Needs work » Needs review

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

kristen pol’s picture

Status: Needs review » Needs work

@Driskell Thanks for updating this issue. It wasn't clear what changes you have made as there is no interdiff on the patch changes and you have switched to an MR. So I have diff'ed the changes manually between the latest MR and patch in #4. There are some minor changes listed below that are needed. Thanks.

2c2
< index bca57388b9..f42553f580 100644
---
> index 7626bf50802c49a5a8ac3f2a64a7bb51892a27b9..bb13dac7218c53ab259207f656730f3a3da0f6e4 100644
5c5,15
< @@ -90,8 +90,11 @@ public function read($name) {
---
> @@ -72,6 +72,9 @@ public function exists($name) {
>        ], $this->options)->fetchField();
>      }
>      catch (\Exception $e) {
> +      if ($this->connection->schema()->tableExists($this->table)) {
> +        throw $e;
> +      }
>        // If we attempt a read without actually having the database or the table
>        // available, just return FALSE so the caller can handle it.
>        return FALSE;
> @@ -90,6 +93,9 @@ public function read($name) {
9,10d18
< -      // If we attempt a read without actually having the database or the table
< -      // available, just return FALSE so the caller can handle it.
14,15c22,23
< +      // If we attempt a read without actually having the table available,
< +      // return FALSE so the caller can handle it.
---
>        // If we attempt a read without actually having the database or the table
>        // available, just return FALSE so the caller can handle it.
17,19c25
<      return $data;
<    }
< @@ -108,8 +111,11 @@ public function readMultiple(array $names) {
---
> @@ -108,6 +114,9 @@ public function readMultiple(array $names) {
23,24d28
< -      // If we attempt a read without actually having the database or the table
< -      // available, just return an empty array so the caller can handle it.
28,29c32,33
< +      // If we attempt a read without actually having the table available,
< +      // return an empty array so the caller can handle it.
---
>        // If we attempt a read without actually having the database or the table
>        // available, just return an empty array so the caller can handle it.
31,32d34
<      return $list;
<    }
34c36
< index b26b3fd53c..07f857c7e7 100644
---
> index b26b3fd53c6b373c56ed2d37bb109851321c52da..d1385e2e8a7bb2f6333b8e03e02bfd13d00ff62c 100644
45c47
< @@ -42,4 +43,53 @@ protected function delete($name) {
---
> @@ -42,4 +43,62 @@ protected function delete($name) {
50c52
< +   * Tests \Drupal\Core\Config\DatabaseStorage creates a table when necessary.
---
> +   * Tests \Drupal\Core\Config\DatabaseStorage does not fail when table does not exist.
53a56
> +    $this->assertFalse($this->storage->exists('config.settings'));
65c68
< +   * Tests \Drupal\Core\Config\DatabaseStorage throws expected exceptions.
---
> +   * Tests \Drupal\Core\Config\DatabaseStorage throws expected exceptions when connection fails.
81a85,92
> +      $this->storage->exists('config.settings');
> +      $this->fail('Expected exception not thrown');
> +    }
> +    catch (\Exception $e) {
> +      $this->assertInstanceOf(DatabaseExceptionWrapper::class, $e);
> +    }
> +
> +    try {

I see that the code was changed as follows:

  1. Added logic to the new exists method
  2. Reverted comments in new logic (see below)
  3. Added test code for the new exists method
  4. Comments for the tests have been changed (see below)

The comment changes that were made in #4 have been reverted. They should be the following in all places:

 // If we attempt a read without actually having the table available,
 // return FALSE so the caller can handle it.

Also, the test comments are now more than 80 characters and should be reverted back to what they were or reformatted.

  1. +++ b/core/tests/Drupal/KernelTests/Core/Config/Storage/DatabaseStorageTest.php	2022-10-07 12:45:33.000000000 +0100
    @@ -42,4 +42,62 @@
    +   * Tests \Drupal\Core\Config\DatabaseStorage does not fail when table does not exist.
    
  2. +++ b/core/tests/Drupal/KernelTests/Core/Config/Storage/DatabaseStorageTest.php	2022-10-07 12:45:33.000000000 +0100
    @@ -42,4 +42,62 @@
    +   * Tests \Drupal\Core\Config\DatabaseStorage throws expected exceptions when connection fails.
    
driskell’s picture

Hi Kristen

The patch in #4 is for 9.2

The MR and patches I added are for 9.4/9.5 which has new exists method so I thought an interdiff would confuse as it’s somewhat a new patch for 9.4+ as opposed to a change to the 9.2 one.

I did apply the code patch manually though and yes you’re right the comments were changed in the 9.2 patch. I’ll resolve that in an update to the MR soon and I will hide the patches in preference of the MR.

Thank you.

Jason

driskell’s picture

Status: Needs work » Needs review

I've updated the MR with the comment changes so it matches the 9.2 patch, and hidden the patches where I didn't include the comment changes.

Thanks for the review! And apologies for missing the comment change!

driskell’s picture

Got PHPCS working and fixed up some comments.

mxr576’s picture

Status: Needs review » Needs work
driskell’s picture

@mxr576 I've applied your requested changes.

I have also added further checks as I have two open problem reports on my side over the last few months:
* listAll we traced to returning empty array during a network glitch and that also cached an empty list somewhere and broke something
* getAllCollectionNames we traced as well to doing the same thing

Further to adding the checks to the above two so they behave like read/readMultiple/write/exists, I've also added the same check to deleteAll in anticipation that it can cause a similar outcome (leaving config behind but caching have no config, and causing a split between cache and what's in storage accessible directly)

driskell’s picture

Status: Needs work » Needs review
driskell’s picture

Fixed line length issues of test comments from what @kirsten-pol mentioned earlier

smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +Needs Review Queue Initiative

This issue is being reviewed by the kind folks in Slack, #needs-review-queue-initiative. We are working to keep the size of Needs Review queue [2700+ issues] to around 400 (1 month or less), following Review a patch or merge request as a guide.

Reviewing MR #2839. Hiding files to avoid confusion.

Code wise everything looks fine

But when I ran the tests locally without the fix to DataStorage.php they still passed, when they should fail to show the bug.

Thanks!

driskell’s picture

@smustgrave

I just ran the tests without the fix and received a failure. I then applied the fix and it worked again.
However, from a hunch, I hooked up SQLite as the test database, and yes it did not fail!

Bit of debugging later...

https://www.sqlite.org/lang_keywords.html
> If a keyword in double quotes (ex: "key" or "glob") is used in a context where it cannot be resolved to an identifier but where a string literal is allowed, then the token is understood to be a string literal instead of an identifier.

The query that is meant to fail because of a missing column actually now resolves to a "does this string equal this string" and is accepted as a string expression!
> SELECT 1 FROM "test57957719"."config" WHERE "collection" = :collection AND "name" = :name LIMIT 0, 1

Will have a think about a way to make the test fail properly on SQLite as well.

driskell’s picture

OK, tracked the quoting to double quotes back to https://www.drupal.org/project/drupal/issues/3126658
It seems not to have taken into account SQLite.
I checked and the ANSI quoting in Pgsql and MySQL should be fine as that forces double quote as identifier and it will never be string literal.
In SQLLite it actually is string literal if the column does not exist.

I don't know but... I am wondering if the above needs to be a separate issue as that's in a different module. So I'll get a ticket raised.
I've noticed there's some duplication of tests in the MR anyway so I'll push a cleaned up version - and I'll set the other ticket for SQLLite as blocking this one.

driskell’s picture

Version: 10.1.x-dev » 9.5.x-dev
driskell’s picture

Status: Postponed » Needs review

@smustgrave

I'm guessing the impact to all SQLite queries of a change to that would prevent this from being a bug fix in 9.5.

However, perhaps there just needs to be an acceptance that this test won't function properly with SQLite until the other issue is fixed so maybe there's a decision to be made here - in that the test will properly work on MySQL so the automated tests will work fine. It only will cause a problem for running locally with SQLite. Just I feel this patch has saved me many issues and is hugely beneficial - our incidents relating to cache corruption in this area are eliminated so far.

So perhaps the SQLite issue does not postpone this? So I'm U-turning as usual and returning to Needs review to get some thoughts.

andypost’s picture

The workaround looks incomplete tableExists() making a query on exception, but when other process creating table it could be wrong assumption

driskell’s picture

@andypost

At the moment it fails silently and corrupts cache during normal day to day running. The scenario of a writer creating the table and causing a valid read to then throw exception is true, but I think that would only occur once, when the table doesn't yet exist, and only in the perfect race between read and the very first ever write to the table, and it would recover completely on the second request and never happen again. Unfortunately it's difficult to quantify the real impact of that scenario but it seems to be small, and much smaller than the current issue's impact.

Technically speaking I think the correct fix is to detect the specific error of the table not existing. However, this might be complex across the different drivers and versions supported to detect the specific error message, and would require driver changes. If that was opted into it would be better I think as a post-task after this to get those driver implementation for throwing a specific exception for table missing, for use in this code, so we're never handling arbitrary exceptions and only handling the exception we intend to: Table does not exist. Arguably that would be easier to test too and wouldn't need the quoting change for SQLite.

What are your thoughts?

driskell’s picture

Not seen it before but found an issue that was adding the table exception checks that would also resolve this: https://www.drupal.org/project/drupal/issues/2371709

catch’s picture

The workaround looks incomplete tableExists() making a query on exception, but when other process creating table it could be wrong assumption

This is fine - the worst that happens is a cache miss, and it's how we handle it in other subsystems.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new1.48 KB

The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

longwave’s picture

I have a busy site that is experiencing occasional strange bugs where fields suddenly no longer exist (causing subsequent fatal errors in different places each time), but this is fixed on cache rebuild, and I am wondering if this is the culprit.

I think the fix here looks good for the problem at hand but I wonder if there is a way of refactoring this so we explicitly create the table separately and then the service itself doesn't have to be concerned with catching exceptions at all; it can just pass them up to the caller. This should be a followup, though.]

ericgsmith’s picture

Version: 9.5.x-dev » 11.x-dev

ericgsmith’s picture

Status: Needs work » Needs review

I suspect we also hit this issue recently. The symptoms were very similar to #37 - config status showed a list of config as in sync only - clearing the cache went back to showing no differences.

Fixed CS issue reported in #36 and opened new MR to 11.x - setting back to needs review.

ericgsmith’s picture

Status: Needs review » Needs work

Ah - setting back to needs work. As was indicated in #26 the test is failing on SQLite.

Have triggered the additional pipelines for PostgreSQL and SQLite and it indeed is still failing with SQLite https://git.drupalcode.org/issue/drupal-3181013/-/pipelines/47013

driskell’s picture

Status: Needs work » Reviewed & tested by the community

I am putting this back into Needs Review because a failing test is impossible for SQLite due to a known issue (see #3349286). The test system online is MySQL so reproduces it correctly. You cannot test this locally with a SQLite database, it won't fail.

As for the other issues, #2371709 should be a follow on, it's a bit more involved. And hopefully at some point #3349286 can be looked at to fix the SQLite issue. But I don't feel either of these block this fix. The first improved it. The second is purely relating to local tests.

driskell’s picture

Status: Reviewed & tested by the community » Needs review
ericgsmith’s picture

Having also been meaning to come back to say we can almost certainly confirm this issue is what is responsible for the config shown as overridden.

Further investigation showed multiple errors in the logs that mysql had gone away over about 1 minute. We can also see higher than expected numbers of evictions in memcache during this period from a combination of high traffic and our memcache cache bin size being lower than what the site needed. Immediately after the mysql gone away logs we start seeing warning of missing fields that imply the field does not exist, and as mentioned in #40 this was visible via drush config status and resolved after a cache clear.

We are now running this patch in production now along side increasing the cache bin size of memcache.

driskell’s picture

I've updated the MR to include a skip for SQLLite and linked it to the issue about the double quoting that prevents the test from working.

driskell’s picture

Sorry - that last pushed commit was to fix the URL to be correct issue

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new1.52 KB

The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

driskell’s picture

Status: Needs work » Needs review

Resolve review queue bot issue

smustgrave’s picture

Status: Needs review » Needs work

Can the 9.5.x MR be closed since D9 is EOL please

The other MR seems to have some test failures potentially.

driskell’s picture

@smustgrave I rebased it to 11.x and updated the MR. Not sure how to update the other one so I kept my existing.
It skips the test for SQLite since it will fail due to the linked issue. But it successfully fails without the fix on MariaDB / MySQL and passes after the patch.

driskell’s picture

Status: Needs work » Needs review

Not sure the test failures are related.

smustgrave’s picture

Issue summary: View changes
Status: Needs review » Needs work

Glad switching targets didn't cause issues, usually changing that causes issues.

Checking the issue summary

I am not sure the exact fix yet, but in my opinion, we should not set a permanent faulty cache to the cache backend when there is an exception on getting the data

Isn't a full solution, can the proposed solution be documented helps reviewers/committers.

Added missing sections too, left TBD for sections that may apply but not familiar with issue to know myself. If not applicable just change to NA.

driskell’s picture

Issue summary: View changes
Status: Needs work » Needs review

@smustgrave I updated issue summary. It was proposed resolution that was taken so I just dropped the not sure bit and I’ve updated other parts to reflect final status. Thanks for checking

driskell’s picture

Issue summary: View changes

Added more detailed description of problem and solution to IS

smustgrave’s picture

Issue summary: View changes
Status: Needs review » Reviewed & tested by the community

Thanks issue summary reads much better now!

  • catch committed d8b92d57 on 10.1.x
    Issue #3181013 by Driskell, alexpott, ericgsmith, Pan Lee, smustgrave,...

  • catch committed 562a13e2 on 10.2.x
    Issue #3181013 by Driskell, alexpott, ericgsmith, Pan Lee, smustgrave,...

  • catch committed 201fd312 on 11.x
    Issue #3181013 by Driskell, alexpott, ericgsmith, Pan Lee, smustgrave,...
catch’s picture

Version: 11.x-dev » 10.1.x-dev
Status: Reviewed & tested by the community » Fixed

Committed/pushed to 11.x, cherry-picked to 10.2.x and 10.1.x, thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.