Contact person responsible and ready to help

@Fabianx

Problem/Motivation

Drupal promises you can use different cache and lock backends like Memcache, Redis, etc. instead of the Database backend but that is essentially not true:

- Because Drupal uses transactions, and the same database connection is used for the cache and lock backends, there is an implicit coupling of the transaction to the cache or lock consistency.

Examples:

function x() {
  lock_acquire();
  $transaction = db_transaction();
  lock_release();
}

Looks right doesn't it? And it will always work correctly, because the lock will get released at the same time as the transaction, BUT as soon as you use another backend, you have introduced a race condition, where the lock is released, before the new data was written.

Correct would be:

function x() {
  lock_acquire();

  $transaction = db_transaction();
  $transaction->commit();

  lock_release();
}

@see #356399: Optimize the route rebuilding process to rebuild on write

Another example is the recent usage of drupal.org having to use the cache_consistent module.

Here the race condition was more like:

  node_save($node);
  // This invalidates the field cache -> with DB: okay, with memcache: BOOM
  // A new request comes in and loads the old node
  // Repopulates the field cache.
  // Now the transaction is committed.

All of this happens, because core takes an implicit coupling of cache, lock and database for granted.

Proposed resolution

- Use a different DB connection for cache and lock backends, so that writes / reads to / from those systems are no longer coupled.
- Obviously there is a drawback to several connections, so lets make that configurable.

BUT the default at least for the test suite should be to use one DB connection per backend.

OR: if we never use transactions on lock() or cache() backends, we could have one DB connection that is transactional and one that is not.

Remaining tasks

User interface changes

API changes

Issue fork drupal-2347867

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

catch’s picture

A non-transactional database connection seems good, then the various database backends can just use that.

catch’s picture

Title: Decouple cache and lock backends from the main database connection » Race conditions with lock/cache using non-database storage - add a non-transactional database connection

Re-titling

catch’s picture

Also log and queue should use this.

moshe weitzman’s picture

Would anyone be willing to Assign this issue to themselves. It is a Critical so needs an owner. I'm a base system maintainer and doing triage on critical bugs in this Component.

I personally dont think this is Critical but if someone owns it then I wont play status pong.

fabianx’s picture

Assigned: Unassigned » fabianx

I think it is base only, because it spans so many systems. Taking it for now.

fabianx’s picture

Assigned: fabianx » Unassigned

Unassigning per assigning policy, but adding myself as "owner" to the issue queue, so I am there as responsible contact for this issue.

If that is not enough Moshe, I will assign myself back. Please let me know and thanks for the triage!

fabianx’s picture

Issue summary: View changes
fabianx’s picture

Issue summary: View changes
moshe weitzman’s picture

@Fabianx. The policy you refer to is https://www.drupal.org/node/2172049? Looking at the revision history, that content was authored by one person. I think it overstates the problems of using the Assigned field. I think it is fine and traditional to use it as the main coordinator for an issue. As a counterpoint, any list or JSON request can show the Assigned person but cannot show someone who 'claims' an issue in the Comments.

berdir’s picture

StatusFileSize
new2.26 KB

Discussed this with @dawehner a bit today.

An easy improvement that we can see is define a database.secondary (or any other name) and use that by default for cache, lock, log and queue (and maybe others?). That falls back to default.default if it's not defined, to enable it, you just have to copy your database definition and add it as database.secondary. Similar to how we used to do it with slave, but with a different purpose.

Not sure what else we could do here, we can't configure that to not use transactions as far as I can see and I also don't think we want to make this the default.

At least needs documentation and a test that the second connection is not affected by the

berdir’s picture

Status: Active » Needs review
berdir’s picture

Issue tags: +Ghent DA sprint
pfrenssen’s picture

Issue summary: View changes
dawehner’s picture

Tried to start writing a test here: http://privatepaste.com/c06d2019b9 but it failed.

berdir’s picture

As discussed, that is not PHP :)

Explicitly unsetting() the $transaction makes the test pass for me. You can find similar code in the further up in that test.

Honestly, I'm not sure why this is critical. Yes, it is can be a problem, but it is something that *always* existed how is it suddenly a release blocking bug in 8.x ?

catch’s picture

Yes the race condition in field caching is marked major, and it's clear from the patch we can fix this in a minor release, so going ahead and downgrading to major.

catch’s picture

Priority: Critical » Major
dawehner’s picture

I would like to see a name for the second database connection so it becomes obvious for what kind of services you want to use it for.

One example could be something like "additional" but this also not seems to be satisfying.

berdir’s picture

Agreed, I was thinking about something like database.nontransactional or so, but that's only TRUE if nobody actually does use transactions. So it would just be a convention and not a rule..?

And the database backend actually uses transactions in setMultiple().

mikeytown2’s picture

D7 cache backend that uses a different db connection for lock/cache operations
https://www.drupal.org/project/apdqc

Note that if we're looking to do this for locking in D8 we should address this issue as well #1225404: Modernize lock acquisition to RAII. I ended up using namespace and an eval file_get_contents fallback if needed inside of apdqc.lock.inc, as I want to wait for all async queries to finish before releasing a lock inside of memcache.

damien tournoud’s picture

It sounds very dangerous to just start making lock and cache non-transactional without fixing all the known race conditions, and having a clear understanding on how to uncover and fix the unknown race conditions.

At this point, the cache being transactional is the only thing that makes Drupal not poop under itself when even some light concurrency happens on an entity, and I'm not talking about concurrency between two save operations, I'm taking about a save and a load operation, see #1679344: Race condition in node_save() when not using DB for cache_field.

fabianx’s picture

The point that this was critical (and no I don't want to block release and delay Drupal 8 for it, so I agree) is that we lie to ourselves currently:

- We make everything nicely de-coupled and pluggable

But in reality you can't use a different cache backend for reliability reasons. If it works, its luck.

The deadlocks we see on the database are showing the concurrency issues we have in core and that others see as inconsistency and data loss issues.

And the only reason we don't take concurrency into account is that we take the transaction for granted as #22 stated.

Making core use a different connection for it would uncover those issues that are the _real_ issues we have.

However with most everything going via Entity Base thats stored now in the database and using transactions, I think its a solvable problem.

Status: Needs review » Needs work

The last submitted patch, 16: database-secondary-2347867-16.patch, failed testing.

jhedstrom’s picture

Status: Needs work » Needs review
StatusFileSize
new4.67 KB

This is a reroll of #16, with the change to use 'nontransactional' rather than 'secondary'.

fabianx’s picture

Nice! :)

Status: Needs review » Needs work

The last submitted patch, 26: database-nontransactional-2347867-26.patch, failed testing.

david_garcia’s picture

I'm adding this as a possibly related issue:

#2572283: Neither REPEATABLE READ nor READ COMMITTED transaction isolation levels are always appropriate

I'm trying to spot issues that wil benefit of such a change, and this looks like one.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

damienmckenna’s picture

Version: 8.1.x-dev » 8.2.x-dev

Bumping to 8.2.x.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

catch’s picture

Issue tags: +Triaged core major

Discussed this on a triage call with the other committers, and we agreed this should be major (although not necessarily on the solution).

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new4.77 KB

Re-roll only.

Status: Needs review » Needs work

The last submitted patch, 36: 2347867-36.patch, failed testing. View results

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new4.74 KB
new476 bytes

Well the last patch is a little bit old. We should use only factory instead of factory class and method.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

anavarre’s picture

StatusFileSize
new4.77 KB

Patch no longer applies.

Checking patch core/core.services.yml...
error: while searching for:
      - [setContainer, ['@service_container']]
  cache.backend.database:
    class: Drupal\Core\Cache\DatabaseBackendFactory
    arguments: ['@database', '@cache_tags.invalidator.checksum']
  cache.backend.apcu:
    class: Drupal\Core\Cache\ApcuBackendFactory
    arguments: ['@app.root', '@site.path', '@cache_tags.invalidator.checksum']

error: patch failed: core/core.services.yml:192
error: core/core.services.yml: patch does not apply
Checking patch core/modules/dblog/dblog.services.yml...
Checking patch core/tests/Drupal/KernelTests/Core/Database/TransactionTest.php...
wim leers’s picture

fabianx’s picture

Yeah, I have since also re-thought my approach:

While in a transaction:

- cache_get: Allow - if cache_prefix or cache tag had not been invalidated in this request [use simplified memcache key prefix algo, we could also be more strict and do invalidation by bin]
(else we run 100% cache-less, which is not really what we want even though most things are static caches anyway).

- cache_set: Disallow, treat like a record_cache_clear_all($cid), record
- cache_clear_all: Disallow, treat like a record_cache_clear_all($bin, $prefix)
- tag invalidation: Disallow, treat like record_cache_tag_invalidation

After the transaction is committed => commit all the cache clears, which is simple as just need to execute it.

That will solve the cache inconsistency for memcache / redis AND also reduce lock wait timeout exceeded problems within transactions as there will be no writes to the cache bins during the transaction.

--

Funnily enough #2966607: Invalidating 'node_list' and other broad cache tags early in a transaction severely increases lock wait time and probability of deadlock already does a lot of that (the tag invalidation part).

fabianx’s picture

Here is another argument for the original idea that in addition to fixing the cache coherency issues, that Core absolutely should not run cache operations within a transaction:

https://dev.mysql.com/doc/refman/5.7/en/truncate-table.html

ends a transaction prematurely as this is an explicit COMMIT.

Unless I misunderstand something it means if you do:

$transaction = db_transaction();
try {
  node_save($node_a);
  cache_clear_all("*", "cache_foo", TRUE); // This is an implicit COMMIT according to MySQL docs. corresponds to `$this->cache->deleteAll();`

  node_save($node_b);
  // here an Exception is thrown.
  node_save($node_c);
}
catch (Exception $e) {
  $transaction->rollback();
}

unset($transaction);

that you end up with just node A saved, but not node B and node C.

Expected would be that no node is saved.

This needs testing, but it seems to be a realistic bug.

EDIT: Tested, the bug is real!

Another possibility to solve the truncate issue is to revert back to a much slower DELETE instead.

----

Reproduce statements:

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT into flood (event, identifier, timestamp, expiration) VALUES ('foo', 'x', 1,2);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT into flood (event, identifier, timestamp, expiration) VALUES ('foo', 'x', 1,2);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT into flood (event, identifier, timestamp, expiration) VALUES ('foo', 'x', 1,2);
Query OK, 1 row affected (0.00 sec)

mysql> select * from flood;
+-----+-------+------------+-----------+------------+
| fid | event | identifier | timestamp | expiration |
+-----+-------+------------+-----------+------------+
|   1 | foo   | x          |         1 |          2 |
|   2 | foo   | x          |         1 |          2 |
|   3 | foo   | x          |         1 |          2 |
+-----+-------+------------+-----------+------------+
3 rows in set (0.00 sec)

mysql> rollback;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from flood;
Empty set (0.00 sec)

# Starting 2nd test

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT into flood (event, identifier, timestamp, expiration) VALUES ('foo', 'x', 1,2);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT into flood (event, identifier, timestamp, expiration) VALUES ('foo', 'x', 1,2);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT into flood (event, identifier, timestamp, expiration) VALUES ('foo', 'x', 1,2);
Query OK, 1 row affected (0.00 sec)

mysql> truncate cache;
Query OK, 0 rows affected (0.01 sec)

mysql> select * from flood;
+-----+-------+------------+-----------+------------+
| fid | event | identifier | timestamp | expiration |
+-----+-------+------------+-----------+------------+
|   4 | foo   | x          |         1 |          2 |
|   5 | foo   | x          |         1 |          2 |
|   6 | foo   | x          |         1 |          2 |
+-----+-------+------------+-----------+------------+
3 rows in set (0.00 sec)

mysql> rollback;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from flood;
+-----+-------+------------+-----------+------------+
| fid | event | identifier | timestamp | expiration |
+-----+-------+------------+-----------+------------+
|   4 | foo   | x          |         1 |          2 |
|   5 | foo   | x          |         1 |          2 |
|   6 | foo   | x          |         1 |          2 |
+-----+-------+------------+-----------+------------+
3 rows in set (0.01 sec)

That means our cache truncate can lead to data inconsistencies by ending transactions early. Fortunately deleteAll() are not that often used, so this is not a big problem in practice, but it is a bad side effect regardless.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

wim leers’s picture

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

quietone’s picture

Assigned: fabianx » Unassigned
Issue tags: +Bug Smash Initiative

There has been no activity here for 2 years 7 months when @Wim Leers asked if this issue was still relevant. And un-assigning since there has been not reply.

Since we need more information to move forward with this issue, I am keeping the status at Postponed (maintainer needs more info). If we don't receive additional information to help with the issue, it may be closed after three months.

Thanks!

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

pstewart’s picture

Status: Postponed (maintainer needs more info) » Needs review
Related issues: +#2833539: SQLSTATE[40001]: Serialization failure: 1213 Deadlock

I think this issue is still relevant: I still regularly get deadlock errors in my production logs, and deadlocks actually become highly probable in situations where synchronous requests are expected (for example, on a Commere site using an offsite paytment gateway, the customer return and independent payment gateway notification requests will be executing simultaneously more often than not. A big part of the problem is that entity saves have potential to do huge amounts of work in the pre-save and post-save phases, which all occur inside the transaction of the entity save. There's a strong argument that post-save should not be occurring inside the transaction (the clue's in the name: stuff occurring in the post save phase should be expecting the save to have been completed and committed). Most notably though, the cache invalidation is occurring inside the save transaction, and that really shouldn't be the case, and leads to deadlocks when competing processes are both trying to cause caches to be updated and the tables are not accessed in order.

For example, process A locks cache_config and then tries to lock cache_bootstrap for something unrelated, but happens to be doing it in an entity save cache context so has to hold both locks simultaneously to fulfil the transaction. Meanwhile process B locks cache_bootstrap and then tries to lock cache_config during a (possibly completely unrelated) entity save, and a deadlock occurs.

I believe #44 is correct: cache operations should not happen inside a transaction.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new144 bytes

The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

catch’s picture

This came up in #2833539-134: SQLSTATE[40001]: Serialization failure: 1213 Deadlock - see the comment which reaches similar conclusions to this issue (but also has at least one possibly alternative approach).

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

greenskin’s picture

Re-rolled patch from #41.

joseph.olstad’s picture

joseph.olstad’s picture

 ╰❯ $ diff D10.0.x-nontransactional_connection-2347867-61.patch D10.1.x-nontransactional_connection-2347867-62.patch 
2c2
< index e3c1e094..f2ed065d 100644
---
> index ed531f5027..13bbd46cdd 100644
5c5
< @@ -213,7 +213,7 @@ services:
---
> @@ -220,7 +220,7 @@ services:
14,15c14
< @@ -386,6 +386,10 @@ services:
<      class: Drupal\Core\Database\Connection
---
> @@ -405,6 +405,10 @@ services:
17a17
>    Drupal\Core\Database\Connection: '@database'
25,26c25,26
< @@ -485,7 +489,7 @@ services:
<        - [setContainer, ['@service_container']]
---
> @@ -527,7 +531,7 @@ services:
>    Drupal\Core\Queue\QueueFactory: '@queue'
30a31
>    Drupal\Core\Queue\QueueDatabaseFactory: '@queue.database'
33,34c34
<      arguments: ['@request_stack']
< @@ -870,13 +874,13 @@ services:
---
> @@ -959,13 +963,13 @@ services:
51c51
< index 96ac2ee9..78cac386 100644
---
> index 96ac2ee948..78cac38694 100644
63,64c63,64
< diff --git a/core/tests/Drupal/KernelTests/Core/Database/TransactionTest.php b/core/tests/Drupal/KernelTests/Core/Database/TransactionTest.php
< index e41df19c..c1145c5f 100644
---
> diff --git a/core/tests/Drupal/KernelTests/Core/Database/DriverSpecificTransactionTestBase.php b/core/tests/Drupal/KernelTests/Core/Database/DriverSpecificTransactionTestBase.php
> index ebbf60826a..e09d8dce65 100644
75c75
< @@ -573,4 +574,54 @@ public function testQueryFailureInTransaction() {
---
> @@ -575,4 +576,54 @@ public function testQueryFailureInTransaction() {
joseph.olstad’s picture

According to the current dictionary, nontransactional is not a word.

I would argue otherwise as this makes perfect sense to those of us who know what transactional processing is in a relational database sense from a contemporary RDBMS world.

How do we go about adding nontransactional to the dictionary so that we can continue with automated testing?

I don't see the problem with using nontransactional in this case. However if someone has a better idea by all means go for it.

berdir’s picture

You can add it to misc/cspell/dictionary.txt.

joseph.olstad’s picture

Thanks @Berdir, here's three straight up rerolls of #60 with the nontransactional added to the cspell dictionary. One for each branch.

joseph.olstad’s picture

Status: Needs work » Needs review

Rerolled, has tests, passes tests

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new119 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

joseph.olstad’s picture

StatusFileSize
new5.43 KB

Rerolled 66 to resolve a little fuzz on the 11.x patch
the 10.1.x patch still applies cleanly.

joseph.olstad’s picture

Status: Needs work » Needs review
smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +Needs change record

This looks good!

Wonder if a change record could be added though for the new service database.nontransactional? If it's something that other contrib modules could leverage.

Think it could be RTBC after that.

Thanks.

joseph.olstad’s picture

@smustgrave, I have created a draft change record.

https://www.drupal.org/node/3384048

smustgrave’s picture

Status: Needs work » Reviewed & tested by the community
Issue tags: -Needs change record

Thanks!

joseph.olstad’s picture

Status: Reviewed & tested by the community » Needs work

latest patch needs a reroll

kostyashupenko’s picture

Status: Needs work » Needs review
StatusFileSize
new5.42 KB

Rerolled

mondrake’s picture

Component: base system » database system
Status: Needs review » Needs work

This is a good idea, but as it is this patch is practically not doing what it's meant to do.

+++ b/core/core.services.yml
@@ -415,6 +415,10 @@ services:
+  database.nontransactional:
+    class: Drupal\Core\Database\Connection
+    factory: Drupal\Core\Database\Database::getConnection
+    arguments: [nontransactional]

This service will try to fetch a db connection with 'nontransactional' as a target, which unless it is defined in the settings.php, will fall back to the same 'default' target, see this code in Database::getConnection:

    // If the requested target does not exist, or if it is ignored, we fall back
    // to the default target. The target is typically either "default" or
    // "replica", indicating to use a replica SQL server if one is available. If
    // it's not available, then the default/primary server is the correct server
    // to use.
    if (!empty(self::$ignoreTargets[$key][$target]) || !isset(self::$databaseInfo[$key][$target])) {
      $target = 'default';
    }

So you would still be doing the cache and log operations on the same default Connection object.

The test works because you are preliminarily adding the 'nontransactional' database target in

+++ b/core/tests/Drupal/KernelTests/Core/Database/DriverSpecificTransactionTestBase.php
@@ -680,4 +681,55 @@ public function testConnectionDeprecations(): void {
+    // Setup a secondary database connection.
+    $connection_info = Database::getConnectionInfo('default');
+    Database::addConnectionInfo('default', 'nontransactional', $connection_info['default']);

So something will have to happen similarily in the getConnection() method if (hopefully) we won't be changing settings.php.

I also think if we open such a transaction-free connection, we should prevent it from executing startTransaction(). Could be a follow up, but IMHO it would be better to get it straight from the beginning.

mondrake’s picture

Also, the IS could use some updates… the code is still D7.

bradjones1 made their first commit to this issue’s fork.

bradjones1’s picture

Status: Needs work » Needs review

OK, marking this NR.

Digging into this a bit, I discovered/decided a few things:

  • Most of Drupal core expects that you'll really only have one default target and (maybe) a replica target. Non-replica targets for a database key, especially default, aren't anticipated in production and this results in a required bugfix here. Basically, the assumption was that if there were more than one target, it must be a replica, and thus we need to disable replicas during certain actions. This was discovered during a regression surfaced in testing where the session was re-started due to the replica kill switch service.
  • Speaking of replicas, we use that term a lot around the DB API and it's not entirely clear that replica is a reserved word that specifically designates a replica or group of replica connection targets. It was particularly difficult to find all instances of this "DB replica" vs. other uses of the word. I think constants are a great answer to this question, so in the course of fixing above bug in adding this functionality, I extracted all these words into constants.
  • I landed on a decorator pattern for the non-transactional connection, which wraps the underlying driver's connection object. As I have navigated in the course of my work on JSON data support, messing with signatures for the various connection classes and interfaces is no small feat, and there are also contrib drivers to think about. Decoration is a good solution here, but as you can see it results in a lot of boilerplate. I marked this class both final and internal, and it did still require a change broadening the potential return types of connection methods. (see the event enable and disable methods.) As I expect this to really be an internal API (there's little to no reason to override the non-transactional decorator class) I'm hoping this is sufficient. Otherwise we need to add a bunch of code to the connection classes that boils down to throwing an exception if you try and start a transaction on this connection. I don't think that's worth it.
  • This also required a few tweaks to performance testing (though with no real change in performance) due to there now being two connections always in play - transactional and non.
bradjones1’s picture

Title: Race conditions with lock/cache using non-database storage - add a non-transactional database connection » Race conditions with lock/cache, session storage - add a non-transactional database connection
mondrake’s picture

Great idea to use a decorator. Added some comments/ideas inline - free thoughts.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new90 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

mondrake’s picture

Status: Needs work » Needs review

Opened a new branch and MR (MR!8762) to try and introduce a minimal interface, common to both Connection and NonTransactionalConnection. The idea is to use this interface where we can inject both.

mondrake’s picture

mondrake’s picture

MySQL passes tests, but SQLite and PostgreSql fail quite spectacularly

bradjones1’s picture

Added a comment on the alternative MR. My feeling is that we should add the non-transactional class now, which solves a real bug (or two) and move adding a connection interface to a new issue. I certainly did consider the question of an interface for handling this, when pursuing my solution, but it would be a non-trivial deprecation dance to get it done overall. And as you have found, there are wrinkles when considering multiple drivers/db types.

mondrake’s picture

Looking a bit at SQLite docs, https://www.sqlite.org/wal.html and https://www.sqlite.org/lockingv3.html, it looks to me that SQLite cannot perform a DML statement in the non-transactional connection while an open transaction in the normal connection is holding the flushing of a DML statement. For example: INSERT in nonTransactionConnection while a transaction is open in Connection, and an INSERT is in the transaction --> Database Locked exception.

Could not find a way to circumvent this, so the problem here is bigger I am afraid - and do not think depends on the MR.

On the interface - I added the open() method, but in fact it could even be empty and just be there to indicate the commonality between Connection and NonTransactionalConnection; also, we do not necessarily need to change everywhere, only where we could have one of the two alternative classes.

bradjones1’s picture

Status: Needs review » Needs work

Thanks for the pointer on SQLite - that's unfortunate. As with a few other issues I've been working on lately, it seems this is another area where SQLite isn't really an equally-qualified production DB for Drupal if you're using features that are well-supported in other relational DBs. The test failures re: a locked DB certainly prove out the failure mode you're describing.

The only real workaround on this is to just keep the current behavior with SQLite, that is, don't use the new non-transactional connection. That would involve overriding these services for SQLite (which is already supported in all but one of the services involved - that being queue). I don't think this is a blocker, in so far as we can't not fix the related bugs just because SQLite can't hang.

At first glance, the PGSQL failure seems a little stranger. This is due to a difference in how MySQL and PgSQL handle transaction isolation. Basically, MySQL gives each transaction its own snapshot for reads and writes, while PgSQL allows select queries inside a transaction to see changes committed since the transaction opened. I updated the test to reflect this... it doesn't actually matter, but the test was written too strictly to pass for both MySQL and PgSQL.

Re: the interface question, I'm open to that change, but haven't incorporated it yet into this MR. I think it's better to either have a complete interface or an empty one. That said, I'm not sure typehinting both the non-transactional and transactional connections with the same interface gets us much, because you likely want to use the non-transactional connection sparingly. I'm concerned about situations where you'd get a non-transactional connection passed where it's not appropriate. At least currently we explicitly have to state that it's acceptable.

bradjones1’s picture

So I chased my tail a bit on sqlite-specific stuff, namely that I forgot to put the dblog override in a service provider because otherwise it registers itself as a logger when dblog module isn't loaded, and that breaks things.

I also discovered that you can't currently do a backend-override on a lazy service - see #3461330: Lazy services (backed by proxy classes) can't be `backend_overridable` because the proxy service isn't tagged - I'm including a fix in this issue but if it needs to be broken out into a separate one... then at least there's a separate issue. Oy vey.

bradjones1’s picture

Status: Needs work » Needs review
bradjones1’s picture

Status: Needs review » Needs work

SQLite tests failing in a new way but much closer. Will pick up tomorrow.

mondrake’s picture

In the alternate MR, made the interface empty. So we can typehint with the interface where we can accept both Connection and NonTransactionalConnection, or the more specific class where needed.

Also, tried a different approach to mark a db driver allowing or not concurrent non-transactional connections, which does not involve services.

Finally - diverging MySql from SQLite here is starting to scare me a bit, we would end up everywhere in very different db state results depending on whether the db supports or not the concurrent non-transactional connection...

bradjones1’s picture

Recapping my debugging this morning.

I think backend-overriding queue.database has been broken since at least #3416357: Convert QueueFactory to use a service locator introduced a service locator.

Here's what happens:

  • BackendCompilerPass adds an alias for the overridden service's ID pointing at the override service.
  • The service locator is looking for services tagged with queue_factory, which now results in an empty ServiceLocator passed to the queue service.
  • CoreServiceProvider does tag instances of QueueFactoryInterface with queue_factory, but that doesn't appear to extend to services defined by modules, including sqlite. I think that it probably should but I'm not sure why it isn't working. Probably an issue with order of operations in the service providers and passes.
  • Manually tagging sqlite.queue.database, the backend-override service, as a queue_factory results in it showing up in the service locator instance passed to the queue service, but it's named sqlite.queue.database, not queue.database, which is the hard-coded default that is requested. Service locators aren't aware of aliases.

Even if the override service got the tag automatically, that doesn't change the fact that queue is requesting queue.database as a default, which is an alias. I think the answer is in how BackendCompilerPass registers the overrides. In cases where the service is requested directly from the "main" container, this all works fine. But laziness causes this go to awry.

I'll see about opening a separate issue to track this, but it is a hard blocker for the sqlite carve-out we need here.

bradjones1’s picture

I'm ambivalent on a flag that indicates the specific nature of the read-committed implementation. It seems like it's mostly only really useful for tests, but perhaps a public method for indicating this behavior would help some power-user developers. I'm tempted to not add to the API surface of the connection for something that we can address in tests - it's only really a MySQL vs. Postgres difference. But I don't care much either way, there are bigger issues uncovered here.

diverging MySql from SQLite here is starting to scare me a bit, we would end up everywhere in very different db state results depending on whether the db supports or not the concurrent non-transactional connection

I don't think we really have any alternative. I would suggest couching it as "SQLite diverging from MySQL/MariaDB/Postgres," and not the other way around. As explored in detail over at #3343634: Add "json" as core data type to Schema and Database API, SQLite is divergent from the other, non-file based drivers in a number of ways and I imagine we will continue to root out these differences as Drupal becomes more advanced and modern under the hood. I think there is a loose but growing consensus that running Drupal on SQLite in production will mean accepting these limitations or trade-offs. If we insist on 100% parity, it means that we sacrifice innovation for a small minority of implementations. I don't think we should do so.

bradjones1’s picture

The failure on SQLite is #3458975: GenerateThemeTest::testContribStarterkitDevSnapshotWithGitNotInstalled fails on sqlite. Seems like there is some sort of regression with PgSQL though... playing whack-a-mole. This issue has turned into fixing what appears to be a very-not-stable backend-override API more than the original goal. Gotta fix all that though, at this point.

bradjones1’s picture

Looks like all tests are passing now, next step is to incorporate some of the changes from the alternative MR.

bradjones1’s picture

Status: Needs work » Needs review

Hacked a bit more on this tonight and am more happy with where this is going. Kind of a compromise between the two MRs.

Per #95:

In the alternate MR, made the interface empty. So we can typehint with the interface where we can accept both Connection and NonTransactionalConnection, or the more specific class where needed.

I toyed with this but eventually landed on a fuller interface. Reason being, the non-transactional connection can "do" most everything the wrapped/transactional connection can, so let's make it a first-class citizen. I think it's also good to move toward an interface for Connection, instead of treating the abstract class as a makeshift interface. I am not wed to this but the interface does help with static analysis way more than using __call().

Also, tried a different approach to mark a db driver allowing or not concurrent non-transactional connections, which does not involve services.

I pulled the flag into my MR and the new interface. I kept the service override, however, since SQLite is still the outlier here and all the services in question are backend-overridable. This is what that model is for, so let's use it instead of hiding the logic.

Finally - diverging MySql from SQLite here is starting to scare me a bit, we would end up everywhere in very different db state results depending on whether the db supports or not the concurrent non-transactional connection...

As discussed a bit above... I'm not sure there's really any way around this. Yes, you will get some different results with SQLite but that's no different than now, and in the vast majority of production use cases this will result in more deterministic and predictable behavior, not less. I think that's a win.

smustgrave’s picture

Status: Needs review » Needs work

Not sure which MR is to be reviewed but 8273 needs rebase

If you are another contributor eager to jump in, please allow the original posters at least 48 hours to respond to feedback first, so they have the opportunity to finish what they started!

mondrake’s picture

@smustgrave 8273 is the one for review; 8762 is only there as a reference if anyone wants to go through all the comments

bradjones1’s picture

Status: Needs work » Needs review

8273 is the active/main MR for review. Updated per feedback, rebased and back to NR.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new90 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.