Problem/Motivation
Whenever a request is processed, the database.replica_kill_switch service checks if the replica database should be ignored. The problem is that the event subscriber method \Drupal\Core\Database\ReplicaKillSwitch::checkReplicaServer always fetches data from the session, which can cause a session to be started when it has already been stopped.
if ($this->session->has('ignore_replica_server')) {
if ($this->session->get('ignore_replica_server') >= $this->time->getRequestTime()) {
Database::ignoreTarget('default', 'replica');
}
else {
$this->session->remove('ignore_replica_server');
}
}
The \Drupal\Core\Database\ReplicaKillSwitch::trigger has a preflight check that sees if a replica server even exists before setting a session variable. The check method should perform this check as well.
Steps to reproduce
On kernel.terminate perform a subrequest. The request triggers the replica killswitch check and throws an except due to headers already being sent.
Proposed resolution
Add this check before reaching into the session service and lazy loading/starting a service
$connection_info = Database::getConnectionInfo();
// Only set ignore_replica_server if there are replica servers being used,
// which is assumed if there are more than one.
if (count($connection_info) > 1) {
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #29 | 3181946-29.patch | 7.24 KB | kim.pepper |
| #28 | interdiff.txt | 1.68 KB | joelpittet |
| #28 | 3181946-28-d10-replica-session-killswitch.patch | 4.55 KB | joelpittet |
| #26 | 3181946-nr-bot.txt | 144 bytes | needs-review-queue-bot |
| #23 | interdiff_22_23.txt | 516 bytes | ranjith_kumar_k_u |
Issue fork drupal-3181946
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
mglamanComment #4
amateescu commentedThe patch makes perfect sense to me :)
Comment #5
longwaveIt should be possible to add a test for this?
Comment #6
sam152 commentedEdit: encountered a different bug, please ignore.
One question though, does the bug described in the IS still exist for sites with a replica configured?
Comment #7
catchLet's add test coverage.
@Sam152 we have to start a session when there actually is a replica database, so it should be 'by design' in that case - the session value is the only thing that tells us to ignore the replica.
Comment #8
kim.pepperIs a session necessary though? Would the duration of the current request be sufficient?
Comment #9
dwwI believe this issue is causing PHPUnit tests to fail on a client site I'm working on right now. The tests all pass in our local Docker env where there's only a single DB configured. On test instances on the client's hardware, there's a replica DB configured, and (some of) the tests fail with this error. If I remove the DB replica config from settings.local.php, the tests all pass. However, adding this patch doesn't help with the replica configured, since then the connection info count is > 1.
Therefore, I'm leaning towards the @kim.pepper question/solution: do we have to use a session for this at all, even in the replica case? Can't this trigger last the duration of the request, instead of trying to be time-based via sessions?
See also some additional related background / digging at #2941687: Subrequests in an KernelEvents::TERMINATE event subscriber (f.e. cron tasks doing subrequests) try to restart a session, resulting in runtime exceptions
Comment #10
mglaman#9 are you thinking we should just flag an attribute on the request object vs the session? I'd like that.
Comment #11
dww@mglaman re: #10: Yes. See #2941687-17: Subrequests in an KernelEvents::TERMINATE event subscriber (f.e. cron tasks doing subrequests) try to restart a session, resulting in runtime exceptions where @EclipseGc writes:
Plus @kim.pepper's comments here and elsewhere.
So yes, flagging the replica ignore in the request object instead of using the session at all...
Comment #12
mglamanThanks, I didn't mean to sound dense, I just wanted to make sure :).
I'll pick this up sometime soon to propose moving from session to a request object flag
Comment #13
longwaveThe session-based killswitch was originally added in Drupal 7 as
db_ignore_slave(). I found the original issue in git history and there is some interesting reading here before we consider removing the time-based component of this and reducing it to the current request only:#314358: Add option to ignore slave server for next requests
#302300: Support adding and ignoring targets
The ultimate source of this feature appears to be the following two blog posts:
https://dri.es/database-replication-lag
https://dri.es/scaling-with-mysql-replication
Comment #14
dww@mglaman: I didn't mean to imply you were being dense. I was just giving credit to the folks who influenced my thinking on this topic, lest anyone think I was claiming that solution as my own idea. ;)
@longwave: Very helpful, thanks! Will definitely read those threads ASAP.
Comment #16
dan2k3k4 commentedGetting this error when trying to login to a site I recently updated:
Comment #17
longwave@dan2k3k4 Do you use redirect_after_login or a similar module? Please see #3214949: Headers have already been sent after upgrade to Drupal 9.2 (can't login)
Comment #18
dan2k3k4 commented@longwave ahh yep, thanks! I have that module installed and temporarily disabling it, fixed the issue for me. I had been Googling a bit the stack trace and ended up in this issue, my bad.
Comment #19
Claudio Fabio Mazzarago commentedHi, i have the same issue as @dan2k3k4 and i fixed the issue by disabling the "redirect after login" module.
this is the issue: https://www.drupal.org/project/redirect_after_login/issues/3214949
C.
Comment #20
kim.pepperHere's an implementation of a request-based killswitch.
Comment #22
arturs.v commentedCurently testing patch from #20 in my project. First signs seem promising.
I had to make one small change - On latest core (9.3.3) after applying the patch Drupal is throwing up
Service "database.replica_kill_switch.request" must implement interface "Symfony\Component\EventDispatcher\EventSubscriberInterface". Switching use statement toSymfony\Component\EventDispatcher\EventSubscriberInterfacefixed it.Comment #23
ranjith_kumar_k_u commentedFixed CS error.
Comment #26
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #28
joelpittetD10 Fails on this patch with
TypeError: Drupal\Core\Database\ReplicaKillSwitchRequest::__construct(): Argument #1 ($requestStack) must be of type Drupal\Core\Http\RequestStack, Symfony\Component\HttpFoundation\RequestStack given, called in /var/www/html/public/core/lib/Drupal/Component/DependencyInjection/Container.php on line 259 in Drupal\Core\Database\ReplicaKillSwitchRequest->__construct() (line 26 of core/lib/Drupal/Core/Database/ReplicaKillSwitchRequest.php).This patch should work for D11 as well.
Comment #29
kim.pepperFixes the patch conflicts and adds a kernel test.
I think we need to work out how this would be used instead of the session version? Do we switch to it and deprecate? How do we support both?
Comment #30
smustgrave commentedIs there any BC concerns if we switch?
Comment #31
smustgrave commentedMoving to NW for issue summary update. And if there is a backwards compatibility concern to switch?
Comment #32
jonmcl commentedI have reviewed some of the patches, including MR!25 and I do not think any of them address the fundamental problem: if the session has been ended, the ReplicaKillSwitch service is still going to attempt to start a session.
In our (extreme edge) case, we are attempting to save some entities in a shutdown function. The session has been closed, our shutdown function then saves a pending entity change,
SqlContentEntityStorage::savethen calls\Drupal::service('database.replica_kill_switch')->trigger()which then ends up creating an Exception which then causes a transaction rollback (insideSqlContentEntityStorage::save) and our attempt to save changes to an entity (in the shutdown function) are lost.Incidentally, we don't have a replica server, but we do have another database connection. So our Database::getConnectionInfo count is 2.
I think the right solution here might be to check if the session ::isStarted before attempting to get from it or set data to it?
Comment #36
catchThe theory behind starting the session when this was originally added is that if someone has just triggered saving an entity, then while the replica is catching up to the primary database, you want that user to see fresh data while they're browsing around the site, and not potentially stale data from the replica.
We could change that logic, and that would mean that the user potentially sees stale data until things catch up again. However, this was all added before render caching, and if anyone browses a listing query that's from a replica database listing stale data, then it'll get cached for other people too, and then this logic does nothing useful anyway. So I'm not sure it makes any sense any longer. Sites using this feature just have to deal with the possibility that some state listing data might get cached sometimes.
I think it's completely reasonable to avoid setting anything to the session unless the session is open, an actual user triggering an entity save without a session open would be extremely rare. So that might also be fine.
The MR will need to be targeted against 11.x, and it looks like the branch needs a rebase.
Comment #37
jonmcl commentedI pushed up my changes on https://git.drupalcode.org/issue/drupal-3181946/-/tree/3181946--replicak...
I tried to create my new branch off of 11.x, but things seemed to go awry and that's probably not the correct way to get this updated for 11.x. When I created the MR, it showed thousands of changes even though the new MR was targeting 11.x.
Someone with better git.drupalcode.org skills than me is needed to clean things up.
Comment #39
catchWhen you branch in an issue fork, it uses 11.x from the issue fork, and that can be very old on an old issue - so you would need to rebase the branch with origin 11.x again.
Thinking about alternatives to session - one possible option that would also fix caching would be to write a timestamp to state instead of session, and then not allow anything to query the replica for x seconds after the timestamp. This would also prevent stale data from the replica going into the render cache too.
Comment #40
jonmcl commented@catch I am wondering if my understanding of ReplicaKillSwitch is correct.
This is to make it so that MY queries, in this current request, go to the primary db server instead of any of the replicas. The idea being that my current request just made a change to the database. That change goes through the primary connection. If my current request, being processed in the same thread, has a select query, the select goes through the primary connection again so that it is assured to have the updated data?
Then, because the kill switch is in my session for a period of time, my next request to load and display the updated node (and update entity & render caches) is guaranteed to get fresh data from the primary because the kill switch is sending me there. I suppose there is a small risk that another user's request comes in at that moment and, because they don't have the kill switch, they get stale data from a replica and replace the invalidated cache items with stale data.
Comment #41
catchThat's the right idea yes. Except loading a node never goes to the replica, in core I think the only places that support it are search and views, so the only place to get stale data would be listing queries.
However there could be dozens of pages on which any one node appears - taxonomy term listings etc. and it would be easy for another user to warm the cache of those pages.