Problem
Upstream issue: https://bugs.xdebug.org/view.php?id=2222
Drupal does auto-commit transactions on shutdown, as discussed at #1608374: Database autocommit disabling at shutdown. At the moment, quite some code relies on that behavior.
For that to work, Drupal\Core\Database\Transaction::__destruct
must run before Drupal\mysql\Driver\Database\mysql\Connection::__destruct
. If the order is reversed, this results in nasty fatal errors like the following:
Fatal error: Uncaught Error: Call to a member function inTransaction() on null in /var/www/web/core/modules/mysql/src/Driver/Database/mysql/Connection.php on line 483
Error: Call to a member function inTransaction() on null in /var/www/web/core/modules/mysql/src/Driver/Database/mysql/Connection.php on line 483
Call Stack:
15.7059 86985504 1. Drupal\Core\Database\Transaction->__destruct() /var/www/web/core/lib/Drupal/Core/Database/Transaction.php:0
15.7059 86985504 2. Drupal\Core\Database\Connection->popTransaction($name = 'drupal_transaction') /var/www/web/core/lib/Drupal/Core/Database/Transaction.php:71
15.7059 86985504 3. Drupal\mysql\Driver\Database\mysql\Connection->popCommittableTransactions() /var/www/web/core/lib/Drupal/Core/Database/Connection.php:1447
15.7059 86985880 4. Drupal\mysql\Driver\Database\mysql\Connection->doCommit() /var/www/web/core/modules/mysql/src/Driver/Database/mysql/Connection.php:415
As a result the transaction is not committed. In our project, this suddenly (=zero change) popped up and reliable killed every "drush cr" or simple drush commands like "drush user:unblock".
According to https://www.php.net/manual/en/language.oop5.decon.php the object destruction order during shutdown is random:
PHP possesses a destructor concept similar to that of other object-oriented languages, such as C++. The destructor method will be called as soon as there are no other references to a particular object, or in any order during the shutdown sequence.
Steps to reproduce
See steps to reproduce in #146
We ran into this at drunomics on a rather large project, on CI builds, suddenly without changes to code or the docker-controlled environment. The error only happened during CLI invocations, but was reproducable also without drush.
It turns out that most of the time the object destruction order is correct and no problems arise. But somehow for CLI invocations the order can become reliable different, thus causing the problem.
Proposed resolution
When a transaction is created register a shutdown function to ensure all transactions are committed before PHP starts to destroy objects.
Workarounds
- ensure that develop is not in your xdebug.mode value
- pin xdebug in CI image to xdebug 3.2.2
API changes
Add \Drupal\Core\Database\Database::commitAllOnAllConnections()
Add \Drupal\Core\Database\Connection::commitAll()
Add \Drupal\Core\Database\Transaction\TransactionManagerBase::commitAll()
All functions are additions and marked as @internal
Data model changes
None
Release notes snippet
N/a - sites do not need to make any changes.
Comment | File | Size | Author |
---|
Issue fork drupal-3405976
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3405976-10.1.x-shutdown-functions changes, plain diff MR !6717
- 3405976-mondrake-approach changes, plain diff MR !5774
- 3405976-transactions changes, plain diff MR !5715
- 3405976-10.2.x-fix-postgres changes, plain diff MR !6731
- 3405976-11.x-fix-postgres changes, plain diff MR !6730
- 3405976-minimal-approach-shutdown-functions changes, plain diff MR !6668
- 3405976-10.2.x-shutdown-functions changes, plain diff MR !6716
- 3405976-minimal-approach changes, plain diff MR !6614
- 3405976-with-3421164 changes, plain diff MR !6620
- 3405976-mondrake-approach-10.2.x changes, plain diff MR !6562
- 10.1.x changes, plain diff MR !5677
- 3405976-transaction-autocommit-during changes, plain diff MR !5674
Comments
Comment #3
fagoNot sure what went wrong with this MR - here a patch file.
Comment #4
fagoComment #5
fagoComment #6
petar_basic CreditAttribution: petar_basic at drunomics commentedComment #7
mglamanWe're hitting this same order and stack trace! I can only reproduce this on our CI environment. And only when executing Drush commands.
Comment #8
mglamanNote: we're hitting this on MySQL and SQLite drivers.
Comment #10
fagoThe previous patch was not correctly committing the transactions. Fixed it, now changes actually persist as the should.
> We're hitting this same order and stack trace! I can only reproduce this on our CI environment. And only when executing Drush commands.
ha, same for us. But it was reproduable with php-cli without drush also + on multiple CI servers.
I was already thinking the problem is probably in the other drivers as well. Let's make sure the mysql works fine, then we can port the fix to all drivers + to 10.2.x. we'll do some more testing with it
Comment #11
fagoComment #12
mglamanFor 11.x we'll need to refactor for this change record: https://www.drupal.org/node/3381002
And then roll a 10.1.x specific fix, it seems
Edit: I also wish I could know _what_ changed to surface this. I rolled back from an 8.1.26 container to 8.1.25 and it still occurred. I'm really confused.
Comment #13
mondrakeIMHO, another reason we should do #3398767: [PP-1] Allow explicit COMMIT in Transaction objects and eventually get rid of commits on objects destruction.
Comment #14
fago> Edit: I also wish I could know _what_ changed to surface this. I rolled back from an 8.1.26 container to 8.1.25 and it still occurred. I'm really confused.
I can totally second that. We rolled back all changes, made sure images did not change, tried multiple servers, everything. No idea what suddenly caused this still, I'm quite puzzled about that. We've seen the issue to pop up on 01.12 morning, since it's reliably failing on CI.
Comment #15
mglamanI tried on 10.2, same error just different flavor of text:
Comment #16
mondrakeWhat #15 says for D10.2 is that the TransactionManager object is being destructed while the root Transaction object in the stack is still active.
I guess you know that.
In the hope that this is not because of serialization/deserialization effects, the only reason I can imagine is unexpected destruction order.
While I strongly believe that #3398767: [PP-1] Allow explicit COMMIT in Transaction objects and later issues to remove commits during object destruction should be the master way forward, I think that a temporary fix (for D10.2) could be to commit the client transaction (is this the wanted behavior?) during TransactionManager destruction if the stack is not empty, instead of the current assert.
In fact when I wrote the TransactionManager code I left the assert there to see if there were side effects. This is one.
EDIT - Suggestion above still remains a last resort, TBH, if this cannot be solved in calling code. Is drush-only code being called in the traces reported? If so, would it be possible to check if a $transaction is being opened anywhere, and if so, can it be explicitly committed with
unset($transaction)
before letting the object go out of scope?Comment #17
mglaman@fago said they could reproduce without Drush
In my case, the Drush command only has this code interacting with Drupal core https://github.com/drush-ops/drush/blob/12.x/src/Commands/core/UserComma...
Comment #18
mglamanFor now, in our CI, I'm manually applying this patch. It's MR #5677 but on the parent Connection class. And we're getting builds to proceed.
Comment #19
mondrakeCan you just try this (in D10.2)?
even though, now that I look more closely, there's a risk that this occurs with post-transaction callbacks still to be called. But if we are in this predicament, then it's very possible that those callbacks have been destructed already. Fun.
Comment #20
mglamanI'm signing off for the workday, tomorrow AM CDT I'll bump to 10.2 and try the patch.
Comment #21
mondrake#17 so drush it's not explicitly opening a transaction, which is likely being opened when saving the user entity by Drupal core. Looks like it's not drush, but definitely seems like restrained on CLI? Is it reproducible at all on webserver PHP?
Comment #22
mondrake#17 alternatively it would be interesting to
unset
$account andgc_collect_cycles()
when it's no longer necessary within that method and before returning from itComment #23
fago> Is it reproducible at all on webserver PHP?
Not for me. Only via CLI, with and without drush.
Comment #25
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commentedCan you try disabled XDEBUG?
I have this error but only on a test environment. Disabling XDEBUG resolves it so it seems XDEBUG is doing something in my case.
In general within PHP, __destruct should call immediately as a variable leaves scope (it has always worked like this for a long time AFAIK - and is related to the "when no more references"). AFAIK only a circular reference is deferred beyond the scope until a collection happens that can work it out.
It looks like perhaps something is causing an interference here and causing a transaction object to not get immediately collected.
Comment #26
mondrakeLooking at #17 again, can anyone try this either? This implicitly COMMITS the transaction that saves the user entity just created.
Comment #27
mglamanOh! We have it turned on in CI for code coverage. And Xdebug had a new release on 2023-11-30 https://xdebug.org/download/historical
I'm in trainings but I'll see if I can have someone test and verify this.
Comment #28
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commented@mglaman
I managed to dive into my specific example and did indeed find that a specific exception that was handled somewhere - if I changed how it handled it it seems to work again.
I did produce a reproduction for Xdebug on this: https://bugs.xdebug.org/view.php?id=2222 (ignore the summary - I tend to write fast without thinking and then update after reading - and it's a rather assumptive title! the reproduction is clear though)
So I think this is likely an issue in 3.3.0 of Xdebug.
Comment #29
catchIf this is an xdebug regression, we could still potentially commit one of the workarounds here with a @todo to remove, or just add this to the 'known issues' for 10.2.0. I'm not sure which is best and it probably depends how quickly it's fixed upstream too.
Comment #30
mglamanFor our CI image we pinned Xdebug and all is fixed:
And then when running PHPUnit we've always had
XDEBUG_MODE=coverage php vendor/bin/phpunit
. So we stopped enabling the extension entirely.RE #29 I say we mark it as a known issue, it affects all versions of Drupal and will be fixed in a patch release of Xdebug. Or by downgrading to the last minor version.
Comment #31
catchTagging for release notes, we can add the snippet directly into the draft.
Comment #32
derickr CreditAttribution: derickr commentedAlthough this is perhaps caused by a change in Xdebug, the real culprit is lined out in the initial report:
Comment #33
alexpott@derickr thanks for commenting here. I agree that we should probably fix things here. But I also think that xdebug should not be changing the order objects are destructed. This is a behaviour change that can make debugging things really hard.
Comment #34
alexpottWhat's interesting is that I cannot reproduce the bug. I'm on
I've got 10.1.x checked out, installed drush and have installed the standard install profile.
I do
vendor/bin/drush ucrt test
and everything works just fine. Same when I repeat the steps on 10.2.x. I wonder why that is.Comment #35
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commented@derickr Reading the statement I believe the way Drupal is currently using transaction objects is fine. It assigns one and then allows it to go out of scope to commit automatically.
The statement "The destructor method will be called as soon as there are no other references to a particular object" covers this. The key phrase here is "as soon as" which leaves nothing to assumption. This occurs the moment the method ends, or when the stack unwinds during an exception.
Having said that, perhaps readability is improved by not having this paradigm in Drupal as this maybe relies on more advanced knowledge of object destruction. Though I think one could argue the approach in PHP currently is implied and the most reasonable assumption, and only circular references are the advanced concept.
@alexpott All my tests were with PHP 8.1 so perhaps the behaviour changes in 8.2? In my specific case it was also during menu link building in cache clear so could be related to the menu and content in my case too.
Comment #36
alexpott@Driskell tried PHP 8.1 too...
Comment #37
mglamanI could not reproduce locally – host, VM, Docker, only on our CI instance. I don't know how or why. But rolling back to 3.2.2 did fix it. However, I am on macOS and don't have a Linux device available. But I don't think @fago could reproduce on a Linux machine, either.
@Driskell, do you have information about the environment you could reproduce it with? Since our CI is hosted via a cloud provider, I'm pretty disconnected from knowing more of those details beyond the fact that it's a Linux machine running and executing Docker.
Comment #38
derickr CreditAttribution: derickr commentedIt probably depends on which values `xdebug.mode` has. It should include `develop` for Xdebug to cache exceptions, their stack traces, and a reference to local variables (which is why they're kept in scope longer).
Comment #39
mglamanAh, that makes sense. I didn't check what mode
docker-php-ext-enable xdebug
set. And would explain why PHPUnit didn't crash inXDEBUG_MODE=coverage
.Comment #41
alexpottI've created an MR for 11.x/10.2.x that fixes the issue for those branches. We have a bit more to do because of the new TransactionManager.
I've tested this locally on xdebug 3.3.0 with develop in the xdebug.mode setting and without it
drush ucrt
is broken and with the fix it works as expected.I've removed the 10.2.0 release notes tag because this affects all versions of Drupal including Drupal 7.
Comment #42
mondrakeWell at least for TransactionManager I do not think this is right. We’re in a very unpredictable situation here, it could be the TransactionManager object is already gone when destroying the Connection one.
Also, differently from
popTransaction
,removeStackItem
is actually doing nothing on the db.To me, here we should just commit the client transaction on the db (note: if there is still a client connection open, which may not be the case if we anticipate that the destruction order is unpredictable), and forget about the stack itself, which is gone anyway after destruction.
So for me #19 is still the best fix if destroying TransactionManager before the Transaction objects. If we are destroying the Connection before anything else, then a straight commit on the client connection.
This seems rather untestable, too?
Comment #43
olegiv CreditAttribution: olegiv as a volunteer commentedI have temporarily disabled xdebug for CLI mode – it fixed the issue in my case.
It looks like the issue started after the recent xdebug update on 2023-11-30.
Comment #44
alexpott@mondrake if the transaction manager object is gone then we're fine. The stack will have been emptied all ready and the transactions committed. In my opinion, that doesn't matter at all. The problem here is that we have to ensure that the stack is empty before closing the connection. If the manager doesn't exist or the stack is already empty then there is not problem.
I think regardless of xdebug's behaviour we need to fix this in some way because we should not be relying on the order of object desctruction.
Comment #45
alexpottChanged the code to call unpile each member of the stack.
Comment #46
mondrake#44
Only if all the individual Transaction objects would have been destructed BEFORE destruction of TransactionManager. Otherwise, the stack would be still active. #15 tells us that.
Why? The state of the stack is irrelevant here IMHO. The TransactionManager is already destroyed or will be destroyed as part of the shutdown anyway. What matters is that if a client transaction is still open on the db at this stage, we should commit it (hence enforcing auto-commit mode) rather than letting it rollback as the standard behaviour would:
and we need to commit it from the Connection object, if we cannot rely on the TransactionManager to exist.
Working on an alternative proposal, will post a patch.
More in general, I'm more and more convinced we should abandon implicit commits, do #3398767: [PP-1] Allow explicit COMMIT in Transaction objects / #1025314: Transactions should be allowed to be committed explicitly, make core code explicitly commit transactions, and deprecate committing as part of object destruction. We would not be here if we will.
Comment #47
eastdrive CreditAttribution: eastdrive commentedI'm seeing this when trying to delete nodes in the admin UI, singly or batched.
Drupal 10.1.6
PHP 8.1.6
MySQL 8.0.32
Running locally with docker compose.
Comment #48
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commented> and we need to commit it from the Connection object, if we cannot rely on the TransactionManager to exist.
I don't think you can assume the transaction is complete. Maybe you're in this destruction because of an error and it is incomplete. This could potentially commit a partial transaction and remove all usefulness of transactions.
> I think regardless of xdebug's behaviour we need to fix this in some way because we should not be relying on the order of object desctruction.
During normal execution there is no issue with destruction order from what I can see. When it goes out of scope and last reference is gone, it is destructed and freed, as per the PHP documentation. Ordering is only involved when you get to shutdown and things are left behind and that in itself, before we even worry about ordering, is a problem - such as in this case where Xdebug is preventing destruction of objects - or if somebody stored a transaction object somewhere.
Perhaps here it just needs a better exception - throw from connection if an open transaction and roll it back - in transaction just ignore missing connection as you'll know it will only be missing if you're in shutdown, and a transaction should never be destructing there. It'll make the error message clearer - you won't get an inTransaction on null - you'll get a "leaked transaction" message which is what Xdebug is doing here. Of course, doesn't workaround the Xdebug issue.
Issue with any change to destruction code is that I don't think you could ever workaround the Xdebug issue safely. If you think about it - if transaction objects are getting held open, every transaction is going to become one. Then you get to shutdown and collection, and you have absolutely no idea if the transaction was successful. And you still have the issue of one massive transaction when it should have been individual.
> More in general, I'm more and more convinced we should abandon implicit commits
I think this makes most sense.
Comment #49
derickr CreditAttribution: derickr commentedFWIW, I reject that this is an Xdebug **issue**. It might well cause this problem, but Xdebug isn't doing anything wrong here. What happens here is that Xdebug now keeps a reference to the last 8 exceptions, including their stack trace (With local variables). Any of these will therefore have their destructor called when Xdebug releases these exceptions. (A workaround therefore could be to just throw and catch exceptions (I joke)).
> During normal execution there is no issue with destruction order from what I can see.
From an earlier comment is a bit strange, as "destruction order" isn't defined as being reliable (from https://www.php.net/destruct#language.oop5.decon.destructor):
> The destructor method will be called as soon as there are no other references to a particular object, or in any order during the shutdown sequence.
That also means, that your workaround (ie fix) could be to keep a reference yourself? Or as you say, it's better to not rely on ```__destruct``` at all.
From the Xdebug side, I don't know how to fix this; except for disabling the improvements to handling transactions while debugging and through the `xdebug_get_function_stack()` improvements, and I am not really willing to do that. Alternatively, I could add a function to clear out these cached exceptions, but that would also mean that you'd have to conditionally call this in places.
FWIW, I don't actually understand *why* you have these exceptions with references to these connection/transaction objects though. Unless you're using exceptions for code flow, which you surely did not expect?
cheers,
Derick
Comment #50
izus CreditAttribution: izus as a volunteer commentedIn our case we :
1) Disabled xdebug
2) Restarted php-fmp
and it Worked :)
Comment #51
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commented@derickr
> FWIW, I don't actually understand *why* you have these exceptions with references to these connection/transaction objects though. Unless you're using exceptions for code flow, which you surely did not expect?
Please see reproduction case in: https://bugs.xdebug.org/view.php?id=2222
There is nothing in code adding the variable "$destructing" to an exception. Just merely the presence of the exception is causing this. I think you'd be absolutely perfectly fine to store a reference to the exception and anything inside it. Just here it seems to be somehow grabbing everything in scope in each method in the call chain, which I am sure must be unintentional and I hope, fixable.
Comment #52
derickr CreditAttribution: derickr commentedAlthough there is no code that adds that variable to an exception, that doesn't mean that it doesn't get added.
Each exception has an exception trace (so that you can call `getTrace()` on the exception), and for each stack element it keeps a reference to each of the arguments, and each of the local variables.
The only alternative would be to 1. not keep a "reference" to local variables (which means I can't show them), or 2. convert them to a string representation (which is also not great).
A script to demonstrate what's going on (copied from https://bugs.xdebug.org/view.php?id=2222):
Outputs:
Comment #53
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commented@derickr Thanks - I asked my question on the bug tracker as it's probably not for here.
Comment #54
mondrake#49 I am working on a patch and will post soon, so I'll keep this assigned to me.
#48
I agree 200%, but here we need to keep supporting current behavior. Reviewing #3398767: [PP-1] Allow explicit COMMIT in Transaction objects and getting it in would be a first step to remove the implicit behavior on destruction, which is causing this trouble.
Yes and no. Yes multiple 'drupal' transaction objects. No in fact there's only one REAL db transaction active, the nesting is a code logic thing, not a corresponding db thing. If you COMMIT that REAL db transaction, you're done vs current behavior. BUT - there's no way to safely execute 'drupal' post-transaction callbacks at this stage, they might have been already destructed.
Comment #55
mondrakeThis patch is what I would suggest doing. Covers independently destruction of both Connection and TransactionManager. As noted, not much can be done to try and perform post-transaction callbacks.
In general, my suggestion would be to won't fix this, and start working toward changing from implicit to explicit commits. I think this could happen by D11 if we do not lose time.
Comment #56
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commentedI agree with @mandrake that I think best approach is to move to explicit commit.
I previously before this edit put I don’t think the patch is feasible but I think I just realised this auto commit is happening anyway due to the Transaction destruct so this just continues that. (Sorry - a bit slow to this!)
I guess I just still concerned. Such as this collapsing all transactions into a single transaction that is auto committed at the end and I’m sure this might have implications - you could be sending a response and it not be guaranteed when it is received that the transaction committed. More edge case tho now so maybe the patch still reasonable. But perhaps it should be logging some errror at least?
Comment #57
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commentedProbably worth an IS update regarding the Xdebug impact and that in normal scenario transaction destruction is safe as it happens way before shutdown?
Comment #58
Wim LeersNote that this also affects Drupal 9. See #3406976: GitLab CI pipeline broke due to wodby/php:8.1-dev 4.41.0 installing XDebug 3.3 for example.
EDIT: I see now that has already been said 👍
Comment #59
mondrakeI filed #3406985: [PP-1] Convert all transactions to explicit COMMIT.
Comment #60
andreic CreditAttribution: andreic commentedThe patch in #55 fixed the same error I had on a 10.1.7 installation.
Comment #61
alfattal CreditAttribution: alfattal as a volunteer commentedThe patch in #55 failed to apply with the following error:
It also failed in the test for (10.1.x: PHP 8.1 & MySQL 8).
Comment #62
alfattal CreditAttribution: alfattal as a volunteer commentedComment #65
ashetkar CreditAttribution: ashetkar as a volunteer commentedWith #3 Patch CI pipeline working fine for Drupal core 10.1.5 but patch #55 failing for drupal core 10.1.5
Comment #66
alexpottI turned @mondrake's patch into an MR against 11.x.
Has anyone got instructions on how to reproduce because I can't reproduce this against the standard install. There is no exception triggered for communication purposes when doing a user create or cache clear.
Comment #67
alexpottHiding the files so the MR is the thing to review.
Comment #68
mondrakeAdded some comments inline.
The more I see this, the sicker I feel... dumping state across objects for use during destruction...
Comment #69
astoker88 CreditAttribution: astoker88 commentedFor anyone struggling short term here, whilst the patch in 55 isn't working (Drupal 10.1.7, PHP 8.2
MySQL 8.0) for a lot of folks and work is ongoing - i can confirm the issue is solved for me by deleting the xdebug.ini.
Comment #70
alexpott@astoker88 the simplest way to fix this is to ensure that develop is not set in xdebug.mode. You can still debug :)
Comment #71
akoe CreditAttribution: akoe as a volunteer commentedWe ran into the same issue with our CI containers that were failing only when run in pipeline. And we weren't able to reproduce this locally with the same containers in any way.
Our stack: MySQL 5.7; PHP 8.1 and Drupal 9 and 10
In the pipeline we could narrow it down to a Drush cache clear for the router cache:
drush cache:clear router
and also running the router rebuild method directly triggered the error:
drush ev "\Drupal::service('router.builder')->rebuild();"
Thankfully the workaround disabling xdebug via xdebug.mode=off is working for our usecase. And we will be testing patches for D10 also.
Comment #72
alfattal CreditAttribution: alfattal as a volunteer commented@akoe May I ask what patches you're testing for Drupal 10? Are they local patches or from other issues? Thank you!
Comment #73
xjmThe MR should be made against 11.x FWIW (which means filing the issue against 11.x too nowadays). We'll need to decide based on the eventual solution how far it's safe to backport it. This issue sounds bad enough that could even be backported as far as 10.1 (at release manager discretion) if we are able to come up with a safe fix.
Comment #74
alfattal CreditAttribution: alfattal as a volunteer commented@alexpott Not sure why you changed the status to Needs Review, you opened a MR based on a patch that does NOT apply and thus, it needs works!
Comment #75
alexpott@alfattal the MR does apply to 11.x which is where this has to be fixed first. The fix for 10.1.x is a little different because there is no transaction manager. If you want a fix for 10.1.x - maybe the patch in #3 will work for you but the final fix for 10.1.x is likely to look a little different. In my opinion the best fix you can do right now is ensure that
develop
is not in your xdebug.mode value as then you don't need a patch at all.Comment #76
YesCT CreditAttribution: YesCT at Lullabot for State of Georgia commentedLike @mglaman in comment #30 https://www.drupal.org/project/drupal/issues/3405976#comment-15346751
we were able to fix the sudden failing of our build step in behat, unit, and functional phpunit tests by updating the image CI was using in our webhead-ci/Dockerfile with this change:
Thank you folks for sharing what symptoms you were seeing and your workarounds!!
---
We are running D10. We did try a patch and it helped get our unit tests to pass, but we were still getting different errors with behat and functtional phpunit tests with the patch. Pinning xdebug got all our tests passing.
Comment #77
YesCT CreditAttribution: YesCT at Lullabot for State of Georgia commentedI added workarounds and the upstream issue to this summary.
Comment #78
Julien Tekrane CreditAttribution: Julien Tekrane for European Commission and European Union Institutions, Agencies and Bodies commented#75 resolved my problem. Thanks alexpott
Comment #79
OIOIOOOI CreditAttribution: OIOIOOOI commentedBeen using Drupal 10.1.6 and stumbled upon an issue that eventually led me to this thread.
Clean installation under PHP8.2 and xdebug 3.3.0, the entities created (nodes/terms) stubbornly were not deleted when doing so in UI.
After enabling the database logging it turned out that after the whole set of DELETE queries, a ROLLBACK was issued.
Step debugging this brought me to the __destruct() method being talked here.
Disabling xdebug extension fixes the issue, however believe the patch in #75 should also do the trick for those who use xdebug.
Comment #80
mferanda CreditAttribution: mferanda commentedDisabling xdebug resolved the issue on 9.5.11 and 10.1.7
Comment #81
pgrandeg CreditAttribution: pgrandeg as a volunteer commentedI faced te same problem while deleting content with Drupal 10.1.7 and latest wodby/drupal-php. Disabling xdebug configuration from docker-compose.yml fixed the issue
Comment #82
daffie CreditAttribution: daffie at Finalist commentedI see a lot of changes in the MR from @mondrake, which adds code that relies on deprecated code. When we do that we cannot remove the deprecated code. So, can we please not do that. :)
If I understand the problem correctly is that it only a problem when the database connection desctucted, before the Transaction object is destructed. The logical solution for me would be to have
Connection->__destruct()
check through the transaction manager if there is still an active transaction(s) and if so then to commit that transaction(s). If it is more complicated or if my solution does not work, can you explain why I am wrong.Comment #83
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commentedIn a way ideally Xdebug would make the exception tracking it’s doing opt-in or at least have an opt-out (the tracking keeps the transaction objects in scope if exception is thrown and caught so there’s a history of exceptions stored during debugging)
That would help for all use cases and if it’s runtime configurable allow Drupal as a framework to opt out until it removes reliance on the transaction object destructor to commit transactions.
Comment #84
mondrake@daffie the issue reported here is present in all Drupal versions, both in new and in deprecated code path, so a fix here would need to cover both paths - yes, that means also fixing deprecated code. At least, if we want to port it back to 10.2 and potentially below. Given this is critical, I assume it's reasonable to expect a backport.
To me, as I said earlier, the future though is to get rid of commit on destruction, and go for explicit commit. At least to me, the path to get there is set: #3398767: [PP-1] Allow explicit COMMIT in Transaction objects, then #3406985: [PP-1] Convert all transactions to explicit COMMIT, then a to-be issue to deprecate commit on destruct.
We are currently blocked by #3384999: Introduce a Connection::executeDdlStatement method.
Comment #85
derickr CreditAttribution: derickr commented@ #83 : I don't know how to make this opt-in without also not adding a setting. FWIW, relying on a specific destruction order has always a problem with Drupal. It was never something that was guaranteed to happen.
Comment #86
alexpottI've just hit this problem while upgrading Thunder to use core 10.2.1 (https://github.com/thunder/thunder-distribution/pull/762)
I'm quite concerned because:
The good news is that 10.2.x MR fixes the problem.
Comment #87
Carlos Romero CreditAttribution: Carlos Romero at Hiberus commentedHello everyone,
It is happening to me on a drupal 9.5.11 with php 8.0.30 and xdebug 3.3.1, when I deactivate xdebug the error disappears, I have activated it again and with patch #3 it no longer gives an error.
Thanks.
Comment #90
fvorillion CreditAttribution: fvorillion commentedI have the same issue on Drupal 10.2.2, with Docker, PHP 8.2.14 and XDebug 3.3.1 enabled with default configuration.
If I force to disable XDebug, the problem was solved :
It works also with the good xdebug config for my env :
And without any of the two solutions above applied, the MR on 10.2.x branch solved also the problem : https://git.drupalcode.org/project/drupal/-/merge_requests/5774
- composer require cweagans/composer-patches
- add patch on extra :
Comment #91
longwave@alexpott re #86 I don't think this is specific to 10.2, other users are reporting issues on 9.5 in this issue, and someone at my organisation has run into this today in ddev with xdebug on a 9.5 site - interestingly I work on the same site and have not run into it yet.
In the without-xdebug case I wonder if there is something more complex in Thunder and also the transaction manager changes in 10.2 mean that an object reference is held longer/differently and so the destruct order is accidentally changed, but it's very difficult to work out.
Comment #92
solideogloria CreditAttribution: solideogloria commented@fvorillion Just make sure you don't actually use the MR's patch URL in the composer file. MR patch files will change anytime a person commits a change, and the change might break or compromise your site. You should download the patch to a folder in your project and use
./patches/5774.patch
or something similar.Comment #93
solideogloria CreditAttribution: solideogloria commentedAfter downloading and applying the current MR patch, I still get errors for any transaction while xdebug is enabled.
The errors also contain the serialized DB connection with a bunch of sensitive info like DB, username, password, so I have redacted that information (even though this is from a local DDEV instance).
Comment #94
larowlanPossibly related #3418428: “Official” Docker image doesn't create user profile on fresh installation
Comment #95
alexpott@longwave yes I'm aware that this is in not 10.2 only when xdebug is involved but I've got the first instance of this issue being a problem where xdebug is not involved and the situation where this occurs is 10.2 only. I think this is a much much bigger problem as running xdebug in production is not recommended. Thunder doesn't do anything special on the DB layer and yes, investigating and reproducing this issue in the situation is not easy. I wish we had an MR for 10.2.x (and earlier Drupal versions) that had no API change so we can easy fix the problem. And then implement the better solutions being proposed. I think we need a pragmatic solution.
Comment #96
Mile23From my MR comment:
I'm not sure about the internals of the DB layer, so maybe it's naive to say. :-) But just keeping a reference to things you expect to exist in your class seems like a non-breaking change that could be applied to past versions.
Comment #97
rfayComment #100
jrglasgow CreditAttribution: jrglasgow commentedI was unable to get either merge request (10.1.x and 11.x) to apply to my 10.2.x sites, I hand applied the patch and created a new branch to be merged into 10.2.x. It is working for me on my local.
Comment #101
Mile23Like this:
Now you always have a reference to the transaction manager for the life of the object, and the transaction manager will not be garbage-collected before the Connection is destroyed.
If PHP has decided that a referenced object can be destroyed through GC before there are 0 references to it, then we have a problem we can not solve here.
Comment #102
Driskell CreditAttribution: Driskell as a volunteer and at Other Media commented> If PHP has decided that a referenced object can be destroyed through GC before there are 0 references to it, then we have a problem we can not solve here.
With XDebug develop mode since 3.2+ if a function throws then XDebug will capture the stack trace along with all in scope variables in all frames. This means the transaction object never reaches 0 at end of function as XDebug holds a reference. Then end of script occurs and AFAIK XDebug doesn’t free - so end of script cleanup kicks in and picks at random to free (as far as PHP is concerned the only way the objects are still alive at this point is circular reference). So it’s less that they free before hitting 0 and more than they will never reach 0 so PHP just picks them off at random.
I kind of agree that this is somewhat unfixable and even if it was fixed it would be just too complex. The only way forward is probably to remove implicit transaction objects entirely or throw warning of no compatibility for develop mode in XDebug.
Comment #104
alexpottI've opened an MR that uses service destruction and the kernel to work around this bug. It's a more minimal approach to resolving the issue in 10.2 and 10.1 that results in me being to run the
\Drupal\Tests\mysql\Kernel\mysql\TransactionTest
successfully locally with xdebug 3.3.0 in develop mode. If I run against HEAD it is broken horribly.Yes it would be great to fix in a more explicit way but Drupal's transaction committing has been implicit for a very long time so I think moving away from the properly is going to take time and a lot of work.
@mondrake what do think about doing something like https://git.drupalcode.org/project/drupal/-/merge_requests/6614 as stop gap and then trying to fix this probably in another issue.
Comment #105
alexpottTIL the kernel is not terminated in the interactive installer during installation. Wowzers.
Comment #106
alexpottOkay so Thunder's tests pass with the minimal mr with all the installer kernel termination fixes... https://github.com/thunder/thunder-distribution/pull/773
Comment #107
mondrakeCool. Left one comment inline in the MR.
Comment #108
alexpottThanks for the review @mondrake. I left replies to your comments on the MR.
Comment #112
mondrakeTricky stuff. Needs pipeline execution for the other dbs, I can not run those.
While this also fixed some things along, IMHO the future should be along the lines of #84. Hopefully we'll get there, all these dances at destruction time seem a bit fragile.
Comment #113
catchOne comment (in two suggestions) on the MR.
Comment #114
solideogloria CreditAttribution: solideogloria commentedI still have the error in #93 when using xdebug after applying the latest MR diff to 10.2.3.
Can someone review if this is related or if the error should be a different issue or something?
Comment #115
alexpottThanks for the review @catch - I've made those changes.
@solideogloria can you re-test with the latest MR. If it doesn't work can you give as much information as possible so we can work out why. Thanks!
Comment #116
longwaveI hate to ask but is there any way we could add some form of test coverage to this, to avoid us refactoring it again in the future and breaking it again?
Also this fix only applies to the default connection via the database service, I assume anyone doing anything with other connections is on their own. Would they leave a dangling transaction to be auto committed? Should we document this for them somehow?
Comment #117
alexpottWe have fails in
Drupal\Tests\standard\FunctionalJavascript\StandardPerformanceTest
to work why this code is causing extra queries there... but not on SQLite or MySQL...Comment #118
solideogloria CreditAttribution: solideogloria commentedThe latest commit broke something.
The line of the error:
Comment #119
catchWe could probably loop over all database connections and commit their transactions, but it'd need a diversion from the service approach then.
Comment #120
solideogloria CreditAttribution: solideogloria commentedI don't know if it's relevant, but I do have external DB connections defined. Let me know if you need any additional info for debugging in addition to what I've given.
Comment #122
alexpott@solideogloria can you supply a full stack trace \Drupal\Core\DrupalKernel::initializeServiceProviders() does not call terminate. Plus can you provide more information about what you're doing when things do not work. As far as I can see you are doing a drush cache rebuild... this works fine for me. Are you sure the patch as applied correctly? We're not changing any line near 1360... I'm attaching a patch for 10.2.x that allows me to install Drupal and rebuild the cache using Drush as the MR does not apply to 10.2.x. Please try this instead.
Comment #124
alexpottThe postgres fails are nothing to do with this issue - they're happening in HEAD - see #3420401-10: StandardPerformanceTest fails randomly on MySQL and consistently on Postgres...
@solideogloria / re #118 I'm pretty sure that was a patch applying issue - got exactly the same fail on Thunder (see https://github.com/thunder/thunder-distribution/actions/runs/7920011867/...). The patch in #122 fixes that.
Comment #125
solideogloria CreditAttribution: solideogloria commentedThe MR diff applies to 10.2.3 for me. Note that the .diff can sometimes apply differently than the .patch; maybe that's why.
Full stack trace from using 6614.diff from #118 and clearing the cache w/ Drush.
It does look like the .diff applied incorrectly, though it did "succeed". It put the following code in the wrong function
Comment #126
solideogloria CreditAttribution: solideogloria commentedWhen using the #122 patch and
drush cr
, I get the same error in #93Comment #127
solideogloria CreditAttribution: solideogloria commentedHere is some additional debugging. The error in #93 occurs for an
INSERT INTO "watchdog"
query.$this->connection
is NULL, and the query is being used to log an error, but since the connection is NULL, an exception occurs instead of the logging succeeding.Since the actual exception logging fails, here's the call stack prior to the error:
And the exception it was trying to log:
Comment #128
alexpott@solideogloria I've got a couple of questions about #93. Can you confirm that the bug goes away if the php.ini setting xdebug.mode does not contain develop. I.e. it is something like
xdebug.mode=debug
Also can you try with the patch attached as this might reveal the error that's causing the error... it makes a small change to _drupal_get_error_level() so errors in error handling don't get in the way.
Comment #129
solideogloria CreditAttribution: solideogloria commentedTrying with the updated patch and with
xdebug.mode=debug
, I still get the error. The cache rebuild says is completed, but the error happens after that:Comment #130
alexpottHere's another patch for 10.2.x that has the latest changes that changes thing to do a commit on all open db connections. @solideogloria it's really interesting that disabling xdebug seems to fix your problem but not changing the mode. That's the first reported instance of this. Can you confirm that you only get the errors with xdebug enbled? Also can you try the latest patch attached here as you did mention that you have multiple connections.
Comment #131
solideogloria CreditAttribution: solideogloria commentedI was trying to change the mode with
ini_set('xdebug.mode', 'debug');
in settings.php, but apparently that doesn't work, asddev drush eval "echo ini_get('xdebug.mode');"
still showsdebug,develop
.I looked up how to change the mode with DDEV and now I don't get the error after properly setting the mode to
debug
.Comment #132
solideogloria CreditAttribution: solideogloria commentedUsing #130, I do still get an error, only when xdebug is enabled with develop. The error is very similar to #93, but I will include it:
Comment #133
solideogloria CreditAttribution: solideogloria commentedAlso, even after commenting out my other DB connections, I still have the error... but now the error only happens roughly every other time I clear the cache..., alternating a success and a failure.
Comment #134
solideogloria CreditAttribution: solideogloria commentedEven so, the cache clear does succeed sometimes, with all my connections enabled (though I know the connections themselves are being used for queries during a cache clear). So it seems to be something unreliable that causes the error.
Does it perhaps matter which order things are destructed?
It appears that whether I have all my custom connections enabled or not, the error happens roughly 50-60% of the time either way.
Comment #135
solideogloria CreditAttribution: solideogloria commentedI tried on another site using #130, and it works fine (though I can't reproduce the error on that site). I'm going to try to narrow down if there's a certain patch or something I'm using that is causing an issue.
Comment #136
solideogloria CreditAttribution: solideogloria commentedI have no idea. I removed all other patches I'm using from both contrib and core, and I still get the issue, and I can reliably reproduce it on the site.
Is there a way to debug this further or look into the transaction stack or destruct order?
Comment #137
solideogloria CreditAttribution: solideogloria commentedAlso, is it possible for the logging of DB credentials during serialization of the connection object to be prevented?
Comment #138
alexpott@solideogloria yeah if you can find out what's causing this that'd be awesome. If you could share your composer.json and core.extension configuration with me that might help. Might some contrib that's causing the problem. Or maybe (even better) there's some way you could share your build with me. Are you on slack?
Comment #139
solideogloria CreditAttribution: solideogloria commentedI am not on slack. I am going to try one more thing. I uninstalled all custom code. That didn't fix it, so it's in core or contrib. I will continue uninstalling modules and clearing the cache afterwards until I see it fixed or run out of modules to uninstall. That will hopefully narrow it down.
I probably won't finish this step until next week.
Comment #140
alexpott@solideogloria does the bug only happen when you clear the caches with Drush? Do you all get the error when you clear via the UI. Just wondering if it is something special about using Drush?
Comment #141
alexpottI've moved the commits even later during execution - moved them to shutdown functions. This has some interesting effects on kernel tests which mean we need to commit transactions when we close connections - which makes sense to me. That in turn makes \Drupal\KernelTests\Core\Database\DriverSpecificTransactionTestBase::testTransactionManagerFailureOnPendingStackItems() pretty meaningless in its current form.
@solideogloria does this change in approach fix things for you? New 10.2.x patch attached.
Comment #143
solideogloria CreditAttribution: solideogloria commentedThe issue only happens when using Drush. I don't see it when clearing the cache via the UI. I am using Drush 12.
When using #141, I still get the error with the same frequency. Same trace as #132
Comment #144
solideogloria CreditAttribution: solideogloria commentedOkay, after uninstalling a lot of modules a few at a time and deleting content in bulk, I discovered the source of the issue.
The error stops occurring if I delete all custom menu links by navigating to
/admin/modules/uninstall/entity/menu_link_content
(routesystem.prepare_modules_entity_uninstall
).This is content for the "Custom Menu Links" core module (menu_link_content)
Comment #145
solideogloria CreditAttribution: solideogloria commentedI am going to update this comment with further debugging steps taken.
$this->menuLinksRebuild();
inMenuRouterRebuildSubscriber::onRouterRebuild
fixes the issue.MenuRouterRebuildSubscriber::menuLinksRebuild()
.$this->menuLinkManager->rebuild();
inMenuRouterRebuildSubscriber::menuLinksRebuild
fixes the issue.$definitions = $this->getDefinitions();
inMenuLinkManager::rebuild
does not fix the issue.$definitions = $this->getDiscovery()->getDefinitions();
inMenuLinkManager::getDefinitions
, which is callingDerivativeDiscoveryDecorator::getDefinitions
DerivativeDiscoveryDecorator::getDerivatives
fixes the issue, if you only return an empty array when it's called for menu links. I usedif (count($base_plugin_definitions) === X) { return []; }
with X set to the inspected length of the array.if (count($base_plugin_definitions) === X) { return $this->getDerivativesDebug($base_plugin_definitions); }
, withgetDerivativesDebug
being a copy of the original function. That way I can set breakpoints only for this occurrence. I couldn't find a nice way to do it with conditional breakpoints.if ($deriver) {
block.foreach
loop inside the if statement has no effect, so the issue is with$deriver->getDerivativeDefinitions($plugin_definition);
.$base_plugin_id
the error occurs for:menu_link_content
, and only this one. Change the if statement toif ($deriver && $base_plugin_id !== 'menu_link_content') {
fixes the issue. This traces the problem toMenuLinkContentDeriver::getDerivativeDefinitions
$plugin_definitions[$menu_link_content->uuid()] = $menu_link_content->getPluginDefinition();
. So loading the menu links is not the issue.Comment #146
solideogloria CreditAttribution: solideogloria commented@alexpott The issue is caused by having a menu link that, in the "Link" field, has a path that is a node's redirect.
To reproduce my issue:
develop
in the mode.ddev xdebug on
/test1
. Save./test2
. Save. This step and the next step can be in any order.drush cache:rebuild
drush cache:rebuild
A workaround for the issue
Edit any menu links that have a link to a redirect path and use the canonical path to the node instead.
This still should be fixed for real. I deliberately wanted the menu link to go to the redirect, though I can change it if it causes an issue that won't be fixed.
Comment #147
alexpottThanks @solideogloria - excellent information - I can now reproduce the bug you're seeing and can fix it. Patch incoming!
Comment #148
alexpottHere's a patch that should hopefully filx @solideogloria's problem and call terminate consistently on kernel's in core. It's also going to solve locks not being released by drush cache-rebuild - so that will be improved too... going to look for an existing issue about drupal_rebuild not releasing locks because my guess it that there must be one!
Going to drop the shutdown approach because it adds complexity and if you are doing database transaction in shutdown functions then I think that is quite a special case.
Comment #150
alexpottImproved the issue summary.
Comment #151
solideogloria CreditAttribution: solideogloria commentedI confirmed that the patch #148 fixes the issue.
Would it be possible to write a test following the steps in #146? Or is that not necessary?
Comment #152
alexpott@solideogloria we can't test the xdebug mode issue because we don't have xdebug running on gitlabci - but what we can do is test that locks are released after calling drupal_rebuild().
Comment #153
solideogloria CreditAttribution: solideogloria commentedAh, that's right. I didn't think of that.
Comment #154
alexpottUnfortunately we can't test the locks being release either as that is caused by the transactions not being committed- so everything is caused by PHP's object destruction not occurring as we expect - and we have no way to cause this other than with xdebug in develop mode.
So I don't think we can add a truly meaningful test here. What we can test is that calling terminate or shutdown causes database transactions to be committed.
Comment #156
alexpottAfter sleeping on this I realised that we can fix this in a more reliable way - we can trigger a shutdown function if a transaction is created. This way we don't need to worry about kernel termination. Re-opened that MR and here's a 10.2.x patch for that approach.
Comment #157
alexpottDiscussed the two approaches with @catch and @longwave and we all preferred the shutdown function approach because:
Updated issue summary accordingly.
Comment #162
alexpottWe have MRs for 11.x, 10.2.x and 10.1.x
I think this is as good as we can make it.
Comment #163
longwave+1 to the approach and the patch looks good but I haven't had time yet to try and reproduce again with XDebug enabled.
@solideogloria @Driskell if you can confirm that this latest set of patches fix your issues here that would be helpful to getting this committed.
Comment #164
mondrake+1, I like this... besides being more self-contained, it will also be simpler to adjust if in the future we will change default commit from implicit to explicit. Left a couple of questions on the MR, not sure how feasible/relevant.
Comment #165
alexpottThanks @mondrake for the review - I've responded on the MR to your questions / points.
Comment #166
mondrakeGreat work... I'd RTBC this but think the honours better be done by someone who can test in the specific context of using xdebug.
Comment #167
solideogloria CreditAttribution: solideogloria commentedI tested #156, and the updated approach fixes the issues. (I tested both the generic issue, and the issue with menu links to redirects)
Comment #168
mferanda CreditAttribution: mferanda commentedTested the patch against Drupal 10.2.x: Successfully fixed the issue.
For my test, I re-enabled xDebug and verified that the old issue was happening. After applying the patch, the issue seemed to be resolved.
Comment #172
catchI wrote some code for one of the MRs here but none of it got used in the final version so I think I'm fine to commit. I think this is the best we can do, it makes things more robust and it makes the weirdness of the current implementation more explicit with the workaround, which isn't bad thing even if it's not pretty.
Committed/pushed to 11.x, cherry-picked to 10.3.x, and the separate MR to 10.2.x
I haven't committed the 10.1.x MR to 10.1.x, it's in security-only mode so we'd need to do a 'bonus' patch release or bundle the commit with a security release for it to actually be useful to people. Tempted to leave things there and people can apply it manually if they need it, so marking fixed, but if someone feels really strongly about a backport to 10.1.x, please re-open.
Comment #173
solideogloria CreditAttribution: solideogloria commentedThank you everyone for helping fix this issue.
Comment #174
catchThis broke pgsql tests, should've thought to kick off an MR run on the other databases. https://git.drupalcode.org/project/drupal/-/jobs/874457
Not rolling back yet since it's explicitly the pgsql transaction coverage that's failing so maybe we just need to fix an assumption? But if it's tricky we should roll back and re-commit with that fixed.
Comment #180
alexpottCreated two MRs one for 11.x and 10.3.x and one for 10.2.x that fix the postgres test. That particular test is pretty icky - I did nearly remove it at one point :(
Comment #183
alexpottTests on the postgres mrs re green on all db drivers. Yay
Comment #187
catchCommitting the test change from needs review given HEAD is failing and the adjustments are minimal, agreed the test is not fun.
Comment #192
derickr CreditAttribution: derickr commented#131 : You can't set `xdebug.mode` with ini_set, only in php.ini (or related files).
Comment #193
herved CreditAttribution: herved commentedHello, and sorry to comment on a closed issue, but is this whole xdebug issue considered work in progress? I see some follow-ups.
I stumbled on a really nasty issue where after submitting a form, status messages would randomly not be displayed.
Session data would not get committed in time (after
\Drupal\Core\Session\SessionHandler::write
) before the next GET request retrieves the session data.Or even users would not be created in time/fast enough for the next event to succeed.
All kind of inexplicable events, absolute horror to debug.
Even after updating to core 10.2.4 which includes this issue, the same happens.
Only the workarounds from the IS work (ensure that develop is not in your xdebug.mode value OR pin xdebug in CI image to xdebug 3.2.2).
Edit: We also stumbled on this in Drush https://github.com/drush-ops/drush/pull/5869 causing batches to fail (again, all kind of crazy/inexplicable issues, even on core 10.2.4 or 11.x-dev).
Comment #194
mondrakeYes, this fix was not sufficient for Drush, see https://github.com/drush-ops/drush/issues/5892.
Only disabling xdebug worked.
I am repeating myself, but I think this will only be solved by getting rid of the implicit commit on destruct.