Problem/Motivation
On #2665216: Deprecate Drupal\Core\Database\Connection::nextId() and the {sequences} table and schema, We made quite a few changes to deprecate Drupal\Core\Database\Connection::nextId() and the {sequences} table and schema, but in doing that ran into a race condition issue pertaining batch processes. I think it would be smart to separate the issue into two, to keep the changes per issue relatively small.
Proposed resolution
Fix batch process race conditions by making ‘bid (batch id)’ auto-increment
User interface changes
TBD
API changes
Change the {batch} table [bid] field to serial.
Created a new getId() function (replacing deprecated function nextId())
Drush impact
Drush has an impact for this change, that is currently fixed for Drush 12 only:
https://github.com/drush-ops/drush/pull/5509
| Comment | File | Size | Author |
|---|
Issue fork drupal-3337513
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3337513-fix-batch-process
changes, plain diff MR !3514
Comments
Comment #2
gidarai commentedThe patch is by @Mondrake from https://www.drupal.org/project/drupal/issues/2665216 and seems like a good patch to start working from.
Comment #3
catchI am trying to work out if the ID for non-progressive batches is completely redundant, or if it's 'used' and doesn't matter what it is, or if it matters somehow.
This logic in batch_set() does seem to rely on the ID regardless of if the batch is progressive or not.
However, the ID is completely ignored in
batch_progress()for non-progressive batches, so perhaps this is fine?But... if it's fine for non-progressive batches to not have an ID at all, then can we get rid of this from the patch?
Comment #4
gidarai commentedComment #5
daffie commentedComment #6
gidarai commented@catch I agree, it might however be a fallback, so i made a new patch without the:
to test wether it will fail or not.
Comment #7
smustgrave commentedFailures in #6
Comment #8
gidarai commentedThe builds with the latest patch from comment #6 have finished, however it looks like the fallback option will be necessary as the builds resulted in approximately 3k errors.
Comment #9
gidarai commentedAdded a new patch which contains an auto increment fix for pgsql from #838992: Change the uid field from integer to serial by leveraging NO_AUTO_VALUE_ON_ZERO on MySQL
I also re-added the following lines of code, because of the builds at comment #6 failing:
Comment #10
mondrakeI think the failures in #6 are due to this code in form.inc
i.e. the
$batch['id']is non existing at that point for non-progressive batches. Maybe'drupal_batch:' . $batch['id'] ?? '' . ':' . $set_id,could just fix that?Comment #11
gidarai commentedAdded new patch using @mondrake 's suggestion to fix errors on comment #6
Comment #12
alexpottGreat idea splitting this out from the other issue.
Are we concerned about the impact of not assigning an ID to non-progressive batches and whether or not code in custom or contrib might depending on this being set. Maybe it should be:
$batch['id'] = $batch['progressive'] ? \Drupal::service('batch.storage')->getId() : '';A function called doCreate doing a database update seems really problematic.
And a function called doGet doing an insert is also interesting.
Comment #14
gidarai commented@Alexpott i've made the changes you suggested, but didn't really know what to rename the "doGetId" function to. I noticed that the function is used as a sub-function in "getId", but it does indeed do an insert. What do you suggest the new function name to be?
Comment #15
mondrakeMaybe
doGetId() => doInsert()
doCreate() => doUpdate()
so the do*() reflect the actual database operations?
Comment #16
mondrakeThe following PHPStan baseline entry should be adjusted to count: 1
Comment #17
gidarai commentedChanged PHPStan baseline entry to count 1
Comment #18
daffie commentedIt is a solution we can take. For me the methods
doCreate()anddoGetId()are 2 helper methods and their name should reflect that. I do get the point that we would create adoCreate()method that does an update query. Naming things is not always easy in IT.Comment #19
mondrakehttps://dev.to/thinkster/the-hardest-thing-in-programming-384g
😊
Comment #20
mondrake@gidarai I will roll the next patch ok?
Comment #21
gidarai commented@Mondrake Of course, go for it!
Comment #23
daffie commentedWe need a test for the function
system_update_10101().Comment #24
mondrakeSure we need a test there. Can I get a review before getting into that?
Comment #25
mondrakeAdded the update path test.
Comment #26
smustgrave commentedUpgrade status looks good. As an assertion before the run, in this case try/catch which was a great solution btw.
Comment #27
mondrakeThanks - the PGSQL failure is real though, NW to get it sorted. On it.
Comment #28
mondrakeThe PgSql failure will be fixed by #3028706: New serial columns do not own sequences, so better wait on that one.
Comment #29
mondrakeI was wrong in #28, #3028706: New serial columns do not own sequences is not fixing the pgsql issue, we still need to understand why after the update inserting to the table we got all the lastIDs to 0.
Comment #30
daffie commentedFor the change to the users table with added something special for PostgreSQL databases. Maybe we should do the same here.
From : https://git.drupalcode.org/project/drupal/-/blob/9.4.8/core/modules/user...
Comment #31
mondrakeThanks @daffie.
No I actually found out the problem to be a trickier one - in the update path test the call to
changeField()is done by the SUT, and the test database connection then gets bogged, itslastId()method does not understand that the field is now a serial so it keeps returning 0. Must be some static caching somewhere.#30 will be solved by #3028706: New serial columns do not own sequences in a consistent way, and it's not strictly necessary here (although I'd say it'd better to get that in before this).
Working on an update to the MR.
Comment #32
mondrakeLet's see the level of bot happiness now.
Comment #33
mondrakeComment #34
daffie commentedThe MR looks good.
I create a couple of remarks on the MR.
Comment #35
mondrakeComment #36
daffie commentedAll my points have been addressed.
Testing has been added.
The MR is for me RTBC.
The CR needs to be updated. We are not using the keyvalue store as a replacement.
Comment #38
mondrakeUpdated CR. Now we probably will need another one for #2665216: Deprecate Drupal\Core\Database\Connection::nextId() and the {sequences} table and schema, but one step at a time.
Comment #39
daffie commentedAll my points have been addressed.
Testing has been added.
The IS and the CR are in order.
For me it is RTBC.
The MR from @mondrake is the one to merge.
Comment #40
mondrakerebased
Comment #41
quietone commentedI was reading the change record and I don't understand why it discusses a postponed issue. Oh, I see this is one change record for two issues. That is not a good idea because the two issue may get committed to different versions.
The CR should also be in clear English, see Technical writing (English) and Write a change record for a Drupal core issue.
I am tagging this for change record updates.
Comment #42
daffie commentedI have updated the CR.
Back to RTBC.
Comment #43
catchI checked the change record. I think it's fine for it to reference the postponed issue - we explicitly split this issue out of that one, but they're closely related and it's useful context.
Committed/pushed to 10.1.x, thanks!
I had to regenerate the phpstan baseline on commit
Comment #45
catchComment #47
smustgrave commentedFYI I've been having issues with running updates on 10.1
Reversing this patch I was able to run the updates.
Comment #48
catchRe-opening this but not rolling back just yet since it's only in 10.1
I think we need to open a ticket against drush, and depending what happens there, maybe it's an easy fix in drush, or if not we might need to roll back until the drush ticket is ready.
Comment #49
andypostFiled issue for drush as it needs BC too https://github.com/drush-ops/drush/issues/5494
Comment #50
smustgrave commentedSo reverted the changes in core/lib/Drupal/Core/Batch/BatchStorage.php and things appear to be working again.
Comment #51
rkollerI also ran into the issue but with a different error. @catch asked me to post the output in here as well.
i am able to reproduce the error consistently. i set up a new install of 10.1.x-dev on ddev with php 8.2 and mariadb 10.5. after that i apply the patch from #2492171-386: Provide options to sanitize filenames (transliterate, lowercase, replace whitespace, etc). after that `drush updatedb`is executed as shown in the gist (created one since the output is too long for the issue here):
https://gist.github.com/rpkoller/ec980e0bd95d071102bf006a7a19f6d3
Comment #53
catchSince this can be reproduced consistently by @rkoller, I've gone ahead and reverted this. Batch has logic for ::ensureTableExists() etc., but either the patch is breaking that, or it's leading to a situation where existing breakage is being exposed.
Comment #54
rkollerper @catch's suggestion i've tested the following now:
1. install 10.1.x-dev
2. went to
/admin/modules/updateand clickedcheck manually3. then went into phpmyadmin and checked if the batch table was in place and it was
4. applied the patch from #2492171-368: Provide options to sanitize filenames (transliterate, lowercase, replace whitespace, etc)
5. went to update.php and ran the update => it ran through without any errors
i did an extra test while writing up this comment
1. install 10.1.x-dev
2. go into phpadmin -> the batch table is missing
Comment #55
andypostIt means insert into batch skipping ensureTable
Comment #57
mondrakeNo drush expert - but if I understand correctly batches in drush are run via it's own
batch.inccode, i.e. code in Drupal'sform.incis not executed when the update runs via drush. That would explain why the table doesn't exist - in the new code that is ensured inBatchStorage::getid()which drush doesn't execute.IMHO it would be easier to fix drush if this were in.
EDIT: commented upstream too https://github.com/drush-ops/drush/issues/5494#issuecomment-1484207843
Comment #58
catchI don't think this is a drush problem, or not exclusively - if you do a clean install, add an update (via a patch that adds one or manually), then go straight to update.php before executing any other batches, you'll hit this.
Comment #59
mondrakeah right - quite an edge case though. We can fix here but the drush problem will remain, more likely to happen when drush is used in test scripts.
Comment #60
mondrakesecond thinking: if we install anew a code base with this issue included, getId() should be ensuring the table, no? so what’s failing?
Comment #61
andyposteven to start update
drushwill need to get batch::ID using old method and probably will fail after running updateComment #62
mondrakeFor me the drush fix is quite clear, I will try to post a PR on github. To test it, though, it would be better to have this in Drupal core. What's still unclear to me is how comes you can end up with a new install (10.1.x including the patch here), call update.php and fail because the table is missing.
Comment #63
mondrakeFiled https://github.com/drush-ops/drush/pull/5509 for downstream drush
Comment #64
mondrakehttps://github.com/drush-ops/drush/pull/5509 downstream is making progress, but testing it without the change here in the codebase would be somehow missing sense.
Can I suggest to get this back into core, so we can test drush and address the edge case of #58 separately?
Comment #65
mondrakehttps://github.com/drush-ops/drush/pull/5509 was merged downstream.
Comment #66
andypostAt least the new auto-increment should get initial value from current `sequences` value to allow finish batches already running/created
Comment #67
mondrake#66 I do not see why that would be needed. AFAICS the important thing is to get a unique ID for the new ones.
Comment #68
daffie commentedThe fix for Drush has landed. See: https://github.com/drush-ops/drush/pull/5509
Comment #69
catchLooks like the fix hasn't landed in drush 11 yet?
Comment #70
andypostMaybe update can use "known ensureTable()" from core and fix it as well if no override applied and table exists?
Comment #71
mondrake#69 no, because there's nothing to fix there at the moment while this is not committed. So we're in a catch 22 I'm afraid...
Comment #73
catchOK I think the answer here is to commit this to 11.x (10.2.x) - that will give it time to percolate through drush versions and updates, and more time to flush out any remaining issues too. I fixed the deprecation version on commit.
Committed/pushed to 11.x, thanks!
Comment #75
mondrakeAdded the Drush PR fix in the IS, for the record.
Comment #77
berdirComing here via https://github.com/drush-ops/drush/issues/5797.
The drush PR that was merged seems incorrect and is not consistent with what core is doing.
The commit here has a specific fallback logic for the update path that does a manual INSERT query. The merge request against drush does not have that, it falls back to \Drupal::database()->nextId(); but does not actually insert the row, so drush never actually persists the batch.
I've proposed a new PR that implements the same logic as core does here.
Comment #78
a.dmitriiev commentedJust for the record, the PR from Berdir was merged to Drush 12.4.3 https://github.com/drush-ops/drush/releases/tag/12.4.3 and I can confirm that it works with Drupal 10.2.0-rc1. In my use case module on installation does batch processing. Now it works fine with Drush 12.4.3.