Problem/Motivation
As discovered and discussed in comments #2960643-27: Cannot load entity by uuid after rename - #2960643-32: Cannot load entity by uuid after rename the config entity updater might misbehave because it updates the #finished key in the sandbox based on the last entity type.
So if there are multiple entity types being updated in a single update and the last entity type has fewer entities than the previous ones, then not all previous ones will be updated as the sandbox will be flagged as finished.
Proposed resolution
Only allow the ConfigEntityUpdater->update() to be triggered for a single config entity type per hook_update_N(). If it is invoked for a a different config entity type an exception is thrown.
+ Re-run all updates that are updating multiple entity types in core in a single update.
+ Publish a change record that custom and contrib has to re-run their updates as well if they update multiple entity types in a single update.
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
ConfigEntityUpdater now enforces that it is only used for one update function at a time via an exception, whereas this behaviour was previously silently broken.
| Comment | File | Size | Author |
|---|---|---|---|
| #39 | 3092714-39.patch | 10.58 KB | alexpott |
| #39 | 37-39-interdiff.txt | 3.45 KB | alexpott |
| #37 | 3092714-37.patch | 8.71 KB | alexpott |
| #37 | 36-37-interdiff.txt | 978 bytes | alexpott |
| #36 | 3092714-36.patch | 8.73 KB | alexpott |
Comments
Comment #3
hchonovComment #4
hchonovComment #5
hchonovComment #6
hchonovAnd re-running the updates that update multiple config entity types.
Comment #7
hchonovFixing the kernel test ConfigEntityUpdaterTest, which was searching for the sandbox keys in the root level of the sandbox array.
Comment #10
hchonovAnd here a test that shows the problem and helped me actually fix correctly the problem as shown in the interdiff :).
Comment #12
hchonovComment #13
cilefen commented(fix typo in title)
Comment #14
berdirYes, the extra key works too, I think we can assume that what we write into $sandbox is internal and not an API :)
That said, sandbox_keys doesn't make much sense to me. We store much more in there, the key is just the key to the actual data. what about "config_entity_updater"?
One thing that's a bit weird is that for entity types/sandbox_key's that are done (finished = 1), this will basically do array_splice([], 0, $this->batchSize), which is kinda pointless, so likely call loadMultiple([]) several times for already finished types, but would make the patch quite a bit bigger to wrap that in another condition. And it's not going to have a measurable performance difference. Would be different if we'd do a query each time..
So this now only cares about it being 1 or not, which technically is indeed the only thing that matters, but it will also result in a completely bogus progress and means it's kinda pointless to exactly calculate $sandbox['sandbox_keys'][$sandbox_key]['#finished'] in the first place.
There are two options to improve it. The first is pretty easy and actually less code than what you have: https://3v4l.org/4l1hG
That's not very exact in case one key is 1k items to process and the other 3 only 10, as you'll immediately reach 75% and then it will go slowly. But it's probably good enough.
The other would be to calculate the total count and total count of remaining entities and then calculate the global finished based on that. Should be doable too but we're talking about config entities here, so we don't expect more than a handful of batch runs in total anyway. And using this to update multiple content entity types would need a completely different implementation anyway (we can't load 1M node ids into $sandbox)
Comment #15
berdirAnother thing to keep in mind. Not a big deal when called 2-3 times, but it means that each batch run actually processes up to $number_of_keys * $batch_size, so the generic loop in the other issue would could be hundreds of entities at least on the first run. e.g. 20 entity types * 50...
To respect the batch size, we'd need to subtract the already processed items on each run.
Given the following data set:
A: 20
B: 40
C: 130
with batch size = 50, the first run would process all 20 of A and 30 of B, then it would process the remaining 10 of B and 40 of C and so on..
Alternatively, instead of all those changes, we could just decide to not actually support multiple calls to it in the same function, knowing that is broken. We'd just throw an exception if $sandbox is already initialized with a different key. Would require to split two core updates retroactively, which we have to anyway to call it again (we'd move the first to a new function), and we couldn't do generic things. I'd expect this would only minimally affect contrib. I'd be OK with that :)
Comment #16
alexpottNice find.
I think that this makes a lot of sense. Calling this twice in the same function breaks the batch size setting and makes everything more complex.
Comment #17
geaseI think 'sandbox_keys' is not obvious enough name for this key. Though name collision is not very likely, $sandbox in ConfigEntityUpdater is not isolated either. So why not call it explicitly 'config_entity_updater_keys' or just 'config_entity_updater'?
For the sake of readability, why not assign $sandbox['sandbox_keys'][$sandbox_key] to a temporary variable?
As far as I see this piece of code, it assigns $sandbox['#finished'] the value of #finished of the first unfinished sandbox key. That means that instead of growing steadily with iterations from 0 to 1 #finished will jump unpredictably, and so will the progress bar. Why not use something like
Upd: While I was writing this, Berdir came up with essentially the same comments ).
Comment #18
geaseWell, first I should notice that my approach above with array_reduce is not perfect, cause we need to take into account the relative size of batch (number of entities with certain key relative to the entire number of entities to be processed). That's not a big deal to add this to calculation, but we need to actually calculate the #finished only after last key is processed. This is first. Second is concern in #15 about not respecting the size of entity update batch. That makes me suggest that we simplify things inside ConfigEntityUpdater and can assign $sandbox['#finished'] = $sandbox['sandbox_keys'][$sandbox_key]['#finished], which will work in case of single call to update(). And if we make multiple calls, we should calculate $sandbox['#finished'], and (somehow) reset entity update batch size, outside of ConfigEntityUpdater::update(), just in the body of post update hook.
Comment #19
hchonovI would rather argue that instead we should document the config entity updater about this and that the batch size will be per entity type updated and not for all entities across all entity types. If we don't do this then we cannot rely on the config entity updater for updating a lot of entity types without knowing exactly which - the example is the referenced issue where the problem has been discovered. This will be a major drawback and also a regression and not really something a developer would expect.
Comment #20
alexpott@hchonov the class was designed to be a helper so update functions didn't have to know about the entity update batch size - if you call it twice from within the same update then it breaks that. So I think as @Berdir points out we have two options:
$sandboxI prefer option 2 because it keeps the code simpler and it means that we respect the entity update batch size. I disagree that it is a major drawback. Having two update function updating separate entities is a good thing. It is a separation of concerns and make updates likely to work whatever the complexity and size of the site being updated.
Comment #21
hchonovAnd what about 2. being a regression? Even in core we have 2 updates relying on the config entity updater for updating multiple entity types in the same update.
And how do we write an update where we need to resave multiple entity types without knowing which ones we need to resave? The old way and doing the hard work manually while still ensuring the batch size is respected ... Exactly like in the initial issue. In this cases we are not allowed to use it, so we now increase the developer overhead and require implementing different solutions instead of reducing it and having one solution for both use cases. This only increases the frustration when writing such updates. Why? Because
will not be valid anymore. Should we then remove the config updater when developers would need to learn about the batch size anyway?
Comment #22
berdir> And what about 2. being a regression? Even in core we have 2 updates relying on the config entity updater for updating multiple entity types in the same update.
2 update functions that we *know* are broken. similar functions in contrib will be broken too. So in a way, we'll tell those modules that they need to fix their update functions.
generic update-all/many-entity-types use cases are very rare already in core and should be even more rare in contrib. If you have a complex use case you can still implement the batch yourself.
And you could still wrap it yourself. You'd first set your own sandbox, and then process the entity types one-by-one, and once it reports #finished = 1, you clean it up, set your actual finished and restart it with the next. Not pretty but quite doable.
Also, I've rewritten the specific use case in the other issue to not use the config entity updater and just loop over the configs directly.
Comment #23
alexpott@hchonov well the problem is that if we allow multiple calls from within the same function we have to allow for things that @Berdir describes in #15. And I think causing the regression is better than have the bug. This should be a critical - it results in update functions claiming they're completed when they are not.
Comment #24
alexpottI've had a play around with an idea of adding some additional functionality to the ConfigUpdater to allow it to add new batch sets so we can get around this problem by executing each update as a separate part of the batch. This works but it introduces two problems
Given these two problems I'm not sure that this solution is worth pursuing.
@hchonov
is not really true though. In the vast majority of cases an update is concerned with a single entity type and the config entity updater works perfectly. And in the case of text_post_update_add_required_summary_flag() and system_post_update_extra_fields() it's really not onerous to split these into separate updates.
Comment #25
alexpottHere's what an implementation of only allowing the ConfigEntityUpdater once per update would look like. Obviously we need to add docs.
Comment #26
berdirconsidering that we can't get rid of the $sandbox_key without API change anyway, could we just use $sandbox_key for that?
The second call actually did work, because that's the one that won the #finished conflict, so I think it might be enough to split of only the first call into a new function?
Comment #27
alexpottThanks for the review @Berdir.
Fixed #26.2
But I think #26.1 is not right. We can't use $sandbox_key here because that contains the entity type. I'm not changing the key at all. I'm adding a new key to lock the update to being called from a single location with the same $sandbox.
Comment #28
hchonovWe need a new update invoking the old update in order to ensure all configs have been updated, as the update might have been performed already.
Same.
Comment #29
hchonovChanged the priority accidentally.
Comment #30
alexpott@hchonov that's the opposite of what @Berdir said in #26.2 - who's right? I think @Berdir's reasoning is correct
Comment #31
hchonovOk, enough work for today. I've clearly missed that explanation. It totally makes sense.
+1 for RTBC, but cannot set myself as I've uploaded patches as well. Berdir?
Comment #32
alexpottUpdated issue summary so that it is inline with the chosen solution. We still need a change record.
Comment #33
berdirWhat I meant is that we would do $sandbox['config_entity_updater'] = $sandbox_key; and if you call it with another $sandbox key, we abort?
You could mess with it by doing more than one update callback on the same entity type but that definitely never worked because it would clash on the sandbox values like count, so I think that would be simpler and doesn't require us to load the backtrace, which is afaik slow-ish.
We also wouldn't have to change this, because calling it with the *same* $entity_type_id would still work just fine.
Because the implementation doesn't care if you call it once or twice or 7 times from the same function. As your workaround with the parent proves. The only thing it cares about is that you don't mix different entity types.
Comment #34
alexpott@Berdir I don't think this is performance critical enough to worry about performance. I think the strictness of storing where we're called from has an advantage. I wouldn't be surprised to see something like:
And this would be broken.
Yes it should be written as
But it would look it works.
I think if we want to support calls from different locations then we need to go back to the previous patches from hchonov.
Comment #35
berdirYes, performance isn't my main concern, just mentioned that too.
IMHO the debug_backtrace() implementation requires weird test hacks to fake being the same call and is quite hard to understand.
Yes, the code example in #34 would not fail, but I don't think it's very realistic that people would actually do that. Because...
> But it would look it works.
Not really. The problem with different entity types is that it works as long as you don't actually have to rely on multiple calls/batch. Your code example, while not failing, would pretty obviously not work, because it has the same key, would not re-initialize the sandbox and would either never run the callback or only on the second chunk of the batch. So if you really do two operations like that, a simple check against an updated entity will show that it didn't do both changes.
Comment #36
alexpottAh - good point. I'm convinced!
Comment #37
alexpottA bit of consistency.
Comment #38
berdirThey key need needs to be config_entity_updater, otherwise it will never throw the exception as $sandbox_key is based on $entity_type_id. So both tests should fail now (The tests are correct, the implementation isn't).
Comment #39
alexpottThe key is
config_entity_updater. I try to run tests before submitting patches :)Anyhow I'd be umming and ahhing about keeping the $sandbox_key variable as it is no longer variable. So I've moved it to a class constant because it makes things simpler.
Comment #40
tim.plunkettThe IS proposes a way to accommodate updating multiple types at once, but the patch forbids it.
Comment #41
alexpottUpdated the issue summary and created the issue summary https://www.drupal.org/node/3100978
Comment #42
alexpottComment #43
berdirI suppose we could also make this an actual function so we can reuse it between the two updates, but not sure how important that really is at this point, would actually result in a bigger patch now. Might make sense if we have to do another update on multiple types in the future.
In my opinion, this is ready. I think @alexpott is still not 100% convinced about checking for the entity type vs the backtrace. But I think this is the cleaner option and the protection is good enough. See the example in #34 and my reply in #35. Doing it in the backtrace requires workarounds in tests that could in theory also be used in an upate function (if someone puts the update code with an argument for the entity type into a helper function..)
Comment #46
catchCommitted/pushed to 9.0.x/8.9.x/8.8.x, thanks!
Backporting to 8.8.x since it's a critical upgrade path and the new post update is only doing something we were supposed to be doing something before.
Comment #48
catchComment #49
fgmUnless I'm missing something, the change record examples don't work: $this is not defined the closure is not running within a class instance ?
Comment #50
berdirIt's only abstract code anyway, but I changed it to do_something() instead of $this->something() now ;)
Comment #52
xjmComment #53
paulmartin84 commentedI'm running into issues related to this change when updating from 8.7 to 8.8
In https://www.drupal.org/project/drupal/issues/3092714#comment-13343349 There was an assumption that, post-update hooks get called with a clean batch $sandbox
For post-update hooks at least, this does not seem to be the case, which causes the above solution not to work.
I logged the output of "$sandbox['config_entity_updater']['entity_type']" at the start of all post update-hook that implement ConfigEntityUpdater that haven't run yet, and they all output block which happen to be the first update hook that is being run. I then removed that update hook and they all now output "entity_form_display" which is was the 2nd update hook to be run
This shows that sandbox is not clean at the start of a post-update hook and It now throws an exception for every post-update hook across any module that implements ConfigEntityUpdater except for the very first update hook that runs.
I'm not sure what needs to be done to tidy all this up. I guess 2 solutions would be
1. Ensure that post-update hooks do receive a clean sandbox
2. Implement a different approach for restricting a single ConfigEntityUpdater operation per update hook.
Comment #54
lykyd commentedAlso running into the issue when trying to update from 8.7.12 to 8.8.8
[ERROR] Updating multiple entity types in the same update function is not supported
Comment #55
berdirIf you get that error then you have an update function that has multiple calls, there's no other explanation. And core doesn't have that, so it has to be a custom or contrib module. What updates are being reported as pending? Check those modules and report it there.
Comment #56
lykyd commentedHere is what I could get from the failed deployment logs :
Executing required previous updates
Executing update function "8801" of module "system"
Executing update function "8802" of module "system"
Executing update function "8803" of module "system"
Executing update function "8804" of module "system"
Executing update function "8805" of module "system"
Executing update function "8800" of module "locale"
Executing update function "add_status_extra_filter" of module "media"
Executing update function "create_language_content_settings" of module "path"
Executing update function "entity_reference_autocomplete_match_limit" of module "system"
Executing update function "extra_fields_form_display" of module "system"
Executing update function "fix_jquery_htmlprefilter" of module "system"
Executing update function "layout_plugin_schema_change" of module "system"
Executing update function "configure_status_field_widget" of module "taxonomy"
Executing update function "add_required_summary_flag" of module "text"
[ERROR] Updating multiple entity types in the same update function is not
supported
Comment #57
berdirIf you run drush updb again, which updates does it still report as pending?