Problem/Motivation
There is no way for a hook_update_N
to come before the automated updates done by the entity manager when an entity type changes.
See #2464427-96: Replace CacheablePluginInterface with CacheableDependencyInterface and #2533282-8: Make schema methods' visibility public in SqlContentEntityStorage.
An example problem:
- Updates to both views schema and entity type
- The views hook_update_N needs to run before the entity type updates because
ViewsEntitySchemaSubscriber::onEntityTypeUpdate()
saves all views when an entity type changes.
Proposed resolution
Add ability to make updates come before the automatic entity type updates using hook_update_dependencies()
. It could return something like:
$dependencies['core']['update_entity_definitions'] = ['views' => 8001];
The current solution makes the automatic entity updates just like a regular update if an update needs to ensure that it comes before or after it then it needs to implement hook_update_dependencies
. With this patch if there are no dependencies the order is essential random - it's up to how php manages arrays.
Remaining tasks
- Agree approach
- Write patch
- Review Commit
User interface changes
None
API changes
None
Data model changes
None
Comment | File | Size | Author |
---|---|---|---|
#51 | interdiff-49-51.txt | 15.24 KB | xjm |
#51 | update-2535082-51.patch | 28.49 KB | xjm |
#49 | 2535082-49.patch | 26.23 KB | jhedstrom |
Comments
Comment #1
alexpottThis patch makes the automatic entity update more like a regular update that can be part of hoo_update_dependencies(). It means we have to do some special casing of the update - but I can't see anyway of avoiding that if we want to use the dependency graph to do this.
Comment #2
alexpottWe don't have to change
update_do_one()
- we can just call a different operation - like we do in HEAD.Comment #3
alexpottI think we might have to force everything that does not have to come before the automatic entity updates to come after - otherwise the order will be what ever PHP arrays and our graph implementation comes up with.
Comment #5
jibranDrupal\system\Tests\Update\DependencyOrderingTest 4 passes 1 fails
Test needs an update.Comment #7
alexpottFixing tests. All installs need the
\Drupal::service('entity.definition_update_manager')->applyUpdates();
- not just standard.Comment #8
jibran#7 seems nasty bug. I don't know about update.inc changes but rest seems good to me including tests and it made #2464427: Replace CacheablePluginInterface with CacheableDependencyInterface green. Nice work finding and fixing it @alexpott.
Comment #9
alexpottAdding some tests for the change in #7
Comment #10
alexpottLet's use the core namespace here instead of the system namespace.
Comment #11
alexpottComment #12
webchickSince we're aiming to get the beta-to-beta upgrade path supported by the next beta, and since this is a blocker to that happening, tagging "beta target."
Comment #13
Fabianx CreditAttribution: Fabianx as a volunteer commentedThis is pretty nice overall.
Do we need a change record?
Even if it just affects drush I think knowing the new update dependency is important.
Comment #14
alexpottI've created https://www.drupal.org/node/2535396 and also added to the issues that introduced the automatic entity storage update code.
Comment #15
Fabianx CreditAttribution: Fabianx as a volunteer commentedWonderful change record!
Comment #16
jhedstromI haven't had a chance to test this yet, but looking at the docs, can a module specify more than one update hook to run first? In head2head, we'll probably want to run *all* of them first.
.
Comment #17
alexpott@jhedstrom it just needs to specify a dependency on the latest head2head update as all the others will be guaranteed to run before it. This is how the system works.
Comment #18
jhedstrom@alexpott ah, that makes total sense--I was too focused on the details :)
Comment #19
jhedstromSo I was going to test this in combination with #2530940-12: Provide beta7 update tests, but that test went green with this patch applied to core before I implemented the update dependencies hook. So something is slightly off here, since update hooks are now running before the automated schema updates regardless of
hook_update_dependencies
.Comment #20
jhedstromIt appears to be due to module weight in
update_resolve_dependencies()
. The order is fine untiland then the beta2beta hooks are suddenly on top (having not implemented
hook_update_dependencies()
).If I implement
hook_update_dependencies
and specify that the entity schema updates should run first, that works.Comment #21
jhedstromHaving said all that, does it matter if
update_entity_definitions
runs in between other update hooks (in a somewhat random order until update hooks start to declare dependencies on it)?Comment #22
alexpott@jhedstrom I documented this behaviour in the IS
Comment #23
alexpottThe point is now you can control whether an update comes before or after an entity update. However I think we have a problem because of a situation like this.
The module is then upgraded on a site which is at version 8000. This would result in an uncalculable dependency tree :(
Comment #24
jhedstromre #22 sorry I missed that bit.
Regarding the incalculable dependencies, perhaps a way to also entirely disable the automatic updates for the really tricky situations?
Comment #25
plachI discussed #23 with @alexpott and @berdir, you can find the full log attached.
The idea we came up with is assigning an identifier (an hash) to each single entity update, e.g. "add FOO field" or "delete FOO field" and add the ability to specify a dependency for a single entity update.
There are tricky edge cases though (surprise). Consider the following scenario where pseudo-hashes in the Update ID column denote entity updates, and the row order implies update dependencies:
8001
Foo
submodule that defines thefoo1
field.#addf001
foo1
field schema.#addf002
foo2
field schema.8002
foo1
tofoo2
.#de1f001
foo1
field schema.#addf001
foo1
field schema again, with a slightly different purpose.8003
foo1
by deriving its data fromfoo2
values.If we apply the updates incrementally everything will work fine (hopefully :), but if we go straight from Bar 1.0 to Bar 1.3 we will have a problem: since the
foo1
andfoo2
field definitions, would be both defined in theFoo
submodule codebase,#addf001
will happen just once, while#de1f001
will never be detected. This will result in unmet dependencies, since#addf001
will be marked as depending on#de1f001
(which is not defined). Ideally the following sequence would work:8001
[#de1f001]
(ignored)#addf001
#addf002
8002
8003
but only if we ignored the unmet dependency, which may be fair, since blocking the update process on an non-existing update may not make sense. Another possibility could be not adding the dependency, since for incremental updates it wouldn't be needed and when updating from 1.0 to 1.3 it would only be problematic.
Overall I'm not sure whether we should block this bug fix on such an edge case, but maybe someone can come up with a more common use case.
If we decide to implement individual entity updates, we will need to key the change list by update id and add an
::applyUpdate($id)
method to the entity update manager. Btw, this will also let modules apply their own updates on install instead of requiring users to apply updates manually after installing them, which feels quite counter intuitive. This would basically address parts of #2346013: Improve DX of manually applying entity/field storage definition updatesComment #26
jhedstromre #25 this seems overly complex to me.
The problems thus-far encountered weren't around adding or deleting fields in the entity storage, but more simplistic schema changes (eg, the type
varchar_ascii
change), or unrelated changes that threw exceptions such as the removal of a views plugin.I think the patch in #10 is probably good as-is (for instance, it allows one to go from beta7 directly to beta12+ within a single run of updates). The alternative being pursued in the head2head module was to set a state that disabled auto-updates (#2532356: Provide a wrapper to core entity.definition_update_manager to temporarily disable updates), and this also worked fine, for a very complex update path (7 to 12), the only difference being that one would have to manually disable auto-updates, run updates, then re-enable, and run updates again.
Comment #27
jhedstromI also meant to add, that for the unresolvable dependency issue noted above, the second update hook could invoke the same updates as the auto-updates by simply including:
or directly calling
update_entity_definitions()
, rather than declare those as an update dependency.Comment #28
Fabianx CreditAttribution: Fabianx for Drupal Association commented#27: Unfortunately that problem still exists.
- Schema adds new [something]
- Update A_8001 updates the content with a default value for [something]
--
- Schema changes from [something] to [some_bar]
- Update A_8002 wants to fix the data to change from something top some_bar.
=> Per the solution here:
Update A_8002 needs to run first, because else the schema is invalid when it tries to load the data to fix the data.
So:
- Schema adds new [something]
- Update A_8001 updates the content with a default value for [something]
--
- Update A_8002 wants to fix the data to change from something to some_bar
- Schema changes from [something] to [some_bar]
--
However schema updates are not versioned, so when you come from 8000 then you end up with either:
- Schema adds new [something]
- Schema changes from [something] to [some_bar]
- Update A_8001 updates the content with a default value for [something] - FAIL
- Update A_8002 wants to fix the existing data to change from something to some_bar. - FAIL
--
or
- Update A_8001 updates the content with a default value for [something] - FAIL
- Update A_8002 wants to fix the data to change from something to some_bar. - FAIL
- Schema adds new [something]
- Schema changes from [something] to [some_bar]
--
=> Probably entity schema updates need to be versioned and old schema's be preserved.
e.g. a possible solution could theoretically be:
- Schema adds new [something]
- Schema changes from [something] to [some_bar]
- Update A_8001 updates the content with a default value for [something], but has a stored state for schema that is overridden and loaded - OK.
- Update A_8002 wants to fix the existing data to change from something to some_bar, but has a stored state for schema that is overridden and loaded. - OK
=> My proposal:
When you write a data-fixing hook_update_N(), store the current schema state in some dump file and override the entity schema before your update starts, then remove the override again.
That could also be a closure for sanity, e.g.:
etc.
That would also fix the problem in the IS and entity updates could happily come first again independently.
Comment #29
plach@jhedstrom:
Well, I'm not sure that's a good argument to ignore #23, in fact disabling entity updates altogether might not be a solution, if we end-up with different update functions with competing dependencies on them. However being able to mark an individual update as "fixed" before it runs, and then manually applying equivalent changes in an regular update function should work. And that's one of the proposals in #2346013: Improve DX of manually applying entity/field storage definition updates.
@Fabianx:
Not sure whether a full entity schema dump would help. Regular update functions need to transition between known states. Entity updates are currently a black box, because they are applied in bulk and cannot be identified individually.
All the examples above rely on the implicit assumption of knowing which is the current state, which may be true only if the module is relying on entity updates concerning items the module itself, or a related known module (i.e. with a soft/hard dependency relating the two), is a provider for. IMO a dump would be useful only if it were possible to dump individual changes.
Comment #30
plachIf targeting individual updates is deemed too complex, also targeting all the updates for a certain provider could work. Mark all the
Foo
module updates as fixed, and re-create them manually via a regular update function.Comment #31
jhedstrom@plach, sorry, I didn't mean to imply that either #23 or #25 should be ignored. I meant to suggest a workaround for #23 (by manually calling the entity updates from within the second hook). For #25 and #28, perhaps we could do a follow-up major issue, since this patch addresses what will most likely be very common scenarios.
Comment #32
alexpottThe problem is with the current patch the ordering is not reliable. If you use
hook_update_dependencies()
to say that update_test_order_update_8002 should come after the entity updates but update_test_order_update_8003 should come before what actually happens is this...This behaviour can't be right - but I have no idea what the actual behaviour should be.
Comment #33
alexpottDiscussed with @plach and @catch on IRC. @plach and I have concluded that perhaps the best approach is this:
The patch attached does 1 and 2 but I can't see how to do 3. Further discussion of that is needed. This patch also calls into question that entity API's are reliable during hook_update_N. (Well actually that is called into question by the issues we're having in HEAD2HEAD and #2464427: Replace CacheablePluginInterface with CacheableDependencyInterface).
No interdiff because it is a new approach it's almost the size of the patch.
Comment #34
plachQuick review
Didn't we agree to throw an exception here?
Also, I'd find
applyFieldUpdate
andapplyEntityUpdate
more consistent as names.Shouldn't we use
::getChangeList()
here?Missing breaks
Comment #35
alexpottThanks @plach
Comment #36
plachThanks @alex, I'll perform a more thorough review later tonight. Meanwhile it would be good to hear what @jhedstrom thinks about this new approach.
Couldn't it just catch the exception?
Comment #37
jhedstromThe approach outlined in #33 makes perfect sense to me. I've done manual testing with head2head of the patch in #35, and it is working as expected.
+1 for RTBC.
Comment #38
BerdirNice to see that install profiles don't have to call this themself anymore.
Wondering if we can optimize those calls and do it only once?
I've seen very long list of field updates, when e.g. a certain field type schema changes. I guess we do a lot of cache clearing right now. That said, if we can check that there are no calls in between those updates usually (unless one is called specifically and I guess that's exactly when we need this call?) then that is also fine. The cache clear itself is fast, we just re-empty a bunch of properties and invalidate tags, which are optimized to be ignored on repeated calls.
Looks good to me a well otherwise. I'll leave it to @plach to do his final review but I'm fine to RTBC this as soon as he is happy with it.
There are still some important things left to do, like the ability to tell the entity manager that you did some manual schema updates that he couldn't handle. But we can tackle that in non-critical issues as they should be API additions (eg. the ability to update last installed field definitions).
Comment #39
plachLooks great, only a couple of minor issues, RTBC otherwise.
Missing trailing dot.
Shouldn't we assert also that the schema was actually updated correctly?
+1
I think #2346013: Improve DX of manually applying entity/field storage definition updates is the right issue for that.
Comment #40
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThis patch looks great, and I agree is a step forward, so +1 for RTBC and commit once Berdir, plach, and alexpott are satisfied.
I don't think the use cases from #28 have been addressed though. For example, let's say path.module has a 8001 that removes its custom storage and lets regular field storage be used, and includes its own update function to migrate the data from the old custom storage to the regular field storage. Then, let's say path.module has a 8002 that renames the
pid
property topath_id
(core wouldn't do such a thing, cause that would be an API change, but a contrib field type module might, especially during its beta cycle).So, how would such a module do its update if a site goes straight from the pre-8001 codebase to the 8002 codebase? Because path_update_8001(), if it calls
\Drupal::service('entity.definition_update_manager')->applyFieldUpdate(EntityDefinitionUpdateManagerInterface::DEFINITION_UPDATED, 'node', 'path')
, with the most recent codebase, would end up with a schema whose column is alreadypath_id
, but the rest of the path_update_8001() function assumespid
.Possible solutions:
I don't think we need to solve that in this issue, but wondering if there's any sense on if/how we'll solve it and what to tell developers in the meantime.
Comment #41
BerdirI don't think that specific example could work since I very much doubt we can automatically update a schema that already exists and is a new entity type, so I guess that would need to be worked out manually anyway. But I get what you mean.
As discussed above, I think that #2346013: Improve DX of manually applying entity/field storage definition updates will need to add public API to set the last installed field storage definitions/schema manually. That might require quite a bit of code to initialize them but should be able to deal with some cases.
Other ideas:
* Introduce some sort of callback system for the entity type/field type providers. Where the entity manager can call out to those and tell them that he has a set of old/changed/new/deleted field storage definitions (entity type provider) or an old/new single storage definition (field type) that he can't handle. Then the can do whatever they need to get from the old state to the new. Including things like renames if it actually is a rename. Those callbacks could get very complicated as well if they have to support a set of different changes over time in every combination but that might be better suited to how entity schema updates work? For example, they only every have to worry about getting to the final state from different originating states and can be updated over time.
* Push the schema introspection API, so that the entity update manager doesn't have to rely on stored schemas that he *thinks* is what in the database but just ask the database about the current schema. That could avoid the thing above that requires us to manually build and store the current state.
Comment #42
Fabianx CreditAttribution: Fabianx as a volunteer commented#41 sounds very sensible to me.
1. The callbacks idea sounds nice and feels like a proper step - even if that can get complicated, too - it allows hooking into the updates to do things, when they are done, so schema and DB are always in sync.
2. I think we should do that anyway. Is that currently still on the table for 8.0.x or was it postponed to 8.1.x? (Or in other words: Do you have an issue link?)
Comment #43
catch#2 there is #1119094: Add a test to verify the schema matches after a database update (which I've just re-opened as it's relevant now again). I can't find an issue for the schema introspection API itself yet.
Comment #44
alexpottAddressed @Berdir's review in #38 and @plach's review in #39.
It seems we have all the follow-ups in place.
I still have a concern about firing the event and views listener as the only reason this fixes #2464427: Replace CacheablePluginInterface with CacheableDependencyInterface is that there are no updates which call applyEntityUpdate or applyFieldUpdate. That said I guess I've answered this already if that happens then we just need to add a hook_update_dependencies() for that update and say it has to run after the views update in #2464427: Replace CacheablePluginInterface with CacheableDependencyInterface. This makes me concerned about the number of hook_update_dependencies() implementations we're going to have in 8.
Comment #45
effulgentsia CreditAttribution: effulgentsia at Acquia commented#38 and #39 said they were +1 on RTBC once those nits were addressed, and #44 addresses them, so RTBC. Great work here!
Comment #46
dawehnerNIce work!
Comment #47
xjmOverall this approach makes sense to me and the docs/tests/etc. look great.
Doesn't need to block the patch, but I think it would be good to explicitly state here that the entity updates will happen automatically at the end if this is not used. (Also,
hook_update_N()
should have parens so api.d.o can automatically link it.)(Still reviewing the patch.)
Comment #48
jhedstromThere's #301038: Add a cross-compatible database schema introspection API, which was the
@todo
linked from #2497323: Create a php script that can dump a database for testing update hooks.Comment #49
jhedstromThis updates the docs as per #47.
Comment #50
xjmWell THIS is interesting... uh. So yes, profiles other than standard should get the entity updates that are necessary based on things done during installation...
I started to try to understand the big picture for
install_install_profile
by looking at install_tasks(), and regretted it.I looked at this awhile and talked to @alexpott to convince myself that change didn't mean that entity update tasks items would overwrite each other, but @alexpott pointed out that there aren't ever multiple entity update batch items--this one batch item runs all of them, no matter how many there are. In the one use of the callback in DbUpdateController::triggerBatch() where it sets the batch, these parameters are actually hardcoded for that one use ever. And, as far as I can see, there is no way ever to alter it. Unless, like, someone altered the route to override the batch, but that is cloud cuckoo land. (Pronounced "coo-coo".)
I also asked myself whether it was possible any other code anywhere was calling the batch operation directly such that we would need to document this BC break, but it seems extremely unlikely, so I don't think we should create a CR. Which is probably obvious but I just wanted to make sure we'd thought about it.
"Call the entity update service" is vague -- an @see to the fully namespaced relevant methods would be helpful.
Minor: All five of these assertions say "an non-existent".
It's unclear just by reading the test why these are "nonsense ops". I am guessing because the entity was not created in the pending update and the field was not deleted?
This state flag and the assertion message with it seem kind of confusing and maybe could use an inline comment or something. I think part of the problem is that the state key name does not describe what its contents actually are. See the quoted context line for an assertion message that actually explains what's going on.
Minor: more missing parens.
This is some magical s@#$ with conditionally declaring the hook implementations, but it seems okay to make the test work.
This one could actually use some inline comments clarifying as well; it took me awhile to puzzle out.
+1 for adding these assertions.
I'll try to improve some of the above.
Also, I was initially confused by the CR: https://www.drupal.org/node/2535396 since the title doesn't really seem to describe the change introduced by this patch. It's amazing that we never had a CR at all for the original beta blocker, but I searched for awhile and couldn't find one. The only one is the original CR for the addition of automatic schemata for entities: https://www.drupal.org/node/2259243
I'll see if I can add anything to the CR for this issue to spare others the confusion of "huh, that changed a long time ago". (Edit: I didn't do that last bit.)Comment #51
xjmOkay, that was unexpectedly brain-bending. I've added some detailed documentation to the test that should hopefully aid future generations.
SqlContentEntityStorageSchemaIndexTest
is probably not named correctly anymore; it should be something likeSqlContentEntityStorageSchemaUpdateTest
. But it didn't seem worth adding noise to the patch for that.Leaving at RTBC because it's just docs and assertion text updates.
Comment #52
xjmcuz
Comment #53
xjmShouldn't the CR https://www.drupal.org/node/2535396 include examples of the API added by this patch as well? I.e. the "during" case (I think).
Comment #54
effulgentsia CreditAttribution: effulgentsia at Acquia commentedRe #53, yeah, the CR was written in #14, but the patch completely changed approach in #33, so let's get that updated. Keeping at RTBC though for maximum exposure of the patch prior to it being committed.
Comment #55
alexpottNice updates @xjm. I've updated the CR
Re #50.2 emma disagrees
Comment #56
plachThis looks ready to me, RTBC +1. Assigning to @catch since this was discussed with him at length.
Can be fixed on commit:
These do not wrap at column 80 correctly.
Comment #57
catchI looked at plach's comment length to fix on commit, but I think it might have been missing a space because I got to 81 chars, so left those as is.
I think we have more to do here, but apart from #2464427: Replace CacheablePluginInterface with CacheableDependencyInterface we're missing concrete use-cases, so this feels as far as we can go for now.
Committed/pushed to 8.0.x, thanks!
Comment #60
divined CreditAttribution: divined commentedThe same error on 8.2:
Failed: The SQL storage cannot change the schema for an existing field (field_1 in contact_message entity) with data.