Problem/Motivation
In #2705389: Selected content entities should extend EditorialContentEntityBase or RevisionableContentEntityBase we want to enable revisions everywhere. But in order for this to be committed in 8.x we need a data upgrade/migration path.
What currently stops us is the fact that we need to change the entity type definition to enable revisions. Why is this a problem? That's because in \Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::onEntityTypeUpdate
we throw a EntityStorageException
if \Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::requiresEntityDataMigration
returns true (which it will for existing sites).
Proposed resolution
A service that can be called from within an update hook.
Pass in the entity type object and it will create / remove tables and columns, update the installed entity and field definition, and migrate and data.
This issue will also add tests for the service to test switching an entity_test entity from non-revisionable to revisionable.
Comment | File | Size | Author |
---|---|---|---|
#152 | interdiff-2721313.txt | 1.97 KB | catch |
#148 | 2721313-142.patch | 44.03 KB | timmillwood |
#142 | interdiff.txt | 3.27 KB | amateescu |
#142 | 2721313-142-combined.patch | 384.03 KB | amateescu |
| |||
#142 | 2721313-142.patch | 44.03 KB | amateescu |
Comments
Comment #2
dawehnerWhat blocks us from writing individual update functions which moves the data over from one table storage to anotherone?
Comment #3
timmillwoodAs mentioned to @dawehner in person I don't think using update hooks is scalable.
Depending on how much we want to be revisionable this will need to work in different ways.
Personally I would love to see core enforce all content entities are revisionable (even if they don't use revisions). Therefore the migration will have to migrate all content entity schemas.
I still think that doing this migration through
\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::onEntityTypeUpdate
using Migrate module.Comment #4
dixon_I'm starting to like the idea of update hooks the more I think about it. I have updated the OP with two possible approaches. I'm also bumping this to Major as this is a big blocker for the first phase of the now-official Workflow initiative.
Comment #5
dixon_Comment #6
dawehnerI personally think that the initial additional of revisions for certain entity types can be totally done in specific update hooks, given its basically a table copy and column copy plus removal of non revisionable fields. The scaling aspect could be probably solved by some clever SQL queries.
On the other hand it would be super nice to have automatic migration from one schema to the other using the migrate module, but given the possible operations, they can't be always automated
Comment #7
timmillwoodIf we decide to go with the update hook approach this will have to ship in the same patch that we enable the revisionable entities in. #2705389: Selected content entities should extend EditorialContentEntityBase or RevisionableContentEntityBase
Comment #8
timmillwoodThe update hooks might simply call
\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::onEntityTypeCreate
, then loop through the non-revision tables, and copy data to the revision tables.Comment #9
dixon_Comment #10
timmillwoodThe code for this needs to ship with #2705389: Selected content entities should extend EditorialContentEntityBase or RevisionableContentEntityBase
Comment #11
timmillwoodComment #12
timmillwoodAdding a starting point for this.
Comment #13
timmillwoodThis includes a database dump of a standard site with entity_test module enabled, although entity_test_mulrev was made non-revisionable before creating the dump, so the the test, tests making it revisionable again.
Comment #15
timmillwoodComment #17
timmillwoodIt seems making this testable with entity_test is a little more complex than I expected.
Comment #18
timmillwoodUsing
shortcut
entity as an initial test for this upgrade path. This patch therefore makesshortcut
entity revisionable. We can then make all other entities revisionable in #2705389: Selected content entities should extend EditorialContentEntityBase or RevisionableContentEntityBase.Comment #20
timmillwoodThis should fix some of the failing test, but others looks to be due to the update hooks only making schema changes and not migrating any data.
TODO: Make sure revision tables and fields are populated
Comment #22
timmillwoodImplemented a first step toward copying data to the new schema.
Comment #24
timmillwoodNeed to work out why the langcode field in shortcut_field_revision is being created with a primary key.
shortcut module
Update #8001
(
[:db_insert_placeholder_0] => 2
[:db_insert_placeholder_1] => 2
[:db_insert_placeholder_2] => en
[:db_insert_placeholder_3] => All content
[:db_insert_placeholder_4] => internal:/admin/content
[:db_insert_placeholder_5] => a:0:{}
[:db_insert_placeholder_6] => 1
)
in Drupal\Core\Database\Connection->handleQueryException() (line 668 of /home/timmillwood/Code/drupal1/core/lib/Drupal/Core/Database/Connection.php).
Comment #25
timmillwoodSo it looks like
langcode
is getting added as a primary key in\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::initializeDataTable
and\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::initializeRevisionDataTable
, not sure I understand why yet.Any insight anyone?
Comment #26
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@timmillwood, the 'langcode' field is added as a primary key in most entity table for (entity query) performance purposes. Not sure why that would affect the data copy operation though..
Comment #27
dawehnerJust a quick happy reminder. This seems to be quite an assumption which is maybe better added at some higher level of the stack.
Comment #28
timmillwood@amateescu - I assumed the primary key on 'langcode' is causing the issue outlined in #24. Maybe it's a deeper issue.
@dawehner - I have been thinking that the part of the patch you mention should go into it's own issue. The problem I am having is when I add the 'revision_id' field to an existing table, if the table has content (such as the Shortcut module adds on install) then it causes issues if the new field can't be null. Maybe there can be something to pass into
installFieldStorageDefinition()
to either add some data, or not add the NOT NULL attribute, this could then be added later.Comment #29
timmillwoodSo it looks like this issue with langcode is because revision_id isn't getting a primary key set.
Comment #30
timmillwoodManually adding the primary keys back in works!
Comment #31
timmillwoodThis is failing in SQLite and PostgreSQL because the
NOT NULL
"fix" has only been done for MySQL. I'm trying to work on a proper fix for this in #2746279: Adding not null fields to table which have data.Comment #32
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedTrying out the alternative approach for setting the initial values for the 'revision_id' field from #2753475: Support using the values of another field as initial values when adding a new field.
Comment #33
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedComment #34
timmillwoodre-roll and clean up.
Comment #37
timmillwoodCan't seem to replicate these failures locally.
Comment #38
jibranTagging it so that I can keep track of it.
Comment #39
jeqqRemoved trailing whitespaces.
Comment #40
jeqqComment #41
jeqqSome tests are failing because Drupal project tests are failing and they are not related to this patch.
Comment #42
Bojhan CreditAttribution: Bojhan as a volunteer and commentedComment #44
timmillwoodYay! All tests passing again!
Please review so we can get this committed and get the ball rolling on the parent issue.
Comment #45
jibranI think @effulgentsia and @plach should review
'system.entity_schema_updater'
changes.Comment #46
BerdirFirst review.
what makes this necessary exactly? Ah, the revision user field? A bit weird to have that on shortcuts, especially given that they don't have a user?
why is this in system.module?
This seems to be doing something very specific but the method name is generic. Either it should have a better method name or it should just be in the update function directly.
For example, we might want to think about the order in which we do things. The idea is to get the currently installed type, then call the method that make the changes and only when that was successful update the type.
This also seems a bit strange, calling on EntityTypeCreate() on an existing entity type?
I think that only works because the storage currently silently skips existing tables, which can cause weird bugs/behavior (see https://www.drupal.org/node/2751363). If we change that, this will break.
I know the example basically says that, but that really isn't going to work if you have tens or hundreds of thousands of entities. Sure, not the case for shortcuts (I hope...) but for terms? absolutely possible.
This needs to use batch API.
From what I understand, this is basically an attempt to make the update code for multiple entity types re-usable. Updates tend to be complicated and it is likely we'll need entit type specific customizations. But if we do keep this, then we should at least give this very specific class/service/method names. And possibly have fewer public methods, that are more closely tied to update/batch API.
Meaning, this would then accept a $sandbox argument and implement something like the example in https://api.drupal.org/api/drupal/modules%21system%21system.api.php/func...
Comment #47
timmillwoodonly really fixing #46.2 but it's quite a big change, so thought it best to upload the patch.
Comment #48
timmillwoodTrying to address #46.3 and #46.5
Comment #49
jhedstromSome initial review. I think the biggest question is around using a filled vs bare database for the tests in #1. The rest are mostly style/nits.
Shouldn't this test check with some content (eg, the
drupal-8.filled.standard.php.gz
starting db)?Also a nit, the docblock should be
{@inheritdoc}
Indentation nit.
$exists
will always be set, so this could just beif ($exists && ...)
.Some nits: these need parameter descriptions.
Comment #50
timmillwoodAddressing issues in #49.
Comment #52
timmillwoodLooks like the filled db has 4 shortcut entities.
Comment #53
jhedstromThis is looking good. Are there any outstanding tasks left to do here?
We're testing shortcut here, and then presumably this test would be updated over in #2705389: Selected content entities should extend EditorialContentEntityBase or RevisionableContentEntityBase to check other entity types?
Comment #54
timmillwood@jhedstrom - I don't think there are any remaining tasks.
Yes, #2705389: Selected content entities should extend EditorialContentEntityBase or RevisionableContentEntityBase is where we will update other content types to be revisionable, add update hooks for them, and update the test.
Comment #55
jhedstromI think this is ready for RTBC, but as @jibran mentions in #45, it would be good for additional reviews of the new service. However, that was 10 days ago, and it would also be nice for this to move forward. Neither of those folks are on IRC at the moment either.
Comment #56
jhedstromHere's a deeper review of the new service to try and move this along.
Nit: needs a space between the description and the
@param
statements. Also need variable descriptions for each param.There's a lot going on in this method. Can you add inline code comments at each section to explain the process for easier potential debugging in the future?
I had to dig a bit do figure out we were looping through 5 at a time. An explicit comment here would be good.
Nit: variable description.
Whitespace.
Rather than hard-code these here, can we re-use the
RevisionLogEntityTrait::revisionLogBaseFieldDefinitions()
method so labels and definitions are always in sync?Comment #57
Berdir6: No, we can't. This is an update function, just like a schema definition, it must not change, otherwise it will get confusing for sites that already get an updated definition and try to do another update on top of it. We have to keep the original definition here.
Comment #58
jhedstromBut this is a re-usable update-helper-service. In theory, it could be used on one entity type now, and another in 6 months. If for some reason entity revisions have another column added, or column schema change, or a label change, etc, shouldn't the update hook run with the latest definition? It isn't changing existing fields, it's adding new ones.
Comment #59
BerdirWell, we'll see how re-usable it exactly is :) Still not 100% sure about where it should live exactly and how it should be named.
Update functions and re-usable usually don't go well with each other. Update functions are always for very specific versions and must be guaranteed to work any combination of old and new core version (as far as supported), so they must *never* change.
Comment #60
timmillwoodFixing nits from #56.
Agree with #57 and #59.
Comment #61
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedFirst review pass.. it looks pretty good already :)
Missing space before '5'.
Wouldn't it be simpler to write this as a merge query?
Or, even better, if it depends only on a single lookup key, it can be an upsert query.
We shouldn't hardcode the name of the 'revision' entity key here, it needs to be an optional parameter.
Same for the revision table names. They might change in the future so the only way for an update function to perform the exact same task at any point in time is to not depend on any live/changeable state.
@Berdir's feedback was not addressed here...
This method seems to deal only with revisionable fields, so it should be named a bit better.
Why is only this field conditioned by the 'revision' entity key?
All the other fields from this method don't make sense without a 'revision' key either.
Like above, we shouldn't hardcode the names of the base fields.
Why are we updating the indexes only for the revision data table? The base, data and revision tables need the same treatment I think, no? :)
Is there a reason for having two update functions?
Missing empty lines and a slightly more descriptive text for this test class :)
Missing empty line.
Missing docblock for this method.
We should also test loading a revision of this entity type in order to be sure that the revision tables were created and populated properly.
Comment #62
timmillwoodAddressing items in #61:
1. Done
2. Done
3. Done
4. The table names are being fetched from the $entity_type, all we're hardcoding here is the property.
5. Not sure how to address this yet.
6. Done
7. Done
8. Where can we fetch them from?
9. Revision data table was the only one that caused an issue.
10. yes, @Berdir's suggestion, the first doesn't loop, the second does.
11. Done
12. Done
13. Done
Comment #63
jibranSome more review points and some suggestion. Nice work @timmillwood.
param description missing.
This should be a single line.
This is not correct.
Not needed at all.
I think
$this->container->get()
is preferred instead of\Drupal::service()
in tests.We should create and save new revision of a shortcut here and assert some values.
Comment #64
jibranComment #65
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedFor #62.4 and .8: That's my point exactly, we shouldn't fetch them from $entity_type because that might change in the future but rather add them as method parameters and the let the update function pass them in.
#62.9: That doesn't mean all the other tables don't need this as well :)
#62.5: Copying over the code seems to be the only "good" solution here..
Comment #66
BerdirIf you get the entity type definition from the entity definitions update manager, then no, it won't change. The problem will be that not all places (like the storage handler) will do that if you just go through that for some part of the code.
Comment #67
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@Berdir, that's true, but right now we're not getting the entity type correctly in the update function.
Comment #68
timmillwoodFixing items from #63 before I refactor all the things.
Comment #69
timmillwoodWorking in the open.
Comment #71
jibranThanks for the quick feedback fixes @timmillwood.
Sorry, I meant new revision of an existing shortcut.
Comment #72
timmillwoodProgress update.
Comment #74
timmillwoodMaking some progress.
Comment #76
timmillwoodAdded a change record - https://www.drupal.org/node/2774203
Comment #77
timmillwoodI think this direction would need a lot of sign off, but it works.
Comment #79
timmillwoodAdding the exception back in.
Comment #81
timmillwoodShould fix some of the tests, but not sure how to fix
\Drupal\Tests\Core\Entity\Sql\SqlContentEntityStorageSchemaTest::setUpStorageDefinition
.Comment #83
timmillwoodConverting the RevisionableSchemaConverter service into just a standard class and marking it as @internal.
Not fixed any of the two failing tests ;(
Comment #84
timmillwoodLooking at duplicating onEntityTypeCreate, calling it onEntityTypeCreateMissingTables.
Comment #86
timmillwoodLooks like in #84 I accidentally removed the initial_from_field.
Comment #87
timmillwoodUpdating #86 with a change of method name, and interface addition.
Starting to feel happy with this solution.
Comment #89
timmillwoodIt helps when changing a method name to change it everywhere.
Comment #91
dawehnerThere are a couple of bugs with the current patch in regards of configurable fields:
See https://www.drupal.org/files/issues/interdiff_22477.txt for fixes around that.
Comment #92
dawehnerA couple of more bugs:
Comment #93
dawehnerHere is a patch which should fix all points from above.
Comment #94
dawehnerSome additional problems:
* *_field_revision table currently just gets one copy per language
* The data is not returned for multiple languages
* We cannot longer use upsert, as the keyspace needs to include at least also the langcode
Here is an interdiff, sorry I don't have more time right now, but it should be doable to apply this interdiff onto the patch, sorry.
Comment #95
dawehner@amateescu
Do you mind commenting the stuff you worked on / found out regarding bugs over the last week? People are using this patch for actual sites, so continues progress is kinda helpful :)
Comment #96
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@dawehner, sure, here's the list of major bugs that I found in the current patch:
1.
Updating all base fields unconditionally like this leads to complete data loss for every base field which uses a dedicated table (i.e. multi-valued base fields).
2.
Sidetracking (or not calling)
\Drupal\Core\Entity\EntityDefinitionUpdateManagerInterface::updateEntityType()
at all means that entity type events are not fired by\Drupal\Core\Entity\EntityTypeListener
, which in turn means that, for example,\Drupal\views\EventSubscriber\ViewsEntitySchemaSubscriber
does not have a chance to run so all the views for the entity type that's being updated will be broken because of the base (data) table changes.3. An additional problem found by @Berdir is that we don't clean up the base table columns for fields that are now stored in the revision data table.
The second point is the most painful one, because it means that the data migration process needs to run within the boundaries of
\Drupal\Core\Entity\EntityDefinitionUpdateManager::updateEntityType()
, which poses a problem as I couldn't yet find a clean backwards-compatible way to pass over the$sandbox
parameter from ahook_update_N()
implementation.I'm still working on updating the patch to fix all of the above, so I'm going to assign the issue to myself for now.
Comment #97
dawehner@amateescu++
Comment #98
plachQuick review, while I try to get familiar with this issue:
Just a note: this implementation will be tied to
hook_update_N()
functions, so I guess we will have to "version" it if we need to perform changes in the future so we get a predictable/reproducable update behavior. That is we will have to provide aRevisionableSchemaConverterV2
(actual name tbd) possibly extending the previous one, if we need to perform any change to the inner logic after that the update functions using the previous version are officially released.Not sure whether this is a new policy, but there are a few PHP docs missing here and below.
This interface and its implementation look pretty SQL-specific, so they should live in the
Drupal\Core\Entity\Sql
namespace.Comment #99
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedIn addition to #98, I discussed the issue with @plach and his idea is to rename all the existing tables, then call
updateEntityType()
in order to create the tables/schema from scratch and after that copy the data over from the original tables. I've already started experimenting with this idea.Comment #100
jibran+1 to #99. As per my experience with DER, it's better to create them from scratch then altering the existing one.
Comment #101
BerdirI like that idea as well. It's kind of like the migrate-based updates dream that we have except doing it by hand.
We could keep the old backup tables around and start using some kind of naming pattern for them, although we need to be careful to keep it short, as we could accidently rename them to something that's too long. Maybe combined with a manual clean-up tasks that reports on admin/reports/status about those tables, that could also be a follow-up.
What would be neat is if we could create the new tables with a different prefix and then rename when the date is successfully migrated. Then we wouldn't leave the system in an entirely broken state if something goes wrong. But I guess that won't be easy..
Comment #102
plachI thought a bit more about this and I'm concerned that the approach I discussed with @amateescu in #99 may lead to an update workflow that's not predictable/reproducable (this is similar to the first comment I had in #98). If we generate the schema then it will rely on whatever runtime version of the entity schema handler class is being executed. It would suck but I'm wondering whether we need to hardcode the schema definition for each entity type in the various update functions.
Comment #103
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented#102 is correct in assuming that the approach discussed in #99 would make updates unpredictable, because of this code from
\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::getEntitySchema()
(which is called byonEntityTypeCreate()
throughonEntityTypeUpdate()
):But that doesn't necessarily mean we have to hardcode the schema in the update functions, instead we should finally fix #2274017: Make SqlContentEntityStorage table mapping agnostic , which would allow
\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::getEntitySchema()
to return the schema for the passed in$entity_type
definition, and not use the actual/current one available in code, right?Comment #104
plachSorry, but wouldn't that be still tied to the "version" of the schema handler class?
Comment #105
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedRight, we're tied to whatever
SqlContentEntityStorageSchema
decides to return at the time when the update function runs, but isn't that the whole point of dynamically generated entity schemas instead of hardcoding them inhook_schema()
implementations? :)Comment #106
catchThe trouble with #105 is if there are conflicting updates/contrib modules. If a contrib update is written on a core version where this patch has been applied, but then runs on a site where it hasn't (alongside a core update). If we hard-code the schema in the update, then whatever the state of the site you always get the same outcome.
It would be really great to not worry about this, but don't think we can stop yet.
Comment #107
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedOk, here we go. Since we can't rely on generating the schema automatically, the best we can do in this patch is to provide a helper for getting the schema that we want to upgrade to, and, of course, the data copying and entity definition update operations.
In order to get the hard-coded schema for an update function, this code can be used:
An interdiff to #93/#94 is provided but due to the extensive changes that I went through in the last two weeks, it's pretty much useless and it's just reverting the previous code :/
There's only thing that I still need to figure out: how to get the schema for configurable fields, because
\Drupal\Core\Entity\EntityFieldManagerInterface::getFieldStorageDefinitions()
only returns the storage definitions for base fields. But until then, this patch is ready for review.Comment #108
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedI just realized this morning that we cannot use a fully hard-coded schema definition in the update functions because that won't contain the schema for custom base fields added via
hook_entity_base_field_info()
.So we're back to square 1 I guess.. :(
Comment #109
catchSo the closest we can get to fixed schema looks like this after talking with amateescu in irc:
1. Use the fixed known schema from core
2. Invoke hook_entity_base_field_info() + _alter() on it to pick up contrib alterations.
3. Collect schema from configurable fields (they'll be the same before and after the specific update runs)
#2 can change over time and depending on which modules are installed, however:
- if the core is updated in a separate deployment from contrib, then the result is predictable
- if contrib adds a dependency on the core update so that they run first, then the result is predictable
- if neither of those happen, then the core update might pre-empt the contrib update and make the same change, this may or may not be OK.
- if we wanted to force it, we could add a new hook in-between hook_update_N() and hook_post_update_N() (I facetiously suggested hook_pre_post_update_N()) so that core's guaranteed to run last.
amateescu mentioned using migrate instead - that also gives us fixed start and finish points (or removes the need to care about them), and that seems worth looking at.
I don't think we can revert to a dynamic update and hope it works - no matter how difficult this is to get right, debugging real-life data-integrity issues and update conflicts on tens of thousands of sites is worse.
Comment #110
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedI thought about this a lot during the weekend and I think I have another viable alternative that eliminates all the risks involved with the fixed + dynamic schema approach discussed in #109.
The main point is that the current design of the storage schema was written with the migrate use-case in mind, so all the previous patches are kinda working against it. So why not "go with the flow" and let the storage + schema handle it for us in a post_update hook by loading the unrevisionable entities and saving them as revisionable.
There are two ways in which this can accomplished:
1. leave the old tables in place and load the entities using the last installed entity type definition (which is unrevisionable at this point), inject the revisionable entity type and a "temporary" table mapping in the storage and save the entities, which will write the final (revisionable) data in temporary entity tables; at the end of the process, remove the old tables and put the temporary ones in place.
2. rename the old tables to temporary names, create the schema from scratch with the new (revisionable) entity type definition; load the old data by using the last installed entity definition and a table mapping that returns the temporary tables, save the data using the current (revisionable) entity type definition; at the end of the process, remove the temporary tables with the old data.
Approach #1 has one drawback: as soon as the annotation for an entity type is updated with a revision entity key (i.e. when the code is updated), the storage is not able to load anything anymore, so the site will be in a broken state until the post_update batch is done and the new tables are put into place.
This would have happened with all the previous patches in this issue as well!
Approach #2 is better in this regard because the new entity tables are created at the beginning of the process, so the site will not throw exceptions, but its drawback is that the content will initially "disappear" and it will be brought back gradually as the post_update batch runs.
Given that #1 (and all the previous patches from this issue) leaves the site in a state of "throw exceptions all over the place", I'm leaning towards approach #2 which gives us a less and less broken state while the batch is running.
Comment #111
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedHere's a patch which implements the approach from #110.2. The entire logic and code from
RevisionableSchemaConverter
is now *much* simpler and easy to follow.There are couple of @todo's left in the patch, but it's definitely ready for a good review :)
Comment #112
jibranGood review, I don't know but it is a review nonetheless. No one has reviewed it in last 10 days so here it goes. Mostly nits and docs suggestions. I agree with @amateescu.
. I attached the interdiff for minor stuff. Feel free to add that to the patch or ignore it. It is a good idea to not convert any core entity in this issue i.e. shortcuts. And thank you! it is looking quite good now.I think we should document whole #110.2 here.
Do we want to fix it here or in the follow-up?
I'm confused by reading this why can't we wrap this into
? A comment to state the reason will help a lot to understand this better.
Oh! yeah, let's do it now.
Let's rename
$base_table
to$temporary_base_table
for better understanding.Very helpful comment. Thank you!
Nice!
Why are we not calling
initTableLayout()
here?Should we move these to the interface?
We should update the docs of
getEntitySchema()
to mention the difference about both and why can't we update thegetEntitySchema()
to accommodate this with additional default parameter?Space missing after if.
Let's rename to
$revision_name
to$revision_table_name
.I'm going to copy this and create a revert schema Drush command out of it.
This file can be removed.
Let's convert to multi-line array.
Nice @amateescu++. :D
Comment #113
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedOk, here's the 'final' version of the patch, with no more @todos and complete test coverage.
I've updated the test with a base field added through
hook_entity_base_field_info()
and also a configurable field.Re #112:
1: Documented what we're doing in the doc block of
\Drupal\Core\Entity\Sql\RevisionableSchemaConverter::convertToRevisionable()
.2: That's actually already done by
\Drupal\system\Controller\DbUpdateController::triggerBatch()
, removed the @todo.3: Didn't understand that comment.
4: Done.
5: 11: 12: 14: 15: Already done in your interdiff.
8: Because we don't need to?
9: 10: Nope, they were leftovers from a previous approach, now removed from the patch.
Comment #114
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedAnd here's a version of the patch without the database dump, just to make reviews a bit easier :)
Comment #115
jibranThanks for addressing the feedback. Let me eloborate point #3. Other then that this ready for me.
At this point, we already have changed the tables. Why can't we use storage table mapping instead of replicating the code?
Comment #116
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedBecause the storage creates a
DefaultTableMapping
class, and we need aTemporaryTableMapping
which has a 'special'generateFieldTableName()
method.Comment #117
jibranGot it, thanks for the explanation. As I said in #115 patch is ready. It is pretty self-explanatory. The documentation is there and makes a lot of sense. Update path test is verifying all the possible scenarios. As per #95 people are using this functionality so I think we should add a change notice for this as well.
The only thing I can think of is sub-system maintainer review i.e. entity system maintainers and framework manager review i.e. @effulgentsia or @alexpott. So, tagging the issue and moving it to RTBC to get more eyes on this.
Comment #118
catchschemea.
last installed
Shouldn't this order by ID ASC to make it explicit?
Also is 10 all we can manage here? Would think 100 would still be safe for memory limits while saving a lot of batch overhead. And/or we could allow the batch size to be set in $settings too.
There's no error handling here. Should we be trying to catch an exception and replace the old storage back where it was? If we could use transactions that'd be great, but of course impossible with batch/UI updates.
The other thing going on here is that during the update any requests to the site are going to be missing entities, maintenance mode possibly makes this OK, but could also mean sites in maintenance mode for hours.
There's one possible way to prevent that issue, although I have a feeling we discussed it in irc and it wasn't possible for some reason, but would be good to confirm here:
1. Create the new entity schema with temporary table names
2. Load entities from the current schema, save them into the new one
3. Rename the old entity tables to temp names, rename the new ones to the proper names
4. Drop the old names and update the schema definition
That would at least in theory allow the site to operate (if only read only), and the advantage is that if it fails somewhere, you'd still have a working site just with some redundant temporary tables up until the final switch around.
Comment #119
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedFixed #118.1, 2 and 3.
For #118.4, it's true that we should have a fallback mechanism in case an exception is thrown at any time during the batch requests. I'll implement it shortly :)
That was not only an IRC discussion, it's also documented in #110 (5th paragraph) why I chose to implement it "the other way around" :)
Comment #120
catchOK I knew it was somewhere, #110 is the one.
I think we can get around the 'instant change' issue by using hook_entity_type_alter().
i.e.
- add a new, (possibly hidden?) module, which alters the entity definition
- make the module required so it's always on for new sites
- when we update, do the full data migration, and enable the module at the end.
This way we explicitly change the entity definition only when the data is already migrated. If the migration fails, then you have some tables to delete, but your site still works otherwise.
Alternatively put the alter in an existing module as use a state flag in the update to change it, but that seems a bit more fragile.
Comment #121
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedSo #120 is one possibility which would allow the site to 'function' (in maintenance mode at least) until the update process is done, but maybe we can also try to do #2554235: Make the content entity storage and entity query use the last installed definitions instead of the ones living in code, or just a small part of it and use the last installed definitions in the storage handler.
This is a lot of food for thought so I'm going to wait on some feedback from the entity (storage) maintainers before working in either direction :)
Comment #122
catch#121 also sounds like a plan, so yeah let's get more opinions.
My main concern here isn't so much leaving the site in working state during the update (although that's nice too), but ensuring the smallest possible window to leave the site in an unrecoverable state if something goes wrong. This is a non-optional upgrade path that people could end up doing as part of an update to a security release, so if we can make it so that a failure (or the most likely failure of a bug in a TaxonomyTerm::save() somewhere, I've seen network requests inside entity saving before) leaves the site working with cruft and a pending update, that's better than fatals due to schema mis-matches and having to restore a backup and insecure code base.
Comment #123
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedSince there are no other opinions in over a month, I've rewritten this patch for the third time to fully account for #118.4, which means that error handling is now as robust as it can get, and also fully tested :)
Comment #125
dawehnerRecreating the tables and then rename tables seems to be a really good approach here!
It would be great to document why this is needed.
I'm a bit nervous about this line (strict comparison). 100/100 is indeed
1
, but note:1.0 !== 1
Isn't it a bit dangerous to drop the old tables first and then get in the new ones? What about renaming the old ones so we could recover somehow and then at the end of the process drop them? So I'm wondering whether a try/catch around the next lines and then a rename in case of an exception could reduce some problems during it.
I was wondering whether database table operations can actually be wrapped inside a transaction, but yeah https://stackoverflow.com/questions/4692690/is-it-possible-to-roll-back-... says no, at least for mysql.
This does no longer match.
For some better test coverage, can we check that the IDs stay the same? Otherwise there is no guarantee that existing entity reference swill continue to work.
I am wondering whether the scope of the try statement should be increased, like the entire copy method.
Nitpick: For slightly better readability I would have negated the condition.
Why does the dump contains this update function in the first place?
We should check IDs as well, as written above.
For sanity reason I would also save the entities to ensure this works as well.
Comment #126
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@dawehner, thanks for reviewing! :)
Re #125:
1. That's because we want to be sure we always get the latest entity definition from code. Added a comment.
2. Ok, let's use '==' then :)
3. Done, we rename the initial tables, use a try/catch for the whole code block below and bring back the original tables if needed.
4. Bummer :/
5. FIxed that and also added a setting for the update step size.
6. 10. Added assertions for entity IDs as well.
7. I think it's nice to be able to be tell which entity failed exactly, so I just expanded the scope of the try/catch to the scope of the foreach statement.
8. Done :)
9. Because I installed the module with the post update function in place, but there's no real reason for it to be there. I removed it and updated the dump. Note that that dump change is not in the interdiff because it would be useless.
Comment #127
joelpittetLooks like this needs a re-roll, but there needs maintainer/manager review so I didn't NW the issue.
Comment #128
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedWe now have a dedicated issue for adding test database dumps that we can also use here. Here's an updated patch which depends on #2856808: Break out the 'entity_test_update' entity type into its own module and add additional test db dumps.
Comment #129
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedI was doing a self-review here because no one else seems to want to look at it :) and I noticed that the test was in the wrong place, so I moved it and then refactored the code a bit to make it more easy to review and extend in the future with translatable/non-translatable support.
Comment #130
jibranDo you think it's time to contact subsystem maintainer(s)?
Comment #131
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@catch contacted them a few months ago and we haven't heard anything back since then. There were also multiple tweets from @drupaldeploy and @dickolsson that this issue is very important for the Workflow Initiative and that it needs review.
I don't know what else to do..
Comment #132
jibranSending a message through contact form always works for me. Just sent a message to @plach and @effulgentsia using the contact form.
Comment #133
jibran@plach will have a look at it soon. yay!
Comment #134
plachBig +1 on this new approach, @amateescu++
The patch looks good to me, below you can find a few things I've noticed. Tomorrow I'll try to perform some manual testing and step through the new code to refresh my mind, but my gut feeling is this is very close to RTBC, at least on my end :)
Just curious: are we planning to create post update functions for core entity types in a follow-up or in separate issues?
I'm not a fan of using a dynamic property here, mainly because it's introducing an extraneous concept to the entity type definition that does not really belong there. What about adding a property + setter method to
SqlContentEntityStorage
to mark an entity type (id) as not requiring a data migration?I'm wondering whether we could end up having existing tables having
$table_name
name, if the exception happens late enough. Would it make sense to drop any existing table just in case?I'm not sure about this, but could simply re-throwing the exception mess-up the backtrace? Would it make sense to wrap
$e
into a new exception?Minor: the PHP doc refers to the "wrong" line, we could switch them to slightly improve readability.
If we're not in maintenance mode,
max
could change at each step. I guess we should update its value just before checking whether the batch process is finished.In practice this will almost always work, but to make this code more robust, could we just skip definitions that are already marked as revisionable? Right now we're assuming an implementation, but we only know about the interface contract here :)
Wrong PHP doc.
Comment #135
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@plach, thanks a lot for taking the time to review this!
The plan is to open separate follow-up issues to make entity types revisionable and publishable, but that's quite a long way off because of all the dependencies that need to land first. See #2820848: Make BlockContent entities publishable for an example :)
#134.1: That's actually the thing that I struggled with *a lot*. The problem is the long indirection call chain that goes from
\Drupal\Core\Entity\EntityDefinitionUpdateManager::updateEntityType()
to\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::requiresEntityDataMigration()
, because we have no way to influence the storage object instantiated by\Drupal\Core\Entity\EntityTypeListener::onEntityTypeUpdate()
.#134.2: Sure, that's a good idea. Fixed.
#134.3: There are many places in core where we simply re-throw the original exception, most notably in the database layer and the render system, so I don't think it's a problem.
#134.4, 5, 6, 7: Fixed :)
Comment #136
plachSorry for not being more clear, but I was suggesting to skip the
::isBaseField()
check altogether: any field definition has an::isRevisionable()
method, so we should rely on that alone. The result should be the same achieved by the original code in the large majority of cases, but this way we would be also accounting for custom/exotic field definition implementations.Aside from that code looks great, I'll play a little with this during the weekend, I hope to be able to RTBC it by Monday :)
Comment #138
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedRight, that makes perfect sense.
Comment #139
plachOk, manually tested this here and it seems to work nicely, although we will definitely need work in the individual storage handlers to make them able to deal with the various supported table mappings instead of the currently assumed ones. That said, this is RTBC on my end, aside from the minor stuff below.
Normally the
is
prefix is used by the property getter, can we rename this property to just$temporary
?Still bugged by this, but unless we can make sure that the entity type manager always returns the same storage object (which would make sense to me), I see no other fix. However since entity types support arbitrary properties, could we at least do the following?
We're not dropping the
old_
tables at the end of the process, is that intentional?I think this variable name is misleading: can we use
$step_size
or just$size
here?Comment #140
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@plach, it's so great to hear that! This issue has been my archenemy for quite a few months :D
Fixed all the points from #139.
Comment #141
plachThanks!
Bear with me :)
Can we add an assertion in the test to make sure backup tables are gone?
Also, can we call them backup tables instead of initial tables in the comment to avoid misunderstandings? ;)
Comment #142
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedSure we can ;)
I also ran the test locally with the line that removes the backup tables commented out and it fails like it should.
Comment #143
plachAwesome, thanks!
Comment #144
plachActually, postponed on #2856808: Break out the 'entity_test_update' entity type into its own module and add additional test db dumps, but feel free to move this back to RTBC once that's in :)
Comment #145
plachme--
Comment #146
jibranBack to RTBC after #2856808: Break out the 'entity_test_update' entity type into its own module and add additional test db dumps is merged.
Comment #148
timmillwoodI think we ween need a re-upload of the #142 patch for testbot now that #2856808: Break out the 'entity_test_update' entity type into its own module and add additional test db dumps is in.
Comment #149
catchComment #150
timmillwoodThis issue has been RTBC for 2 weeks. Any chance we can get it reviewed / committed?
Comment #152
catchFixed a couple of coding standards issues on commit (see interdiff).
Committed/pushed to 8.4.x, thanks! Sorry this took so long to get back to.
Comment #153
catchComment #154
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThanks, @catch!
I've rewritten and published the change record.
Comment #155
Wim LeersImpressive work!
Comment #156
dixon_Thanks everyone for all the work on this! Very important milestone for the initiative!