Problem/Motivation
When creating a new base field storage definition an initial value may need to be assigned as the field value might be required. This value may or may not match the default value, in fact required fields may not have a default value at all.
A real-life example of this is https://github.com/md-systems/file_entity/tree/schema_handling, where we need to add a type
field to the file
entity, which will store the field bundle. Core SQL storage requires an initial value for this field as it's marked as NOT NULL
.
We need to figure out a way to specify and apply an initial value for each field storage definition.
Proposed resolution
Add two methods to \Drupal\Core\Field\BaseFieldDefinition
: setInitialValue()
and setInitialValueFromField()
and use that information when we create the table columns in \Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::getSharedTableFieldSchema
.
Note that support for "initial value from field" has been added specifically for this use case: #2753475: Support using the values of another field as initial values when adding a new field
Remaining tasks
Solution definitionImplementation- Reviews
User interface changes
Nope.
API changes
API additions: the two new methods on BaseFieldDefinition
.
Comment | File | Size | Author |
---|---|---|---|
#76 | interdiff.txt | 691 bytes | amateescu |
#76 | 2346019-76.patch | 20.1 KB | amateescu |
#70 | interdiff.txt | 1.09 KB | amateescu |
#70 | 2346019-70.patch | 19.86 KB | amateescu |
#68 | interdiff.txt | 1.42 KB | amateescu |
Comments
Comment #1
plachQuoting some relevant comments from other issues:
@Berdir, #1498720-122: [meta] Make the entity storage system handle changes in the entity and field schema definitions:
@yched, #1498720-126: [meta] Make the entity storage system handle changes in the entity and field schema definitions:
Comment #2
yched CreditAttribution: yched commentedDemoting as per #2232477-26: Fatal when adding new fields with NOT NULL constraints in a base table that contains existing entities ?
Comment #3
effulgentsia CreditAttribution: effulgentsia commentedOnce we support 'initial', then ideally, it wouldn't be table mapping dependent. In other words, it should result in the same data state regardless of whether it's in a shared table or a dedicated table (whether that's due to being a bundle field or due to any other table-mapping-specific logic). That would mean that in the case of a dedicated table, when we add the field, we should create records in the dedicated table for every entity that already exists. I discussed this with pwolanin at DrupalCon AMS, and he believes that this is actually a reasonably fast operation (<1 second even for millions of records), if implemented entirely in SQL (not a PHP iteration). I don't know the proper SQL statement to do this, but perhaps pwolanin or someone else can help with that when the time comes.
Comment #4
yched CreditAttribution: yched commentedFully agreed - that's the tricky thing with supporting 'initial' :-)
That's good news. But : once we support 'initial', then ideally it wouldn't be storage engine dependant either - meaning, it should be something alternate storage backends (typically, Mongo...) could support as well.
Is "update millions existing documents to add a new sub-entry" something that we can expect to be "reasonably fast" with a Mongo storage ? I guess yes ?
Comment #5
fagoSeems reasonable. I think we can default to the default value when there is no initial value.
I'd love to see this fixed for configurable fields also - e.g. considering REST output after adding a field with a default value, it would make a ton of sense to have that default value in the output for existing entities also.
Comment #6
andypostThe related issue affected the similar problem but on field level, comment filed use some hacks to get default value
Comment #7
plachComment #9
jibranComment #10
timmillwoodI am facing this exact issue in #2746279: Adding not null fields to table which have data.
Comment #12
dpiThis issue affects the ability to convert a bundle-less entity type into one with bundles. Adding a new bundle field to the existing content entity, then using
installFieldStorageDefinition
does not work as expected. This is due to the enforcement of all entity key fields need to be NOT NULL (SqlContentEntityStorageSchema::getSharedTableFieldSchema
), but the field is initially created as IS NULL and populated with NULL values.If we had the ability to specify an initial value then the conversion to NOT NULL should go smoother...
Associated: Allow creation of registrant bundles (dpi/rng#108)
Comment #13
BerdirThere's a way to work around that using a custom storage schema class, see http://cgit.drupalcode.org/file_entity/tree/src/FileEntityStorageSchema.....
Comment #14
timmillwood#2820848: Make BlockContent entities publishable depends on this.
Comment #15
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThe ability to use initial values for base fields is badly needed for the Workflow initiative, so I propose to tackle that first in this issue and create a followup for handling configurable fields as well.
Here's an initial patch that will have a lot of fails, so I'm looking for an approval for the direction/approach before spending a lot more time to fix everything :)
Note that it also includes the patch from #2390495: Support marking field storage definitions as required, which is needed for proper handling of 'not null' columns.
Comment #16
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedComment #17
BerdirI think we identified another thing that needs this in #2820848: Make BlockContent entities publishable, we should initialize the content translation metadata fields with a proper initial value, otherwise we have weird things like a status that is NULL. Not sure if we want to do that here or as a follow-up.
Comment #18
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedAs long as they're base fields (and they are, afaik), I have no problem with handling them in this issue. The final patch will probably be a bit bigger than needed, but I guess that's ok if we get more eyes for reviews :)
Comment #20
BerdirOk :)
The initial problem is partially hidden because content_translation always considers the default translation to be published it seems. But while testing #2820848-71: Make BlockContent entities publishable, I did fine at least one bug when adding new translations of an entity that was created before translations were enabled. That would be an actual bug that we can extend tests for I think.
Comment #21
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedHere's some fixes for most of the fails.
Comment #23
jibranPatch looks great. So, this means every custom entity has to update the basefield definitions and provide an upgrade path. I think, that is fine.
Comment #24
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedNot necessarily, we can actually do it for every entity type in a single update function, like in the patch attached.
Also fixed the rest of the fails :)
Comment #25
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedMaybe a better fix for this would be to make
node_update_8002()
depend onnode_update_8300()
.Comment #27
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedA few improvements.
Comment #29
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedI've decided to open a separate issue for this: #2840595: The 'Source feed' field of aggregator items has to be updated and marked as required
And for the last 4 fails of the patch above, I opened #2841291: Fix NOT NULL handling in the entity storage and 'primary key' changes when updating the storage definition of an identifier field
Along with #2390495: Support marking field storage definitions as required, we're now blocked on 3 issues here :/
Comment #30
jibranAdded PP prefix for easy tracking also the issue summary is quite outdated.
Comment #31
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented#2390495: Support marking field storage definitions as required is in so we're down to only 2 blockers. However, there's also a new soft-blocker for one of the child issues: #2842480: system_update_8203() is named incorrectly
Comment #32
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThere's just one dependency left, #2841291: Fix NOT NULL handling in the entity storage and 'primary key' changes when updating the storage definition of an identifier field, so let's how these two patches are working combined.
Comment #36
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedDiscussed this issue with @Berdir and we agreed to split the NOT NULL handling part of this patch to another issue in order to make everything easier to review. Here it is: #2850686: Handle NOT NULL correctly at the storage level for required fields.
Comment #37
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedAfter #2841291: Fix NOT NULL handling in the entity storage and 'primary key' changes when updating the storage definition of an identifier field and #2850686: Handle NOT NULL correctly at the storage level for required fields, this simple patch should be all that's needed here!
Comment #38
timmillwoodQuick note to make sure it doesn't get forgotten:
Needs an actual doc.
Comment #39
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedFixed #38, added support for 'initial values from field' and wrote full test coverage.
This patch is ready for final review IMO, but it still depends on #2841291: Fix NOT NULL handling in the entity storage and 'primary key' changes when updating the storage definition of an identifier field.
Comment #40
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThis needed a reroll for the changes in #2841291: Fix NOT NULL handling in the entity storage and 'primary key' changes when updating the storage definition of an identifier field and HEAD.
Comment #41
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedRerolled for the short array syntax conversion and also made the code that compares the current field schema with the last installed field schema to not take into account any values for 'initial' and 'initial_from_field'.
Comment #42
jibranThere is still IS tag on the issue. I think we have a green patch now so we can update it.
$initial_storage_value = $storage_definition->getInitialValue() will return false if empty so !empty check is not required.
Can we please add comments to explain the situation in which errors are thrown?
Can we please add a comment here? Can we use $this->getMainPropertyName() here instead?
This !is_array check is not needed.
Comment #43
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedNope,
$storage_definition->getInitialValue()
will return an empty array if no initial value is provided.I think each exception message is very clear about the error that triggered it.
Yes we can, fixed.
It is needed because we want to support passing a string to
setInitialValue()
.Comment #44
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedRerolled the combined patch for the reroll of its dependencies.
Comment #45
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedOh, and we have a new dependency: #2615496: A serial/primary key field can not be added to an existing table for some databases :/
Comment #46
xjmMarked #2274597: Not all required fields are marked as required as a duplicate of this issue as per discussion with other committers and the entity and field maintainers back in December. We also agreed that this issue is major priority and probably can be considered a bug.
I also have some note about adding the "related block content issue etc." but I have no idea what that means, so hopefully someone else does. :)
Comment #47
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented#2615496: A serial/primary key field can not be added to an existing table for some databases is in. We're getting closer :)
Comment #48
jibranLet's update the issue summary in meanwhile.
Comment #49
tstoecklerThis should use
$this->fieldStorageDefinitions
per #2867325: SqlContentEntityStorageSchema needlessly fetches storage definitions it already has.I hit this due to a problem with this patch. See the attached patch for the (minimally-invasive) workaround I employed that fixed this for me. Would love to hear your thoughts on this.
I think an explanation of our concrete example is the best way of understanding the problem:
We currently have an entity type that has a
status
field that is a string field. Following Drupal 8.3's best practice we are now adding a booleanstatus
field fromEntityPublishedTrait
to all of entity types. So I want to rename our existing string field (toprocessing_status
in this particular case) and then add the standard booleanstatus
field. My update hook to achieve this conceptually looks like the following:The problem is that when
SqlContentEntityStorageSchema
validates the "from" field and makes sure that the field types of the new field and the "from" field match that does not fetch the last installed field definition of the "from" field, but the one generated from the new code. And in my case thestatus
field is already a boolean in code even though at this point in the update function it should still be considered a string, and the last installed field storage definition rightly is still a string at this point.So I solved this by allowing the field storage definitions to be set in the storage schema and then setting them from
FieldStorageDefinitionListener::onFieldStorageDefinitionCreate()
(see the attached patch). Not sure if there are more elegant ways of solving this. Note also (as I also noted in the code comment in the patch) this would become a non-issue with #2554235: Make the content entity storage and entity query use the last installed definitions instead of the ones living in code.To come around full-circle: my patch only works if
SqlContentEntityStorageSchema
actually uses the field storage definitions that it has as a class property (and that can be set externally with my patch).Comment #50
tstoecklerOops, here's the patch I promised.
Comment #51
tstoecklerThis also needed a re-roll for #2856808: Break out the 'entity_test_update' entity type into its own module and add additional test db dumps.
Comment #52
tstoecklerAnd here, finally, is an interdiff for the additional change I mentioned at the top of ##4. I'll shut up now...
Comment #53
tstoecklerOK, one more... ;-) Here's a combined patch for the one #51 to see if it's still green.
Comment #54
tstoecklerOK, now back to needs work for #49.
Comment #55
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented#2841291: Fix NOT NULL handling in the entity storage and 'primary key' changes when updating the storage definition of an identifier field was discussed in depth during a recent Workflow Initiative meeting at Drupalcon Baltimore and everyone agreed that it is a risky change which will require per-entity type updates and a *lot* of work for keeping BC. Since that will take quite some time to implement properly, I had to take another look at the code in
\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::getSharedTableFieldSchema()
and realized that the dependency on that issue was there simply because I started working on it from that angle, by fixing NOT NULL handling first.But this API addition can be done without it, even as the code comment says:
So let's get initial values support in first and fix NOT NULL handling as the followup.
I agree with #49 so here's a reroll of #44, which also includes the interdiff from #52.
Comment #56
tstoecklerDo you have any thoughts about #50?
Comment #57
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@tstoeckler, the funny thing is that I hit a very similar bug in #2860654-9: Field storage CRUD operations must use the last installed entity type and field storage definitions, which is best explained by the code comment I wrote for that issue:
If we combine that with the problem you explained in #50, the only conclusion is that any CRUD operation of field storage definitions need to use the last installed entity type definition *and* the last installed field storage definitions.
Comment #58
timmillwoodTempted to RTBC but I think this needs another review / feedback on #57 from @tstoeckler.
Also, I guess we need a change record?
Comment #59
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedHere's a change record for this: https://www.drupal.org/node/2877964
As for #49/#50, I tried various approaches in #2860654-11: Field storage CRUD operations must use the last installed entity type and field storage definitions but, as I said there, we need to fix the entity update system as a whole so the only thing we can do in this patch is use the last installed definitions directly when validating the "from" field.
Also updated the issue summary so this should be good to go :)
Comment #60
hchonov@amateescu, looking at the Change Record I am not sure if the initial value has to be set only in the update function installing a new base field or in the base field definition in the entity class as well. Because the definition is cached I would say it has to be provided at both places otherwise there will be a mismatch between the installed definition and the one from code.
Comment #61
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@hchonov, that's a good point. The patch already takes care of that and strips the "initial value" stuff from the definition before caching it, so you only need to specify it in the update function.
Updated the CR to reflect this. Thanks for the pointer! https://www.drupal.org/node/2877964/revisions/view/10481455/10481577
Comment #62
timmillwoodCan't see any reason not to RTBC.
Comment #63
tstoecklerSorry, but I think #50 is still a needs work. As you do in #2860654-11: Field storage CRUD operations must use the last installed entity type and field storage definitions I think for this to go in we need to make the StorageSchema always use the last installed field definitions and I don't think that should be postponed on any issue as it will cause trouble for people doing what I tried to do in #49, especially as there really is no workaround.
Comment #64
tstoecklerOh wait, sorry, #59 already does that. I will verify that that fixes my issue that I had above.
Comment #65
tstoecklerYay, my test worked! Back to RTBC now. Thank you all for your patience and for you great work on this!
I attached some files for folks playing along at home. To reproduce what I did, download all the files from this comment + patch #59 from this issue into your Drupal root and then run the following from there:
Comment #66
tstoecklerOops, forgot the two other files.
Comment #67
larowlanfyi unset takes multiple args.
Can we add a comment here as to why we get the first value of the array. Personally I prefer
reset()
over explicitly using0
but thats just a preference. What happens if an initial value needs to be set as multiple values? Not supported? Do we validate that?This suggests that we support multiple values? Yet earlier we explicitly use delta 0? Am I missing something? Note: not mentioned in change record
Why not set the initial value and return when we know its an empty array. Would make the logic simpler. Early returns++
We only check the first key here? What if someone passes mixed keys? Wouldn't it be better to use array_filter with is_numeric
Oh for php7 and null coalesce :)
Shouldn't we be adding a test for the other formats too?
Do we need a test for multi-value field?
Comment #68
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@larowlan, thanks for reviewing!
Re #67:
1. I know it does but in this case I think the code is more readable when separated on two lines.
2. Added a comment and opened a follow-up issue for supporting configurable and multi-value fields: #2883851: Add initial values support for configurable and multi-valued fields
3. I would like to see if we can support multiple values or not in the followup mentioned above, so I'd prefer not to lock down the API (even through documentation) until we know for sure.
4. 5. That's a direct copy of
\Drupal\Core\Field\BaseFieldDefinition::setDefaultValue()
, so let's open a followup to improve both if you feel strongly about it :)6. Yeah :P
7. We are testing other formats in
\Drupal\Tests\Core\Entity\BaseFieldDefinitionTest::testFieldInitialValue()
8. Nope, but #2883851: Add initial values support for configurable and multi-valued fields should add one.
Comment #70
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedA local testing change slipped in #68. Putting back to RTBC because the only difference to #59 is the code comment added in the interdiff from #68.
Comment #72
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThose were some CI errors.
Comment #73
catchI thought I'd posted a review on here already but apparently not, just to say it looks good to me, giving larowlan a chance to get back on it.
Comment #74
larowlanDon't hold back on my behalf
Comment #75
catchDiscussed this briefly with larowlan, he mentioned possibly throwing an exception from setInitialValue() until #2883851: Add initial values support for configurable and multi-valued fields is done, that seems like it'd make what's supported clearer until the follow-up gets resolved.
I'm 50/50 on the early return - I really like them on longer methods, but less so when the method is already short.
Comment #76
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedSure thing, I don't mind throwing an exception there for now :)
Comment #77
larowlanThanks, +1 RTBC
Comment #78
catchCommitted 29b7ea9 and pushed to 8.4.x. Thanks!
Comment #80
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThanks everyone for the reviews! The next one in line is #2854732: Use initial values for content translation metadata fields and fix existing data and we're just two issues away from being able to convert core's entity types to revisionable and publishable :)
Comment #81
Wim LeersThis also unblocked #2841291: Fix NOT NULL handling in the entity storage and 'primary key' changes when updating the storage definition of an identifier field.
Comment #83
BerdirSo this causes an interesting problem, specifically, it results in an entity schema definition mismatch on all fields that had previously manually set an 'initial' value through a custom StorageSchema handler.
Because what happens now is that we remove those two keys from the *generated* schema but it might still exist in the stored installed schema.
I've seen this so far in paragraphs (status, behavior_settings) as well as in file_entity (type). I think the solution is to process both the stored and the generated schema equally, to consistently ignore that value, or explicitly process the installed schema and remove it from that? I'll open an issue for that.
Comment #84
BerdirCreated #2925550: Fields with a manually defined initial value in the schema have an Entity schema definition mismatch after updating to Drupal 8.4