Splitted off from #937442: Field type modules cannot maintain their field schema (field type schema change C(R)UD is needed):
Relatively complex problem space; the below list of problems are depending on each other, so it does not make sense to split them into separate issues. The resolution needs to take all of them into account.
Problem
- The serialized
{field_config}.data
column contains values it should not contain. - field_update_field() relies on mistakenly duplicated ("cached") field schema information in
{field_config}.data
.
Goal
- Only store intended data in
$field['data']
. - Require to pass explicit field update information to field_update_field().
Details
- Various functions in Field API contain an attempt to extract (arbitrary) field data that needs to be serialized into the
{field_config}.data
column:
// The serialized 'data' column contains everything from $field that does not // have its own column and is not automatically populated when the field is // read. $data = $field; unset($data['columns'], $data['field_name'], $data['type'], $data['locked'], $data['module'], $data['cardinality'], $data['active'], $data['deleted']); // Additionally, do not save the 'bundles' property populated by // field_info_field(). unset($data['bundles']);
This code is repeated in various places throughout Field module. It does not properly remove
- all {field_config} table column keys
- field storage engine keys (settings)
- the 'data' key itself
- field schema information keys ('columns', 'primary key', 'unique keys', 'indexes', 'foreign keys')
- field info properties
Technically, the current code serializes anything that happens to be set on a $field and is not explicitly unset() into $field['data'].
Most probably, we need to flip the logic around and only serialize certain keys (first and foremost, $field['settings']). Alternatively, we could introduce a special $field['data_serialize'] key that contains a list of $field properties that should be serialized. So in case a module needs its additional keys to be serialized into the data column, it has to add the corresponding $field keys to the list in $field['data_serialize']. (This reminds me a bit of the $user->data problem... the "proper" fix for that was to keep 'data' in 'data', instead of unserializing and tacking it onto the top-level $field array.)
- field_update_field() currently has only one function argument, $field, which is taken as base to update a field. The code automatically retrieves the current field configuration and the currently defined field schema to build two variables out of that:
$prior_field
and$field
. Those two variables are passed along to the responsible field storage engine to perform the required updates.Now it gets weird:
- To update a field, you would have to pass the old field structure in $field. Because field_update_field() invokes hook_field_schema() to retrieve the current schema for $field. So passing the old structure is technically the only way to have a difference between $prior_field and $field.
- But right in the first line, field_update_field() does
$prior_field = field_read_field($field['field_name']);
, so the passed $field is supposed to be the new field structure. $prior_field is effectively read from the current database information via field_read_field(). However, $prior_field therefore does not contain field schema information. So $prior_field and $field are always identical... - ...if there was not the circumstance that a field's schema information keys are not removed from the serialized
{field_config}.data
column. This means that the field schema is partially cached in the serialized data and therefore partially tacked back onto the $prior_field, because field_read_fields() only populates (overwrites)$field['columns']
, but not$field['indexes']
or any other field schema key.This is the only reason why tests are passing currently. The current field_sql_storage tests assert that indexes can be updated. That only works, because $field['indexes'] is cached in $field['data'] and is not reset with current field schema information by field_read_fields(). In turn, $prior_field and $field are effectively different in field_update_field().
I think there are only two valid ways out:
- Change field_update_field($field) into field_update_field($prior_field).
So field_update_field() would automatically retrieve the current structure, but $prior_field must be properly and correctly prepared by the caller. While this proposal probably makes sense for module updates, it makes not really sense for runtime operations; i.e., modules passing a new, wanted $field structure, perhaps merely containing new $field['settings'], not really caring for the current and previous field's schema or whatnot. So I guess this proposal makes not much sense.
- Add a second argument; i.e., field_update_field($field, $prior_field = NULL).
Only if $prior_field is not set, we automatically retrieve and set it (like now), because that means that only the field configuration needs to be updated. To perform schema updates, $prior_field must contain the previous field structure, so that storage engines can properly update it.
Comment | File | Size | Author |
---|---|---|---|
#11 | drupal.field-config-data.11.patch | 28.56 KB | sun |
#9 | drupal.field-config-data.8.patch | 4.05 KB | sun |
#7 | drupal.field-config-data.7.patch | 4.09 KB | sun |
#1 | 937554-1_cleanup_field_data.patch | 2.75 KB | alex_b |
Comments
Comment #1
alex_b CreditAttribution: alex_b commentedAttached patch is split out from sun's patch on #937442: Field type modules cannot maintain their field schema (field type schema change C(R)UD is needed).
It does a more comprehensive cleanup of keys before saving.
However, I don't follow where 'field_update_field() relies on bogus data values'. Specifically:
I believe you mean field storage information, and that information _is_ present in $prior_field (see field_read_fields()).
Downgrading this for now. Seems to me rather like a 'cleanup' issue than a critical bug. I may be missing sth here.
Comment #2
chx CreditAttribution: chx commentedComment #3
yched CreditAttribution: yched commentedsubs for now
Comment #4
yched CreditAttribution: yched commented- Agreed on the code structure - more maintainable.
- Agreed on the 'translatable' column, currently not removed from the 'data' blob
- 'storage_type', 'storage_module', 'storage_active' and 'data' are {field_config} columns too, so they fall under the 1st category ?
Note that I considered relying on {field_config} schema a couple times here, but that sounds a bit scary.
- About 'storage' : that's the only place where $field['storage']['settings'] are stored. We can't ditch them.
Powered by Dreditor.
Comment #5
yched CreditAttribution: yched commentedHmm - we take care of adding those entries if missing at write time in field_create_field() (also true for 'settings' found in $instance arrays - instance settings, widget settings, formatter settings), so there should be no need to initialize them on read ?
Powered by Dreditor.
Comment #6
sunActually, I spent some time thinking through this in the past days, and I think we should at least *try* whether the following, reversed proposal is doable:
Upon read, unserialize {field_config}.data on $field, like it is now. But upon save, only move $field['settings'] into $field['data']['settings'], and only serialize $field['data']. See what breaks.
In fact, this is the identical problem we had with {users}.data, which led to similar problems and even security issues. Therefore, the only clean way to prevent bad things from happening would be to not unserialize onto $field at all, but always keep $field['data'] in $field['data']. However, I'm quite sure that many people will veto this change.
Comment #7
sunSo let's simply try + see what breaks.
I'm totally sure this breaks, but then again, that would be just another argument for doing a dedicated $field['data'] (like $user->data); i.e., without overloading on $field.
Comment #8
olobank CreditAttribution: olobank commentedi get this error message:
Fatal error: Call to undefined function content_types() in /home/jobbergu/public_html/modules/country/country_select/country_select.module on line 53. what did i do wrong?
Comment #9
sunerm, ok, serializing twice definitely breaks more :P
Comment #11
sunAlright, so while that approach would be doable, it would really be a good idea to keep $field['data'] simply in $field['data'].
Ugh. I just see that the same problem exists for field instances, widgets, and formatters.
Attached patch is merely to show what I mean.
Comment #12
sunForgot to change status. Also, this bug is major at minimum, given that we already serialize and store bogus data, and also, given that all of us already know the consequences of this code pattern from $user->data.
Comment #14
sunThe current code in HEAD
1) relies on certain but arbitrary circumstances to work.
2) uses a pattern that is commonly known to break and lead to security issues.
Therefore, bumping to critical and tagging as release blocker. Both points are sufficiently described in the opening issue description.
--
The fact that #11 failed probably means that further functionality, unrelated to plain field settings, currently relies on the behavior that everything and anything gets serialized into {field_config}.data.
Comment #15
catchThe problem with users.data was that we serialized the password field which had previously been kept out of that column by hard-coded lists of things to exclude from it. That was specific to user module, field config doesn't have highly sensitive information that's hashed in one column and has to be manually kept out of another, so I don't see why the 'security' flag gets raised here at all.
In general, while this is a bit messy, I really can't see any reason to justify it being critical, so I'm bumping down to 'major'. I'd want to see how this renders Field API "unusable" before seeing this bumped back up, and there's no sign of that.
Comment #16
sunAs explained in the OP, field_update_field() relies on mistakenly serialized and therefore "cached" field schema information to do its job. This "cache" obviously fails too quickly and too easily.
Furthermore, the current logic of field_update_field() implies that schema updates would be as simple as comparing an old schema to a new schema. If schema updates would be that simple, we would not remotely consider to use hook_module_N() anywhere else.
Therefore, the current code is not "messy". It is fundamentally flawed.
Sure, it is not "unusable". But the result of any field_update_field() operation is arbitrary. The same applies to field instances, widgets, and formatters, which all happen to use the same serialization logic.
So if a crucial API that's responsible for data storage doing "whatever" is not critical, then I'm losing my last bits of hope for sanity with regard to our issue priorities.
Comment #17
catchThe field API doesn't support changes to field schema, see http://drupal.org/node/937442#comment-3576322 which is now marked as a documentation issue. Since the schema update you speak of will not be happening during Drupal 7, I see no reason the D7 release should be blocked on something which might theoretically affect that at an indeterminate point in the future. That doesn't stop it being fixed before the release comes out, but we are past the point of blocking the release on things we think are bad and might possibly cause problems in certain situations that aren't clearly defined.
We cleared up a lot of stupid mess in D7, we will have to clear up mess in D8 too, this should not come as a shock. I am losing all hope and sanity of there ever being a Drupal 7 release because of constant and perpetual scope creep. I suggest reading, then re-reading http://drupal.org/node/45111.
Comment #18
sun.core CreditAttribution: sun.core commented#11: drupal.field-config-data.11.patch queued for re-testing.
Comment #20
chx CreditAttribution: chx commentedIf this is as bad as users.data then it's certainly normal as we were doing that for an uncounted number of releases... that was quicksketch's take on this one and mine is similar -- I brought up with him because I failed to see the major-ity of this.
Comment #21
swentel CreditAttribution: swentel commentedField API is now in config, schema is not in there anymore. Moving to 7.x, although not sure whether we can easily (and want?) to fix that there ?
Comment #22
Dave ReidIndexes (which is schema) are still stored in the exported config, which causes errors. Still an issue with 8.x. See new related issues.
Comment #31
catchThis is still valid.
Things to check:
1. Do we actually read 'indexes' from field config anywhere? If not could we just delete/stop populating the key entirely?
2. If we do read from 'indexes' does this ever affect the actual field table schema?
3. Where else is the indexes information retrieved from (i.e. when a field is installed from config, where are the indexes for the schema definition pulled from).
Comment #34
larowlan1. yep see 2. And we have FieldSqlStorageTest for verifying you can modify the indexes
2,
\Drupal\field\Entity\FieldStorageConfig::getSchema
adds them to the DB schema3. I think 2 answers that 🤔