Splitted off from #937442: Field type modules cannot maintain their field schema (field type schema change C(R)UD is needed):

Relatively complex problem space; the below list of problems are depending on each other, so it does not make sense to split them into separate issues. The resolution needs to take all of them into account.

Problem

  • The serialized {field_config}.data column contains values it should not contain.
  • field_update_field() relies on mistakenly duplicated ("cached") field schema information in {field_config}.data.

Goal

  • Only store intended data in $field['data'].
  • Require to pass explicit field update information to field_update_field().

Details

  • Various functions in Field API contain an attempt to extract (arbitrary) field data that needs to be serialized into the {field_config}.data column:
      // The serialized 'data' column contains everything from $field that does not
      // have its own column and is not automatically populated when the field is
      // read.
      $data = $field;
      unset($data['columns'], $data['field_name'], $data['type'], $data['locked'], $data['module'], $data['cardinality'], $data['active'], $data['deleted']);
      // Additionally, do not save the 'bundles' property populated by
      // field_info_field().
      unset($data['bundles']);
    

    This code is repeated in various places throughout Field module. It does not properly remove

    • all {field_config} table column keys
    • field storage engine keys (settings)
    • the 'data' key itself
    • field schema information keys ('columns', 'primary key', 'unique keys', 'indexes', 'foreign keys')
    • field info properties

    Technically, the current code serializes anything that happens to be set on a $field and is not explicitly unset() into $field['data'].

    Most probably, we need to flip the logic around and only serialize certain keys (first and foremost, $field['settings']). Alternatively, we could introduce a special $field['data_serialize'] key that contains a list of $field properties that should be serialized. So in case a module needs its additional keys to be serialized into the data column, it has to add the corresponding $field keys to the list in $field['data_serialize']. (This reminds me a bit of the $user->data problem... the "proper" fix for that was to keep 'data' in 'data', instead of unserializing and tacking it onto the top-level $field array.)

  • field_update_field() currently has only one function argument, $field, which is taken as base to update a field. The code automatically retrieves the current field configuration and the currently defined field schema to build two variables out of that: $prior_field and $field. Those two variables are passed along to the responsible field storage engine to perform the required updates.

    Now it gets weird:

    • To update a field, you would have to pass the old field structure in $field. Because field_update_field() invokes hook_field_schema() to retrieve the current schema for $field. So passing the old structure is technically the only way to have a difference between $prior_field and $field.
    • But right in the first line, field_update_field() does $prior_field = field_read_field($field['field_name']);, so the passed $field is supposed to be the new field structure. $prior_field is effectively read from the current database information via field_read_field(). However, $prior_field therefore does not contain field schema information. So $prior_field and $field are always identical...
    • ...if there was not the circumstance that a field's schema information keys are not removed from the serialized {field_config}.data column. This means that the field schema is partially cached in the serialized data and therefore partially tacked back onto the $prior_field, because field_read_fields() only populates (overwrites) $field['columns'], but not $field['indexes'] or any other field schema key.

      This is the only reason why tests are passing currently. The current field_sql_storage tests assert that indexes can be updated. That only works, because $field['indexes'] is cached in $field['data'] and is not reset with current field schema information by field_read_fields(). In turn, $prior_field and $field are effectively different in field_update_field().

    I think there are only two valid ways out:

    1. Change field_update_field($field) into field_update_field($prior_field).

      So field_update_field() would automatically retrieve the current structure, but $prior_field must be properly and correctly prepared by the caller. While this proposal probably makes sense for module updates, it makes not really sense for runtime operations; i.e., modules passing a new, wanted $field structure, perhaps merely containing new $field['settings'], not really caring for the current and previous field's schema or whatnot. So I guess this proposal makes not much sense.

    2. Add a second argument; i.e., field_update_field($field, $prior_field = NULL).

      Only if $prior_field is not set, we automatically retrieve and set it (like now), because that means that only the field configuration needs to be updated. To perform schema updates, $prior_field must contain the previous field structure, so that storage engines can properly update it.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

Priority: Critical » Normal
FileSize
2.75 KB

Attached patch is split out from sun's patch on #937442: Field type modules cannot maintain their field schema (field type schema change C(R)UD is needed).

It does a more comprehensive cleanup of keys before saving.

However, I don't follow where 'field_update_field() relies on bogus data values'. Specifically:

However, $prior_field therefore does not contain field schema information.

I believe you mean field storage information, and that information _is_ present in $prior_field (see field_read_fields()).

Downgrading this for now. Seems to me rather like a 'cleanup' issue than a critical bug. I may be missing sth here.

chx’s picture

Issue tags: +Needs tests
yched’s picture

subs for now

yched’s picture

- Agreed on the code structure - more maintainable.
- Agreed on the 'translatable' column, currently not removed from the 'data' blob

+++ modules/field/field.crud.inc	10 Oct 2010 17:50:12 -0000
@@ -465,12 +465,25 @@ function field_update_field($field) {
+    // Ignore {field_config} keys.
+    'id', 'field_name', 'type', 'module', 'active', 'locked', 'cardinality',
+    'translatable', 'deleted',
+    // Ignore all possible storage engine keys.
+    'storage', 'storage_type', 'storage_module', 'storage_active',
+    // Ignore the data key itself.
+    'data',

- 'storage_type', 'storage_module', 'storage_active' and 'data' are {field_config} columns too, so they fall under the 1st category ?
Note that I considered relying on {field_config} schema a couple times here, but that sounds a bit scary.

- About 'storage' : that's the only place where $field['storage']['settings'] are stored. We can't ditch them.

Powered by Dreditor.

yched’s picture

+++ modules/field/field.info.inc	10 Oct 2010 17:50:12 -0000
@@ -255,7 +255,9 @@ function _field_info_collate_fields($res
 function _field_info_prepare_field($field) {
...
+  $field += array('settings' => array());
   $field['settings'] += field_info_field_settings($field['type']);
+  $field['storage'] += array('settings' => array());

Hmm - we take care of adding those entries if missing at write time in field_create_field() (also true for 'settings' found in $instance arrays - instance settings, widget settings, formatter settings), so there should be no need to initialize them on read ?

Powered by Dreditor.

sun’s picture

Actually, I spent some time thinking through this in the past days, and I think we should at least *try* whether the following, reversed proposal is doable:

Upon read, unserialize {field_config}.data on $field, like it is now. But upon save, only move $field['settings'] into $field['data']['settings'], and only serialize $field['data']. See what breaks.

In fact, this is the identical problem we had with {users}.data, which led to similar problems and even security issues. Therefore, the only clean way to prevent bad things from happening would be to not unserialize onto $field at all, but always keep $field['data'] in $field['data']. However, I'm quite sure that many people will veto this change.

sun’s picture

So let's simply try + see what breaks.

I'm totally sure this breaks, but then again, that would be just another argument for doing a dedicated $field['data'] (like $user->data); i.e., without overloading on $field.

olobank’s picture

Title: field_update_field() relies on bogus {field_config}.data values » country select module
Version: 7.x-dev » 6.19

i get this error message:
Fatal error: Call to undefined function content_types() in /home/jobbergu/public_html/modules/country/country_select/country_select.module on line 53. what did i do wrong?

sun’s picture

Title: country select module » field_update_field() relies on bogus {field_config}.data values
Version: 6.19 » 7.x-dev
FileSize
4.05 KB

erm, ok, serializing twice definitely breaks more :P

Status: Needs review » Needs work

The last submitted patch, drupal.field-config-data.8.patch, failed testing.

sun’s picture

Alright, so while that approach would be doable, it would really be a good idea to keep $field['data'] simply in $field['data'].

Ugh. I just see that the same problem exists for field instances, widgets, and formatters.

Attached patch is merely to show what I mean.

sun’s picture

Priority: Normal » Major
Status: Needs work » Needs review

Forgot to change status. Also, this bug is major at minimum, given that we already serialize and store bogus data, and also, given that all of us already know the consequences of this code pattern from $user->data.

Status: Needs review » Needs work

The last submitted patch, drupal.field-config-data.11.patch, failed testing.

sun’s picture

Assigned: sun » Unassigned
Priority: Major » Critical
Issue tags: +Release blocker

The current code in HEAD

1) relies on certain but arbitrary circumstances to work.

2) uses a pattern that is commonly known to break and lead to security issues.

Therefore, bumping to critical and tagging as release blocker. Both points are sufficiently described in the opening issue description.

--
The fact that #11 failed probably means that further functionality, unrelated to plain field settings, currently relies on the behavior that everything and anything gets serialized into {field_config}.data.

catch’s picture

Priority: Critical » Major

The problem with users.data was that we serialized the password field which had previously been kept out of that column by hard-coded lists of things to exclude from it. That was specific to user module, field config doesn't have highly sensitive information that's hashed in one column and has to be manually kept out of another, so I don't see why the 'security' flag gets raised here at all.

In general, while this is a bit messy, I really can't see any reason to justify it being critical, so I'm bumping down to 'major'. I'd want to see how this renders Field API "unusable" before seeing this bumped back up, and there's no sign of that.

sun’s picture

As explained in the OP, field_update_field() relies on mistakenly serialized and therefore "cached" field schema information to do its job. This "cache" obviously fails too quickly and too easily.

Furthermore, the current logic of field_update_field() implies that schema updates would be as simple as comparing an old schema to a new schema. If schema updates would be that simple, we would not remotely consider to use hook_module_N() anywhere else.

Therefore, the current code is not "messy". It is fundamentally flawed.

Sure, it is not "unusable". But the result of any field_update_field() operation is arbitrary. The same applies to field instances, widgets, and formatters, which all happen to use the same serialization logic.

So if a crucial API that's responsible for data storage doing "whatever" is not critical, then I'm losing my last bits of hope for sanity with regard to our issue priorities.

catch’s picture

The field API doesn't support changes to field schema, see http://drupal.org/node/937442#comment-3576322 which is now marked as a documentation issue. Since the schema update you speak of will not be happening during Drupal 7, I see no reason the D7 release should be blocked on something which might theoretically affect that at an indeterminate point in the future. That doesn't stop it being fixed before the release comes out, but we are past the point of blocking the release on things we think are bad and might possibly cause problems in certain situations that aren't clearly defined.

We cleared up a lot of stupid mess in D7, we will have to clear up mess in D8 too, this should not come as a shock. I am losing all hope and sanity of there ever being a Drupal 7 release because of constant and perpetual scope creep. I suggest reading, then re-reading http://drupal.org/node/45111.

sun.core’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests, -Release blocker

#11: drupal.field-config-data.11.patch queued for re-testing.

Status: Needs review » Needs work
Issue tags: +Needs tests, +Release blocker

The last submitted patch, drupal.field-config-data.11.patch, failed testing.

chx’s picture

Version: 7.x-dev » 8.x-dev
Priority: Major » Normal
Issue tags: -Release blocker

If this is as bad as users.data then it's certainly normal as we were doing that for an uncounted number of releases... that was quicksketch's take on this one and mine is similar -- I brought up with him because I failed to see the major-ity of this.

swentel’s picture

Version: 8.x-dev » 7.x-dev

Field API is now in config, schema is not in there anymore. Moving to 7.x, although not sure whether we can easily (and want?) to fix that there ?

Dave Reid’s picture

Indexes (which is schema) are still stored in the exported config, which causes errors. Still an issue with 8.x. See new related issues.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

catch’s picture

Title: field_update_field() relies on bogus {field_config}.data values » Field storage config entities 'indexes' key
Version: 8.9.x-dev » 9.2.x-dev
Issue tags: +Bug Smash Initiative, +Needs issue summary update

This is still valid.

Things to check:

1. Do we actually read 'indexes' from field config anywhere? If not could we just delete/stop populating the key entirely?

2. If we do read from 'indexes' does this ever affect the actual field table schema?

3. Where else is the indexes information retrieved from (i.e. when a field is installed from config, where are the indexes for the schema definition pulled from).

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.1.10 (June 4, 2021) and Drupal 9.2.10 (November 24, 2021) were the last bugfix releases of those minor version series. Drupal 9 bug reports should be targeted for the 9.3.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

larowlan’s picture

1. yep see 2. And we have FieldSqlStorageTest for verifying you can modify the indexes
2, \Drupal\field\Entity\FieldStorageConfig::getSchema adds them to the DB schema
3. I think 2 answers that 🤔

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.