Problem/Motivation

We use serialized columns in many places in core where JSON encoded arrays could be used. This can lead to security vulnerabilities in some contexts and we should move away from it if possible.

Where it is not possible, we should use the "allowed_classes" option for unserialize to reduce risk: https://www.php.net/manual/en/function.unserialize.php

Proposed resolution

Audit core to find instances of serialized columns, and determine where moving to JSON is possible.

In this issue or a follow up:

1. Deprecate SerializedColumnNormalizerTrait and friends in favor of killing unserializes
2. Write a PSA to tell people to fix their custom normalizers that use unserialize, and call out unserialize generally as probably a bad practice

Remaining tasks

See above.

User interface changes

None.

API changes

TBD.

Data model changes

TBD.

Release notes snippet

TBD.

Comments

samuel.mortenson created an issue. See original summary.

samuel.mortenson’s picture

Title: Move from serialized columns to JSON encoded data wherever possible » Move from serialized columns to JSON encoded data wherever possible, or use allowed_classes
Issue summary: View changes
berdir’s picture

One problem is that apparently people actually have classes stored in commerce order data for example, and apparently even link fields (language objects), no idea what to do about that.

Another is that the transition is obviously going to be challenging, can we just migrate away and then only dela with json, or do we still somehow need to accept the old data..

johnwebdev’s picture

Besides security, many database management systems allows you to query the JSON data type. That would be a pretty cool and useful feature, although that would involve a lot of more work :-)

mradcliffe’s picture

many database management systems allows you to query the JSON data type.

Agreed. I saw this and followed it to maybe get that supported as well.

At the moment blocking that would be officially supporting JSON data type in schema for postgresql: #2472709: PostgreSQL: Support JSON data type. I started moving that forward slightly today with a patch to up the versions for testing on drupalci #3019693: Upgrade test running to Postgres 9.5 since that blocks #2846994: Increase minimum version requirement for Postgres to 10 and require the pg_trgm extension and #2472709: PostgreSQL: Support JSON data type.

I think that this could move forward by setting it in a text data type, which would need to happen anyway for SQLite.

xjm’s picture

Promoting this since it would mitigate many potential RCE problems.

I'd propose splitting this into two issues, one for auditing unserialize() and adding allowed_classes wherever we can (which will be simple technically although we might run into issues like #3045173: MapItem unserialize function in setValue method should allow TranslatableMarkup class). Changing storages is a bigger change. I think we should pursue both strategies for defense-in-depth.

wim leers’s picture

Issue tags: +Entity Field API

One problem is that apparently people actually have classes stored in commerce order data for example, and apparently even link fields (language objects), no idea what to do about that.

This is why I don't see how we can remove serialized columns ('serialize' => TRUE) right away. I think we'd need to first deprecate that in 8.8 (and perhaps provide alternative infrastructure that uses json_encode() instead of serialize()), so we could remove it in 9.0. That gives contrib modules some time to update.

The most challenging update path will AFAICT be field data. Field types in core using 'serialize' => TRUE:

  • link
  • test_object_field (test-only)
  • layout_section
  • serialized_item_test (test-only)

There's more in contrib of course.

wim leers’s picture

samuel.mortenson’s picture

Issue summary: View changes
berdir’s picture

Note: We did such a switch in TMGMT for our data in #3065597: Convert data storage to JSON, fix PostgreSQL compatibility.

This is one of those weird cases where we use a regular string_long field and then store serialized data in there, without ever showing it in a widget or so.

One learning there was that at least for that use case, it wasn't really possibly to provide BC, specifically also because of the serialization tricks/features that we have in place to protect such fields now. Initial patches did that by checking the first character of the string on load to either call unserialize() or json_decode(). But we switched to an update function that automatically updated all data instead.

Then we could just drop that special serialize property definition and now if someone uses these entities in rest/jsonapi then they simply get a JSON-string as value in those fields.. not very pretty but perfectly safe. Otherwise it would have still been possible to submit serialized data, have it stored and then executing code when loading it again.

For us, that's fine because we never considered the raw stored data to be an API, but if we start to support json instead of serialized strings in our field types/storage, then that's going to be much more complicated as Wim said in #7. This goes in a similar direction as the timestamp year-2038 problem that we discussed in Slack some days ago.

xjm’s picture

Issue tags: +mwds2019

 

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

ghost of drupal past’s picture

Field types in core using 'serialize' => TRUE:

  1. link
  2. test_object_field (test-only)
  3. layout_section
  4. serialized_item_test (test-only)
  5. map

I added map so that it's together

geek-merlin’s picture

Mind that json_encode will break for binary (non-unicode) strings.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

andypost’s picture

The map sub-types should allow custom serialization (jsonb, msgpack, igbinary, toml) and probably at Schema API level custom fields need ability to control serialize() on columns

ghost of drupal past’s picture

Given my idea at https://www.drupal.org/files/issues/2020-07-11/3145563_48.patch I wonder whether this should be pushed to the database layer -- or is that too raw?

andypost’s picture

ghost of drupal past’s picture

I closed that issue -- it was filed in 2013 and at the time it was a decent idea (I can't remember whether it was me or Damien who came up with it but that's moot now) but since 2015 all three of our databases have JSON capabilities (OK, PostgreSQL had JSON earlier but JSONB only happened 2014 Dec) so it's moot.

It'd be fantastic to not only do this issue but also move field storage into JSON. One can dream.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

bradjones1’s picture

Coming here from #2232427: Allow field types to control how properties are mapped to and from storage as part of introducing JSON data storage. Also am abstracting out all the various calls to unserialize() in SqlContentEntityStorage, which makes it pretty easy to support an additional key/keys in the property definition alongside serialize to specify allowed_classes.

wim leers’s picture

catch’s picture

I think it would be good to provide an alternative to MapItem like JsonMapItem. Do we need #3343634: Add "json" as core data type to Schema and Database API for that?

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.