Problem/Motivation
We use serialized columns in many places in core where JSON encoded arrays could be used. This can lead to security vulnerabilities in some contexts and we should move away from it if possible.
Where it is not possible, we should use the "allowed_classes" option for unserialize to reduce risk: https://www.php.net/manual/en/function.unserialize.php
Proposed resolution
Audit core to find instances of serialized columns, and determine where moving to JSON is possible.
In this issue or a follow up:
1. Deprecate SerializedColumnNormalizerTrait and friends in favor of killing unserializes
2. Write a PSA to tell people to fix their custom normalizers that use unserialize, and call out unserialize generally as probably a bad practice
Remaining tasks
See above.
User interface changes
None.
API changes
TBD.
Data model changes
TBD.
Release notes snippet
TBD.
Comments
Comment #2
samuel.mortensonComment #3
berdirOne problem is that apparently people actually have classes stored in commerce order data for example, and apparently even link fields (language objects), no idea what to do about that.
Another is that the transition is obviously going to be challenging, can we just migrate away and then only dela with json, or do we still somehow need to accept the old data..
Comment #4
johnwebdev commentedBesides security, many database management systems allows you to query the JSON data type. That would be a pretty cool and useful feature, although that would involve a lot of more work :-)
Comment #5
mradcliffeAgreed. I saw this and followed it to maybe get that supported as well.
At the moment blocking that would be officially supporting JSON data type in schema for postgresql: #2472709: PostgreSQL: Support JSON data type. I started moving that forward slightly today with a patch to up the versions for testing on drupalci #3019693: Upgrade test running to Postgres 9.5 since that blocks #2846994: Increase minimum version requirement for Postgres to 10 and require the pg_trgm extension and #2472709: PostgreSQL: Support JSON data type.
I think that this could move forward by setting it in a text data type, which would need to happen anyway for SQLite.
Comment #6
xjmPromoting this since it would mitigate many potential RCE problems.
I'd propose splitting this into two issues, one for auditing
unserialize()and addingallowed_classeswherever we can (which will be simple technically although we might run into issues like #3045173: MapItem unserialize function in setValue method should allow TranslatableMarkup class). Changing storages is a bigger change. I think we should pursue both strategies for defense-in-depth.Comment #7
wim leersThis is why I don't see how we can remove serialized columns (
'serialize' => TRUE) right away. I think we'd need to first deprecate that in 8.8 (and perhaps provide alternative infrastructure that usesjson_encode()instead ofserialize()), so we could remove it in 9.0. That gives contrib modules some time to update.The most challenging update path will AFAICT be field data. Field types in core using
'serialize' => TRUE:linktest_object_field(test-only)layout_sectionserialized_item_test(test-only)There's more in contrib of course.
Comment #8
wim leersComment #9
samuel.mortensonComment #10
berdirNote: We did such a switch in TMGMT for our data in #3065597: Convert data storage to JSON, fix PostgreSQL compatibility.
This is one of those weird cases where we use a regular string_long field and then store serialized data in there, without ever showing it in a widget or so.
One learning there was that at least for that use case, it wasn't really possibly to provide BC, specifically also because of the serialization tricks/features that we have in place to protect such fields now. Initial patches did that by checking the first character of the string on load to either call unserialize() or json_decode(). But we switched to an update function that automatically updated all data instead.
Then we could just drop that special serialize property definition and now if someone uses these entities in rest/jsonapi then they simply get a JSON-string as value in those fields.. not very pretty but perfectly safe. Otherwise it would have still been possible to submit serialized data, have it stored and then executing code when loading it again.
For us, that's fine because we never considered the raw stored data to be an API, but if we start to support json instead of serialized strings in our field types/storage, then that's going to be much more complicated as Wim said in #7. This goes in a similar direction as the timestamp year-2038 problem that we discussed in Slack some days ago.
Comment #11
xjmComment #13
ghost of drupal pastField types in core using 'serialize' => TRUE:
I added map so that it's together
Comment #14
geek-merlinMind that json_encode will break for binary (non-unicode) strings.
Comment #17
andypostThe
mapsub-types should allow custom serialization (jsonb, msgpack, igbinary, toml) and probably at Schema API level custom fields need ability to controlserialize()on columnsComment #18
ghost of drupal pastGiven my idea at https://www.drupal.org/files/issues/2020-07-11/3145563_48.patch I wonder whether this should be pushed to the database layer -- or is that too raw?
Comment #19
andypostThere was XML suggestion earlier #2162161: Change DatabaseStorage to use XML instead of serialize
Comment #20
ghost of drupal pastI closed that issue -- it was filed in 2013 and at the time it was a decent idea (I can't remember whether it was me or Damien who came up with it but that's moot now) but since 2015 all three of our databases have JSON capabilities (OK, PostgreSQL had JSON earlier but JSONB only happened 2014 Dec) so it's moot.
It'd be fantastic to not only do this issue but also move field storage into JSON. One can dream.
Comment #26
bradjones1Coming here from #2232427: Allow field types to control how properties are mapped to and from storage as part of introducing JSON data storage. Also am abstracting out all the various calls to
unserialize()inSqlContentEntityStorage, which makes it pretty easy to support an additional key/keys in the property definition alongside serialize to specifyallowed_classes.Comment #27
wim leersComment #28
catchI think it would be good to provide an alternative to MapItem like JsonMapItem. Do we need #3343634: Add "json" as core data type to Schema and Database API for that?