Problem/Motivation
At present we store component inputs in a JSON blob.
This means we cannot efficiently query components as it is just a large JSON Blob. #3521202: Store XB field type's "deps_*" columns in separate table to allow efficient querying does allow us to at least identify which entities are using which plugins/components but it doesn't provide any path for updates. So for example if a block changes its settings, we have to loop over every revision and search for components and then update the whole blob.
Proposed resolution
Store the data normalized - in the same way that we currently do for field API fields in core. But instead of one table per field (prop) we would have one table per component version (set of fields)
I had wondered whether we actually need two config entity types at all - i.e. could field union directly use a component config entity type instead of using its own, or could XB directly use field unions without an extra entity type in-between, but... no idea whether that would even be desirable even if it's possible.
I think we don't need a lot of the complexity of Field Union module and it would instead be better to borrow the concept of field-type derivatives from Field Union. i.e. We can derive one field-type plugin per component and version
Spike outcomes, some of these may be split into separate child stories:
- Evaluate if this is even feasible
- Try to do it in a storage layer that supports one table per component (version), not one per component version per entity type (as is the case with fields in core)
- Explore if we can do it without requiring field definitions for each component (field derivative). This will bloat the field map and lead to performance issues
- Explore the impact on the number of tables and joins this will entail - we can expect there might be up to 50 different component types in a given site, possibly more. We will likely also need to store versions of components in separate tables if new props are added or data-types change. So there might be as many as 100 tables. That's assuming we can reuse the same table across multiple entity-types. If we have one table per field per entity-type.
- Explore decorating SqlContentEntityStorage and storage handler that extend from it to support loading of this data in a single query during standard entity load even though we're not making use of standard fields here
- Explore what views integration would look like
- Explore nested field definitions for object and array shape data
- Explore making this something component source plugins control as it doesn't apply to all source plugins
- Explore what this would like for e.g. Block settings that whilst modeled using config schema (and therefore typed data) are arbitrary in shape and would traditionally be stored in a serialized column
- Consider the implications for API consumers like JSON:API
Comments
Comment #2
larowlanComment #3
larowlanComment #4
wim leersThis potential direction is why I prioritized #3467870: Support `{type: array, …}` prop shapes and got it finished. Because I have a hard time seeing how this work with multi-value fields. Especially because of #3052670: Support multi-valued "field union"s.
So, to avoid us adopting this and potentially losing multi-value support, I made sure #3467870: Support `{type: array, …}` prop shapes was working, and proves that multi-value scalars (
type: array, items: { type: integer }— see thesparklinetest SDC) and multi-value object shapes (see theimage-gallerytest SDC) can work in the current architecture.(I'm not fundamentally opposed to this — just concerned we'd forget about that, and now we can't! 👍)
Related: I tried to push #3467890 forward and assigned it to you at #3467890-13: [later phase] Support `{type: object, …}` prop shapes with single level that require *multiple* field types: use `field_union`? — OUT OF SCOPE: nested components/component reuse for feedback, @larowlan 😄
Comment #5
wim leersIndeed. And for that, we have #3501708: Prove that it *will* be possible to apply block settings update paths (assuming #3521221 in core) to stored XB component trees in config/content.
Comment #6
wim leers🤯 That could easily be hundreds of DB tables: a site can easily have a 100 components (the issue summary assumes 50), and for each of those multiple versions. (Note that "version" here is a massively overloaded term — there can be very different reasons. See #3523841-6: Versioned Component config entities (SDC, JS: prop_field_definitions, block: default_setting, all: slots for fallback) + component instances refer to versions ⇒ less data to store per XB field row.)
⚠️ Concern: this would make #3463996: [META] When the field type, storage/instance settings, widget, expression or requiredness for an SDC/code component prop changes, the Content Creator must be able to upgrade much harder. What if a site with a million existing revisions decides to implement
hook_storage_prop_shape_alter()to change the field type for a prop of an SDC that is present in all of them (to improve the authoring experience, or to switch from plain images to Media Library or $REASON).This architecture would require rows to be removed from one table and moved into another!
Although I think it could be argued that that would be much clearer. It'd also allow dropping tables for older "component versions" that don't have any remaining rows anymore, and would also allow removing the corresponding entries in the
Componentconfig entity that #3523841: Versioned Component config entities (SDC, JS: prop_field_definitions, block: default_setting, all: slots for fallback) + component instances refer to versions ⇒ less data to store per XB field row would've added.🤔 Not sure yet, but for sure interesting 😄
I don't see yet how that'd be meaningful. Views lists things of the same type in a single list/grid/table/…. But here those same things (instances of the same component version) are spread across many entities and bear no relation to one another. Unless you're thinking about listing all the different component instances of a single entity? Or something else still? But listing the first or fifth or Nth instance of some component still is not meaningful?
I struggle to follow your thinking here 😇
I'm really curious about this part 🧐
Comment #7
catch@Wim #3468272: Store the ComponentTreeStructure field property one row per component instance is (I think, still catching up on the latest issues a bit) a row-per-component with a single JSON column for the values in a single table, so it would be mutually exclusive with this issue.
For me, having multiple tables, or multiple rows for a single delta, feels like it would be incredibly complex both from the point of view of having to adapt all SQL storage backends to support it, and also for views integration.
However row-per-component with a JSON column would simplify dependency checking, updates, potentially things like revision compression etc. and might well be useful for #3462219: [META] Support alternative renderings of prop data added for the 'full' view mode such as for search indexing or newsletters too. Views integration feels like a very low priority because the data is arbitrary as you say.
I have on occasion added listing filters with CONTAINS on the body field or similar on sites that otherwise don't use the search module, when the dataset is small enough that it won't kill the database. There might be the odd case like that but don't think there will be many.
I could see wanting to list entities that are using component x - that would be easy to do with row-per-component because it doesn't rely on the values. e.g. you could list all articles that have an image gallery in them, things like that.
A JSON column would make views integration (at least for the values if not other things like component) dependent on #3343634: Add "json" as core data type to Schema and Database API, but that feels like a reasonable limitation to me. No matter how complicated it might be, it is definitely going to be less complicated than views integration for the current JSON blob with everything in it, and it might be less complicated than supporting a fully relational schema here too.
So for me personally, I would postpone this issue on #3468272: Store the ComponentTreeStructure field property one row per component instance, and if that one works out, then this might not be very necessary to explore.
Comment #8
wim leersAgreed! Reflecting that in the issue metadata 👍
Comment #9
wim leers#3468272: Store the ComponentTreeStructure field property one row per component instance is in.
#3523841: Versioned Component config entities (SDC, JS: prop_field_definitions, block: default_setting, all: slots for fallback) + component instances refer to versions ⇒ less data to store per XB field row is actively being worked on and will pave the path for this.
So keeping the issue status the same. 👍
Comment #10
wim leersI'm working to update #3520449: [META] Production-ready data storage to be comprehensive. But this isn't linked yet from there. So I needed to dig deeper than #9.
So on second thought, I wondered how this was still relevant after #3468272: Store the ComponentTreeStructure field property one row per component instance. @catch wrote in #7:
+
Beautiful. That's exactly what I think. And the "field union metadata" aspects of this proposed spike are actually covered by #3523841: Versioned Component config entities (SDC, JS: prop_field_definitions, block: default_setting, all: slots for fallback) + component instances refer to versions ⇒ less data to store per XB field row, as I wrote in #9.
So: closing :)