Closed (fixed)
Project:
Experience Builder
Version:
0.x-dev
Component:
Data model
Priority:
Critical
Category:
Task
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
15 Aug 2024 at 08:08 UTC
Updated:
10 Jun 2025 at 16:39 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
lauriiiWould it be possible to document what are the benefits and downsides of using a flat table over a JSON blob? Based on the issue summary, it isn't clear to me what are the trade-offs involved in this decision.
Comment #4
wim leersThoughts:
\Drupal\Core\TypedData\PrimitiveInterfaceif that data type is used for a field property — AFAICT)The only exception I can think of (right now, in the little time I have this morning while writing this up to ensure we follow through!): the
PathItemfield type, which essentially is a computed field type that stores its data in another database table and does the necessary additional DB queries.Maybe that can work? That maybe actually gets closer to @effulgentsia's original proposal at #3440578: [PP-2] JSON-based data storage proposal for component-based page building, where he proposed to do some deduplication stuff that most of us wanted to defer to later?
We could then literally have a single DB table for all XB uses, by expanding the columns from
(@longwave's comment)
to:
… which would not violate the
3.2.4 Facilitating `component props` changessection indocs/data-model.md.Comment #5
longwaveOne benefit is this becomes trivial:
@Wim Leers additional note: we also have to consider langcode for asymmetric translations
Comment #6
catch@lauriii One of the advantages here is it would (at least partially) solve the same problem of non-portable JSON queries (which are already in the XB code base) that #3457504: XB field type: calculate all dependencies, store them, surface in new Component "Audit" operation is also trying to remove - same general problem as #5.
Comment #7
catchI'm not sure exactly which data isn't fitting in a single column, but could only that column be JSON (storing a flat array of whatever doesn't fit) without undermining the goal of the issue?
Comment #8
wim leers#5++
#6++
#7: an XB field must store a tree of components + sources for their props values. This issue proposes to store that tree as a list, so not as a single column in a single row, but as multiple columns in multiple rows. That's what I was trying to convey in #4, and is also why I pointed to @effulgentsia's "deduplication" proposal (we could implement this by storing a single value for each XB field revision's tree, with that value pointing to a foreign key in another table, where @longwave's proposed columns could then be used).
Comment #9
effulgentsia commentedI think this proposal makes sense. My initial thinking behind JSON for this was thinking we wanted to store it as a tree. But since we've already changed that to a flat JSON representation, I don't think JSON for this part has any special value.
I think what we could do is store this as a multi-valued field. If we want to keep tree and props as a single field, we could do that by each item containing the following properties/columns:
component_instance_id, component, parent, slot, delta_in_slot, props, where that last one is the JSON props source/expression/values for that component instance.However, at some point, I think it will make sense for us to model this as two fields: in other words, pull the
propscolumn out of the above suggestion and make it its own separate field from the field representing the tree. Not sure if we want to jump to the two field implementation as part of this issue or continue to keep it as one field until we separately discuss the pros and cons of two fields vs one.Comment #10
johnwebdev commented#4
I'm not sure that would work with revisions and sync/async translations?
Comment #11
quietone commentedRemoved duplicate related item.
Comment #12
effulgentsia commentedI incorporated this into #3477428: [PP-1] Refactor (or decide not to) the XB field type to be multi-valued, to de-jsonify the tree, and to reference the field_union type of the prop values.
Comment #13
larowlanRepurposing this as a spike to explore storing the tree in a flat structure and unblocking #3523842: Spike: Explore storing a hash lookup of the ComponentInputs field property: one hash per component instance which would basically be the proposal in comment #9 here, but instead of
propswe'd store the lookup hashAdditionally there's scope to use CTEs here like we do in Entity hierarchy v5 to retrieve an ordered tree in a single SQL query should we need it. That module has views integration to let people do things like 'is child of' and 'is parent of' in a single query - which might be attractive for issues like #3462219: [META] Support alternative renderings of prop data added for the 'full' view mode such as for search indexing or newsletters
Comment #14
larowlanComment #15
larowlanComment #16
larowlanPicking this spike up
Comment #17
wim leers👏🙏
Complication not yet discussed here: #3519352: Content templates, part 3b: store exposed slot subtrees on individual entities. But perhaps it's worth first doing this issue without complicating it with that aspect?
Comment #18
wim leersNote: we could choose to tackle #3495625: Remove ComponentTreeItemList::ROOT_UUID from hydration and client-to-server conversion as part of this, by equating
\Drupal\experience_builder\Plugin\DataType\ComponentTreeStructure::ROOT_UUIDwithNULLfor theparentcolumn.I defer to @larowlan.
Comment #19
wim leersComment #20
wim leersComment #21
wim leersThis would also unblock #3520517: [PP-1] ComponentTreeItem should automatically purge subtrees for slots that are no longer exposed for
ContentTemplatesupport.Comment #22
catchI think it's worth doing this first, but having #3519352: Content templates, part 3b: store exposed slot subtrees on individual entities in mind while doing it.
There are probably two ways that issue could be done, assuming row-per-component is implemented here:
1. Keep a single field, but add a column for 'slot' (or whatever else is necessary), meaning deltas would span across multiple slots.
2. Add a separate field for each exposed slot.
If those are indeed the two options and there is not some other third option, then they're both compatible with this issue (I think), so it should be fine to do this first without having to pick one in advance.
Comment #23
wim leersI intend to get to the bottom of that one tomorrow, but it is reassuring to read you do not anticipate problems with this implementation order 😊
Comment #24
wim leersJust caught up on #3519352 — high-level responses at #3519352-38: Content templates, part 3b: store exposed slot subtrees on individual entities.
Makes me want to #22+++++++++++++ even more than I could've anticipated 😄 — as in: "do this first, but keep #3519352 in mind".
Comment #25
larowlanPushed up some work in progress. Will continue with the TranslationTest next as that's the big unknown
Comment #26
larowlancross post
Comment #28
larowlanDraft MR up, should be passing linting
I think now is good time to check in and decide if we want to get this mergeable
Referring back to the spike objectives
Assigning to Wim to get a +1 keep going before going any further.
Comment #29
wim leersAwesome! Thank you for summarizing the current status as well 🙏😊 Excited to review this!
Comment #30
lauriiiThese seem potential benefits of this approach rather than downsides 🤔 Have we considered what are the downsides with this approach? How does this impact performance, flexibility, storage requirements, and developer experience?
I'm also curious, how does having one row per instance help with prop translation? Might be worth explaining the change in more detail in the issue summary because I feel like I'm missing something.
Comment #31
wim leersThis bit is specifically what I wanted @larowlan and @effulgentsia to discuss. Per #28, they did discuss that, and @larowlan got the (a)symmetrical translation support working as he mentions in #28.1.
I'd be fine with explaining that in the updated
docs/data-model.mdrather than the issue summary. I think @larowlan just hasn't gotten to that point yet 😊Comment #32
wim leersLinking right parent.
Comment #33
catchOn performance, are already additional rows to write on entity saves due to #3521202: Store XB field type's "deps_*" columns in separate table to allow efficient querying, which denormalizes some of the existing information in the JSON field. I think this MR (or a follow-up) will make that completely unnecessary because there's already a column with the component in it. So it should actually be the same, or similar, amounts of database rows with less duplication overall. Even if there are more rows, writing additional rows in a transaction shouldn't make a lot of difference. Reading will get the rows in a single query so should be neutral. Parsing the JSON should be done behind the entity cache ideally, so that should also be neutral once the entity is cached. Or even if not it'll be behind render caching.
For storage requirements, it should help compatibility with #3469082: Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages because instead of a single large field that could have a single character change, it will be possible to intern per-component - e.g. when there's a long text input. Otherwise a change to any part of the tree would preclude interning. If/when we add revision pruning outside XB, it may be possible to do something like only prune revisions when they are not removing the last remaining instance of an 'interned' value, so that the full history of changes to individual fields can be preserved, just thought about this now and not sure if it will hold up, but it would be a nice bonus if so. That definitely wouldn't be possible when all the content is in the single JSON blob.
Any kind of auditing or bulk operation on components should be easier - e.g. to find all entities using a specific component, you can query the component column (including from custom views admin listings or similar), this means no bespoke integration with #3521202: Store XB field type's "deps_*" columns in separate table to allow efficient querying to worry about and no JSON queries if that issue hadn't been done.
For DX, if someone is looking around the database tables, it will be a more familiar structure than a JSON blob + separate denormalized dependency lookup table. Apart from looking in the database or working on XB itself, it should be transparent anyway.
Views queries on prop values will still be hard, because that will need to query the JSON column, which will run into all the same issues discussed on #3521202: Store XB field type's "deps_*" columns in separate table to allow efficient querying with regard to JSON indexes, cross database support etc.. However, it will be a simpler JSON structure to query than with the current schema, so can only make that better rather than worse. I don't think views queries on values for this field are a high or even medium priority, and approaches like #3523846: [PP-1] Spike: Explore storing component inputs in separate columns (aka field union), while they would avoid JSON, introduce a lot of cross-table JOINs which can have their own problems and complexity, and would require special handling in views too. So for me the single JSON column with values is the right balance.
It will make things easier for MongoDB, because MongoDB can just store this in jsonb (along with everything else for the entity/revision), and use it natively instead of having to deal with the duplication and custom schema in #3521202: Store XB field type's "deps_*" columns in separate table to allow efficient querying. So while it moves the main field away from a document model, the overall XB database schema moves closer to one again - back to only field storage without custom relational SQL layers on top.
For flexibility, the actual change should be neutral in terms of XB itself. However when it's combined with #3523841: Versioned Component config entities (SDC, JS: prop_field_definitions, block: default_setting, all: slots for fallback) + component instances refer to versions ⇒ less data to store per XB field row we are getting very close to a 'dynamic field union' or 'no entity paragraphs' data model which could potentially be re-usable or adaptable outside of XB itself - e.g. for structured, multi-column fields (think a CD content type with a track listing field, where the track listing has song title, duration, musician entity references etc., or IMDB-style TV show episode listings with duration and actors). Those structured fields can then be incorporated into XB layouts too. Without the dynamic field union, we will either see people continue to use paragraphs or similar models, or try to shoe-horn those kinds of things into dynamic slots even if they should be structured fields sometimes.
This is a long list of benefits or neutral effects, with no actual downsides, but I think that's right.
Comment #34
wim leersThe big bet we made with XB >1 year ago was that #3343634: Add "json" as core data type to Schema and Database API would get fixed, which with the last core minor release (11.2) prior to XB 1.0 in the final stages now, is guaranteed to not happen.
So the primary goal of this MR: decomposing the 2 big JSON blobs per revision (
treeandinputs) that we've been discussing for >1 year in many places, most recently in #3457504: XB field type: calculate all dependencies, store them, surface in new Component "Audit" operation. This MR must do it in a manner where:The approach @larowlan went with is option 1 that @catch listed in #22.
Reviewed this entire
71 files, +1908, -1680MR 😄Posted 67 remarks, of which 19 were trivial comment nits that I already applied. The remainder are either observations to help the next reviewer (ℹ️), excitement/admiration/applause (🤩) or questions 😄
Light bulb moments
While reviewing, these were the most important "aha!" moments:
require_all_groups_for_translation, which is used by the image field type already 👍ValidParentAndSlotConstraintValidatorallows trivially adding support forContentTemplates: there's already a@todo, the remaining work to make it fully validatable is trivial, and makes #3519352: Content templates, part 3b: store exposed slot subtrees on individual entities completely obsolete AFAICT! This MR's introduction of storing every component instance separately, with an optionalparent_uuidand optionalslotis what makes this easy (instead of very hard in HEAD) 👍IOW: this does address the meta (#3520449)'s .
ComponentInputsdata type) that would be trivial to implement would be sufficient to achieve JSON:API read/write support for content entities with XB fields! Which means XB would be able to leap ahead of LB on that front already, because #2942975: [PP-1] Expose Layout Builder data to REST and JSON:API remains unsolved. The difference is that XB started with strict schema + validation, so the big "ensure the normalization can be interpreted by front-end developers" problem is already solved. (It'd even comply with the recommended best practice at #2926507: Discourage @FieldType-level normalizers, encourage @DataType-level normalizers, to strengthen the API-First ecosystem.)EDIT: that's what @larowlan independently concluded in #28.2! 👍
Issues we could close if this lands
AFAICT this allows us to close, @larowlan should confirm:
Questions
AFAICT this MR does not make that neither more difficult nor easier to achieve. In HEAD, it would have required some extra per-component instance metadata. Here, it would, too. In HEAD, it'd probably have had to live in the data contained in the
ComponentInputsdata type. With this MR in, that would still have to be the case.So I think this MR is neutral for that.
EDIT: that seems to match what @larowlan independently wrote in #28.3 👍
Conclusion
A very strong "+1, yes please! 😄" to @larowlan's .
Comment #35
catchI think option #1 vs #2 from #22 is still to be determined in #3519352: Content templates, part 3b: store exposed slot subtrees on individual entities. I think there might be some confusion between 'slot' and 'exposed slot' in my original comment either between me and Wim in #34 or possibly just by me.
If we did option #1, I was thinking we would probably have to add an additional 'exposed slot' column to map the row to the exposed slot in #3519352: Content templates, part 3b: store exposed slot subtrees on individual entities (in that issue, not this one). e.g. all the data for all the exposed slots in one field instance.
If we did option #2, then the schema added here would be unchanged, instead each exposed slot would have a field instance.
Or am I misunderstanding and this already implements #1 because slot === exposed slot in this context? If so the confusion is all mine.
Comment #36
wim leers@catch in #33:
Good reassurance, this I'm less familiar with 🙏👍
Exactly!
Wow, great insight — that would be amazing! 🤩
Exactly! IOW: this would
basically solvemake #3522953: [later phase] Make Component `audit` operation performant/scalable simpler, and would allow us omittingconfig: experience_builder.component.*dependencies from #3521202: Store XB field type's "deps_*" columns in separate table to allow efficient querying'scomponent_tree_depenciestable. Because that's essentially duplicating that same bit of information now.💯 — this MR single-handedly makes XB much easier to wrap your head around. Much becomes deducable.
Indeed! To be more precise:
That's what I'm starting work on right now! 🤓🥳
💯
@catch in #35:
You're right that that choice is not finalized in this MR. But this bit in
ValidParentAndSlotConstraintValidatordefinitely lean towards option #22.1 aka storing an exposed slot name in theslotfield property/column of an XB field item:Which at least I interpreted as @larowlan intending to implement it this way in this MR, which would make almost true: it's not quite implemented yet, but AFAICT @larowlan intends it to be.
This is why I tried to deduce how @larowlan intends to implement this and the required code would indeed be trivial.
Perhaps we're both missing something? :D
Comment #37
catchThere's some background in #3217783: Configuration management performance regression - slow config:import. I don't think we were actively making use of this behaviour (or necessarily knew about it at all) until recently, for entities it's been like this since we added transaction support to core. Either way the write to one field table happens in a single query anyway (just double checked), so it may even behave exactly the same before/after in terms of write performance or at least very similarly.
Comment #38
wim leersComment #39
lauriiiComment #40
wim leersThe spike has proven itself, as I already said in my review in #34, but especially now that it's green!
Excited to review this
85 files, +3644, -3221diff in depth tomorrow! 😄@larowlan indicated he'd be doing a self-review first thing tomorrow, so leaving assigned to him for that. 👍
Comment #41
larowlanDid another pass and was able to
* Remove the new UniqueConstraint validator in favour of putting it into ComponentTreeStructure
* Noticed we weren't using ComponentTreeStructure on field default values, was tricky, but was able to add that
* Made validation property paths consistently use dot notations e.g.
something.1.uuidinstead of a mix of that and square brackets -something[1][uuid]* Another pass on docs
Added comments to the MR to help reviewers
This is ready for reviews now
Comment #42
wim leers🥳🤓
Comment #43
wim leersI think the true test of a well-designed system (besides adoption, vibrant ecosystem, etc. — all of which by definition cannot yet exist) is:
IOW: when it seems so evident that the data and logic is structured in this particular way, that it's hard to imagine any other way.
I think XB is now entering that territory with this MR. 🤓🤩
It's amazing to see how this single (massive!) MR completely transforms the most essential part of XB (the way it stores component trees in both content entities as fields and in config entities), and we can be confident that it'll A) work (thanks to tests), B) scale to many different uses (thanks to the 3 different config entity types that use it, and kinda 4).
If it were so evident that this was the right approach, we would've written it like this at the very start. But we didn't. It required iteration. And the uses, the tests, the validation logic all needed to be structured in a sufficiently sensible manner for this massive refactor (by a single (very capable) person over the course of a few days!!!) to be achievable.
IMHO this is the biggest achievement yet of the team working on XB's back end. And it's thanks to literally all of those people having contributed — including all the front end work that uses that back end and that end-to-end tests are exercising the back end through the front end.
See the remaining open threads — they highlight some of the crucial improvements.
Bonus delight: @larowlan's fine test string choices:
🤣
It also allowed me to:
stable blockertag for #3495625-14: Remove ComponentTreeItemList::ROOT_UUID from hydration and client-to-server conversionFinally, this is what a single simple component tree on a content entity now looks like in the DB:
… and that
component_idcolumn will evolve in the next issue @larowlan is working on: #3523841: Versioned Component config entities (SDC, JS: prop_field_definitions, block: default_setting, all: slots for fallback) + component instances refer to versions ⇒ less data to store per XB field row.Comment #45
wim leersComment #46
larowlanAwesome to see this land!
Full credit to Jim Morrison poetry for the test strings.
There's a minor issue here https://git.drupalcode.org/project/experience_builder/-/merge_requests/1... around DX consistency from a late change, I've reopened this to revert that (separate MR). Creating that now
Comment #48
larowlanOpened a follow up for a couple of things spotted in review
Comment #50
wim leersThanks for the posthumous review/confirmation of the tweaks I made to your epic MR, and merged the follow-up 👍 😊
Comment #51
wim leersThis somehow introduced a regression for
default_content's export functionality: #3526814: Default content exports of component trees are invalid and hence are not correct after importingWhich would've been far less painful for @justafish to run into if the validation this MR introduced caught the problematic bits, created #3526818: Tighten validation of `parent_uuid` and `slot` on XB fields to match the strictness of config for that.