The problem
Database performance degrades with the exponential increase in revisions generated by Entity Reference Revisions fields (commonly used for Paragraphs). Each new node revision means a new revision of each paragraph, which means a new revision of each field on the paragraph... Things can get out of hand pretty quick on sites that aren't even ginormous.
Note that the general problem of revision bloat and the proposed resolution are not specific to ERR or Paragraphs. I have zero hard performance data on hand at the moment, but if that is needed I'm sure they could be gotten.
Not a solution?
On sites with frequent, small changes to entities, we end up with tons of field revision table rows with identical values, their only difference being the revision ID. #2083451: Reconsider the separate field revision data tables aims to improve things by avoiding revisions entirely, which is not my goal.
I'm pretty sure this would be absurd and break some fundamental law of relational databasing, but would it be possible to somehow not duplicate field data when creating a new revision of an entity? Let me 'splain: In #2297817: Do not attempt field storage write when field content did not change, we made it possible to prevent unnecessary db writes when updating an existing revision with field data that has not changed. But when creating a new revision, we, of course, need a new table row with that revision ID. But what if we somehow said "Hey, I'm an entity revision collecting all my associated field data OH WAIT THAT FIELD TABLE DOESN'T HAVE AN ENTRY FOR MY REVISION WTF oh hmmm maybe that just means the field data hasn't changed since the last revision... Lemme just grab the data for that field from the last available revision." E.g., say the current revision of node X is 99, but the latest row in node_revision__field_foo for Node X is 95, so Entity API "magically" grabs that row when populating field data on node X.
Now, even if what I'm trying to explain makes sense, is there any sort of "magic" we could do in Entity API to make it (a) not suck from a performance standpoint and/or (b) not break lots of unintended things?
Comments
Comment #2
hawkeye.twolfComment #3
hawkeye.twolfComment #4
giorgio79 commentedProbably dupe of this old issue :) #2083451: Reconsider the separate field revision data tables
Comment #5
tim.plunkettFrom the issue summary:
Comment #6
damienmckennaNormalized the issue title.
Comment #7
imclean commentedA more active approach may be safer. Rather than assume the latest revision is the correct one perhaps an index could be kept, at the expense of added complexity. For example, storing the corresponding ER revisions in a separate table.
I'm not sure of a use case, but it would also allow any revision to be linked to any ER revision. Probably not a lot of use with Paragraphs but it might be useful for other entity types.
For example, we've built a system which tracks components for manufacturing a certain product. Each component has sub-components, actions and other attributes which go into it. In addition to some quite deep nesting, when selecting a top level "product" to manufacture the product entity and all its sub-entities are cloned so they can be modified, if required, for a specific production run. Most of the time most of the products aren't modified so we don't really need to keep a complete copy of everything.
Edit: New revisions are created when the product is modified. A component may have a slight change depending on available parts but the sub-components (and all their sub-components) might not change at all.
Comment #8
hawkeye.twolfHmm, that could work! So, @imclean, the revisions table would have columns for:
field_foo)I like the sureness of it, though it does mean one more table, more joins, etc. It might be good to do some performance testing of the extra join vs. something like
(where 99 is the node revision id for which we want the field data)
Comment #10
geek-merlinInteresting. Alas, we must not make any assumptions on revisions being sequential.
So in the end we want that multiple entity revisions *can* use a single field revision row.
This needs a field-item-vid, and an intermediate entity-vid:field-item-vid table, or even more simple, storing the field-vid reference with the entity revision.
Having pluggable entity and field storage, this can be developed as alternative storages.
But even more exciting: Once field item revisions get a vid, we can give field items an id, and both a uuid, and implement EntityInterface. Boom we have fieldable fields without the fieldcollection/paragraph hacks.
Comment #11
geek-merlinWow. What i proposed in #10 has already been built (with an intermediate table) in #2957425: Allow the inline creation of non-reusable Custom Blocks in the layout builder.
Comment #12
matsbla commentedComment #13
handkerchief@axel.rutz, what does this exactly mean for further progress towards the goal of this issue?
Comment #14
geek-merlin#13: Someone(tm) might want to pick up these ideas and implement a field storage as outlined. Unfortunately it does not look like i'll have the bandwidth to do so.
Comment #20
bbombachiniI'm having an issue migrating D7 paragraphs to D9 because the node has a revision so it's querying the revision table instead of the data table and as the paragraphs didn't have any update since the node has been created there's no information on the revision table meaning that the paragraphs field are "empty" on my migration row. So I think that choosing not to write revisions if the field data doesn't change is tricky and can bring this kind of issues I'm having.