The problem

Database performance degrades with the exponential increase in revisions generated by Entity Reference Revisions fields (commonly used for Paragraphs). Each new node revision means a new revision of each paragraph, which means a new revision of each field on the paragraph... Things can get out of hand pretty quick on sites that aren't even ginormous.

Note that the general problem of revision bloat and the proposed resolution are not specific to ERR or Paragraphs. I have zero hard performance data on hand at the moment, but if that is needed I'm sure they could be gotten.

Not a solution?

On sites with frequent, small changes to entities, we end up with tons of field revision table rows with identical values, their only difference being the revision ID. #2083451: Reconsider the separate field revision data tables aims to improve things by avoiding revisions entirely, which is not my goal.

I'm pretty sure this would be absurd and break some fundamental law of relational databasing, but would it be possible to somehow not duplicate field data when creating a new revision of an entity? Let me 'splain: In #2297817: Do not attempt field storage write when field content did not change, we made it possible to prevent unnecessary db writes when updating an existing revision with field data that has not changed. But when creating a new revision, we, of course, need a new table row with that revision ID. But what if we somehow said "Hey, I'm an entity revision collecting all my associated field data OH WAIT THAT FIELD TABLE DOESN'T HAVE AN ENTRY FOR MY REVISION WTF oh hmmm maybe that just means the field data hasn't changed since the last revision... Lemme just grab the data for that field from the last available revision." E.g., say the current revision of node X is 99, but the latest row in node_revision__field_foo for Node X is 95, so Entity API "magically" grabs that row when populating field data on node X.

Now, even if what I'm trying to explain makes sense, is there any sort of "magic" we could do in Entity API to make it (a) not suck from a performance standpoint and/or (b) not break lots of unintended things?

Comments

derek.deraps created an issue. See original summary.

hawkeye.twolf’s picture

Issue summary: View changes
hawkeye.twolf’s picture

giorgio79’s picture

tim.plunkett’s picture

From the issue summary:

#2083451: Reconsider the separate field revision data tables aims to improve things by avoiding revisions entirely, which is not my goal.

damienmckenna’s picture

Title: Do Not Create Field Revisions When Field Data Hasn't Changed » Do not create field revisions when field data hasn't changed

Normalized the issue title.

imclean’s picture

E.g., say the current revision of node X is 99, but the latest row in node_revision__field_foo for Node X is 95, so Entity API "magically" grabs that row when populating field data on node X.

A more active approach may be safer. Rather than assume the latest revision is the correct one perhaps an index could be kept, at the expense of added complexity. For example, storing the corresponding ER revisions in a separate table.

  1. View node X revision 99
  2. Look up node X 99 corresponding revision for node_revision__field_foo in revisions table
  3. Show node_revision__field_foo revision 95

I'm not sure of a use case, but it would also allow any revision to be linked to any ER revision. Probably not a lot of use with Paragraphs but it might be useful for other entity types.

For example, we've built a system which tracks components for manufacturing a certain product. Each component has sub-components, actions and other attributes which go into it. In addition to some quite deep nesting, when selecting a top level "product" to manufacture the product entity and all its sub-entities are cloned so they can be modified, if required, for a specific production run. Most of the time most of the products aren't modified so we don't really need to keep a complete copy of everything.

Edit: New revisions are created when the product is modified. A component may have a slight change depending on available parts but the sub-components (and all their sub-components) might not change at all.

hawkeye.twolf’s picture

Hmm, that could work! So, @imclean, the revisions table would have columns for:

  1. entity revision id (99, in our example)
  2. field name (field_foo)
  3. entity revision id with the actual field data (95, in our example)

I like the sureness of it, though it does mean one more table, more joins, etc. It might be good to do some performance testing of the extra join vs. something like

MIN(`node_revision__field_foo`) WHERE revision_id <= 99

(where 99 is the node revision id for which we want the field data)

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

geek-merlin’s picture

Interesting. Alas, we must not make any assumptions on revisions being sequential.

So in the end we want that multiple entity revisions *can* use a single field revision row.
This needs a field-item-vid, and an intermediate entity-vid:field-item-vid table, or even more simple, storing the field-vid reference with the entity revision.

Having pluggable entity and field storage, this can be developed as alternative storages.

But even more exciting: Once field item revisions get a vid, we can give field items an id, and both a uuid, and implement EntityInterface. Boom we have fieldable fields without the fieldcollection/paragraph hacks.

geek-merlin’s picture

Wow. What i proposed in #10 has already been built (with an intermediate table) in #2957425: Allow the inline creation of non-reusable Custom Blocks in the layout builder.

matsbla’s picture

Issue tags: +Performance
handkerchief’s picture

@axel.rutz, what does this exactly mean for further progress towards the goal of this issue?

geek-merlin’s picture

#13: Someone(tm) might want to pick up these ideas and implement a field storage as outlined. Unfortunately it does not look like i'll have the bandwidth to do so.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

bbombachini’s picture

I'm having an issue migrating D7 paragraphs to D9 because the node has a revision so it's querying the revision table instead of the data table and as the paragraphs didn't have any update since the node has been created there's no information on the revision table meaning that the paragraphs field are "empty" on my migration row. So I think that choosing not to write revisions if the field data doesn't change is tricky and can bring this kind of issues I'm having.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.