Do not create field revisions when field data hasn't changed [#2960887]

The problem

Database performance degrades with the exponential increase in revisions generated by Entity Reference Revisions fields (commonly used for Paragraphs). Each new node revision means a new revision of each paragraph, which means a new revision of each field on the paragraph... Things can get out of hand pretty quick on sites that aren't even ginormous.

Note that the general problem of revision bloat and the proposed resolution are not specific to ERR or Paragraphs. I have zero hard performance data on hand at the moment, but if that is needed I'm sure they could be gotten.

Not a solution?

On sites with frequent, small changes to entities, we end up with tons of field revision table rows with identical values, their only difference being the revision ID. #2083451: Reconsider the separate field revision data tables aims to improve things by avoiding revisions entirely, which is not my goal.

I'm pretty sure this would be absurd and break some fundamental law of relational databasing, but would it be possible to somehow not duplicate field data when creating a new revision of an entity? Let me 'splain: In #2297817: Do not attempt field storage write when field content did not change, we made it possible to prevent unnecessary db writes when updating an existing revision with field data that has not changed. But when creating a new revision, we, of course, need a new table row with that revision ID. But what if we somehow said "Hey, I'm an entity revision collecting all my associated field data OH WAIT THAT FIELD TABLE DOESN'T HAVE AN ENTRY FOR MY REVISION WTF oh hmmm maybe that just means the field data hasn't changed since the last revision... Lemme just grab the data for that field from the last available revision." E.g., say the current revision of node X is 99, but the latest row in node_revision__field_foo for Node X is 95, so Entity API "magically" grabs that row when populating field data on node X.

Now, even if what I'm trying to explain makes sense, is there any sort of "magic" we could do in Entity API to make it (a) not suck from a performance standpoint and/or (b) not break lots of unintended things?

Comments

Comment #1

14 April 2018 at 03:31

derek.deraps created an issue. See original summary.

Comment #2

hawkeye.twolf

they/them or he/him

English

ᏙᎩᏯᏍᏗ Unalatogiyasdi, Tsalaguwetiyi (Cherokee country)

commented 14 April 2018 at 03:35

Issue summary:

View changes

Comment #3

hawkeye.twolf

they/them or he/him

English

ᏙᎩᏯᏍᏗ Unalatogiyasdi, Tsalaguwetiyi (Cherokee country)

commented 14 April 2018 at 18:50

Comment #4

giorgio79 commented 18 April 2018 at 09:38

Probably dupe of this old issue :) #2083451: Reconsider the separate field revision data tables

Comment #5

tim.plunkett

he/him

English

Philadelphia

commented 18 April 2018 at 11:18

From the issue summary:

#2083451: Reconsider the separate field revision data tables aims to improve things by avoiding revisions entirely, which is not my goal.

Comment #6

damienmckenna

TN, USA

commented 15 May 2018 at 13:56

Title:

Do Not Create Field Revisions When Field Data Hasn't Changed

» Do not create field revisions when field data hasn't changed

Normalized the issue title.

Comment #7

imclean commented 15 May 2018 at 22:43

E.g., say the current revision of node X is 99, but the latest row in node_revision__field_foo for Node X is 95, so Entity API "magically" grabs that row when populating field data on node X.

A more active approach may be safer. Rather than assume the latest revision is the correct one perhaps an index could be kept, at the expense of added complexity. For example, storing the corresponding ER revisions in a separate table.

View node X revision 99
Look up node X 99 corresponding revision for node_revision__field_foo in revisions table
Show node_revision__field_foo revision 95

I'm not sure of a use case, but it would also allow any revision to be linked to any ER revision. Probably not a lot of use with Paragraphs but it might be useful for other entity types.

For example, we've built a system which tracks components for manufacturing a certain product. Each component has sub-components, actions and other attributes which go into it. In addition to some quite deep nesting, when selecting a top level "product" to manufacture the product entity and all its sub-entities are cloned so they can be modified, if required, for a specific production run. Most of the time most of the products aren't modified so we don't really need to keep a complete copy of everything.

Edit: New revisions are created when the product is modified. A component may have a slight change depending on available parts but the sub-components (and all their sub-components) might not change at all.

Comment #8

hawkeye.twolf

they/them or he/him

English

ᏙᎩᏯᏍᏗ Unalatogiyasdi, Tsalaguwetiyi (Cherokee country)

commented 17 May 2018 at 00:54

Hmm, that could work! So, @imclean, the revisions table would have columns for:

entity revision id (99, in our example)
field name (field_foo)
entity revision id with the actual field data (95, in our example)

I like the sureness of it, though it does mean one more table, more joins, etc. It might be good to do some performance testing of the extra join vs. something like

MIN(`node_revision__field_foo`) WHERE revision_id <= 99

(where 99 is the node revision id for which we want the field data)

Comment #9

17 May 2018 at 00:54

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #10

geek-merlin

German

Freiburg, Germany

commented 6 September 2018 at 18:40

Interesting. Alas, we must not make any assumptions on revisions being sequential.

So in the end we want that multiple entity revisions *can* use a single field revision row.
This needs a field-item-vid, and an intermediate entity-vid:field-item-vid table, or even more simple, storing the field-vid reference with the entity revision.

Having pluggable entity and field storage, this can be developed as alternative storages.

But even more exciting: Once field item revisions get a vid, we can give field items an id, and both a uuid, and implement EntityInterface. Boom we have fieldable fields without the fieldcollection/paragraph hacks.

Comment #11

geek-merlin

German

Freiburg, Germany

commented 12 September 2018 at 21:42

Wow. What i proposed in #10 has already been built (with an intermediate table) in #2957425: Allow the inline creation of non-reusable Custom Blocks in the layout builder.

Comment #12

The problem

Not a solution?

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

Related issues

Referenced by

News items

Our community

Documentation

Drupal code base

Governance of community