Add a revision_parent field to revisionable entities [#2725523]

Problem/Motivation

This issue is part of the #2721129: Workflow Initiative. And #2786133: WI: Phase B: Extend the revision API with support for parents (the parent of this issue) is the first step towards #2867707: WI: Phase H: Conflict management and local workspace merging support.

The specific problem the Workflow Initiative is trying to solve here in this issue, is that of conflict management between revisions of the same entity that are created in different workspaces. When an entity can be edited in multiple workspaces, we start having a graph or tree (or branches) of revisions that needs to be tracked and managed.

In any scenario like this, there will never be more than two parents of a single revision; one parent revision, and optionally one revision that was merged into the current line. Very much like a Git tree with merge commits.

Proposed resolution

Introduce a new type of reference field, with an associated API, that keeps track of the aforementioned revision tree. This new field needs to be able to reference the two possible parents.

Add two new revision metadata keys: revision_parent and revision_merge_parent
Add two new base fields, revision_parent and revision_merge_parent to all revisionable entity types
Ensure that the value of the parent revision is populated automatically by default with the latest revision ID of the active branch from the revision graph
Write an update hook that back-fills data for the new field. The strategy that will be used is described in #13
Add a new revision_tree entity handler that is able to determine the Lowest Common Ancestor of two revisions from the graph, using an algorithm that is agnostic to the branching strategy that is being used

Remaining tasks

None.

The default branching strategy for the revision tree will be discussed and implemented in #3023194: [PP-2] Add parallel revisioning support.

User interface changes

None.

API changes

New methods available on \Drupal\Core\Entity\RevisionableInterface, with default implementations provided in \Drupal\Core\Entity\ContentEntityBase.:

getParentRevisionId()
setParentRevisionId($revision_id)
getMergeParentRevisionId()
setMergeParentRevisionId($revision_id)

A new revision_tree entity handler is added by default to all revisionable entity types, with a default implementation provided in \Drupal\Core\Entity\Sql\SqlRevisionTreeHandler.

A new revision_reference field type is added, not shown in the Field UI and without having a widget or a formatter.

Data model changes

All revisionable entities will have two additional revision metadata fields installed, revision_parent and revision_merge_parent.

Comment	File	Size	Author
#166	2725523-9.0.x-166.patch	71.22 KB	alexpott
#152	interdiff-152.txt	9.04 KB	amateescu
#152	2725523-152.patch	71.21 KB	amateescu
#148	interdiff-147.txt	23.86 KB	amateescu
#148	2725523-147.patch	67.93 KB	amateescu
#127	interdiff.txt	882 bytes	plach
#127	2725523-127.patch	65.34 KB	plach
#125	2725523-125.patch	65.34 KB	plach
#125	interdiff.txt	893 bytes	plach
#123	interdiff.txt	1.23 KB	plach
#123	2725523-123.patch	65.11 KB	plach
#119	interdiff-115-119.txt	13.37 KB	hchonov
#119	2725523-119.patch	65.25 KB	hchonov
#115	interdiff-103-115.txt	906 bytes	hchonov
#115	2725523-115.patch	64.19 KB	hchonov
#103	interdiff-1-vs-2-fields.txt	19.86 KB	hchonov
#103	2725523-103-two-fields-approach.patch	64.19 KB	hchonov
#97	interdiff-97.txt	8.82 KB	amateescu
#97	2725523-97.patch	61.72 KB	amateescu
#86	Screen Shot 2019-03-07 at 17.42.03.png	119.65 KB	dixon_
#84	2725523-79.patch	61.54 KB	amateescu
#83	2725523-82-FAIL.patch	62.5 KB	hchonov
#79	interdiff-79.txt	1.32 KB	amateescu
#79	2725523-79.patch	61.54 KB	amateescu
#78	interdiff-78.txt	8.7 KB	amateescu
#78	2725523-78.patch	62.09 KB	amateescu
#76	interdiff-76.txt	3.6 KB	amateescu
#76	2725523-76.patch	62.34 KB	amateescu
#74	2725523-74.patch	58.74 KB	amateescu
#72	interdiff-72.txt	6.21 KB	amateescu
#72	2725523-72.patch	58.76 KB	amateescu
#70	interdiff-70.txt	18.17 KB	amateescu
#70	2725523-70.patch	58.1 KB	amateescu
#67	2725523-67.patch	57.32 KB	amateescu
#66	interdiff-66.txt	6.3 KB	amateescu
#66	2725523-66-for-review.patch	57.32 KB	amateescu
#66	2725523-66-combined.patch	80.16 KB	amateescu
#61	interdiff-61.txt	26.88 KB	amateescu
#61	2725523-61-for-review.patch	57.24 KB	amateescu
#61	2725523-61-combined.patch	80.08 KB	amateescu
#56	interdiff-56.txt	7.61 KB	amateescu
#56	2725523-56-for-review.patch	56.24 KB	amateescu
#56	2725523-56-combined.patch	78.65 KB	amateescu
#54	interdiff-54.txt	6.41 KB	amateescu
#54	2725523-54-for-review.patch	51.76 KB	amateescu
#54	2725523-54-combined.patch	74.17 KB	amateescu
#51	interdiff-51.txt	29.99 KB	amateescu
#51	2725523-51.patch	52.16 KB	amateescu
#47	2725523-47.patch	52.53 KB	amateescu
#43	interdiff-43.txt	13.29 KB	amateescu
#43	2725523-43.patch	52.67 KB	amateescu
#41	interdiff-41.txt	29.19 KB	amateescu
#41	2725523-41.patch	53.4 KB	amateescu
#38	interdiff-38.txt	1.66 KB	amateescu
#38	2725523-38.patch	49.29 KB	amateescu
#35	interdiff-35.txt	5.39 KB	amateescu
#35	2725523-35.patch	49.17 KB	amateescu
#25	interdiff-25.txt	666 bytes	amateescu
#25	2725523-25.patch	49.43 KB	amateescu
#23	interdiff-23.txt	702 bytes	amateescu
#23	2725523-23.patch	48.78 KB	amateescu
#22	interdiff-22.txt	10.93 KB	amateescu
#22	2725523-22.patch	48.77 KB	amateescu
#20	interdiff-20.txt	11.12 KB	amateescu
#20	2725523-20.patch	38.12 KB	amateescu
#15	2725523.patch	30.04 KB	amateescu

Comments

Comment #1

14 May 2016 at 22:59

dixon_ created an issue. See original summary.

Comment #2

dixon_

he/him

English

France

commented 14 May 2016 at 23:01

Status:

Active

» Postponed

Postponing this as sign-offs are pending in: #2725433: WI: Phase A: Use the revision API in more places

Comment #3

dixon_

he/him

English

France

commented 14 May 2016 at 23:02

Issue summary:

View changes

Comment #4

dixon_

he/him

English

France

commented 14 May 2016 at 23:18

Issue tags:

+Workflow Initiative

Comment #5

14 May 2016 at 23:18

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #6

14 May 2016 at 23:18

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #7

14 May 2016 at 23:18

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #8

14 May 2016 at 23:18

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #9

14 May 2016 at 23:18

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #10

wim leers

Ghent 🇧🇪🇪🇺

commented 27 August 2018 at 07:24

Issue tags:		+API-First Initiative
Related issues:		+#2992833: Add a version negotiation to revisionable resource types

It looks like this is probably a blocker for #2992833: Add a version negotiation to revisionable resource types — i.e. for JSON API (but also core REST and GraphQL) to implement revision support. For revision support to be implementable, we need to be able to reliably determine what to link to. We need to implement RFC5829: Link Relation Types for Simple Version Navigation between Web Resources. That includes for example a working-copy-of link relation type. That means we need to know, given a revision, which other revision it was based on. Which of course leads me to this issue.

Comment #11

wim leers

Ghent 🇧🇪🇪🇺

commented 27 August 2018 at 07:26

Parent issue:	#2725433: WI: Phase A: Use the revision API in more places	» #2786133: WI: Phase B: Extend the revision API with support for parents
Related issues:		+#2725433: WI: Phase A: Use the revision API in more places

Also, this is actually part of phase B of the Workflow Initiative, based on the sign-offs given in #2725433. See #2786133: WI: Phase B: Extend the revision API with support for parents. Updating issue metadata accordingly.

Comment #12

wim leers

Ghent 🇧🇪🇪🇺

commented 27 August 2018 at 08:59

It'd be very helpful if we could already start discussing what the upgrade path will look like. Because today, we do not have revision parent information (hence this issue), which means we'll have to construct a past.

Do we go with the simplest possible assumption? That'd be assuming that the previous revision by ID always is the revision parent. Which may be inaccurate. But it's a fair best-effort guess.

Comment #13

amateescu commented 27 August 2018 at 10:04

Do we go with the simplest possible assumption? That'd be assuming that the previous revision by ID always is the revision parent. Which may be inaccurate. But it's a fair best-effort guess.

The only time this assumption might be inaccurate is for revision reverts (i.e. when a new revision is being creating by "reverting" to the data of a past revision), but this case is impossible (read: complicated to the point that it's not even worth trying) to determine so, yes, we'll have to go with the assumption that the previous revision ID is the parent.

Comment #14

wim leers

Ghent 🇧🇪🇪🇺

commented 27 August 2018 at 10:08

Issue summary:

View changes

Perfect, thanks! Updated the IS.

Comment #15

amateescu commented 14 January 2019 at 09:05

Status:

Postponed

» Needs review

Status	File	Size
new	2725523.patch	30.04 KB

Here's a patch for adding the revision_parent field :)

Comment #16

14 January 2019 at 09:56

Status:

Needs review

» Needs work

The last submitted patch, 15: 2725523.patch, failed testing. View results

Comment #17

amateescu commented 14 January 2019 at 10:07

Status:

Needs work

» Needs review

Nevermind those test fails for now, the patch needs architectural review so putting back to NR.

Comment #18

wim leers

Ghent 🇧🇪🇪🇺

commented 14 January 2019 at 15:59

High-level review to kick off the review process :) Reviewing for high-level understandability. Reviewed the validation logic, but didn't review all test cases just yet.

+++ b/core/lib/Drupal/Core/Entity/EntityFieldManager.php
@@ -225,6 +225,14 @@ protected function buildBaseFieldDefinitions($entity_type_id) {
+      $field_name = $entity_type->getRevisionMetadataKeys(FALSE)['revision_parent'];
+      $base_field_definitions[$field_name] = BaseFieldDefinition::create('revision_tree')
+        ->setLabel($this->t('Revision parent'))

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionTreeReferenceConstraint.php
@@ -0,0 +1,40 @@
+ *   id = "ValidRevisionTreeReference",
+ *   label = @Translation("Valid parent revision reference", context = "Validation")

🤔 parent s tree vs parent vs tree … I'm confused!

+++ b/core/lib/Drupal/Core/Entity/EntityFieldManager.php
@@ -225,6 +225,14 @@ protected function buildBaseFieldDefinitions($entity_type_id) {
+        ->setInternal(TRUE)

👍 Not 100% sure if this should be internal, but I think it makes sense to at least make it internal at first and then consider making it non-internal later. Perfect :)

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionTreeReferenceConstraintValidator.php
@@ -0,0 +1,90 @@
+class ValidRevisionTreeReferenceConstraintValidator extends ConstraintValidator implements ContainerInjectionInterface {
...
+   * Constructs a ValidReferenceConstraintValidator object.

Nit: mismsatch.

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionTreeReferenceConstraintValidator.php
@@ -0,0 +1,90 @@
+    /** @var \Drupal\Core\Field\FieldItemInterface $value */

Can we instead do an assert()?

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionTreeReferenceConstraintValidator.php
@@ -0,0 +1,90 @@
+    /** @var \Drupal\Core\Entity\Plugin\Validation\Constraint\ValidRevisionTreeReferenceConstraint $constraint */

Same here. And several more.

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionTreeReferenceConstraintValidator.php
@@ -0,0 +1,90 @@
+      if ($parent_revision_id == $merge_revision_id) {

🤔 Can we use a strict comparison?

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionTreeReferenceConstraintValidator.php
@@ -0,0 +1,90 @@
+      // Third, check that the assigned parent and the merge parent actually

Nit: s/actually//

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
@@ -0,0 +1,148 @@
+    // When the parent revision ID is assigned manually, we have to skip the

👍

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
@@ -0,0 +1,148 @@
+    // auto-assign code from ::preSave().

Nit: s/from//in/

Comment #19

hchonov

Bulgarian

Burgas

commented 14 January 2019 at 18:41

Is the idea to have a field with two properties - one for the revision ID of the parent and one for the revision ID of an entity revision that has been merged into the current one?

Why do we need a field with two properties instead of two dedicated fields for each task?

When adding some reference by revision, then it would be better to keep the already established naming standard from entity_reference_revisions and refer to the revision property as target_revision_id and not target_id, because otherwise it will get confusing.

Also without having the entity ID of the parent as a dedicated property it becomes harder to prepare entity queries for searching for the children of a specific entity ID. Therefore it would be better to include this information as well.

I personally think that it would be great to include EntityReferenceRevisionsItem from the entity_reference_revisions module into core and use it instead. This is a desired feature for a long time now and this here would be a great opportunity for it.

But I guess that there are reasons for not taking this approach?

Comment #20

amateescu commented 17 January 2019 at 14:00

Title:	WI: Add revision_parents field to revisionable entities	» WI: Add a revision_parent field to revisionable entities
Issue summary:	View changes

Status	File	Size
new	2725523-20.patch	38.12 KB
new	interdiff-20.txt	11.12 KB

Thank you both for starting the review process :)

Re #18:

The field type (plugin) is called revision_tree, which has a "valid revision tree" constraint, and the name of the base field that's added to all revisionable entity types is revision_parent :) I changed the description of the constraint, maybe that was the confusing part?
Exactly my thought process.
Fixed.
We could, but since assert() has a runtime cost, however small that might be, and these comments are there purely for IDE autocompletion, I opted not to use it..
Same as above.
Same old Entity API story, we can't use strict comparison on field values :)
Fixed.
:)
Fixed.

Re #19:

Is the idea to have a field with two properties - one for the revision ID of the parent and one for the revision ID of an entity revision that has been merged into the current one?

Yup :)

Why do we need a field with two properties instead of two dedicated fields for each task?

Because conceptually this is a multi-value field, but, since we know that we can only store a maximum of two parents, having two columns for them is a performance optimization. It also allows us to store it in the shared tables, removing the need for joining a dedicated field table.

When adding some reference by revision, then it would be better to keep the already established naming standard from entity_reference_revisions and refer to the revision property as target_revision_id and not target_id, because otherwise it will get confusing.

Since this field type is not based on ERR on purpose (more on that below), I think it would be equally confusing to have the same property name as ERR. If we really want to differentiate somehow from both ER and ERR, how about using parent_revision_id and merge_revision_id for the property names?

Also without having the entity ID of the parent as a dedicated property it becomes harder to prepare entity queries for searching for the children of a specific entity ID. Therefore it would be better to include this information as well.

I don't get this point, you can add a condition on the entity ID field to the entity query just like on any other field.

I personally think that it would be great to include EntityReferenceRevisionsItem from the entity_reference_revisions module into core and use it instead. This is a desired feature for a long time now and this here would be a great opportunity for it.

But I guess that there are reasons for not taking this approach?

That's actually how I started to write this field type, but, after wasting quite some time fighting (overriding) the behavior of the parent class(es), I realized that we're much better off with a custom field type dedicated to the task at hand: building a tree/graph of revisions.

Here's a non-exhaustive list of reasons for not using ERR as a base class for this field type:

having an entity ID property would give the impression that this field type could reference the revision of an arbitrary entity, but we want to be able to reference revisions from a *single* entity
similar to above, having a target_type storage setting would give the impression that the target type is configurable, when in fact it is restricted (hardcoded) to the entity type that contains the field
we can't use *any* code related to ER: selection plugins, auto-create support, widgets, formatters, etc. because they are designed to work with entity IDs, not revision IDs
while writing the field type on top of ERR, I found that I had to override almost every method of the parent class, to the point where it wasn't using even 10% of its code

Comment #21

amateescu commented 17 January 2019 at 14:42

how about using parent_revision_id and merge_revision_id for the property names?

That would mean some funny column names though: revision_parent__parent_revision_id and revision_parent__merge_revision_id :/ IMO, the existing property names from the patch are clear enough.

Comment #22

amateescu commented 22 January 2019 at 14:08

Status	File	Size
new	2725523-22.patch	48.77 KB
new	interdiff-22.txt	10.93 KB

And here's one of the most useful functionalities enabled by revision parents: the ability to find the lowest common ancestor (LCA) of two revisions, which is the stepping stone for proper conflict management :)

If the full patch gets too hard to read with this addition, maybe it should be posted in a followup issue. I'll let that up to reviewers to decide.

Comment #23

amateescu commented 22 January 2019 at 14:12

Status	File	Size
new	2725523-23.patch	48.78 KB
new	interdiff-23.txt	702 bytes

Fixed a typo so I'm not sending this updated patch to the testbot.

Comment #24

22 January 2019 at 14:59

The last submitted patch, 22: 2725523-22.patch, failed testing. View results

Comment #25

amateescu commented 22 January 2019 at 15:06

Status	File	Size
new	2725523-25.patch	49.43 KB
new	interdiff-25.txt	666 bytes

Méh..

Comment #26

hchonov

Bulgarian

Burgas

commented 22 January 2019 at 15:43

having an entity ID property would give the impression that this field type could reference the revision of an arbitrary entity, but we want to be able to reference revisions from a *single* entity

Well, we already have exactly this use case, where we decouple an entity and keep a reference to its parent and the revision from which the entity was decoupled, which is a different entity. We use that to track changes on the parent and through conflict management update the child. Both entities are allowed to change.
I think this use case demonstrates, why it is important to not have a constraint on the parent. It sounds logical to derive an entity from another one and keep track of the origin through the parent field.
Currently we also update the parent revision reference when me merge a newer parent revision into the child. We don't use a merged revision field, because we don't see any reason for it. What would the core reason be for that? Our argument is that after a merge the entity has a new parent, because the parent changes have been merged into the derived entity and the initial parent revision doesn't anymore represent a real parent of the updated child. This is why if the initial parent revision was 5 and later we merged revision 7 into the child, then we update the parent field to point at revision 7.

And here's one of the most useful functionalities enabled by revision parents: the ability to find the lowest common ancestor (LCA) of two revisions, which is the stepping stone for proper conflict management :)

Hmmm, but we already have a proper and completely functioning conflict management provided by the 2.x branch of the conflict module as presented at Drupal Europe Darmstadt 2018. And here is the recorded session on YouTube - https://youtube.com/watch?v=ixlLskzT9c4

I am confused by your comment. Probably you weren't aware of my work on the conflict module? Should we arrange a call and talk about all this? Because at work we are in a pretty advanced state regarding parent tracking and conflict resolutions :).

Comment #27

hchonov

Bulgarian

Burgas

commented 22 January 2019 at 17:01

we can't use *any* code related to ER: selection plugins, auto-create support, widgets, formatters, etc. because they are designed to work with entity IDs, not revision IDs
while writing the field type on top of ERR, I found that I had to override almost every method of the parent class, to the point where it wasn't using even 10% of its code

Actually by saying

I personally think that it would be great to include EntityReferenceRevisionsItem from the entity_reference_revisions module into core and use it instead. This is a desired feature for a long time now and this here would be a great opportunity for it.

I've only meant that we need EntityReferenceRevisionsItem. We don't need any other logic. So we don't have to build on top of it, but we can copy the relevant parts - like the properties, the data type, the schema, the setter and getter methods. This would be enough for start.

Comment #28

amateescu commented 23 January 2019 at 15:38

It sounds logical to derive an entity from another one and keep track of the origin through the parent field.

I'm sorry but this actually sounds very project-specific and not a use case for core.

Currently we also update the parent revision reference when me merge a newer parent revision into the child. We don't use a merged revision field, because we don't see any reason for it. What would the core reason be for that?

Let me quote from https://en.wikipedia.org/wiki/Merge_(version_control):

In some cases, the merge can be performed automatically, because there is sufficient history information to reconstruct the changes, and the changes do not conflict. In other cases, a person must decide exactly what the resulting files should contain.

The field type introduced by this patch provides a way to efficiently store the Directed Acyclic Graph of an entity's revision history, as well as the capability to retrieve the Lowest Common Ancestor of two revisions, as needed by the three-way merge process. Having the ability to do a three-way merge during conflict resolution is very much needed because it maximizes the chances for automatic merging, thus providing a better UX for content editors.

I've only meant that we need EntityReferenceRevisionsItem. We don't need any other logic.

Even a "stripped down" EntityReferenceRevisionsItem would still have to extend core's EntityReferenceItem, which comes with the unnecessary baggage and downsides as pointed out in #20.

Comment #29

timmillwood

English

🏴󠁧󠁢󠁷󠁬󠁳󠁿 Wales, UK

commented 23 January 2019 at 17:54

Proper tracking of a revision's parent does not just allow for proper conflict management, but also for proper tracking and audit trails.

Tracking the parent revision also allows for full 3 way merges, and potentially automated conflict management. As well as revision graphs and the calculation of the lowest common ancestor. All of which makes Drupal increasingly powerful.

I fully support the current solution in this issue.

Comment #30

hchonov

Bulgarian

Burgas

commented 23 January 2019 at 20:06

I'm sorry but this actually sounds very project-specific and not a use case for core.

Could you state your arguments please? Drupal is a CMS, which has an API for duplicating an entity. The duplicated entity has all the data from entity it was duplicated from, so the intensions are to use it as a starting point. For having a complete history the CMS would have to track that information, but Drupal currently doesn't. Therefore I think, that this makes it a core feature.
If for example we take a food magazine, then we create a duplicate of some recipe and want to offer different extended variants by the users, which create a duplicate and extend/modify the new recipe, but a link between the recipes is automatically created. This actually sounds like a really great out of the box feature for Umami and for presenting Drupal.
I think that this feature might be used by a lot of projects and as such, it would be good to be part of the Drupal core.

In some cases, the merge can be performed automatically, because there is sufficient history information to reconstruct the changes, and the changes do not conflict. In other cases, a person must decide exactly what the resulting files should contain.

In my previous comment, I've made a reference to the conflict module and to the session about it. I am quoting the following from the module's page - https://www.drupal.org/project/conflict :

Conflicts which will be auto merged:
-changes in translatable fields in non-edited content entity translations
-changes in fields to which the user does not have access
--fields with no edit access
--fields not part of the entity form display
-changes in entity metadata
--revision ID
--changed timestamp

And we already do that and it works without any core additions.

Having the ability to do a three-way merge during conflict resolution is very much needed because it maximizes the chances for automatic merging, thus providing a better UX for content editors.

I know what a three-way merge is, because I've already implemented it in the the conflict module. There it is currently limited to the use case of concurrent editing. It keeps track of the initial entity used to build the form (yes, this is something that we need to solve more efficiently there). The second entity that is required is the current entity resulting from the form values and the third one is the newest entity from the storage.

-----

I guess, that you are talking about merging two entities from two different revision trees/branches? In that case which one would become the new parent? Realistically there would be two parent revisions then.
What happens if we merge rev 17 and rev 35 from two different revision branches and through this create rev 41. Now we want to merge rev 41 with a revision from a third revision branch, which has a common ancestor with either rev 17 or rev 35, but not with both of them.

Beside that question, I think that you could use the conflict module out of the box, if you provide it with the 3 required entities. In your case the entity from the storage matches the entity from the first revision branch, the edited entity the one from the second revision branch and the initial entity matches the common ancestor.

The field type introduced by this patch provides a way to efficiently store the Directed Acyclic Graph of an entity's revision history

Is it more efficient than an approach that would be using EntityReferenceRevisionsItem instead? If so, then why?

Even a "stripped down" EntityReferenceRevisionsItem would still have to extend core's EntityReferenceItem, which comes with the unnecessary baggage and downsides as pointed out in #20.

I don't think so, because we can at first create the new field type without UI, exactly like the one from your patch. What downsides exactly do you see in this case?
I haven't checked exactly, but probably it might be even possible to extend our current EntityReferenceItem to support revisions through a setting. Having the setting by default turned off would not be a BC break. Do you know if anybody has given this approach a try?

Comment #31

hchonov

Bulgarian

Burgas

commented 23 January 2019 at 20:12

@timmillwood, I am sorry, but I've somehow missed the intention of your comment. Nobody here is arguing against parent tracking and all of us want to have it. We are only discussing about the approach.

Comment #32

amateescu commented 24 January 2019 at 14:00

Could you state your arguments please?

That's my opinion, but it's up to framework and product managers to decide whether cross-entity revision graphs are a desirable feature for core.

I guess, that you are talking about merging two entities from two different revision trees/branches? In that case which one would become the new parent?

I am talking about merging two revisions of the same entity, for example edited by two users at the same time, possibly in different worskpaces as well. Both these cases mean that the revisions belong to different branches of the graph.

Realistically there would be two parent revisions then.

Not sure whether that's a question or not, but yes, that's why we need to track the "merge parent" in addition to the parent revision.

What happens if we merge rev 17 and rev 35 from two different revision branches and through this create rev 41. Now we want to merge rev 41 with a revision from a third revision branch, which has a common ancestor with either rev 17 or rev 35, but not with both of them.

I can answer that question only if you post the full revision graph.

Is it more efficient than an approach that would be using EntityReferenceRevisionsItem instead? If so, then why?

Because we store both parents in the same table (the entity type's revision table), instead of requiring a separate dedicated field table for each revisionable entity type.

I don't think so, because we can at first create the new field type without UI, exactly like the one from your patch. What downsides exactly do you see in this case?
I haven't checked exactly, but probably it might be even possible to extend our current EntityReferenceItem to support revisions through a setting. Having the setting by default turned off would not be a BC break. Do you know if anybody has given this approach a try?

I already said that this is how I started writing this field type, but realized that it's the wrong path to take and started with a fresh field type. Since you keep insisting on this point, here's the full list of methods from EntityReferenceItem that we would need to override:

defaultStorageSettings() because we already know the target type
defaultFieldSettings() because we can't use selection handlers
propertyDefinitions() because we need to add the merge_target_id property and remove the entity property
schema() same as above
setValue() because we don't need to keep target_id in sync with entity
getValue() because we don't support the auto-create feature (the hasNewEntity() call)
onChange() same as setValue()
isEmpty() because empty items are the roots of the tree, and we need to be able to store NULL values in the database for them
preSave() same as getValue()
generateSampleValue() because we need to generate sample parent IDs differently
storageSettingsForm() because we don't have the target_type storage setting
fieldSettingsForm() same as above and defaultFieldSettings()
hasNewEntity() because we don't have the auto-create feature
calculateDependencies() because we don't have default values or other dependencies
calculateStorageDependencies() because we don't have a configurable target type
onDependencyRemoval() same as above
getPossibleValues(), getPossibleOptions(), getSettableValues() and getSettableOptions() because we don't expose any kind of UI widgets so we don't need to implement OptionsProviderInterface
getPreconfiguredOptions() because we don't expose this field type in the UI

Which means that there are only two methods from EntityReferenceItem that we would use as-is: mainPropertyName() and getConstraints(), and the former might need to be overridden as well if we decide to not use target_id as the main property name.

It seems that my assessment from #20 was wrong, we wouldn't even use that 10% of the code from ER but more like 1%, and we would still have all the baggage that comes along with ER.

Comment #33

effulgentsia commented 30 January 2019 at 19:45

Issue tags:

+Needs framework manager review

Adding the "Needs framework manager review" tag since #2786133: WI: Phase B: Extend the revision API with support for parents still has it.

Raising this to Major, since it sounds like it could potentially help with a Critical (#2950869-25: Entity queries querying the latest revision very slow with lots of revisions) and with API-First (per #10, but I'm not sure if that's still accurate).

Comment #34

sam152 commented 31 January 2019 at 09:19

Kicking in a review of this, for what it's worth. Looks like really solid work 👍

+++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
@@ -398,6 +409,90 @@ public function getLatestTranslationAffectedRevisionId($entity_id, $langcode) {
+    $result = $this->getQuery()
+      ->condition($this->idKey, $entity->id(), '=')
+      ->sort($revision_key, 'ASC')
+      ->allRevisions()
+      ->execute();

Should this skip access checking?

+++ b/core/lib/Drupal/Core/Entity/EntityFieldManager.php
@@ -225,6 +225,14 @@ protected function buildBaseFieldDefinitions($entity_type_id) {
+      $base_field_definitions[$field_name] = BaseFieldDefinition::create('revision_tree')

Hm, the field isn't actually a whole tree is it? Just a reference to a single parent. Naming seems kind of confusing.

```
+++ b/core/lib/Drupal/Core/Entity/EntityFieldManager.php
@@ -225,6 +225,14 @@ protected function buildBaseFieldDefinitions($entity_type_id) {
+        ->setTranslatable(FALSE)
```
Have we validated that the parent field should be non-translatable with some tests that demonstrate how the field is populated when translations are added?

Just based on intuition, it feels like it should be translatable. Creating new revisions of a de translation, doesn't seem like it should change the parent of the en translation, if revision_translation_affected is FALSE.

+++ b/core/lib/Drupal/Core/Entity/EntityFieldManager.php
--- a/core/lib/Drupal/Core/Entity/KeyValueStore/KeyValueContentEntityStorage.php
+++ b/core/lib/Drupal/Core/Entity/KeyValueStore/KeyValueContentEntityStorage.php

We really should document somewhere that this entity storage is not really suitable for anything. I feel like I've seen it being used in the wild, where it really shouldn't be.

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
@@ -0,0 +1,148 @@
+/**
+ * Defines the 'revision_tree' field type.
+ *
+ * @FieldType(
+ *   id = "revision_tree",
+ *   label = @Translation("Revision tree"),
+ *   description = @Translation("An entity field containing a revision tree item."),
+ *   no_ui = TRUE,
+ *   cardinality = 1,
+ *   constraints = {"ValidRevisionTreeReference" = {}}
+ * )
+ */
+class RevisionTreeItem extends FieldItemBase {

Thinking about the naming a bit more, is this actually a node in the tree? That is, maybe "Revision tree node" is more correct and the tree is actually the whole collection of nodes.

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
@@ -0,0 +1,148 @@
+    $target_id_data_type = 'string';
+    if ($target_type->entityClassImplements(FieldableEntityInterface::class)) {
+      $id_definition = \Drupal::service('entity_field.manager')->getBaseFieldDefinitions($target_type->id())[$target_type->getKey('revision')];
+      if ($id_definition->getType() === 'integer') {
+        $target_id_data_type = 'integer';
+      }
+    }

~~Migrate also has \Drupal\migrate\Plugin\migrate\destination\EntityContentBase::getDefinitionFromEntity, I might be a useful utility for entity_field.manager itself.~~

Edit: looks like this does something else.

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
@@ -0,0 +1,148 @@
+          'type' => 'int',
...
+          'type' => 'int',
...
+          'type' => 'varchar_ascii',
...
+          'type' => 'varchar_ascii',

I wonder if you could use FieldStorageDefinitionInterface::getSchema, on the storage definition of the original ID field.

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
@@ -0,0 +1,148 @@
+  public function isEmpty() {
+    // An empty item is always a root.
+    return FALSE;
+  }

Hm, so wouldn't isEmpty be TRUE for roots? The comment doesn't explain the hard-coded return value IMO.

Comment #35

amateescu commented 1 February 2019 at 13:35

Status	File	Size
new	2725523-35.patch	49.17 KB
new	interdiff-35.txt	5.39 KB

@Sam152, thanks!

Re #34:

It sure does :)
Replied in 5. below
I discussed this point with @plach a while ago and he said that having it non-translatable is fine, mostly because we might be able to retire the whole "revision translation affected" stuff and provide a better replacement based on revision parents and conflict management.
Yeah, it gets in the way a lot :/
Wim raised a similar point in #18, so I guess we really need to do something about it. However, if we introduce the 'tree node' terminology, we might as well name it for what it really is, a graph :) So "Revision graph node" would be even more accurate. I'd like to get some more input on this point from other folks as well, maybe we can come up with something better.
That code is gone now, per the point below.
We definitely can, I cleaned up all that code which was mostly copied from ER.
I've expanded the comment, hope it's more clear now.

Comment #36

amateescu commented 1 February 2019 at 13:36

Priority:

Normal

» Major

Also raising to Major, per #33.

Comment #37

1 February 2019 at 14:30

Status:

Needs review

» Needs work

The last submitted patch, 35: 2725523-35.patch, failed testing. View results

Comment #38

amateescu commented 1 February 2019 at 14:56

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-38.patch	49.29 KB
new	interdiff-38.txt	1.66 KB

Oops :)

Comment #39

sam152 commented 4 February 2019 at 00:49

Thanks for addressing each point :)

#35.3: If there was conflict management on revisions, wouldn't that allow us to essentially have different branches of the graph for each translation, and the revision_parent field would have to be translatable to support that? I understand if this is something that isn't really a concern right now. I suppose translation doesn't affect the schema so it's an easy change later.
#35.8: Ah, I see. By saying "this field is never empty", NULL becomes a valid value and gets stored for the root node of the graph.

Comment #40

plach

he/him

Italian

Venezia

commented 5 February 2019 at 11:15

I read the discussion above about the current approach: to be honest I can't say whether the use case @hchonov is describing about deriving entities from other entities should be supported natively. On one hand, given that this is the first time it is proposed AFAIK, despite entity cloning being a thing for quite a while, I'd tend to think that it must not be that common. OTOH it might be that once available to a widespread audience it could end up being a valuable solution.

If I understood correctly @hchonov's examples, it seems the main use case for this approach are entities composed by sub-entities, so I asked @Berdir to chime in here and provide us his view on how one approach or the other might affect Paragraph's ability to support conflict management.

I guess that implementing revision references via an EntityRevisionReferenceItem would support both use cases and thus would be more flexible, OTOH it would likely involve a more complex logic and might be harder to implement and support. In the latter case Conflict might be able to keep things working as they are now by just adding an ERR field, although integrating with core's built-in functionalities might require some workarounds at that point. Maybe the ability to support conflict management between different entities could be separated out to a submodule?

I wish we had had this discussion before the 8.x-2.x branch was opened, however luckily not many sites are using that yet.

Code review of the current patch, solid work indeed :)

            +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
            @@ -44,6 +45,13 @@
            +  protected $ancestorIds = [];

I could not spot any invalidation logic for this: that's ok because we never update revision data, right?

            +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
            @@ -321,6 +329,9 @@ public function createRevision(RevisionableInterface $entity, $default = TRUE, $
            +    $new_revision->revision_parent->target_id = $entity->getLoadedRevisionId();

What about adding accessor methods for parent / merge parent on RevisionableInterface?

            +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
            @@ -398,6 +409,91 @@ public function getLatestTranslationAffectedRevisionId($entity_id, $langcode) {
            +  protected function getAllRevisionParents(RevisionableInterface $entity) {
            ...
            +    $result = $this->getQuery()
            ...
            +    foreach ($this->loadMultipleRevisions(array_keys($result)) as $revision_id => $revision) {

Technically, the base implementation and the SQL-specific (optimized?) version are not equivalent, because in the former we are invoking hooks. What about using an aggregate entity query to retrieve field data without having to load the entity objects in the base implementation?

            +++ b/core/lib/Drupal/Core/Entity/RevisionableStorageInterface.php
            @@ -65,4 +65,20 @@ public function deleteRevision($revision_id);
            +  public function getLowestCommonAncestor(RevisionableInterface $entity, $first_revision_id, $second_revision_id);

This is returning a revision ID, should we rename it to ::getLowestCommonAncestorId()? Also, do we actually need an entity object here? Could we pass an entity ID for consistency?

```
            +++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
            @@ -0,0 +1,122 @@
            + * Defines the 'revision_tree' field type.
        
```
My initial reaction based on #38.5 was to propose to name this revision_parent, since it implements a very specific logic that fits only this particular use case, I believe. However this led me to think that @yched always used to say that field items should only be data buckets and NOT wrap any logic, so maybe we should implement a RevisionReferenceItem instead? We could then add revision_parent and revision_merge_parent fields to revisionable entity types and move the logic to ContentEntityBase::preSaveRevision()?

            +++ b/core/modules/system/system.install
            @@ -2173,3 +2174,146 @@ function system_update_8501() {
            +    $parent_revision_field_storage_definition = $entity_definition_update_manager->getFieldStorageDefinition($entity_type->getRevisionMetadataKey('revision_parent', FALSE), $entity_type_id);

Extra param passed to getRevisionMetadataKey.

            +++ b/core/modules/system/system.install
            @@ -2173,3 +2174,146 @@ function system_update_8501() {
            +    $id_column = $table_mapping->getFieldColumnName($id_field_storage_definition, 'value');
            +    $revision_id_column = $table_mapping->getFieldColumnName($revision_id_field_storage_definition, 'value');
            +    $revision_parent_column = $table_mapping->getFieldColumnName($parent_revision_field_storage_definition, 'target_id');

Were the property names hardcoded on purpose?

            +++ b/core/modules/system/system.install
            @@ -2173,3 +2174,146 @@ function system_update_8501() {
            +      ->range($sandbox[$entity_type_id]['current'], $sandbox[$entity_type_id]['current'] + $steps)

To avoid issues with added/removed data, could we adopt an approach similar to SqlFieldableEntityTypeListenerTrait:227 here?
Or hardcoding assumptions about revision IDs being integers would not be ok?

            +++ b/core/tests/Drupal/KernelTests/Core/Entity/EntityRevisionTreeTest.php
            @@ -0,0 +1,124 @@
            +   * @var \Drupal\Core\Entity\EntityStorageInterface|\Drupal\Core\Entity\RevisionableStorageInterface

To be merged :)

            +++ b/core/tests/Drupal/KernelTests/Core/Entity/EntityRevisionTreeTest.php
            @@ -0,0 +1,124 @@
            +    // Create a complex revision graph:
            +    //
            +    //        3 -- 4 -- 5 -- 8
            +    //       /         /      \
            +    // 1 -- 2 -- 6 -- 7 ------ 9 --- 16
            +    //            \                 /
            +    //             10 -- 11 ------ 15
            +    //              \             /
            +    //               12 -- 13 -- 14
            +    //

            +++ b/core/tests/Drupal/KernelTests/Core/Entity/EntityRevisionTreeTest.php
            @@ -0,0 +1,124 @@
            +      5 => [4, 7],

This looks weird: how can a parent ID be higher than the child ID? Shouldn't that be 8 => [5, 7]?

Edit: fixed some formatting issues.

Comment #41

amateescu commented 12 February 2019 at 16:22

Status	File	Size
new	2725523-41.patch	53.4 KB
new	interdiff-41.txt	29.19 KB

Thanks for the review, @plach!

I've managed to rework the patch and make the current way of storing two revision parents in a single field item an implementation detail of the RevisionTree field type. Through the new EntityRevisionReferenceFieldItemListInterface added in this patch, the revision_parent field can now be swapped with an entirely different implementation like EntityReferenceRevision from contrib, assuming that it will implement this new interface.

Also extracted the code and logic for getting the lowest common ancestor into a service, so it's easier to swap with a custom implementation rather than replacing the entire entity storage class.

Re #40:

Yup, but this static cache is gone now because I think it was a premature optimization.
Done! Except that there's no distinction between the parent ID and the merge parent ID, they're both just parent IDs :)
Fixed because the new implementation only works with the SQL entity storage class.
Sure, renamed as suggested.
We discussed this a lot and I said I'll try to check if we can move the logic to ContentEntityBase::preSaveRevision(), but I don't think it's possible because we need the ability to set a default value for the parent ID only when it wasn't set manually, and we can't influence the onChange() method of the field type class from ContentEntityBase::preSaveRevision(). In any case, setting a default value is not what I would call logic, and ChangedItem does very similar things.
Fixed.
Nope, my bad :) Fixed.
We've tried very hard so far to not force revision IDs to be integers.. there's even an issue to codify the int|string expectation in the doxygen blocks of various methods that deal with revision IDs.
Done!
:D
You're right, since those are supposed to be revision IDs, the ancestry has to be sequential. Fixed :)

Comment #42

12 February 2019 at 17:39

Status:

Needs review

» Needs work

The last submitted patch, 41: 2725523-41.patch, failed testing. View results

Comment #43

amateescu commented 12 February 2019 at 20:03

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-43.patch	52.67 KB
new	interdiff-43.txt	13.29 KB

Those test failures are just a copy/pasta mistake in the update path function.

Aside from that, I changed my mind about introducing a new EntityRevisionReferenceFieldItemListInterface. ERR in contrib would have to depend on core 8.7 to be able to use it, and that's not very realistic in the short term. So we should simply use the existing EntityReferenceFieldItemListInterface, which is also used by ERR right now :)

Comment #44

12 February 2019 at 21:50

Status:

Needs review

» Needs work

The last submitted patch, 43: 2725523-43.patch, failed testing. View results

Comment #45

amateescu commented 13 February 2019 at 08:50

Status:

Needs work

» Needs review

That test fail will be fixed by #2975081: UpdatePathTestBase fails to re-initialize the test site (rebuild container, clear caches) after running the database updates. In the meantime, the patch is still good to review.

Comment #46

pancho

UTC+2 🇪🇺 EU

commented 13 February 2019 at 17:03

Nice work!

Just a side note:

Aside from that, I changed my mind about introducing a new EntityRevisionReferenceFieldItemListInterface. ERR in contrib would have to depend on core 8.7 to be able to use it, and that's not very realistic in the short term.

True, however the 9.0-dev branch will be opening soon, and if D9 is expected to have ERR in core at some point, we should get prepared as early as possible, particularly help ERR integrate with Core as tightly as possible.

Comment #47

amateescu commented 14 February 2019 at 21:45

Status	File	Size
new	2725523-47.patch	52.53 KB

Rerolled the patch to chase latest HEAD.

@Pancho, the interface that I mentioned was entirely new code added in #41 that doesn't exist in the contrib ERR module, so this patch is not making it any easier or harder to get ERR in core at some point :)

Comment #48

pancho

UTC+2 🇪🇺 EU

commented 14 February 2019 at 22:36

so this patch is not making it any easier or harder to get ERR in core at some point :)

I know. But if we could make it easier, we probably should.

Comment #49

hchonov

Bulgarian

Burgas

commented 18 February 2019 at 20:50

Re #40:

I can't say whether the use case @hchonov is describing about deriving entities from other entities should be supported natively. On one hand, given that this is the first time it is proposed AFAIK, despite entity cloning being a thing for quite a while, I'd tend to think that it must not be that common. OTOH it might be that once available to a widespread audience it could end up being a valuable solution.

If I understood correctly @hchonov's examples, it seems the main use case for this approach are entities composed by sub-entities, so I asked @Berdir to chime in here and provide us his view on how one approach or the other might affect Paragraph's ability to support conflict management.

Actually not. What we do is to export our original content and then import it to another site. There we allow it to be modified - no matter if there are sub-entities involved or not. When a user wants to modify the content, then we create a duplicate of the entity and through a custom parent reference we make a link between both entities. Then we make it possible to merge changes between both entities, while they diverge. I do think that distributing content is a pretty common case. And allowing Drupal out of the box for tracking this would be super nice feature.

As you've said :

OTOH it might be that once available to a widespread audience it could end up being a valuable solution.

How can we make a decision on this?

I guess that implementing revision references via an EntityRevisionReferenceItem would support both use cases and thus would be more flexible, OTOH it would likely involve a more complex logic and might be harder to implement and support. In the latter case Conflict might be able to keep things working as they are now by just adding an ERR field, although integrating with core's built-in functionalities might require some workarounds at that point. Maybe the ability to support conflict management between different entities could be separated out to a submodule?

The request of the community for supporting reference by revision is there for quite some time as well and there is an issue for this - #2812595: Support revision tracking of entity references in core. I think that this will be amazing for the community, especially given the fact that we are doing more and more stuff with revisions. So many new features arise regarding revisions, so I think that it is time that we finally make it a core feature to reference by revision. If we are going to invest time and resources, then let us solve the problem in a much general form, where a bigger part of the community will benefit from. The effort will be higher, but the win will be even higher.

I wish we had had this discussion before the 8.x-2.x branch was opened, however luckily not many sites are using that yet.

What do you mean when you say "luckily"? I am completely open to talks about the further development of the branch and we still can adjust things to fit better for core. I've been developing the module with high quality and documentation as I can in order for it to become a proud core member some day :). I would be happy to cooperate with you on it, so that we make it better and suitable for a wide range of use cases before it enters core.
Actually it would have been great, if you would've had involved me in your conversations and discussions about the approach in this issue here, because I've already have some pretty good experiance in dealing with conflicts and providing conflict resolutions, because I've invested an enourmous amount of time and work on making conflict 2.x working out-of-the-box for all kind of content entities and I don't even require an entity to be revisionable in order to provide concurrent editing and conflict resolutions.

----------------------

Independently from the disccusion about the parent field and how we want to implement it, there is something I don't really understand about the merged tracking and why exactly it is considered when dealing with revisions. Could you please provide some more in-depth reasoning about this?

Consider the following graph:
     10 (A)
    /       \
20 (A)    30 (B)
   \         /    \
    21 (A)      31 (B)
    /
   22 (A)

Here we have the revision 10 from the revision branch 1.x. Revision 10 has the value "A" on some random field. From revision 10 we create two revision branches - 2.x and 3.x - with the respective revision IDs 20 and 30.

On the revision branch 2.x the field value remains intact. On the revision branch 3.x the value is changed to "B".

Then we merge revision 30 into revision 20 and save the result as a revision 21 in the 2.x branch - still without changing the value of the field.

From 21 create 22 in the revision 2.x branch.
From 30 create 31 in the revision 3.x branch.

The next step is to merge 31 from revision branch 3.x into 22 from revision branch 2.x.

Now according to the latest patch the common ancestor for this merge will be 30 from revision branch 3.x. However this is not really true, because only some part of 30 has been merged into 21, but 30 doesn't represent all the common values they both shared, which are only truly and completely represented by 10.

When you perform a 3-way merge, then yes you need a common ancestor, however you need the one from which the two branches separated apart. One should ask the question why is it actually needed - and the answer is to find out what really has changed since the entities drove apart.

Here I would argue that it is completely enough to define 10 as the common ancestor. Why? Because if we merge an entity from the 3.x branch into the 2.x branch without taking the changed value from the 3.x branch and then later we want to merge the 3.x branch into the 2.x branch again and if this value has not yet changed since the last merge, then we have to detect this. We have have to detect, that the value has not changed in 2.x but only in 3.x. If we use 10 as common ancestor then we can detect this. If we use 30 as a common ancestor we cannot detect it.

In this case it is even wrong to define 30 as a common ancestor. Why? Because then we would consider the value of the 2.x branch as the one that has changed compared to the common ancestor of the 3.x branch. This however is not true - the value has not changed since we've initialy created the 2.x branch. It is wrong to assume that the value has changed on the 2.x branch. Why is this is wrong? Because if you take as an example concurrent editing then you need to know if the value has been changed in the current session (revision branch 2.x) or on the server (revision branch 3.x). If you define 30 as the common ancestor, then you will assume that the value has been changed by the current user - which however is not true. Assuming this might lead to overwriting and reverting values if using automatic merges. Even if you provide a conflict resolution UI to the user then you would show the local value as changed compared to the common ancestor - and the user will be confused.

Also if the user has decided to merge values of the 3.x branch into the 2.x branch, then those will not be identified as conflicts at all and therefore there is actually no need for having access to the "merged" revision.

A good example is:
1. The user of the 2.x branch tries to save.
2. Conflicts with the 3.x branch are detected.
3. A conflict resolution UI is provided to the user.
4. The user accepts some of the changes on the 3.x branch, but some not - the "A" values still remains intact.
5. After the conflict resolution is finished, then the user continues editing the form, as after the conflict resolution some adjustments might be needed.
6. The user tries to save again and a new conflict is detected, because there was an update on the 3.x branch in the meanwhile.
7. Now we should not detect the value "A" as changed by the user of the 2.x branch, but we should be able to identify it as only changed in the 3.x branch.

Based on this I think that a core solution for finding the common ancestor should be implemented as easy and transparent as possible and be defined as the entity at which point the revision branches were split. I think that core should stick to а general solution and if there is a need for something so specific like the merged revision, then this can be implemented in contrib and we could offer the API for exchanging the common ancestor. This general approach is also much more understandable and completely enough.

Comment #50

hchonov

Bulgarian

Burgas

commented 17 February 2019 at 21:47

P.S. I'll be visiting Drupal Mountain Camp Davos 07.-10.03.2019 and there I will be working on #2812595: Support revision tracking of entity references in core.

@amateescu, @plach it would be great if we could meet there and talk further before this issue reaches 200+ comments, at it is already on that way :). Otherwise we could schedule a web conversation.

Comment #51

amateescu commented 19 February 2019 at 12:03

Status	File	Size
new	2725523-51.patch	52.16 KB
new	interdiff-51.txt	29.99 KB

@hchonov, I'm not going to quote all that wall of text, but I'll answer on each main "topic":

Why not use the ERR field type for tracking revision parents: because we need to track multiple parents, and using ERR for that means we would have to add a dedicated field table for each revisionable entity type. Since we know that we can only have a maximum of two parent revisions, the field type introduced by this patch provides an optimized storage by keeping them in two properties of the same field item so they can be stored in the shared entity revision table.

Why do we need to track multiple parents: that's the definition of a merge revision. A merge revision has all of the parents which are the last revision of each involved branch. Specifically, the parent on the current branch is the first parent. The other ("merge") parent is the last revision of the merged branch.

Regarding your example graph: that's a problem that can happen in git as well, and it's really well described in this answer on SO. Git doesn't prevent you from working further in the 3.x branch after it was merged into 2.x (i.e. creating revision 31 with 30 as the parent), but, in our case, we could decide that we don't want to allow users to get into that situation and forbid parents that are part of a "closed branch".

In an autosave scenario, where the user continues editing revision 30 without knowing that the 3.x branch has been merged into 2.x, we can write some code in preSaveRevision() which detects that the parent revision (which was assigned automatically when the editor started working on it) is now part of a closed branch and re-assign a proper parent, 21 in this case. This allows the 2.x and 3.x branches to live separately as they were before the merge, and a subsequent merge would have to resolve the (same) conflict, there's really no other way around that.

As for the proposal to "define the common ancestor as the entity at which point the revision branches were split", please take a look at the example revision graph from the issue summary of #3023194: [PP-2] Add parallel revisioning support and clarify how would your solution work exactly.

In any case, I've done a few updates to the patch:

added a getter and a setter for the parent revision IDs on RevisionableInterface, and a default implementation in ContentEntityBase which works regardless of the field type used to store the parents;
moved the validation to be on the entity level instead of the field type level, which makes it work with any kind of revision reference field as well;
removed the constraint for the parent revisions that needed to belong to the same entity, so you can now reference two arbitrary revisions of an entity type, which satisfies your use case for cross-entity parent tracking;
the logic for determining the LCA of two revisions now lives in a entity.revision_tree service and it doesn't require an $entity_id argument anymore, which makes it easy to swap with a custom implementation (this was done in #41).

All this means that it's really easy to swap the optimized field type provided by core with an ERR field if needed, and also that your specific use case is completely supported by the current approach.

Comment #52

19 February 2019 at 13:10

Status:

Needs review

» Needs work

The last submitted patch, 51: 2725523-51.patch, failed testing. View results

Comment #53

amateescu commented 19 February 2019 at 13:29

Status:

Needs work

» Needs review

That fail will be fixed by #2975081: UpdatePathTestBase fails to re-initialize the test site (rebuild container, clear caches) after running the database updates, keeping at NR to get more eyes on this until that issue gets in.

Comment #54

amateescu commented 20 February 2019 at 13:13

Status	File	Size
new	2725523-54-combined.patch	74.17 KB
new	2725523-54-for-review.patch	51.76 KB
new	interdiff-54.txt	6.41 KB

While working on #2950869: Entity queries querying the latest revision very slow with lots of revisions I realized that we must set the (first) parent automatically to always be the active revision of the branch. Here's an update for that which also fixes @plach's point from #40.5.

Also posting a patch combined with #2975081: UpdatePathTestBase fails to re-initialize the test site (rebuild container, clear caches) after running the database updates to check if we're good on all supported database drivers.

Comment #55

20 February 2019 at 14:23

The last submitted patch, 54: 2725523-54-combined.patch, failed testing. View results

Comment #56

amateescu commented 20 February 2019 at 16:05

Status	File	Size
new	2725523-56-combined.patch	78.65 KB
new	2725523-56-for-review.patch	56.24 KB
new	interdiff-56.txt	7.61 KB

Those test fails reminded me that we should keep the ability to set the revision parents manually if needed, so we need to do the same "enforce" dance as for the revision translation affected key. Also restored the test coverage for it.

Comment #57

berdir

German

Switzerland

commented 25 February 2019 at 22:39

```
+++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
@@ -322,6 +329,58 @@ public function updateLoadedRevisionId() {
+    }
+
+    $revision_parent_key = $entity_type->getRevisionMetadataKey('revision_parent');
+    $revision_parent_field = $this->get($revision_parent_key);
+
```
I'm just jumping in without having read much of this issue except the issue summary.

Thinking about paragraphs, I'm wondering if there should be a way for entity types to opt out of this. We already don't track our own changed date, uid as well as most of the revision metadata because each paragraph revision is part of the larger composite which is the parent and its referenced paragraphs.

When we merge two host revisions with paragraphs, we don't really care about the parent of the paragraphs revisions but the paragraph revisions referenced by the host entity.
```
+++ b/core/lib/Drupal/Core/Entity/EntityRevisionTree.php
@@ -0,0 +1,122 @@
+   */
+  protected function getAllRevisionParents($entity_type_id, $entity_id) {
+    $entity_type = $this->entityTypeManager->getDefinition($entity_type_id);
+    $storage = $this->entityTypeManager->getStorage($entity_type_id);
+    assert($storage instanceof SqlEntityStorageInterface);
```
This has probably been discussed, but I'm wondering what the reason for this being a separate service is. A site could theoretically have one entity type in the SQL and another in MongoDB. Or it could have a different implementation of an SQL backend that doesn't work with the same type of structure?

This being a dedicated service, even if we'd tag it with backend_overridable, would not allow to provide different implementations for different entity types?

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionParentConstraintValidator.php
@@ -0,0 +1,87 @@
+  public function validate($entity, Constraint $constraint) {
+    /** @var \Drupal\Core\Entity\RevisionableInterface $entity */
+    if (!isset($entity)) {
+      return;
+    }

what about doing this as an actual $entity instanceof RevisionableInterface check, then you get code inspection too without a @var.

Also, if we'd make having a parent field optional then this validator would also need to be optional, as well as various places that define the field definition and so on.

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
@@ -0,0 +1,89 @@
+   */
+  public static function schema(FieldStorageDefinitionInterface $field_definition) {
+    $target_type = \Drupal::entityTypeManager()->getDefinition($field_definition->getTargetEntityTypeId());
+
+    /** @var \Drupal\Core\Field\BaseFieldDefinition $revision_field_definition */
+    $revision_field_definition = \Drupal::service('entity_field.manager')->getBaseFieldDefinitions($target_type->id())[$target_type->getKey('revision')];
+    $revision_column_schema = $revision_field_definition->getSchema()['columns'][$revision_field_definition->getMainPropertyName()];

a field schema that is dynamically depending on the schema of another field on the same entity type is slightly scary :)

+++ b/core/lib/Drupal/Core/Field/RevisionTreeItemList.php
@@ -0,0 +1,40 @@
+  /**
+   * {@inheritdoc}
+   */
+  public function set($offset, $value) {
+    parent::set($offset, $value);
+
+    // Store the second revision parent in its dedicated property.
+    if ($offset === 1) {
+      $this->list[0]->set('merge_target_id', $value, TRUE);
+    }
+
+    return $this;

Is there a specific reason (I actually wrote revision first, too much revision going on here) for this magic?

It does seem to assume that $value is a scalar, I'm also not quite sure why it calls the parent *and* this?

```
+++ b/core/modules/system/system.install
@@ -2238,3 +2239,146 @@ function system_update_8601() {
+/**
+ * Fill values for the 'revision_parent' field.
+ */
+function system_update_8702(&$sandbox = NULL) {
+  $entity_definition_update_manager = \Drupal::entityDefinitionUpdateManager();
+  $entity_type_manager = \Drupal::entityTypeManager();
```
A bit worried about the execution time of these update functions, not just this but others that 8.7 is adding. Didn't review in detail yet, but if I see this correctly we're doing a select + ûpdate for each revision, plus a select to get the next batch of revisions.

We are using the default entity_update_batch_size step size here, but there's a big difference between an entity save and a select + update, so possibly this could be considerably higher and a bit more efficient then. Not sure if it's worth it, it will be called for at least one second per run.

Would be great if we could test this on a database with a million nodes or so. I did a somewhat similar update on paragraphs and tested that with 800k paragraphs. It was surprisingly fast but did take a while. I know that my laptop is also 3x faster than e.g. a MySQL Galera cluster due far less disk/network overhead and my MySQL being configured for speed and not secure data.

I'm wondering if a Drupal site could technically work without all those revision parents being filled out and if we could do that in the background on cron/batch. But I suppose that's tricky as soon as we start to actually save a revisionable entity? And also other upcoming things depending on this.

I'll try to test this on a larger site but I imagine it is a bit tricky as I will need to first update our distribution to 8.7 and then update the real site with that. And this patch likely doesn't apply to 8.6.

Comment #58

berdir

German

Switzerland

commented 25 February 2019 at 23:36

Status:

Needs review

» Needs work

Wasn't that hard to get that working, but the update died :)

Executing system_update_8701 [15.36 sec, 35.56 MB]                                                                                                                                                                                                                               
Field storage definition for 'langcode' could not be found. [40.14 sec, 75.16 MB]

Seems to die on https://www.drupal.org/project/profile, which is revisionable but not translatable. Possibly caused by something else in 8.7, didn't debug further, but here's the exception backtrace:

Drupal\Core\Field\FieldException: Field storage definition for 'langcode' could not be found. in web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorageSchema.php:870
Stack trace:
#0 web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorageSchema.php(159): Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema->getEntitySchema(Object(Drupal\Core\Entity\ContentEntityType), true)
#1 web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorageSchema.php(344): Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema->requiresEntityStorageSchemaChanges(Object(Drupal\Core\Entity\ContentEntityType), Object(Drupal\Core\Entity\ContentEntityType))
#2 web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php(1466): Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema->onEntityTypeUpdate(Object(Drupal\Core\Entity\ContentEntityType), Object(Drupal\Core\Entity\ContentEntityType))
#3 web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php(1546): Drupal\Core\Entity\Sql\SqlContentEntityStorage->Drupal\Core\Entity\Sql\{closure}()
#4 web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php(1467): Drupal\Core\Entity\Sql\SqlContentEntityStorage->wrapSchemaException(Object(Closure))
#5 web/core/lib/Drupal/Core/Entity/EntityTypeListener.php(92): Drupal\Core\Entity\Sql\SqlContentEntityStorage->onEntityTypeUpdate(Object(Drupal\Core\Entity\ContentEntityType), Object(Drupal\Core\Entity\ContentEntityType))
#6 web/core/lib/Drupal/Core/Entity/EntityDefinitionUpdateManager.php(233): Drupal\Core\Entity\EntityTypeListener->onEntityTypeUpdate(Object(Drupal\Core\Entity\ContentEntityType), Object(Drupal\Core\Entity\ContentEntityType))
#7 web/core/modules/system/system.install(2280): Drupal\Core\Entity\EntityDefinitionUpdateManager->updateEntityType(Object(Drupal\Core\Entity\ContentEntityType))
#8 vendor/drush/drush/commands/core/drupal/update.inc(65): system_update_8701(Array)
#9 vendor/drush/drush/commands/core/drupal/batch.inc(163): drush_update_do_one('system', 8701, Array, Object(DrushBatchContext))
#10 vendor/drush/drush/commands/core/drupal/batch.inc(111): _drush_batch_worker()
#11 vendor/drush/drush/includes/batch.inc(98): _drush_batch_command('151137')
#12 vendor/drush/drush/commands/core/drupal/update.inc(235): drush_batch_command('151137')
#13 vendor/drush/drush/commands/core/core.drush.inc(1228): _update_batch_command('151137')
#14 vendor/drush/drush/includes/command.inc(422): drush_core_updatedb_batch_process('151137')
#15 vendor/drush/drush/includes/command.inc(231): _drush_invoke_hooks(Array, Array)
#16 vendor/drush/drush/includes/command.inc(199): drush_command('151137')
#17 vendor/drush/drush/lib/Drush/Boot/BaseBoot.php(67): drush_dispatch(Array)
#18 vendor/drush/drush/includes/preflight.inc(66): Drush\Boot\BaseBoot->bootstrap_and_dispatch()

Should be fairly easy to reproduce by installing profile.module, possibly create some data and then run the update...

Comment #59

berdir

German

Switzerland

commented 26 February 2019 at 07:24

Ok, just skipped profile. With 800k node revisions and a bit over 800k paragraph revisions, the update took about 780s, then went into some kind of loop and didn't finish. Seen that before when the sum query disagree over the amount of rows that are actually returned, could be some kind of cruft in the table or so. Usually I also add an exit condition if the last run returned less than 50 results.

And I'm sure this isn't the largest site by far :)

One thing I see that causes it to slow down is that LIMIT HUGE_NUMBER, 50 is inefficient:

SELECT t.revision_id AS revision_id FROM paragraphs_item_revision t WHERE (revision_id < '2822025') limit 797020, 50;
50 rows in set (0.15 sec)

It's just 0.15s at the end of the process, but still, that sums up quickly, 800k rows means we have to call the update function 16k times. It's much faster if you invert it and select the next 50 revision ids that are higher:

SELECT t.revision_id AS revision_id FROM paragraphs_item_revision t WHERE (revision_id > '2822025') limit 50;
50 rows in set (0.00 sec)

For this query, that would mean that you need to store the last updated revision_id in the sandbox, you don't even need the MAX because per above, you can simply stop if you get less than $batch_size results.

Comment #60

hchonov

Bulgarian

Burgas

commented 26 February 2019 at 13:50

Re #51:

@hchonov, I'm not going to quote all that wall of text, but I'll answer on each main "topic":

If I did not wanted/expected an extended answer then I would not have put that much effort into writing it all down. This is why I've asked for a call where we can clarify all the things and prevent misunderstandings. Once again I propose that we schedule a call. What do you think?

Why not use the ERR field type for tracking revision parents: because we need to track multiple parents, and using ERR for that means we would have to add a dedicated field table for each revisionable entity type. Since we know that we can only have a maximum of two parent revisions, the field type introduced by this patch provides an optimized storage by keeping them in two properties of the same field item so they can be stored in the shared entity revision table.

Once again you introduce a specific field and invest work into a specific solution instead into a general one where the whole Drupal community will benefit. Why? Why does this feature have higher priority than the community feature request for referencing revisions out of the box? Why don't we first implement a general solution and then if really needed re-use it here or adjust it for a specific use case here?

What is the problem of having a dedicated table? When #2620980: Add static and persistent caching to ContentEntityStorageBase::loadRevision() gets in, then we don't have to worry about any optimizations regarding entity revisions, as nothing can outperform a revision cache, where we should not constantly join all the tables just for loading an entity revision. Therefore I don't think that this argument really counts. And if it does not count as an argument, then we can separate the parent and the merged revision problems and have two dedicated issues instead of this big one. We can then focus on two different problems. What about this?

Regarding your example graph: that's a problem that can happen in git as well, and it's really well described in this answer on SO. Git doesn't prevent you from working further in the 3.x branch after it was merged into 2.x (i.e. creating revision 31 with 30 as the parent), but, in our case, we could decide that we don't want to allow users to get into that situation and forbid parents that are part of a "closed branch".
In an autosave scenario, where the user continues editing revision 30 without knowing that the 3.x branch has been merged into 2.x, we can write some code in preSaveRevision() which detects that the parent revision (which was assigned automatically when the editor started working on it) is now part of a closed branch and re-assign a proper parent, 21 in this case. This allows the 2.x and 3.x branches to live separately as they were before the merge, and a subsequent merge would have to resolve the (same) conflict, there's really no other way around that.

This is not possible. This kills concurrent editing. If 2.x is the current user session and 3.x the default revision branch then you cannot close either of them.

The user of the 2.x branch works and wants to save at some point in time, but the default revision branch was updated and therefore there is a conflict now.
Then the user gets a conflict resolution UI, where the user can merge all the changes or just pick a few of them from the default revision branch 3.x.
After the conflict resolution the user might need some more time to adjust things because of the merged 3.x branch.
In the meantime it should be possible for other users to save in the default revision.
After some time the user of the 2.x branch decides to save, but now there might be a new conflict.
A new conflict resolution should occur.

We cannot disable such use cases just to "announce" the merged revision as the common ancestor of two entities instead of the one entity where both branches have split.

As for the proposal to "define the common ancestor as the entity at which point the revision branches were split", please take a look at the example revision graph from the issue summary of #3023194: [PP-2] Add parallel revisioning support and clarify how would your solution work exactly.

I have asked for an in-depth reasoning about the merge revision tracking and using it as a common ancestor. Could you please provide this information, as I cannot find it, but it would be really useful here. Without it it is still hard for me to understand why/what for exactly do you need the merged revision beside "history" tracking. Also regarding the referenced issue please state me a specific question and what exactly would not work if we use the real parent of both revision branches instead of the merged revision?

I've also made a statement that it is wrong to assume the merged revision as a common ancestor, because the entity revision where both branches have split is the real and truly ancestor of both revision branches and not the merged one. Could you please comment on that?
Please consider that if you merge 3.x into 2.x and don't modify the merged fields and then later want to merge 3.x into 2.x again then there will be no conflicts when using the entity from the 1.x branch as the common ancestor because the values didn't change.

P.S.
Please note that we already have a "parent" field in core, which is allowed to reference a different entity. I am talking about the taxonomy term's parent field. The taxonomy term entity type should be converted to revisionable in #2880149: Convert taxonomy terms to be revisionable and I've just posted there the question what is going to happen with the parent field. Because there it might make sense to track changes in the vocabulary and for that we would need an entity revision reference field to reference the target revision of the parent in order to be able to fully reconstruct a vocabulary at any certain time point.

This and the use case of decoupling (creating duplicate) entities and tracking changes between the original and the decoupled entity I've given earlier in the current issue just show how useful an entity revision reference field will be. We are even already doing so much things in core with revisions and still don't offer the ability to reference by revision.

Comment #61

amateescu commented 27 February 2019 at 14:59

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-61-combined.patch	80.08 KB
new	2725523-61-for-review.patch	57.24 KB
new	interdiff-61.txt	26.88 KB

@Berdir, thanks for the review and the thorough testing! That was very very useful :)

Re #57:

We discussed this further and agreed it's not really useful to have the ability to disable this functionality, mostly because a few other followups intend to use the revision tree introduced here for various purposes:
- fix the critical performance problem from #2950869: Entity queries querying the latest revision very slow with lots of revisions
- use the logic for finding the LCA of two revisions in #2867707: WI: Phase H: Conflict management and local workspace merging support
- possibly re-write the "latest affected" functionality based on revision (translation) parents
Very good point. Converted it to be an entity handler instead.
Done :)
And yet it works surprisingly well, even for the ER field type: #2680571-41: Cannot reference an entity with a big int (size setting of ID field not propagated to reference field) (don't believe those test fails, they're just about "unexpected" schema changes)
The reason for that whole item list class was my attempt to make it very easy to swap this field type with an ERR one, but it's not really working and it increases the complexity for no good reason, so I dropped that approach.
Implemented your suggestion from #59, which also makes the logic of this update function very similar to the one used by SqlFieldableEntityTypeListenerTrait::copyData().

I still have to look into #58.

Edit: I installed the Profile module on my local 8.7.x codebase without this patch, created a profile and then applied the patch and ran the update function and everything worked fine. I know that for you it failed before this step, but the profile that I created had two revisions and the second one had the proper parent assigned by system_update_8702().

Comment #62

27 February 2019 at 14:58

The last submitted patch, 61: 2725523-61-combined.patch, failed testing. View results

Comment #63

amateescu commented 27 February 2019 at 16:30

@hchonov:

Once again you introduce a specific field and invest work into a specific solution instead into a general one where the whole Drupal community will benefit. Why? Why does this feature have higher priority than the community feature request for referencing revisions out of the box? Why don't we first implement a general solution and then if really needed re-use it here or adjust it for a specific use case here?

Because I don't think this feature is a good use case for ERR, for the reasons explained in the part of my comment that you quoted just above these questions.

What is the problem of having a dedicated table? When #2620980: Add static and persistent caching to ContentEntityStorageBase::loadRevision() gets in, then we don't have to worry about any optimizations regarding entity revisions, as nothing can outperform a revision cache, where we should not constantly join all the tables just for loading an entity revision. Therefore I don't think that this argument really counts.

The optimizations provided by this field type are about storage: less tables, less data that needs to be stored, faster inserts, etc., it has nothing to do with loading revisions. Therefore I think the argument counts.

This is not possible. This kills concurrent editing. If 2.x is the current user session and 3.x the default revision branch then you cannot close either of them.
...
We cannot disable such use cases just to "announce" the merged revision as the common ancestor of two entities instead of the one entity where both branches have split.

I have to say that my initial reaction to "killing concurrent editing" was that it's just FUD, but I took the time to re-read everything a few times over and realized that this

3. After the conflict resolution the user might need some more time to adjust things because of the merged 3.x branch.

actually means that the user aborted the merge while keeping the local changes they made, including possible resolved conflicts. Which, in turn, means that revision 21 from your initial graph is not a merge revision after all, it only has 20 as a parent, so a subsequent merge attempt will still pick 10 as the common ancestor.

I have asked for an in-depth reasoning about the merge revision tracking and using it as a common ancestor. Could you please provide this information, as I cannot find it, but it would be really useful here. Without it it is still hard for me to understand why/what for exactly do you need the merged revision beside "history" tracking.

That's precisely the answer, the merged revision is used for history tracking. As I've said before, all this work is based on the git storage model, which tracks two parents for a commit that results from a non-fast-forward merge (the -no-ff option).

I've also made a statement that it is wrong to assume the merged revision as a common ancestor, because the entity revision where both branches have split is the real and truly ancestor of both revision branches and not the merged one. Could you please comment on that?

Per the explanation above, I don't think there's any disagreement here. A merge revision has no special meaning in the graph, it will only be a common ancestor when two branches are split from it, which is not the case in the example you gave earlier.

Comment #64

amateescu commented 27 February 2019 at 16:49

Also, I forgot to mention that the ability to specify a parent revision that belongs to a different entity was already implemented a while ago (in #41) by removing the validation constraint for this case, and by allowing $entity_id to be NULL in EntityRevisionTreeHandlerInterface::getLowestCommonAncestorId($revision_1, $revision_2, $entity_id = NULL). So your use case for cross-entity parent tracking is fully supported by this patch.

Comment #65

berdir

German

Switzerland

commented 28 February 2019 at 07:41

+++ b/core/modules/system/system.install
@@ -2371,10 +2373,18 @@ function system_update_8702(&$sandbox = NULL) {
 
-    $sandbox[$entity_type_id]['current'] += count($revisions);
-    $sandbox[$entity_type_id]['finished'] = ($sandbox[$entity_type_id]['current'] == $sandbox[$entity_type_id]['max']) || empty($revisions);
+    $missing = $database->select($revision_table, 't')
+      ->condition("t.$revision_id_column", $sandbox[$entity_type_id]['current_id'], '>')
+      ->orderBy($revision_id_column, 'ASC')
+      ->countQuery()
+      ->execute()
+      ->fetchField();
+    $sandbox[$entity_type_id]['finished'] = $missing ? $sandbox[$entity_type_id]['progress'] / ($sandbox[$entity_type_id]['progress'] + (int) $missing) : 1;

hm, when saying that we don't need the total, I forgot about progress tracking. It might be more efficient to still get the max revision id and use that to calculate a rough progress and then we can also use count($revisions) > $step_size as an indicator that we are done. That wouldn't be exact, but the progress is pretty random anyway here with multiple entity types (I'm not even sure how you currently calculate that into the progress).

About ERR, my thoughts as an entity system co-maintainer are that it does not seem like a good fit for this scenario. Especially with the current approach of storing possibly two revision parents, that would be very inefficient with ERR because you would need a multi-cardinality field, which means a dedicated field table for every revisionable entity type. ERR stores a reference to an entity id + entity revision, so even if we would only store one parent, we would have an extra column with useless data. That is only for this type of revision parent reference, the term parent field is quite a different use case, exactly because it is a reference to a different entity. I think there's an open issue already to make that revisionable.

And my thoughts about getting ERR into core as an ERR co-maintainer are that I'm currently not very fond of that idea, for a few reasons:

* Getting non-trivial stable functionality (this wouldn't be a module) into core is a very time consuming process, I think I can speak for Miro too when I say that neither of us could spend the necessary time & effort on that. And I think it would be important to have us on board, see following reasons.

* Only a relatively small part of the ERR module would be viable for core IMHO, a large part of the complexity is in the composite system that is heavily used by Paragraphs. But there are still too many moving parts there and it would not make sense to have this in core IMHO. That means the module would still need to provide that functionality, which would require a considerable amount of time for refactoring it. And further improvements would also be hindered by having parts of it in core, most of which we'd likely have to override again.

* ERR currently does not have a widget included that actually makes sense for it, the default ER widgets don't really work because there is nothing that allows to select a specific revision and the default widgets also update to the latest revision every time you save. There are also other incomplete parts, like views integration.

I can't really comment right now the other discussion points like merging strategies and if we need to track multiple parents, would need to spend more time thinking and reading through plans for that.

Comment #66

amateescu commented 28 February 2019 at 09:06

Title:

WI: Add a revision_parent field to revisionable entities

» Add a revision_parent field to revisionable entities

Status	File	Size
new	2725523-66-combined.patch	80.16 KB
new	2725523-66-for-review.patch	57.32 KB
new	interdiff-66.txt	6.3 KB

hm, when saying that we don't need the total, I forgot about progress tracking. It might be more efficient to still get the max revision id and use that to calculate a rough progress and then we can also use count($revisions) > $step_size as an indicator that we are done.

The problem with getting the max revision ID before applying the changes to the revisions processed by the batch run is that it doesn't know about revisions that might have been created in the meantime, that's why @plach and I opted for getting the "final" answer at the end of the batch run for both SqlFieldableEntityTypeListenerTrait::copyData() and the update function from this patch.

You raised this question in #57.6, and with a small addition of checking whether the parent ID is NULL, I think the update function will work fine in the scenario where revisions are created while the update runs.

That wouldn't be exact, but the progress is pretty random anyway here with multiple entity types (I'm not even sure how you currently calculate that into the progress).

We don't calculate individual progress for each revision, only for entity types :) That's what system_update_8400() did as well, when moving revision metadata fields to the revision table.

Comment #67

amateescu commented 28 February 2019 at 21:55

Status	File	Size
new	2725523-67.patch	57.32 KB

#2975081: UpdatePathTestBase fails to re-initialize the test site (rebuild container, clear caches) after running the database updates is in, so this is now unblocked! Re-uploading the 'for review' patch.

Comment #68

catch

he/him

English

commented 5 March 2019 at 15:19

Just to say I'm +1 to a custom field type with two values for the different revision parents here.

ERR has a very specific use case (like linking commerce orders to the product revision so that the price reflects what was paid at the time, not later changes) so keeping ERR in contrib, and this field being very clearly for the special case we're dealing with will make that clear.

Comment #69

plach

he/him

Italian

Venezia

commented 6 March 2019 at 01:12

This looks very good! Personally I'd still prefer a slightly different approach to track parents: a new simple revision reference type (not an ERR!) and two base fields to track parent and merge parent, since parent and merge parent have very defined roles and we don’t foresee to have more parents in this specific context. Having this generic reference field would allow us to implement generic DAGs supporting multiple parents, and any component working with graphs could be adapted to work with both generic DAGs and this specific implementation in the same way. However, given that most people here seem to be onboard with the current solution, I'm not going to argue about it :)

+++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
@@ -322,6 +329,58 @@ public function updateLoadedRevisionId() {
+  public function getParentRevisionIds() {

What about adding dedicated accessor methods for the two parents? This would make the semantics of the consumer code more explicit/readable.

+++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
@@ -702,6 +702,18 @@ protected function doPreSave(EntityInterface $entity) {
+        $parent_revision_ids[0] = $this->getLatestRevisionId($entity->id());
+        $entity->setParentRevisionIds($parent_revision_ids);

I'm wondering whether we could perform this assignment in ContentEntityBase::preSave() and whether that would allow us to avoid the need for the enforced methods.

+++ b/core/lib/Drupal/Core/Entity/ContentEntityType.php
@@ -37,6 +37,12 @@ public function __construct($definition) {
+        'revision_tree' => 'Drupal\Core\Entity\Sql\SqlRevisionTreeHandler',

Why "tree" and not "graph"?
(here and everywhere else :)

+++ b/core/lib/Drupal/Core/Entity/EntityRevisionTreeHandlerInterface.php
@@ -0,0 +1,29 @@
+   *   The lowest common ancestor of the two revisions, or NULL if it couldn't

Can we mention that we are returning a revision ID?

+++ b/core/lib/Drupal/Core/Entity/EntityRevisionTreeHandlerInterface.php
@@ -0,0 +1,29 @@
+  public function getLowestCommonAncestorId($revision_1, $revision_2, $entity_id = NULL);

Can we add _id to the first two parameters to match the third one?

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionParentConstraintValidator.php
@@ -0,0 +1,85 @@
+    assert($entity instanceof RevisionableInterface);

Would it make sense to return early and skip validation if the parent IDs are not enforced?

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionParentConstraintValidator.php
@@ -0,0 +1,85 @@
+      if (count($parent_revision_ids) === 2 && $parent_revision_ids[0] == $parent_revision_ids[1]) {

Could we ever have a revision with ID 0? If so we'd have problems here when the merge parent is NULL.

+++ b/core/lib/Drupal/Core/Entity/Sql/SqlRevisionTreeHandler.php
@@ -0,0 +1,175 @@
+  public function __construct(EntityTypeInterface $entity_type, EntityTypeManagerInterface $entity_type_manager, EntityLastInstalledSchemaRepositoryInterface $entity_last_installed_schema_repository, Connection $database) {

Missing PHP doc.

+++ b/core/lib/Drupal/Core/Entity/Sql/SqlRevisionTreeHandler.php
@@ -0,0 +1,175 @@
+  protected function getAllRevisionParents($entity_id) {

Could we add an Info suffix to the method name given that we are not returning revision objects?

+++ b/core/modules/system/system.install
@@ -2239,3 +2240,160 @@ function system_update_8601() {
+  \Drupal::service('plugin.manager.field.field_type')->clearCachedDefinitions();

Is it safe to do this? I vaguely recall seeing issues caused by clearing caches in updates, since this will trigger hooks. I don't have alternative ideas though.

```
+++ b/core/modules/system/system.install
@@ -2239,3 +2240,160 @@ function system_update_8601() {
+function system_update_8702(&$sandbox = NULL) {
```
I was wondering why this is not done in a post update function leveraging entity query + entity save. Is that because we would be forced to process each entity individually?

+++ b/core/modules/system/system.install
@@ -2239,3 +2240,160 @@ function system_update_8601() {
+    $step_size = Settings::get('entity_update_batch_size', 50) * 10;

This would make it impossible to test the update with batch size 1. What about using a different setting altogether?

+++ b/core/tests/Drupal/KernelTests/Core/Entity/EntityRevisionsTest.php
@@ -261,4 +269,110 @@ public function testIsLatestAffectedRevisionTranslation() {
+    $this->assertArrayHasKey('revision_parent', $base_field_definitions);

Can add an assertion that a non revisionable entity type doesn't have the key?

+++ b/core/tests/Drupal/KernelTests/Core/Entity/EntityRevisionsTest.php
@@ -261,4 +269,110 @@ public function testIsLatestAffectedRevisionTranslation() {
+    // Check that a new revision that doesn't start from the latest one gets the
+    // proper revision parent ID, which is the active revision of the branch.
+    $revision_4 = $this->storage->createRevision($revision_2, FALSE);
+    $revision_4->save();
+    $this->assertEquals([$revision_3->getRevisionId()], $revision_4->getParentRevisionIds());

I initially thought this would be the way we'd start a new branch, but I can see how this would not work with reverted revisions. Do we already have a strategy for branching? Maybe we consider a new branch to be created when the new revision contexts differ from the parent revision ones?

+++ b/core/tests/Drupal/KernelTests/Core/Entity/EntityRevisionsTest.php
@@ -261,4 +269,110 @@ public function testIsLatestAffectedRevisionTranslation() {
+    /** @var \Drupal\entity_test\Entity\EntityTestRev $revision_1 */
+    \Drupal::currentUser()->setAccount($this->createUser());

These lines should be swapped.

+++ b/core/tests/Drupal/KernelTests/Core/Entity/EntityRevisionsTest.php
@@ -261,4 +269,110 @@ public function testIsLatestAffectedRevisionTranslation() {
+    $violations = $revision_2->revision_parent->validate();
...
+    $violations = $revision_3->revision_parent->validate();
...
+    $violations = $revision_4->revision_parent->validate();

Why parent? Aren't we supposed to validate the entity object itself before saving it?

Comment #70

amateescu commented 6 March 2019 at 16:20

Issue tags:

-Needs framework manager review

Status	File	Size
new	2725523-70.patch	58.1 KB
new	interdiff-70.txt	18.17 KB

Thanks for the review, @plach!

Re #69:

Sure thing, done.
Very good point! We can do that and get rid of the public "enforce" methods, but we need to keep the internal flag because otherwise there's no way to differentiate between the cases when the parent revision has a value set by default (populated on entity load) and when a user sets the same value manually.
For the same reason we have a "taxonomy tree" and not a "taxonomy graph" :) I'm not really opposed to changing it if people feel strongly about it..
Fixed.
Fixed.
I'm not sure. I think it's better to validate at all times because these values could be set by an automated process (e.g. a migration) and ensuring the integrity of graph is very important.
I don't think we can have 0 as a revision ID, but fixed anyway.
Fixed.
Fixed.
Those problems were caused by clearing the entity type definitions cache, and I also can't think of another way to do it.
Yes, and that would be very slow. Another update function that's very similar is system_update_8400() and that's one of the few major updates that didn't cause problems.
I'm not sure about this, I think that a new setting would be very specific to this particular update function in which we are doing way less database operations than a regular entity (re)save.
Sure, added.
The strategy for branching is already included in the patch, where we set the parent revision by default to be the latest revision, and we just have to change from "latest" to "active" in #3023194: [PP-2] Add parallel revisioning support, after we figure out translation branching and whatnot :)
Fixed.
Good catch, that's a leftover from before moving the new constraint from the field type level to the entity level.

Regarding the generic revision reference field, the fact that we're not introducing it here is a good thing IMO. In addition to @Berdir's and @catch's objections for having something like that in core, at least one additional major problem with it is that it would be incompatible with ERR from the start, because it wouldn't have a reference to the entity ID (the target_id property which ERR inherits from ER).

Comment #71

6 March 2019 at 17:19

Status:

Needs review

» Needs work

The last submitted patch, 70: 2725523-70.patch, failed testing. View results

Comment #72

amateescu commented 6 March 2019 at 21:39

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-72.patch	58.76 KB
new	interdiff-72.txt	6.21 KB

Forgot to remove a call to setParentRevisionEnforced(), which was removed in #70. Also cleaned up a few more things.

Comment #73

6 March 2019 at 21:42

Status:

Needs review

» Needs work

The last submitted patch, 72: 2725523-72.patch, failed testing. View results

Comment #74

amateescu commented 6 March 2019 at 21:46

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-74.patch	58.74 KB

Rerolled for #3037823: The system.theme.data key remains corrupted in state causing performance issues and is not used in >= 8.7 which added system_update_8701() that was also used by this patch.

Comment #75

6 March 2019 at 22:38

Status:

Needs review

» Needs work

The last submitted patch, 74: 2725523-74.patch, failed testing. View results

Comment #76

amateescu commented 6 March 2019 at 23:08

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-76.patch	62.34 KB
new	interdiff-76.txt	3.6 KB

Fixed those unit tests.

Comment #77

larowlan

🇦🇺🏝.au GMT+10

commented 7 March 2019 at 01:07

Initial review

+++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
@@ -322,6 +337,54 @@ public function updateLoadedRevisionId() {
+  public function setParentRevisionIds(array $revision_ids) {
+    // A revision can have a maximum of two parents.
+    assert(count($revision_ids) <= 2);
+
+    /** @var \Drupal\Core\Entity\ContentEntityTypeInterface $entity_type */
+    $entity_type = $this->getEntityType();
+    if ($entity_type->isRevisionable()) {
+      $revision_ids = array_values($revision_ids);
+      $revision_parents = [
+        'target_id' => isset($revision_ids[0]) ? $revision_ids[0] : NULL,
+        'merge_target_id' => isset($revision_ids[1]) ? $revision_ids[1] : NULL,
+      ];
+
+      $this->set($this->revisionParentKey, [$revision_parents]);
+    }
+

i think the API here would make more sense if it was explicit

public function setParentRevisionIds($target_id = NULL, $merge_target_id = NULL)

As it stands, its a bit of a mystery API

+++ b/core/lib/Drupal/Core/Entity/EntityRevisionTreeHandlerInterface.php
@@ -0,0 +1,29 @@
+   * @param int|string $entity_id
+   *   (optional) If the two revisions belong to the same entity, the entity ID
+   *   can be used for optimizing the process used to build the revision tree,
+   *   otherwise the entire revision history might be loaded into memory.
+   *   Defaults to NULL.

I'm interested to learn when two revisions might have a common ancestor but not be the same entity. Are we trying to support some sort of cloning here? Are we over complicating it by trying to support that this early?

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionParentConstraintValidator.php
@@ -0,0 +1,86 @@
+    $parent_revision_ids = array_filter(array_values((array) $entity->get($field_name)->get(0)->getValue()));
...
+      $revision_ids_to_check = array_filter($parent_revision_ids);

any reason to array_filter twice?

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionParentConstraintValidator.php
@@ -0,0 +1,86 @@
+      if (count($parent_revision_ids) === 2 && (string) $parent_revision_ids[0] == (string) $parent_revision_ids[1]) {

given the type coercion here, should we use ===?

```
+++ b/core/lib/Drupal/Core/Entity/RevisionableInterface.php
@@ -63,6 +63,33 @@ public function getRevisionId();
+  public function getParentRevisionId();
...
+  public function getMergeParentRevisionId();
...
+  public function setParentRevisionIds(array $revision_ids);
```
We are adding methods to an interface here, so that is an API change. As far as I can tell, it doesn't qualify under the 1:1 rule as there is no base class directly beneath this, instead there is one beneath a child interface (RevisionLogInterface) - is it feasible someone implemented RevisionableInterface but not RevisionLogInterface and hence did not use the RevisionablyContentEntityBase?

+++ b/core/lib/Drupal/Core/Entity/Sql/SqlRevisionTreeHandler.php
@@ -0,0 +1,187 @@
+    $query = $this->database->select($revision_tree_table, 't');
+    if ($entity_id) {
+      $query->condition($id_column, $entity_id, '=');
+    }
+    $query->addField('t', $revision_id_column, 'id');
+    $query->addField('t', $parent_revision_id_column, 'parent_id');
+    $query->addField('t', $merge_revision_id_column, 'merge_parent_id');
+    $query->orderBy($revision_id_column, 'ASC');

can you elaborate (perhaps in a comment) as to why this can't be done with an entity query if these are just entity fields?

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionTreeItem.php
@@ -0,0 +1,88 @@
+      'indexes' => [
+        'target_id' => ['target_id'],
+      ],

no index on merge_target_id? we use it in where clauses so I would expect to see one.

+++ b/core/modules/system/system.install
@@ -2247,3 +2248,169 @@ function system_update_8701() {
+function system_update_8702() {
+  $entity_definition_update_manager = \Drupal::entityDefinitionUpdateManager();
+
+  // Get a list of revisionable entity types.
+  /** @var \Drupal\Core\Entity\EntityTypeInterface[] $definitions */
+  $definitions = array_filter($entity_definition_update_manager->getEntityTypes(), function (EntityTypeInterface $entity_type) {
+    return $entity_type->isRevisionable();
+  });

should this be done in a batch using the sandbox?

Comment #78

amateescu commented 7 March 2019 at 11:20

Status	File	Size
new	2725523-78.patch	62.09 KB
new	interdiff-78.txt	8.7 KB

Thanks for the review!

Re #77:

Very good point and suggestion, done :)
Yes, cloning is exactly what we're supporting there, per @hchonov's (very insistent) arguments. IMO, it's not complicating things that much, we just need to allow the $entity_id parameter to be NULL even though core won't use this feature.
No reason at all, fixed.
Sure thing, done.
We added many new methods to RevisionableInterface, I think the most recent example is #2891215: Add a way to track whether a revision was default when originally created. The expectation here is that people should extend from ContentEntityBase.
That's because entity queries always return a set of revision and entity IDs, never field values. You could get field values from aggregate entity queries, but that's not useful in this instance.
We never use merge_target_id in WHERE clauses (just in SELECT ones), that's why it doesn't have an index.
All update functions from D8 that added a field to all content entity types were done without using the batch sandbox :)

Comment #79

amateescu commented 7 March 2019 at 16:03

Status	File	Size
new	2725523-79.patch	61.54 KB
new	interdiff-79.txt	1.32 KB

@Berdir did *a lot* of manual testing for the update function from this patch, and the conclusions are:

the 10x multiplier for entity_update_batch_size is definitely worth it
removing the count query at the end of each batch run provides a ~30% performance increase; the side effect is that revisions that might be written during the last batch run won't be updated, but on the other hand there might be other entity schema updates that need to run as well, so there's no guarantee that any entity save would work at all while running db updates
the batch logic was broken but it has now received a very rigorous real-world testing on a database with 800K node revisions and many paragraphs as well

Comment #80

amateescu commented 7 March 2019 at 16:05

Saving credits.

Comment #81

dixon_

he/him

English

France

commented 7 March 2019 at 16:20

Status:

Needs review

» Reviewed & tested by the community

I've reviewed the patch and don't have anything to add, apart from what's already been addressed by @plach, @larowlan, @Berdir et al. Thanks! I think this is good to go!

I've done a fair amount of manual testing of the patch itself, gone through a couple of scenarios. Most importantly, I ran the upgrade on an existing site with some custom bundles. Here's one the many scenarios I went through:

Installed a site and added a custom block bundle
Added some block content
Applied the patch and ran the database updates (see below screenshots)
Observed the difference in database storage before and after the upgrade (see below screenshots)
Continued creating content successfully

I've also taken this patch for a thorough ride in combination with Workspaces module + #2880152: Convert custom menu links to be revisionable + #2880149: Convert taxonomy terms to be revisionable and everything is working as expected!

Comment #82

dixon_

he/him

English

France

commented 7 March 2019 at 16:28

Cross posted with the last patch from @amateescu. The interdiff looks great as well!

Comment #83

hchonov

Bulgarian

Burgas

commented 7 March 2019 at 16:30

Status	File	Size
new	2725523-82-FAIL.patch	62.5 KB

Personally I'd still prefer a slightly different approach to track parents: a new simple revision reference type (not an ERR!) and two base fields to track parent and merge parent, since parent and merge parent have very defined roles and we don’t foresee to have more parents in this specific context. Having this generic reference field would allow us to implement generic DAGs supporting multiple parents, and any component working with graphs could be adapted to work with both generic DAGs and this specific implementation in the same way.

@plach++
I completely support this.

I am fine with having only two properties on a simplified ERR - entity (computed) and target_revision_id (not target_id because it is super confusing to refer to a revision id simply by ID). This gives the advantage of making it easy to load the parent revision and cache it on the field item.
Additionally the column in the shared table will have a clear naming - revision_parent, merged_revision vs revsion_parent__target_id, revsion_paret__merge_target_id. It also makes it possible to listen to changes on the fields in ContentEntityInterface::onChange, which is impossible to address for only one of the properties if it is a combined field. The revision parent and the merged revision are two units, completely independent of each other and putting them together disregards the separation of concerns design principle.

What problems do you see about a simplified ERR field type without widgets? Even if we don't want ERR then why not having 2 simple integer fields?

The current issue mixes 3 different concepts which could be perfectly discussed in isolation, especially given the fact that we don't reach full agreement on all the aspects - revision parent, merged revision and least common ancestor. This separation would make it easier to review and discuss instead of mixing it all together.

We've always needed the revision parent reference and it is not only needed because of the revision branches, while the merged revision is needed only because of the revision branches.

Per the explanation above, I don't think there's any disagreement here. A merge revision has no special meaning in the graph, it will only be a common ancestor when two branches are split from it, which is not the case in the example you gave earlier.

Then why is the attached test failing?

Comment #84

amateescu commented 7 March 2019 at 17:10

Status	File	Size
new	2725523-79.patch	61.54 KB

Then why is the attached test failing?

Because the expectation of the test case is wrong. I've reconstructed your graph, it looks like this:

    // 10 -- 20 -- 21 -- 22
    //   \       /
    //    \     /
    //     \   /
    //      30 -- 31

And the test expects the LCA of 22 and 31 to be 10, when it's 30 instead.

Re-uploading the patch from #79.

Comment #85

hchonov

Bulgarian

Burgas

commented 7 March 2019 at 17:15

No, the expectations are not wrong. You've said :

A merge revision has no special meaning in the graph, it will only be a common ancestor when two branches are split from it

10 should be the least common ancestor in that test. I've already explained pretty detailed why it is wrong to assume anything else than 10 to be the least common ancestor.

Comment #86

dixon_

he/him

English

France

commented 7 March 2019 at 17:45

Status	File	Size
new	Screen Shot 2019-03-07 at 17.42.03.png	119.65 KB

@hchonov I'm afraid your definition of Lowest Common Ancestor is wrong. In the graph above, node 30 is most definitely the LCA because it's the lowest ancestor, i.e. furthest from the root.

For anything useful relying on LCA you need the node furthest from the root for things like diff, merge etc. You want as much history as possible in the LCA.

LCA definition

Comment #87

7 March 2019 at 18:00

The last submitted patch, 79: 2725523-79.patch, failed testing. View results

Comment #88

7 March 2019 at 18:00

The last submitted patch, 83: 2725523-82-FAIL.patch, failed testing. View results

Comment #89

7 March 2019 at 18:00

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 84: 2725523-79.patch, failed testing. View results

Comment #90

amateescu commented 7 March 2019 at 18:06

Status:

Needs work

» Reviewed & tested by the community

Testbot is finally back!

Comment #91

7 March 2019 at 18:40

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 84: 2725523-79.patch, failed testing. View results

Comment #92

7 March 2019 at 19:46

dixon_ credited rakesh_verma.

Comment #93

dixon_

he/him

English

France

commented 7 March 2019 at 19:09

I'm adding @rakesh_verma to the credits for this issue. While he hasn't written any code directly for this issue, he was a student in the Google Summer of Code Project that was instrumental in informing the general direction that we've taken here.

Some background on Rakesh's work can be found here:

Comment #94

dixon_

he/him

English

France

commented 7 March 2019 at 21:05

Status:

Needs work

» Reviewed & tested by the community

Patch is green. Back to RTBC.

Comment #95

7 March 2019 at 22:11

The last submitted patch, 83: 2725523-82-FAIL.patch, failed testing. View results

Comment #96

plach

he/him

Italian

Venezia

commented 8 March 2019 at 00:40

+++ b/core/lib/Drupal/Core/Entity/RevisionableInterface.php
@@ -63,6 +63,34 @@ public function getRevisionId();
+  public function getParentRevisionId();
...
+  public function getMergeParentRevisionId();
...
+  public function setParentRevisionIds($target_id = NULL, $merge_target_id = NULL);

Sorry, I thought it was clear from my previous review, but I was suggesting to have two separate setters to match the two getters.

Comment #97

amateescu commented 8 March 2019 at 08:55

Status	File	Size
new	2725523-97.patch	61.72 KB
new	interdiff-97.txt	8.82 KB

@plach, works for me.

Comment #98

8 March 2019 at 09:50

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 97: 2725523-97.patch, failed testing. View results

Comment #99

amateescu commented 8 March 2019 at 09:58

Status:

Needs work

» Reviewed & tested by the community

That was an unrelated random test failure in LayoutBuilderDisableInteractionsTest, queued a re-test.

Comment #100

amateescu commented 8 March 2019 at 10:54

@hchonov:

Additionally the column in the shared table will have a clear naming - revision_parent, merged_revision vs revsion_parent__target_id, revsion_paret__merge_target_id.

The name of the columns in shared tables have no impact on the functionality of this feature.

It also makes it possible to listen to changes on the fields in ContentEntityInterface::onChange, which is impossible to address for only one of the properties if it is a combined field.

It is possible to check which of the two properties of the field has been changed in ContentEntityInterface::onChange() by getting the current field item values with $this->get(...) and comparing them with what's stored in $this->values.

The revision parent and the merged revision are two units, completely independent of each other and putting them together disregards the separation of concerns design principle.

The revision parent and the merged revision are two edges that belong to a vertex of the graph, so it makes sense for them to be stored together, very much like the taxonomy DAG which uses a multi-valued field to accomplish the same goal. In the case of the revision graph, we know in advance that we can have a maximum of two edges for a vertex, so we can optimize its storage by grouping them in two properties/columns of a single field item. Therefore I don't think this disregards the separation of concerns design principle at all.

What problems do you see about a simplified ERR field type without widgets?

This was answered more than once by @Berdir, @catch and myself.

Even if we don't want ERR then why not having 2 simple integer fields?

See the answer above on why I think they should be stored together.

We've always needed the revision parent reference and it is not only needed because of the revision branches, while the merged revision is needed only because of the revision branches.

The ability to have revision branches is a desired feature for #3023194: [PP-2] Add parallel revisioning support, which means we need to track merged revisions.

Comment #101

tstoeckler

he/him

German

Essen, Germany

commented 8 March 2019 at 13:42

Status:	Reviewed & tested by the community	» Needs work
Issue tags:		+Needs issue summary update

After the discussion we had with @kattekrab, @catch, @plach, @dixon_, @amateescu, @Berdir, @hchonov @amateescu to elaborate a bit on why I prefer a different approach. Forming my thoughts still a bit, will write something up in a bit. In looking a bit at this issue, I did notice however, that the issue summary is very sparse and especially given the fact that we had so much discussion I think the current approach should be explained a bit more in detail. Marking needs work for that.

Comment #102

dixon_

he/him

English

France

commented 8 March 2019 at 15:25

Issue summary:

View changes

Comment #103

hchonov

Bulgarian

Burgas

commented 8 March 2019 at 15:55

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-103-two-fields-approach.patch	64.19 KB
new	interdiff-1-vs-2-fields.txt	19.86 KB

And here is the alternative with 2 fields. I haven't added the computed entity property to the new RevisionReference field in order to keep the changes as minimal as possible. We could add it in a follow-up issue.

Comment #104

catch

he/him

English

commented 8 March 2019 at 16:12

OK we discussed this on a call this morning, trying to summarise:

1. The majority of the people on the call do not have a strong opinion between two fields vs. a two column field.

2. The advantage of a two column field is it keeps the DAG information in 'one place'.

3. The advantage of two fields is that the field type is much less customized.

Between 2 and 3, in both cases the revision is semantically different information (i.e not quite the same as taxonomy parents, but it is a multiple parent relationship), so most of the choices between them are subjective.

4. @hchonov brought up things like checking if the field has changed or is empty - to be honest I don't see much use case for that with metadata fields. Similarly swapping out the parent revision field with something else (like pointing to a different entity instead of a different revision) - to me this seems like a completely different use case that should use a different field altogether - entity revisions are a specific concept to 'parent revision' should point to one.

5. I was concerned that switching to a two field approach would invalidate the upgrade path and performance testing that's already been done on this issue, however from the interdiff above, that's not the case, the upgrade path has not changed very much (nor the patch more generally).

This is a very partial account from quite a long phone call.

One additional question that I have is given the large batch size on the update, do the upgrade path tests here definitely cover multiple batches - this is an issue we had with taxonomy parents. I did not try to verify this yet so apologies if it's obvious in the patch.

Comment #105

alexpott

he/they

English

🇪🇺🌍

commented 8 March 2019 at 16:17

One additional question that I have is given the large batch size on the update, do the upgrade path tests here definitely cover multiple batches - this is an issue we had with taxonomy parents. I did not try to verify this yet so apologies if it's obvious in the patch.

We committed #3007606: Force all update path tests to only run one entity per batch 9 hours ago to help this exact case :)

Comment #106

catch

he/him

English

commented 8 March 2019 at 16:20

+++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
@@ -508,10 +516,7 @@
 
     // Set the parent revision ID automatically when we create a new revision.
     if ($this->isNewRevision() && !$this->enforceRevisionParent) {
-      // By default, the parent revision is the active revision of the branch.
-      // @todo Use the active revision ID once it is available.
-      //   @see https://www.drupal.org/project/drupal/issues/3023194
-      $this->setParentRevisionId($storage->getLatestRevisionId($this->id()));
+      $this->setParentRevisionId($this->getLoadedRevisionId());
     }

Why is this change necessary?

Comment #107

hchonov

Bulgarian

Burgas

commented 8 March 2019 at 16:24

Because I think that the revision parent of a new revision is the old revision.

When we have revision ids 1,2,3,4 and 4 is the default and we load revision 2 and save it in a new default revision then the revision parent of the new revision is 2 and not 4.

Comment #108

amateescu commented 8 March 2019 at 16:29

Status:

Needs review

» Needs work

When we have revision ids 1,2,3,4 and 4 is the default and we load revision 2 and save it in a new default revision then the revision parent of the new revision is 2 and not 4.

This introduces a hardcoded way of branching that the previous patch did not, which I think needs to be properly discussed in #3023194: [PP-2] Add parallel revisioning support.

Comment #109

plach

he/him

Italian

Venezia

commented 8 March 2019 at 16:33

The interdiff is looking good to me (of course :), however #108 is exactly what I was about to post, that change is definitely follow-up material.

Comment #110

hchonov

Bulgarian

Burgas

commented 8 March 2019 at 16:35

This introduces a hardcoded way of branching that the previous patch did not, which I think needs to be properly discussed in #3023194: [PP-2] Add parallel revisioning support.

This introduces a way of setting the correct parent. You create a new revision out of a currently loaded entity object and this object represents the parent and not the newest entity revision from the storage.

Comment #111

catch

he/him

English

commented 8 March 2019 at 16:36

That decision has no relationship to whether we have one field or two fields, so I'd prefer to discuss it in a follow-up.

If revisions 1, 2, 3, and 4 are all default revisions, then even though revision #5 is using the exact same content as #2, the difference on the live site will be between #3 and #4 and #4 and #5. I could also create a new revision that happens to have exactly the same content as revision #2. Is this really semantically different?

Comment #112

dixon_

he/him

English

France

commented 8 March 2019 at 16:42

Issue summary:

View changes

I updated the issue summary to better describe the specific problem we are trying to solve in this issue. I also added references to the long standing plans of the initiative to make the context more explicit.

Comment #113

hchonov

Bulgarian

Burgas

commented 8 March 2019 at 16:44

If revisions 1, 2, 3, and 4 are all default revisions, then even though revision #5 is using the exact same content as #2, the difference on the live site will be between #3 and #4 and #4 and #5. I could also create a new revision that happens to have exactly the same content as revision #2. Is this really semantically different?

It is not about the diff between revisions, but about how the revision graph develops. In this case the revision has been created out of 2 and not out of 4.

I don't think that it is correct to have the inconsistency, which saves an incorrect information:

$entity_original = $storage->loadRevision(2);
$entity = $storage->loadRevision(2);
$entity->setNewRevision();
$entity->save();
$entity->revision_parent->target_revision_id != $entity_original->getRevisionId()

Comment #114

hchonov

Bulgarian

Burgas

commented 8 March 2019 at 16:52

Additionally if we save 2 as the parent then the following will function correctly, while it will not if we use 4 as parent:

$entity->setNewRevision();
$entity->save();
$entity->original = $entity->revision_parent->entity;
$entity->hasTranslationChanges() => should emit FALSE, because there are no changes.

Comment #115

hchonov

Bulgarian

Burgas

commented 8 March 2019 at 16:58

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-115.patch	64.19 KB
new	interdiff-103-115.txt	906 bytes

This should be fixing the test fails.

Comment #116

hchonov

Bulgarian

Burgas

commented 8 March 2019 at 17:08

+++ b/core/lib/Drupal/Core/Entity/Plugin/Validation/Constraint/ValidRevisionParentConstraintValidator.php
@@ -0,0 +1,86 @@
+      $revisions = $storage->loadMultipleRevisions($revision_ids_to_check);

Would it be more efficient if we execute an entity query instead of loading the entities? Here we need to know only if a revision ID exists or not.

Comment #117

tstoeckler

he/him

German

Essen, Germany

commented 8 March 2019 at 17:17

Issue tags:

+Needs change record

So the reason why I preferred to have two separate fields is that I think tracking the revision parent is really important even without any merging of revisions. I.e. if every revision only has a single parent it is worthwhile to track each revision's parent.

The use-case is the current Revert UI we have in core and it is very much related to the discussion started in #106. When you revert to an old revision we take that old revision and save it as a new revision. It is currently not possible to reconstruct this history after the fact (unless you inspect the log message...). The revision parent field makes this possible.

Therefore I like having two separate fields because then I can remove the merged revision ID from my entity types if I do not need that feature. I am not proposing any facility for this removal to be added to core but I can still alter the field out if I want to. This would have been a bit trickier as a custom field type.

Additionally, I think having a revision reference field in core is very very useful in general. In fact (in a follow-up!!!) we can explore using that field (or possibly a more powerful subclass) for content moderation state's entity revision ID reference field.

Now addressing the discussion in #106 and following:
On process: I'm not sure how this can be discussed in a follow-up if changing the implemenation necessarily constitutes a behavior change at that point? I.e. if a different behavior occurs when reverting revisions in the UI how can we discuss and change that post-alpha?
On the issue itself: I was quite surprised to see #54 change from getLoadedRevisionId() to using the latest revision and then glad to see it changed back by @hchonov. Given the above use-case of reverting using the UI I really cannot understand how one can come to the conclusion that the revision I reverted to is not the parent of the newly created revision. I would really like to hear further reasoning on that.

I already stated my opinion on the process of this issue in the call so I will not repeat that here, but one addendum now that I have reviewed the patch: I really think it is unusual to have a patch with such a broad scope. As in fact @hchonov pointed out above we are tackling three (very closely related, albeit distinct) topics in one patch which is very unusual to say the least.

Thanks @dixon_ for the issue summary update. I think it still could into a bit more detail, but maybe others can chime in on that as well. Also, this will certainly need a change record.

Comment #118

plach

he/him

Italian

Venezia

commented 8 March 2019 at 17:44

@tstoeckler:

Thanks for the additional insights, I won't reply to what #117 brought up yet.

Regarding process:

I really think it is unusual to have a patch with such a broad scope. As in fact @hchonov pointed out above we are tackling three (very closely related, albeit distinct) topics in one patch which is very unusual to say the least.

Revision parent and merge revision parent were obviously tackled together because the fact that they could have been tackled separately has been a matter of discussion until now. @amateescu explicitly asked me whether the LCA stuff should have been a separate issue (don't remember whether that happened here or in Slack) and I told him that addressing that in this issue could have provided some validation and implicit test coverage around the API being introduced. I apologize if this introduced additional confusion and I hope this helps clarifying the scope of the patch, even if you may disagree with it. I would not be opposed to splitting the LCA part out, but in practical terms I don't think it would make a big difference now that the patch has been reviewed several times.

Comment #119

hchonov

Bulgarian

Burgas

commented 8 March 2019 at 17:52

Status	File	Size
new	2725523-119.patch	65.25 KB
new	interdiff-115-119.txt	13.37 KB

Some more test adjustments.

Comment #120

8 March 2019 at 17:54

The last submitted patch, 115: 2725523-115.patch, failed testing. View results

Comment #121

8 March 2019 at 18:47

Status:

Needs review

» Needs work

The last submitted patch, 119: 2725523-119.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #122

8 March 2019 at 18:47

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #123

plach

he/him

Italian

Venezia

commented 9 March 2019 at 01:00

Status	File	Size
new	2725523-123.patch	65.11 KB
new	interdiff.txt	1.23 KB

This should fix the failures.

Comment #124

plach

he/him

Italian

Venezia

commented 9 March 2019 at 01:00

Status:

Needs work

» Needs review

Comment #125

plach

he/him

Italian

Venezia

commented 9 March 2019 at 01:10

Status	File	Size
new	interdiff.txt	893 bytes
new	2725523-125.patch	65.34 KB

And this addresses #106 and #111, so now we have all the options on the table.

Comment #126

9 March 2019 at 02:02

Status:

Needs review

» Needs work

The last submitted patch, 125: 2725523-125.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #127

plach

he/him

Italian

Venezia

commented 9 March 2019 at 02:08

Status	File	Size
new	2725523-127.patch	65.34 KB
new	interdiff.txt	882 bytes

55 files were hidden/shown/deleted

Status	File	Size
hidden	2725523.patch	30.04 KB
hidden	2725523-20.patch	38.12 KB
hidden	interdiff-20.txt	11.12 KB
hidden	2725523-22.patch	48.77 KB
hidden	interdiff-22.txt	10.93 KB
hidden	2725523-23.patch	48.78 KB
hidden	interdiff-23.txt	702 bytes
hidden	2725523-25.patch	49.43 KB
hidden	interdiff-25.txt	666 bytes
hidden	2725523-35.patch	49.17 KB
hidden	interdiff-35.txt	5.39 KB
hidden	2725523-38.patch	49.29 KB
hidden	interdiff-38.txt	1.66 KB
hidden	2725523-41.patch	53.4 KB
hidden	interdiff-41.txt	29.19 KB
hidden	2725523-43.patch	52.67 KB
hidden	interdiff-43.txt	13.29 KB
hidden	2725523-47.patch	52.53 KB
hidden	2725523-51.patch	52.16 KB
hidden	interdiff-51.txt	29.99 KB
hidden	2725523-54-combined.patch	74.17 KB
hidden	2725523-54-for-review.patch	51.76 KB
hidden	interdiff-54.txt	6.41 KB
hidden	2725523-56-combined.patch	78.65 KB
hidden	2725523-56-for-review.patch	56.24 KB
hidden	interdiff-56.txt	7.61 KB
hidden	2725523-61-combined.patch	80.08 KB
hidden	2725523-61-for-review.patch	57.24 KB
hidden	interdiff-61.txt	26.88 KB
hidden	2725523-66-combined.patch	80.16 KB
hidden	2725523-66-for-review.patch	57.32 KB
hidden	interdiff-66.txt	6.3 KB
hidden	2725523-67.patch	57.32 KB
hidden	2725523-70.patch	58.1 KB
hidden	interdiff-70.txt	18.17 KB
hidden	2725523-72.patch	58.76 KB
hidden	interdiff-72.txt	6.21 KB
hidden	2725523-74.patch	58.74 KB
hidden	2725523-76.patch	62.34 KB
hidden	interdiff-76.txt	3.6 KB
hidden	2725523-78.patch	62.09 KB
hidden	interdiff-78.txt	8.7 KB
hidden	2725523-79.patch	61.54 KB
hidden	interdiff-79.txt	1.32 KB
hidden	2725523-82-FAIL.patch	62.5 KB
hidden	2725523-79.patch	61.54 KB
hidden	Screen Shot 2019-03-07 at 17.42.03.png	119.65 KB
hidden	2725523-103-two-fields-approach.patch	64.19 KB
hidden	interdiff-1-vs-2-fields.txt	19.86 KB
hidden	2725523-115.patch	64.19 KB
hidden	interdiff-103-115.txt	906 bytes
hidden	2725523-119.patch	65.25 KB
hidden	interdiff-115-119.txt	13.37 KB
hidden	interdiff.txt	893 bytes
hidden	2725523-125.patch	65.34 KB

Of course test coverage needs to be updated as well.

Comment #128

plach

he/him

Italian

Venezia

commented 9 March 2019 at 02:09

Status:

Needs work

» Needs review

Comment #129

amateescu commented 9 March 2019 at 14:23

@tstoeckler, thanks for sharing your feedback, it is useful to have more points of view written down in a clear manner so we can have a healthy debate.

As far as I see, there are three major discussion topics:

Two competing architectural designs for this feature: one field vs. two fields
Branching strategy for the revision graph
The process of this issue, specifically the fact that it handles three seemingly distinct features in one patch

I'll try to address them in reverse order.

3. Process

@plach already replied to this in #118, but I'll provide a quick summary as well:

having both a parent revision and a merge revision is a central part of the design for the revision graph feature that we want to introduce here (see the issue summary of #3023194: [PP-2] Add parallel revisioning support for additional context), so it is natural for them to tackled together
including a generic algorithm for finding the LCA of two revisions was discussed with a framework and Entity API maintainer, and it was deemed useful for additional validation of the proposed solution of this issue
therefore I hope we can agree that there are only two distinct features, and the second one can be moved to a followup if it's really contentious, even though the LCA algorithm is generic and it doesn't make any assumption on the branching strategy of the graph
I'm not sure what was the reason for emphasizing "which is very unusual to say the least" for this point. There is no conspiracy here, just well-intentioned people trying to do their best. In any case, it was totally uncalled for.

2. Branching strategy

The reason for the change in #54 is that this patch is intentionally neutral in regards to branching strategies, and it has a very specific purpose: to keep the current behavior from HEAD which has a linear revision history (by not tracking the revision history at all).

Keeping this patch unopinionated allows us to properly discuss branching strategies and their merits or shortcomings in #3023194: [PP-2] Add parallel revisioning support.

1. One field vs two fields

I think this can only be addressed if we lay out the "pros" and "cons" lists for each of them.

One field using a custom field type

pros:

conceptually correct design for storing a DAG in a Drupal field (similar in nature to how we store the taxonomy graph): provides an optimized storage for a multi-cardinality field with known constraints (cardiniality <= 2)
lightweight and maintenance free: minimal code implementation that essentially provides two integer properties, with no other additional features or complexity planned
performance: it doesn't require two separate base field definitions

cons:

not reusable: this is by design, see the 'lightweight' point above
a bit harder (but very much possible, see #100) to see which of the two properties has been changed in ContentEntityInterface::onChange()
harder to remove the merge parent if it's not needed for a specific project: however, note that the API methods for interacting with the merge parent can not be removed from the interface, and as soon as some code calls them, they expects to have "something" to interact with

Two fields with a minimal implementation of a revision reference field type

pros:

reusable: I will point out below why this might not be true
lightweight but not necessarily maintenance free, also described below

cons:

it is incompatible with the ERR implementation from contrib, because it lacks the target_id property/column among other things
stores the two edges of a graph's node in two separate fields. This is specifically for @tstoeckler's point from the call that this issue does not strive for a proper architecture as we try to do in Drupal, while I very much think that it does, as per the first "pro" of the one field approach.
less performant: it requires two base field definitions
heavily debated feature set, see below

Now, let's talk about the first point from the "con" list of the revision reference field type because I believe that's the most important one. As it stands from the latest patch in #127, there are two proposals to alleviate this point:

- add an optional target_id property (which would be activated by a field storage setting) and a target_type setting, neither of them being required (nor desired) features for the revision graph that we are introducing here. If I remember correctly, @tstoeckler expressed concerns in the past about varying the columns (or properties?) of a field type based on a field setting. I can't recall the specific context nor the exact words from his comment on that issue, but I can try to look it up if needed.

- extend the minimal implementation of the field type from the patch in #127 with another class, therefore creating an additional field type that would provide the ability to reference revisions from a different entity type.

Both of these options have been opposed at length by @Berdir in #65 and @catch in #68. Also, both options will provide some functionality currently provided by ERR in contrib, but not all of it (the "compound entity" concept), and we will end up with either 2 or 3 different revision reference field types (1 or 2 in core and 1 in contrib), neither of them being feature-equivalent.

This is why I think that the minimal implementation of a generic revision reference field type from #127 introduces a maintenance risk at least, because the (future) feature set and complexity of this field type are largely unknown, heavily debated and potentially not even desired for Drupal core.

The third option would be to leave the minimal implementation as-is, which removes the reusability point from its "pro" list and leaves only the lightweight one, which is also provided by the custom field type approach.

Please feel free to correct me if I missed something.

Comment #130

hchonov

Bulgarian

Burgas

commented 9 March 2019 at 14:27

+++ b/core/lib/Drupal/Core/Field/Plugin/Field/FieldType/RevisionReferenceItem.php
@@ -19,21 +19,11 @@
-  public static function defaultStorageSettings() {
-    return [
-        'target_type' => NULL,
-      ] + parent::defaultStorageSettings();
-  }
...
-    $settings = $field_definition->getSettings();
-    $target_type = \Drupal::entityTypeManager()->getDefinition($settings['target_type']);
+    $target_type = \Drupal::entityTypeManager()->getDefinition($field_definition->getTargetEntityTypeId());

What is the reason for this change?

+++ b/core/lib/Drupal/Core/Entity/ContentEntityBase.php
@@ -516,7 +516,10 @@ public function preSave(EntityStorageInterface $storage) {
-      $this->setParentRevisionId($this->getLoadedRevisionId());
...
+      $this->setParentRevisionId($storage->getLatestRevisionId($this->id()));

This manipulates history information in a wrong way. Why do you want to always define the active revision as the parent revision? In this case when you don't have revision branches, but only the default revision then you will only have linear revisioning and you always can assume that parent revision = getLoadedRevsion() - 1. In this case this definition of parent revision makes no sense and is simply confusing. Why do we want to have it then at all?

If you are working only with a single branch, then the parent is the revision from which you've created the new one and not the last one, otherwise you lose the revision graph information and the tracking of what has exactly happened. When you have revisions 1,2,3,4 and then a user decides to revert to 2 and from there work further, then the reason is that the user decided to throw away the information/work from 3 and 4 and start a new work based on 2. This is very similar to a revision branch if you draw the graph out of it and look at it it becomes clear that the revision parent of 5 is 2 and not 4.

1
|
2
|  \
3  5
|
4

If you are working between different revision branches and merge an entity from branch B into Branch A and use the current active revision of branch A as a base, then the parent will be that very same active revision and the merged one will be the one from B. If I understand it correctly this is the behaviour you are trying to achieve.

The approach that we are proposing works for both cases.

Comment #131

dixon_

he/him

English

France

commented 10 March 2019 at 20:41

When faced with different architectural approaches like this, I like to take a step back and reflect on the what and why this issue exists.

What are we building? We're building a critical extension of our revision system. Drupal is a CMS, and there are few systems that are more important in a CMS than its revision system. We're building an explicit API that will enable features as per the initiative, see #2867707: WI: Phase H: Conflict management and local workspace merging support.

Why are we building it? Ultimately to improve the experience for editors and authors, through features such as; collaborative and parallel revisioning with workspaces, conflict management with more options for reverting and restoring revisions.

So, there are some clear reasons and goals for what we are building here. And it's a critical system, which means that we must look beyond just the current use cases in core (such as the Revert UI).

In a scenario like this, I will argue that the type of architecture to look for should be purpose built, lightweight and easy to maintain in order to maximise for robustness and performance. We should not optimise for the ability to remove or change the system.

When trying to objectively evaluate the architectural options myself, this sticks out to me from @amateescus evaluation:

Two fields with a minimal implementation of a revision reference field type

[...]

cons:

stores the two edges of a graph's node in two separate fields. [...]

less performant: it requires two base field definitions

heavily debated feature set, see below

IMO, the architecture with two fields does not optimise for robustness and performance, but for modularity instead. While that's generally a good goal to strive for, given the what and why of this particular issue I think it's the wrong trade off to make.

The patch from #97, implementing a single field, seems to best achieve the goals of this issue.

Comment #132

tstoeckler

he/him

German

Essen, Germany

commented 10 March 2019 at 15:57

Re #129:

(Only replying where I have anything to add. I have a different opinion than you on a bunch of things brought up, but I don't think it's good use of all of our time to rehash every little detail of every point.)

I'm not sure what was the reason for emphasizing "which is very unusual to say the least" for this point. There is no conspiracy here, just well-intentioned people trying to do their best. In any case, it was totally uncalled for.

I am a bit baffled by this point, to be honest. By saying this is unusual, I meant the literal meaning of that word, I don't know how else to explain that. The scope of issues has long been a very contentious issue in core development and it is my opinion that this issue was handled in a different way most other issues are, i.e. in a way that is not usual. You may of course disagree with my assessment but I don't know why stating that opinion was in any way "uncalled for". I did not in any way allude to any conspiracy or anything of the like (nor do I believe there is one, just to to make this 100% clear). I wouldn't go so far as to say it is "uncalled for" but I find the fact that you make these broad insinuations based (solely?) on my usage of emphasis is a bit unnecessary. I may have been able to express what I meant with better wording, but I would appreciate if we can focus on the points of discussion at hand and not nitpick wording or markup.

The reason for the change in #54 is that this patch is intentionally neutral in regards to branching strategies, and it has a very specific purpose: to keep the current behavior from HEAD which has a linear revision history (by not tracking the revision history at all).

Keeping this patch unopinionated allows us to properly discuss branching strategies and their merits or shortcomings in #3023194: [PP-2] Add parallel revisioning support

I don't understand what "neutral" means in this case, we have to fill the parent revision with something, and we need to decide on a strategy for doing that, I don't see how choosing one strategy over another is neural. Same goes for considering this approach "unopinionated".

I also genuinely don't understand how you can consider this as keeping the current behavior from HEAD as this is an entirely new feature. HEAD has absolutely no revision history at all, which is what we are fixing in this issue.

As stated above, I furthermore don't see how changing the strategy after we have introduced it here is acceptable in terms of BC. Once we have now decided what the parent is of a reverted revision, how can changing that behavior not be considered an API change. For that reason I don't think it's valid to defer this to a follow-up.

stores the two edges of a graph's node in two separate fields. This is specifically for @tstoeckler's point from the call that this issue does not strive for a proper architecture as we try to do in Drupal, while I very much think that it does, as per the first "pro" of the one field approach.

Why is this listed under cons? You are just describing how it works not judging it in any way here.

less performant: it requires two base field definitions

I understand the implications of (base) field definitions but stating in this broad sense that it is "less performant" is simply not correct. If you seriously consider this an impediment of the current patch you need to provide some more concrete reasoning.

heavily debated feature set, see below

Below you are just talking about the first point, so using that as another argument is somewhat circular argumentation.

Now, let's talk about the first point from the "con" list of the revision reference field type because I believe that's the most important one.

To be honest, I don't think not having this compatible with contrib ERR is a big issue. For the use case of Paragraphs / compose entities / ... it turned out to be very beneficial if not utterly necessary to store both the revision ID as well as the entity ID. But that is one use-case of many. In general being to able to reference a specific revision of an entity is a nice feature to have.

The third option would be to leave the minimal implementation as-is, which removes the reusability point from its "pro" list

As said above I completely disagree with this. In particular if we do add a target_type setting in the future (which would also not affect the storage) it would be a really nice addition to the toolkit of data modelling-

Comment #133

tstoeckler

he/him

German

Essen, Germany

commented 10 March 2019 at 16:38

Re #131:

IMO, the architecture with two fields does not optimise for robustness and performance, but for modularity instead.

See above for why I disagree with @amateescu's arguments which you base this on. But even if I stipulate @amateescu's arguments coming to this conclusion seems to be a bit far-fetched. Lacking robustness in the context of this issue implies there is an issue with data integrity which is not the case, so I want to point that out explicitly.

Comment #134

catch

he/him

English

commented 10 March 2019 at 21:35

As stated above, I furthermore don't see how changing the strategy after we have introduced it here is acceptable in terms of BC. Once we have now decided what the parent is of a reverted revision, how can changing that behavior not be considered an API change.

The specifics of how we manipulate entities aren't an API - reverting an entity is a UI operation, we don't even have an API for it.

For me, the parent revision of a default revision should always be the previous default revision. This is what allows you to trace a linear default revision history once you have forward revisions.

So:

Revision 1 default
Revision 2 draft
Revision 3 default - the parent should be revision 1, and revision 2 would be the merge reference.

When we save revision 3, $entity->original is revision 1 - this is what hooks compare against for example.

That revision 2 was an actual revision at all is an implementation detail - it could have been in temp store or whatever, but the merge reference could then point to that"

1 ----- 3 (default)
  \    /
    2 (draft)

Comment #135

catch

he/him

English

commented 10 March 2019 at 21:46

Just for reference, I think we can change the behaviour of the default parent revision during alpha (or probably afterwards too), but I don't think we can change between one fields/two fields.

Comment #136

catch

he/him

English

commented 10 March 2019 at 22:02

@hchonov:

1
|
2
|  \
3  5
|
4

With this version of the graph, I can't tell whether 3 and 4 were abandoned drafts, or were published versions that got reverted.

In git terms, 3 and 4 look like 'dangling commits' or an abandoned branch in that graph, when in fact they're commits in the 'master' commit history.

However if we keep linear default revision parents and use the merge ref to point to the revert source:

1
|
2
|  \
3  |
|  |
4  |
|  /
5

Now I can see that 4 and 3 followed 2 sequentially as default revisions, but also that 5 followed 4 and is based on 2.

However even this is not completely correct, because in Drupal a revert is not actually a revert (nor a rebase).

All 'reverting' a revision does, is allow you to create a new revision based on the content of an old one. In technical terms it's not different to if you open up the edit form, make the same change to the content, and save it again.

So in that case, the 'revert' history would be:

1
|
2
|
3
|
4
|
5

Because in fact this is the actual history of the revisions, and whether someone wants exactly the same content as 2, or just happens to edit content again to be the same as 2 (and both cases are possible, such as a status change in the issue queue or promoting/unpromoting from front page) is really incidental, we just provide the revert UI to make it easier.

Comment #137

tstoeckler

he/him

German

Essen, Germany

commented 11 March 2019 at 00:23

Thank you @catch for your detailed explanation. At least now I understand where you are coming from. I don't agree with the assessment but if you - as a release manager - say that it is fine to change this behavior after this issue has gone in, I'm fine with discussing this in a follow-up. Therefore I won't discuss this issue further here.

To be completely honest I am a bit wary of that because it will make it more of an uphill battle for me and anyone agreeing with me, as with this being a new feature all options for the possible behavior are on equal footing before it lands, but after it does I will have to make an active case to change it, but I guess that's just tough luck for me in this case.

I do feel it is worth noting, however, that even after we had a >1 hour call which included all Entity API maintainers, this rather fundamental discussion on how a revision history, which this issue enables, should work in the context of reverting only arose now while at the same time this is being considered for inclusion in 8.7.x on the premise that it was "done"/"ready"/... last Friday.

Comment #138

plach

he/him

Italian

Venezia

commented 11 March 2019 at 01:31

@tstoeckler, #137:

I do feel it is worth noting, however, that even after we had a >1 hour call which included all Entity API maintainers, this rather fundamental discussion on how a revision history, which this issue enables, should work in the context of reverting only arose now while at the same time this is being considered for inclusion in 8.7.x on the premise that it was "done"/"ready"/... last Friday.

To be fair I was surprised as well, I didn't realize we had disagreement also on this aspect until #103, so I thought the only issue to overcome was the 1 field vs 2 fields discussion. I fully agree that having all entity API maintainers on the same page on this matter is highly desirable and something I spent all the weekend mulling over (and that's the reason why I set up the call in the first place). I still think that there is nothing fundamentally flawed in any of the solutions we discussed during the call on Friday (#97 and #127), they just seemed two competing approaches with similar merits and both acceptable. After picking one we would have been ready for commit from my perspective.

Clearly, I have more concerns now that also #123 is on the table, but as long as we can still change the approach during alpha, I'm ok with discussing this in a follow-up. I guess we could even roll back the commit from 8.7.x and only leave it in 8.8.x, if it turns out there are serious flaws that we need to account for.

Personally I'm trying to do everything possible to make this land in 8.7.x, because I'm sensitive to product management goals and this issue was flagged as critical for the Workflow Initiative success, however I'm not willing to do that at the cost of introducing irreparable technical debt. I'm still not convinced this is what we are doing here though.

I have no problem to admit that, weren't we this close to alpha, I'd be ok with postponing commit to the resolution of this last debate, however committing now and possibly revisiting this change (or rolling it back) later still gives us a chance to make the 8.7 deadline. I don't think we would be bending any rule in doing so and I definitely don't think it would be unprecedented. I hope you'll agree with me at least on this point :)

Comment #139

tstoeckler

he/him

German

Essen, Germany

commented 11 March 2019 at 02:19

To be fair I was surprised as well, I didn't realize we had disagreement also on this aspect until #103

Just to be 100% clear: I did not mean to insinuate that this was in any way hidden by anyone or that there was any ill intent. I think that we were all quite convinced that what we had in mind was self-evidently correct, unfortunately we didn't all have the same thing in mind.

Comment #140

hchonov

Bulgarian

Burgas

commented 11 March 2019 at 04:53

Re #134:

For me, the parent revision of a default revision should always be the previous default revision. This is what allows you to trace a linear default revision history once you have forward revisions.

In this case what for do we need the field if there are no revision branches and no forward revisions? The parent is then always the previous revision ID. Talking in regards to performance it does not make sense to track this information when there are no revision branches and it only introduces a small but unusefull overhead. Also with linear revisioning I lose the information about what has happened.

Re #136:

@hchonov:
1
|
2
|  \
3  5
|
4
With this version of the graph, I can't tell whether 3 and 4 were abandoned drafts, or were published versions that got reverted.

In git terms, 3 and 4 look like 'dangling commits' or an abandoned branch in that graph, when in fact they're commits in the 'master' commit history.

3, 4 are part of the default revision branch and have their revision history and "was_default" revision field, which makes it possible to "tell whether 3 and 4 were abandoned drafts, or were published versions that got reverted.".

If you compare the tree to the tree with the revision branches, then you can notice that even without an explicit revision branch, we still create a revision branch - an implicit one.

However if we keep linear default revision parents and use the merge ref to point to the revert source:
1
|
2
|  \
3  |
|  |
4  |
|  /
5
Now I can see that 4 and 3 followed 2 sequentially as default revisions, but also that 5 followed 4 and is based on 2.

@catch, I see your point and now I better understand your thinking. Thank you for explaining it.
However I don't agree with it, because for me 2-5 and 2-3-4 are just different "implicit" revision branches and 2 is the parent of both.

I don't feel comfortable comparing this to git, because our API is not based on git and it allows handling things differently - for example loading an older revision and saving it as a new one. In git the reverting of commits results in commits itself, which creates the linear revisioning. In git you explicitly specify which commits to revert, but in Drupal you can skip these intermediate steps by loading a previous revision, which then results into a new revision branch 2-5, while the previous branch 2-3-4 gets abondend.

Reverting multiple commits in git is explained here pretty well - https://stackoverflow.com/a/1470452 .

If we want to do it like git does, then this means that consecutively we have to prevent saving an old revision as the new default revision. Doing this will be pretty bad BC break and as we cannot do it, we can't really be conform with git.

Re #138:

To be fair I was surprised as well, I didn't realize we had disagreement also on this aspect until #103

Because prior the call we weren't aware of that and it wasn't mentioned in the issue summary, which is why it came out later.

Clearly, I have more concerns now that also #123 is on the table, but as long as we can still change the approach during alpha, I'm ok with discussing this in a follow-up. I guess we could even roll back the commit from 8.7.x and only leave it in 8.8.x, if it turns out there are serious flaws that we need to account for.

Personally I'm trying to do everything possible to make this land in 8.7.x, because I'm sensitive to product management goals and this issue was flagged as critical for the Workflow Initiative success, however I'm not willing to do that at the cost of introducing irreparable technical debt. I'm still not convinced this is what we are doing here though.

I have no problem to admit that, weren't we this close to alpha, I'd be ok with postponing commit to the resolution of this last debate, however committing now and possibly revisiting this change (or rolling it back) later still gives us a chance to make the 8.7 deadline. I don't think we would be bending any rule in doing so and I definitely don't think it would be unprecedented. I hope you'll agree with me at least on this point :)

Full disclosure - neither me nor Tobias is paid by the mentioned sponsorship for the workflow initiative during the call. Insisting on commiting this issue in 8.7.x already has caused a pressure on us. Commiting it now will cause even more pressure on us as it will leave us only with a time frame during the alpha release. Considering this issue deals with 3 different topics at the same time, I don't think this is correct.
I don't know about the mentioned sponsorship, but the company I work for sponsors the workflow initiative as well - by having paid for 99 % of the current code of both autosave_form and conflict 2.x. I've visited Drupal Camp Mountain, Davos 2019 this weekend with the agenda of working towards a stable conflict 2.x release based on an agreement for an approach after talking to Philipp Melab. Reference - #2867707-27: WI: Phase H: Conflict management and local workspace merging support. But after all I've ended up with dedicating my whole time this weekend and my energy to this issue instead. Commiting this now having so much disagreement is not correct considering not everybody here can dedicate that time in this little time frame. Also there are no technical reasons for having it in 8.7.x, because everything which will be using it will be targeted against 8.8.x, meaning commiting it and using it in 8.8.x should be fine as well.

Comment #141

plach

he/him

Italian

Venezia

commented 11 March 2019 at 08:33

Full disclosure: I'm not paid by the aforementioned company as well, and I spent most Friday on this issue, despite none paying me for that at all, just because I think that the overall benefits of having this in 8.7 outweigh the drawbacks.

I fully understand your point but it seems me that if we can't even compromise on this priority, the only solution you're willing to accept is one that fully meets your criteria, despite there being quite a few people disagreeing with those.

We can try to get this matter sorted now. If we can't have this land in 8.7, I won't lose (more) sleep.

Comment #142

plach

he/him

Italian

Venezia

commented 11 March 2019 at 08:55

@tstoeckler, #140:

Just to be 100% clear: I did not mean to insinuate that this was in any way hidden by anyone or that there was any ill intent. I think that we were all quite convinced that what we had in mind was self-evidently correct, unfortunately we didn't all have the same thing in mind.

Same here, thanks for saying that, it's appreciated.

Comment #143

catch

he/him

English

commented 11 March 2019 at 09:00

In this case what for do we need the field if there are no revision branches and no forward revisions?

Forward revisions are supported by the API, whether or not there is content_moderation, workflow or another mechanism to create them. If we added the field when one of those modules is enabled, we'd have to populate it on install which feels painful. edit: Also uninstalling would mean removing the field and losing the history. So it makes sense to me to store the information all the time, even if the history technically mirrors the revision ID sequence.

I don't feel comfortable comparing this to git, because our API is not based on git and it allows handling things differently

To be honest I also think it's not useful comparing this to git.

- for example loading an older revision and saving it as a new one... in Drupal you can skip these intermediate steps by loading a previous revision, which then results into a new revision branch 2-5, while the previous branch 2-3-4 gets abondend.

Except that loading a previous revision, removing the revision ID, and saving it to create a new revision is not a 'revert API' - it's just how the entity system responds when it gets an entity without a revision ID. So the behaviour in the revert case vs. publishing a draft revision case don't need to be identical - what happens under the hood in the entity system might be the same but the intention is very different.

The last example in #136 (1 - 2- 3- 4- 5) shows that there is really no revision branching going on at all in the revert case, that we loaded revision 2 is incidental - an implementation detail of the user interface. Loading revision 4, copying the changed fields over from revision 2, then saving it would work just as well to 'revert'.

Once you get into actual revision branches (a series of draft revisions building on each other, or a fork from a draft to a second workspace etc.) then the parent field starts to show useful information, but I really think revert is a different case.

There's also the issue of translations, to be honest I can't remember how the revert UI works with translations, but for example:

r1 - English
r2 - English + French
r3 - English edit
r4 - English + French + Spanish
r5 - Revert English back to language in 1 (or 2), but preserve the French and Spanish translations.

In this case, the parent of r5 is not r1 or r2, but r4 - the content is a composite of the translations in r4, with the English content from r1/2. It's a completely new revision, that does not match the content of any of the previous revisions, it just happens to combine field values from previous revisions.

Comment #144

hchonov

Bulgarian

Burgas

commented 11 March 2019 at 14:29

Re #141:

I fully understand your point but it seems me that if we can't even compromise on this priority, the only solution you're willing to accept is one that fully meets your criteria, despite there being quite a few people disagreeing with those.

I am not saying "I want it to be like this", but I am supporting everything I say with arguments. I am sorry if in any way it sounds to you like this, but this is absolutely not my intention. This statement was TRUE also for #2880149: Convert taxonomy terms to be revisionable, where I've raised concerns about the parent field being revisionable and I had hard times defending this, however now it turned out, that my concerns were correct and the parent field should not be revisionable - please refer to the latest comments on the issue.
The same statement however is valid in the other direction, because there are people disagreeing with your criteria as well. Please let us not make any accusations and maintain a healthy discussion. The situation is already pretty intense and I will consider this statement only as caused by it and will ignore it for the sake of the progress instead of getting offended.

For some parts here there has been no discussion, which is healthy and prevents bugs in the future and makes the API better. I think that all we want to have a better Drupal and no one wants to slow down something just because of believes or anything else different than the desire to invest in the progress of Drupal. Regarding a healthy discussion please see my next reply to @catch's comment.

Re #143:

- for example loading an older revision and saving it as a new one... in Drupal you can skip these intermediate steps by loading a previous revision, which then results into a new revision branch 2-5, while the previous branch 2-3-4 gets abondend.

Except that loading a previous revision, removing the revision ID, and saving it to create a new revision is not a 'revert API' - it's just how the entity system responds when it gets an entity without a revision ID. So the behaviour in the revert case vs. publishing a draft revision case don't need to be identical - what happens under the hood in the entity system might be the same but the intention is very different.

The last example in #136 (1 - 2- 3- 4- 5) shows that there is really no revision branching going on at all in the revert case, that we loaded revision 2 is incidental - an implementation detail of the user interface. Loading revision 4, copying the changed fields over from revision 2, then saving it would work just as well to 'revert'.

Exactly because you might look at it not only as a 'revert API' is the reason an implicit revision branch is created. You can use the API this way simply in the entity API not in the revision revert UI and in this case the currently loaded entity revision is the parent.

This is why I've stated in my previous comment:

If we want to do it like git does, then this means that consecutively we have to prevent saving an old revision as the new default revision. Doing this will be pretty bad BC break and as we cannot do it, we can't really be conform with git.

If we are unable to save an old revision as a new one, then a revert would work only by copying over values and in this case we will correctly end up with revision parent being the latest revision and merged parent being the entity to which state we want to revert to.

There's also the issue of translations, to be honest I can't remember how the revert UI works with translations, but for example:

r1 - English
r2 - English + French
r3 - English edit
r4 - English + French + Spanish
r5 - Revert English back to language in 1 (or 2), but preserve the French and Spanish translations.

In this case, the parent of r5 is not r1 or r2, but r4 - the content is a composite of the translations in r4, with the English content from r1/2. It's a completely new revision, that does not match the content of any of the previous revisions, it just happens to combine field values from previous revisions.

@catch++. Thank you for bringing up this example. It actually shows that we have unfinished work in the patch. \Drupal\node\Form\NodeRevisionRevertForm::prepareRevertedRevision() loads an old revision and declares it as the new one, but \Drupal\node\Form\NodeRevisionRevertTranslationForm::prepareRevertedRevision() makes use of \Drupal\Core\Entity\TranslatableRevisionableStorageInterface::createRevision(RevisionableInterface $entity, $default = TRUE, $keep_untranslatable_fields = NULL) which is documented as follow:

* When dealing with a translatable entity, this will merge the default
* revision with the active translation of the passed entity.

This case that you describe and the usage of this API this way should indeed result into a linear revisioning and the passed entity is the merged revision, while the current default revision is the parent. In this case I completely agree with you and we should update \Drupal\Core\Entity\ContentEntityStorageBase::createRevision() to set the merged revision on the new entity object. However in this case we merge only a single translation but currently we are able to only flag the whole entity as merged. I was thinking that we might use the revision_translation_affected flag, but I am not sure if this is sufficient, because there might be no changes on a translation and then the flag will remain unset for that translation or if we merge non-translatable fields, then even if merging only one translation all the translations will have the revision_translation_affected flag set. Do we also need a boolean translatable merged_langcode field? Any thoughts on that? I think this issue is CNW because of this. Does this convince you that the patch is not ready for 8.7.x?

Summary:
There are two use cases mentioned here:
1. Loading an entity in an older revision and saving at as a new one, which I think should result into having as parent the old revision.
2. Reverting only a single translation, where the parent should be the current default revision and the entity revision used for the revert should be the merged revision.
--------------------------

A side note:
We've just found a bug in our project and a way to recover from it might be to create some intermediate revisions...

This would be easy if on save the entity receives as parent the currently loaded revision. Otherwise one has to explicitly set the parent, which requires a dedicated knowledge how the parent is being chosen - and I think this knowledge now is limited to only the people participating in the current issue and even with a CR I am not sure that this knowledge is able to propagate. Considering this fact in isolation regardless of everything else convinces me as well that it is much more secure that the parent is always the previously loaded entity object, which in the case of \Drupal\node\Form\NodeRevisionRevertForm::prepareRevertedRevision() will be the loaded revision and in the case of \Drupal\node\Form\NodeRevisionRevertTranslationForm::prepareRevertedRevision() will be the default revision.

Comment #145

hchonov

Bulgarian

Burgas

commented 11 March 2019 at 16:55

Actually instead of introducing one more field, we could make the merged_revision field translatable - something that we are able to do thanks to it being a separate field and not only a property of an another field.

In order to keep it a revision metadata key, we would have to adjust \Drupal\Core\Entity\Sql\DefaultTableMapping::create() to allow translatable revision metadata keys to be put into the data tables. I think that this change is fine, but it also might be considered a BC break if someone has defined a revision metadata field(s) to be translatable, because after such a change the generated schema will change. We could tackle this by explicitly checking only for that field and allowing only it to be translatable.

Comment #146

catch

he/him

English

commented 11 March 2019 at 17:18

While I still personally don't have a strong preference on two vs. one fields, I'm tempted to go with #127 - it does not actually introduce ERR, and the field is more agnostic.

I spoke to amateescu and plach separately about the revision parent strategy at the moment, because I had another alternative idea, and they both explained something I hadn't thought of before, which I think is a strong argument for keeping it as it is:

plach: those of us arguing for a linear parent history are thinking about the parent field as a change history for merging purposes. those who are arguing for setting parent to 'currently loaded revision ID' are talking about a process history for auditing_purposes. For me, if a content auditing trail is desired that could be an additional field that always tells you the loaded revision ID regardless of any other branching strategy. I'm not sure that both approaches can be reconciled in a single field, which might explain the deadlock. Or we might find out they can later on in which case it will get resolved by the follow-ups anyway.

The main reason to keep the linear revision history in the patch though, is that until we decide exactly how parallel revisions might work, it ensures the following:

1. Site updates from 8.6.8 to 8.7.0
2. Site updates form 8.6.8 to 8.7.8

Let's assume these are the same site, with identical content changes, and the only difference is when the code update happens, then the revision history they'll end up is identical with the current approach. This is a consistent basis to work out everything else from, it means that everything up until we have a fully defined parenting strategy will be the same on all sites (unless they start to set the parent manually themselves, in which case they made a decision).

The patch needs a re-roll after the menu link conversion so CNW for that.

Comment #147

hchonov

Bulgarian

Burgas

commented 11 March 2019 at 17:48

While I still personally don't have a strong preference on two vs. one fields, I'm tempted to go with #127 - it does not actually introduce ERR, and the field is more agnostic

Thank you!

plach: those of us arguing for a linear parent history are thinking about the parent field as a change history for merging purposes. those who are arguing for setting parent to 'currently loaded revision ID' are talking about a process history for auditing_purposes. For me, if a content auditing trail is desired that could be an additional field that always tells you the loaded revision ID regardless of any other branching strategy. I'm not sure that both approaches can be reconciled in a single field, which might explain the deadlock. Or we might find out they can later on in which case it will get resolved by the follow-ups anyway.

Could anyone please provide an example where using the loaded revision ID as the parent is wrong in terms of merging purposes? Because if I understand it correctly in your use cases you always work with the active version of an entity in which case the parent will be the same regardless of the approach. It will be different only if you use an older revision instead of the active one, which I don't think is the case anywhere beside a revert.

---------

I've said that I consider it CNW also because currently the merged_revision field is non-translatable indicating that a whole entity merge has occurred. However core already has an API for merging only a single translation like I've explained in #144. Therefore I suggest making the field translatable. This will be needed if you consider the merged revision for a parent when searching for a LCA, because if you merge only the german translation of an entity and you search for a LCA for the english translation then it is wrong to consider the merged entity as LCA.

Comment #148

amateescu commented 11 March 2019 at 17:51

Status	File	Size
new	2725523-147.patch	67.93 KB
new	interdiff-147.txt	23.86 KB

The patch in #127 needs to be updated after #2880152: Convert custom menu links to be revisionable. Also, renamed merged_revision to revision_merge_parent after discussing with @plach.

Comment #149

hchonov

Bulgarian

Burgas

commented 11 March 2019 at 18:00

Actually if talking about performance retrieving the active revision requires an entity query, right? So on each save there will be an entity query, while when using the loaded revision ID as parent this will not be the case. We have complex nested inline entity forms, where we save a lot of entities at once. Having 50+ entity queries on save will have a serious impact on our performance. I think this might be the case with Paragraphs too.

And we already have performance issues, so this change will be a performance regression for us.

Comment #150

11 March 2019 at 18:48

Status:

Needs review

» Needs work

The last submitted patch, 148: 2725523-147.patch, failed testing. View results

Comment #151

xjm

she/her

English

commented 11 March 2019 at 19:26

Thanks everyone for such careful consideration of the implications of this change.

I discussed this briefly with @catch and wanted to add that I'm comfortable deferring some decisions to followup issues here, so long as we preserve BC and upgrade path/data integrity. Iterative improvement would be great to see in 8.7.x before beta1, and then the followups can land in either 8.7.x or 8.8.x depending on how far we get on them in the next few weeks.

(Note that I haven't done a full review of the patch itself or the details of all the different scenarios; leaving signoff for that to the framework managers.)

Really great work here. Thank you!

Comment #152

amateescu commented 12 March 2019 at 10:03

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-152.patch	71.21 KB
new	interdiff-152.txt	9.04 KB

Updated the patch to account for the taxonomy term conversion to revisionable and fixed the failing test.

@hchonov, you can use hook_entity_revision_create() to set a parent manually so the entity query won't run anymore.

Comment #153

hchonov

Bulgarian

Burgas

commented 12 March 2019 at 10:11

In #144, #145, #147 and #149 I've made some points and asked some questions. It would be great to get an reply on those from either @plach or @amateescu.

The merged_revision field might need to be translatable and we already have an API in core which is merging a single translation from a non default into a default revision. The patch here hasn't yet updated this API. Also I've explained that it needed to consider merged_revision per translation and not per entity when searching for LCA. If we haven't had the official API for merging only a single translation then we could have enforced merging all translations at once, but now we have to support the core API. And actually thinking about it it is impossible for a single editor to merge all translations unless the editor is polyglot. Therefore I think the field should be translatable.

And because there were so many points by @amateescu and @dixon_ about performance it would be great to explain how an entity query for retrieving the revision ID is more efficient than reading a property from an already loaded entity object? Why is it needed at all considering we've introduced an API for retrieving always the active revision for editing, which means there are no cases where the query will return anything different than the getLoadedRevsionId method on the loaded active revision, because we already load the active revision. This simply makes the query unnecessary.

I've also asked at least for an example in which cases it is wrong to assume the current loaded revision ID as parent.

I am not sure that those points can be addressed without BC break in a follow-up and they show that we haven't considered and talked about everything, but I think that we should when introducing such a complex new feature. After all we all want that everybody using it will be happy with it.

I have nothing against this going into 8.7.x, but having so much things not clarified and that there might be potential BC breaks required, it isn't really good. I think we need couple of days to think everything through.

I am raising concerns because we want to use this API in our project as well and I just see cases where it might not work properly. When developing a new API it is always better to consider the use cases of others and only the own ones. We all share Drupal like the air we breathe :).

----

@hchonov, you can use hook_entity_revision_create() to set a parent manually so the entity query won't run anymore.

@amateescu, this however is not supported by core and core doesn't run the tests with such a hook.

Comment #154

plach

he/him

Italian

Venezia

commented 12 March 2019 at 10:15

@hchonov:

I'm rather busy right now, but I'll make sure to reply to your latest comments today, worst case tonight!

Comment #155

amateescu commented 12 March 2019 at 10:37

it would be great to explain how an entity query for retrieving the revision ID is more efficient than reading a property from an already loaded entity object?

Did anyone say that an entity query is more efficient than reading a property from an entity object?

Why is it needed at all considering we've introduced an API for retrieving always the active revision for editing, which means there are no cases where the query will return anything different than the getLoadedRevsionId method on the loaded active revision, because we already load the active revision. This simply makes the query unnecessary.

The query is needed because loading the active revision also does translation negotiation, which might return a different revision than the latest one from the default branch, and since we want to keep the default revision history linear we need to ensure that the parent revision is _always_ the latest one from the default branch.

I've also asked at least for an example in which cases it is wrong to assume the current loaded revision ID as parent.

When a different revision has been added by another editor or by another (possibly automated) process, between the time the entity form was loaded and when it was submitted.

When developing a new API it is always better to consider the use cases of others and only the own ones.

Not sure what you mean by "own ones", but did you read #3023194: [PP-2] Add parallel revisioning support? We tried to incorporate as many use cases as we could think of in there, and I've pointed you towards that issue many many times already.

this however is not supported by core and core doesn't run the tests with such a hook.

I've no idea what "is not supported by core" means. If you mean that the createRevision() API is not *always* used in core tests, that's true, but I still don't get the point.

In any case, I gave you a solution for _your project_ (you're welcome, by the way), what do core tests have to do anything with that?

I can't speak for the argument about the translation stuff, hopefully @plach will be able to do that.

Comment #156

catch

he/him

English

commented 12 March 2019 at 11:57

Status:	Needs review	» Reviewed & tested by the community
Issue tags:		+Needs framework manager review

I've reviewed the latest interdiff and am happy with the patch as it stands now.

On the merge revision ref and translatability:

1. Once again I think this comes down to treating the parent/merge history as a content auditing field (i.e. tracking the content source for a particular revision) rather than change history (tracking the history of revision creation and the point that drafts are merged into master or other draft branches).

Where we combine multiple revisions into one to preserve translations, the reason we do so is to not change the content of translations that aren't being operated on in the UI. In other words it's mostly a feature of how the user interface works, not the entity API, similar to reverts.

2. In the same way that making the term parent revisionable can happen in a minor release, making the merge revision translatable also could (and during alpha too). So if we commit it not-translatable, we aren't closing off the possibility of making it translatable later. Going the other way though could potentially result in data loss going from a translatable to non-translatable field (if people use it) since you'd have to resolve multiple values down to one.

Having said all that, plach is much stronger on how exactly revisions and translations interact than me, so should confirm the above thinking with him too.

Moving this to RTBC since I think we have the best compromise on the two fields possible, and for me personally I feel strongly that keeping the revision history linear until we properly resolve the parallel revisioning issue is important. Tagging 'needs framework manager review' though so it's clear plach needs to take another look at it before it goes in.

Comment #157

hchonov

Bulgarian

Burgas

commented 12 March 2019 at 12:28

Re #155:

Did anyone say that an entity query is more efficient than reading a property from an entity object?

This was a question regarding your biggest arguments here have been about performance.

The query is needed because loading the active revision also does translation negotiation, which might return a different revision than the latest one from the default branch, and since we want to keep the default revision history linear we need to ensure that the parent revision is _always_ the latest one from the default branch.

Thank you for explaining this, now I understand where you are coming from. It would've been better if this was clear from the beginning. Can we please add this to the documentation? Also can we explain why do we need to keep this information when we are performing a query anyway for retrieving the LCA? Since we always have a linear revisioning then we could simply retrieve the revision before the current one, right?

What about if there is only a default branch and no forward revisions are involved? Is there a way that we do the query only when it really has to be performed? Flag the entity object somehow? It looks like we have been talking not only about different purposes, but also have had different usages in mind.

I've also asked at least for an example in which cases it is wrong to assume the current loaded revision ID as parent.

When a different revision has been added by another editor or by another (possibly automated) process, between the time the entity form was loaded and when it was submitted.

The EntityChangedConstraintValidator in core prevents this. This is a use case where a conflict will occur and the conflict 2.x module is already taking care of. If the API does something automatic it will prevent a proper conflict detection and resolution. This should not be automated, but it should be resolved by conflict management.
However given your previous reply this is only the case when only default revisions are involved.

We tried to incorporate as many use cases as we could think of in there, and I've pointed you towards that issue many many times already.

Regarding this I've already have given examples that the way you determine the common ancestor does not match our expectations.

I've no idea what "is not supported by core" means. If you mean that the createRevision() API is not *always* used in core tests, that's true, but I still don't get the point.

In any case, I gave you a solution for _your project_ (you're welcome, by the way), what do core tests have to do anything with that?

I've meant that when core tests run, only the query approach is used. And the core tests are important because they ensure that something is compatible with core.

Re #156:
I think we really need @plach on this, because I am pretty sure that you merge only a single translation at a time and knowing which one has been merged is important when resolving conflicts, because we might have used different revisions for the different translations giving the fact that we retrieve different revisions for different translations as said in the last comment from @amateescu.

Making the merged_revision translatable later would require a change in the way the LCA is determined. Wouldn't this be a BC break?

Moving this to RTBC

We haven't yet updated the core internal API which is merging revisions to set the merged revision. Don't we want to this here? I am talking about \Drupal\Core\Entity\ContentEntityStorageBase::createRevision().

Comment #158

amateescu commented 12 March 2019 at 13:59

This was a question regarding your biggest arguments here have been about performance.

That is completely false (it was even the last one in the "pros" list), and you didn't answer the question. I really wonder what are you trying to achieve with FUD like this.

Also can we explain why do we need to keep this information when we are performing a query anyway for retrieving the LCA?

Keep what information? Where does the current patch perform a query for retrieving the LCA outside of test code?

Since we always have a linear revisioning then we could simply retrieve the revision before the current one, right?

That's what the query which retrieves the latest revision ID does.

What about if there is only a default branch and no forward revisions are involved? Is there a way that we do the query only when it really has to be performed? Flag the entity object somehow?

How would we know that there are no pending revisions involved?

The EntityChangedConstraintValidator in core prevents this. This is a use case where a conflict will occur and the conflict 2.x module is already taking care of. If the API does something automatic it will prevent a proper conflict detection and resolution. This should not be automated, but it should be resolved by conflict management.

The EntityChanged constraint is only added by default to entity types that implement EntityChangedInterface, so we can not rely on it for every revisionable entity type.

Regarding this I've already have given examples that the way you determine the common ancestor does not match our expectations.

No, you gave examples of a different branching strategy than the one currently proposed in #3023194: [PP-2] Add parallel revisioning support, the algorithm for determining the LCA of two revisions from the graph is generic and works for any branching strategy. Also, it is completely fine for a project to use a different branching strategy than the one that will be provided by default in core.

I've meant that when core tests run, only the query approach is used. And the core tests are important because they ensure that something is compatible with core.

EntityRevisionsTest::testRevisionParents() has a test which sets the parent(s) manually, which means that the query is not used.

Comment #159

hchonov

Bulgarian

Burgas

commented 12 March 2019 at 14:36

That is completely false (it was even the last one in the "pros" list), and you didn't answer the question. I really wonder what are you trying to achieve with FUD like this.

You were talking about performance and I've simply said that when there are no forward revisions a query is more expensive.

Why do you wanna start a fight and conflicts from everything I say?

Keep what information? Where does the current patch perform a query for retrieving the LCA outside of test code?

Information about the parent. And I was talking about this code, which is not a test:

+++ b/core/lib/Drupal/Core/Entity/Sql/SqlRevisionTreeHandler.php
@@ -0,0 +1,199 @@
+  public function getLowestCommonAncestorId($revision_id_1, $revision_id_2, $entity_id = NULL) {

How would we know that there are no pending revisions involved?

I am not sure about this, this is why I am simply asking questions if it is possible to optimize it for this use cases.

Comment #160

plach

he/him

Italian

Venezia

commented 12 March 2019 at 16:06

Issue tags:

-Needs framework manager review

I have read all the latest comments and, as I feared, there are a lot of valuable ideas and little time to evaluate them, which is the main reason why I was pushing for addressing this topic in a follow-up. I'm not feeling comfortable with making a decision or indicating a direction in such a short timeframe, so I got in touch with @catch and we agreed to commit this only to 8.8.x for now. This still needs an IS update and a CR though.

Given that the potential for disruption is not that high, we are willing to consider this for backport to 8.7.x during the alpha window, but only if we are confident that all concerns have been addressed in the follow-up and that we have a reliable way forward.

I have taken note of several questions/remarks that would require a detailed reply, but I'll keep them for the follow-up issue, where we will be hopefully able to start this conversation over with an updated IS and without the pressure and the heat of the past days.

There's only one comment I'd like to address here because it definitely doesn't belong in the follow-up :)

@hchonov, #144:

I am not saying "I want it to be like this", but I am supporting everything I say with arguments.
The situation is already pretty intense and I will consider this statement only as caused by it and will ignore it for the sake of the progress instead of getting offended.

I'm sorry if you felt offended, it was not my intention to be offensive: I was not implying that you were wrong, just that, with meany people disagreeing with you, it was unlikely to find consensus on your positions in a short timeframe. I was (and partially still am) frustrated by the fact that I was trying in good faith to find a process allowing us to keep everyone's goals reachable (or at least most of them) and kept getting technical answers in reply, which I perceived as "no"s.

If I'm bringing something home from this issue, besides a lot of technical goodies, is that in the future we all need to be better at communicating our minds. This is a very complex topic and we cannot afford to leave anything implied, both on the technical and on the human side, there's too much at stake.

Peace ✌️

Comment #161

amateescu commented 12 March 2019 at 20:45

Issue summary:	View changes
Issue tags:	-Needs issue summary update, -Needs change record

Updated the IS and added a CR: https://www.drupal.org/node/3039579

Comment #162

larowlan

🇦🇺🏝.au GMT+10

commented 14 March 2019 at 21:53

Has thought been given to what happens when a user deletes a revision?

We also have #2925532: Provide bulk action to delete all old revisions of an entity

In that scenario, should the revision parent be updated to the next ancestor?

Comment #163

20 March 2019 at 19:58

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 152: 2725523-152.patch, failed testing. View results

Comment #164

20 March 2019 at 19:58

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #165

plach

he/him

Italian

Venezia

commented 23 October 2019 at 15:04

Version:

8.9.x-dev

» 9.1.x-dev

Moving to 9.1.x per #3057445: [policy, no patch] freeze entity schemas in 8.9.x and 9.0.x.

Comment #166

alexpott

he/they

English

🇪🇺🌍

commented 9 December 2019 at 12:51

Status:

Needs work

» Needs review

Status	File	Size
new	2725523-9.0.x-166.patch	71.22 KB

Re-rolled the patch.

Comment #167

9 December 2019 at 14:13

Status:

Needs review

» Needs work

The last submitted patch, 166: 2725523-9.0.x-166.patch, failed testing. View results

Comment #168

9 December 2019 at 14:13

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Comment #169

9 December 2019 at 14:13

Version:

9.2.x-dev

» 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #170

bbrala

Netherlands

commented 29 October 2021 at 08:56

It would be awesome to see this issue moving forward. There has been a lot of discussion, but it does seem this seemed to end in some consensus. Anyone interested in breathing new life into this one? :)

Comment #171

29 October 2021 at 08:56

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #172

29 October 2021 at 08:56

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #173

29 October 2021 at 08:56

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #174

29 October 2021 at 08:56

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #175

29 October 2021 at 08:56

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.