This issue is part of: #2721129: Workflow Initiative and was originally proposed in #1812202-25: Add UUID support for entity revisions but is now this separate issue.

Problem/Motivation

It should be possible to identify revisions across multiple environments. And the same change to an entity on multiple environments should result in the same ID. For this we need a revision hash field, with a deterministic hash algorithm.

A very important design decision in both Git and CouchDB is that if exactly the same change (time, content etc) is made in two different environments to the same entity it should result in the same revision hash. The point is that this hash should NOT be unique in the case just described. It's better if we can identify exactly the same change in two environments as the same revision hash to avoid having the same change marked as conflicts (once these two changes make it onto the same environment).

Proposed resolution

Add a new base field for all revisionable entities. Stub code for the bit that generates the hash:

  $array = $entity->toArray();
  // Don't include local IDs to keep hash consistent across multiple environments.
  foreach (['id', 'revision_id'] as $key) {
    unset($array[$key]);
  }
  $entity->revision_hash->value = md5(serialize($array));

Remaining tasks

Code.

User interface changes

None.

API changes

Only additions.

Data model changes

One additional base field for all revisionable entities. Needs an update hook to back-fill old revisions.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dixon_ created an issue. See original summary.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

dixon_’s picture

Issue summary: View changes

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

hchonov’s picture

It's better if we can identify exactly the same change in two environments as the same revision hash to avoid having the same change marked as conflicts (once these two changes make it onto the same environment).

I don't understand this completely. When the same change is made on both environments, then both entities will remain the same. There is no conflict regarding the change, as the entities, or at least the fields on which the change was performed, are identical.

It would be great to elaborate on this, because I might be missing something.

P.S.
The only thing I could think of is if you're comparing the entities only through their "change hash" to check if there are conflicts between the entities.
However in this case it might happen that a change across multiple fields is performed in a single revision on the one environment and on the other it has been spread across multiple revisions. This results in the same effective change, but now the hashes will be different, because they are based on the changes in a single revision only.

sime’s picture

I don't understand this completely. When the same change is made on both environments, then both entities will remain the same. There is no conflict regarding the change, as the entities, or at least the fields on which the change was performed, are identical.

The issue in this case would only be if the hash is different. A different hash would be the cause of the conflict.

hchonov’s picture

I am sorry, I haven't seen the initial patch in the issue, which gives more clarity than the issue summary. So the idea is to have a hash over the document's content.

I've just talked to @cspitzlay about this and he mentioned, that the hash might not be necessary unique because of timestamp fields (like the changed field), which will be different unless the change is made on both environments at the very same second, given the server clocks are synced. A solution might be to explicitly remove the revision_created and the translation's created and changed fields, however we could exclude only those fields and not all timestamp fields.

Computed fields also play a role, because their value will be returned by calling \Drupal\Core\Entity\ContentEntityBase::toArray() and if the computed value is dependent on the environment or on the current date, then the hash will be different as well. As computed fields are not persisted it might be possible to exclude them from the hash computation.

The hash value itself should also be excluded from the computation.

And still I have one question - given the fact, that this is a computable information, why do we want to have a dedicated and persisted field for it?

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

geek-merlin’s picture

This might make a lot of sense.

> given the fact, that this is a computable information, why do we want to have a dedicated and persisted field for it?

Yes, this should be a computed field.

> [exclude timestamps]

Yup.

> [exclude computed fields]

In spite of the name, computed fields are not necessarily read only today, but - in lack of a better way - are used to model fields with custom storage. OG does this with the user->OG relation field, and Group does the same for the group-content->group relation. Just FTR.

And by the way:

+++ b/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php
@@ -1022,6 +1031,16 @@ protected function saveRevision(ContentEntityInterface $entity) {
+        $record->{$this->revisionHashKey} = md5(serialize($array));

According to https://www.drupal.org/node/2833433, we must not use MD5.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.