Have you ever wanted to detect duplicate content or revisions for nodes, users, or other entities? Say you are loading data from a 3rd party source that does not come with a unique key or it does have a primary key, but does not have a version identifier, and you want to be sure you are not saving unnecessary revisions on each data load.

This module calculates a cryptographic hash for all detectable entities and their revisions (if applicable) and saves the hash to their respective tables.

On the surface, that is it; however, developers can interface with the module in a few ways by:

  • Changing the default hash type (currently md5)
  • Transforming the entity being hashed to normalize data (for example, if you do not want "Author" to be considered relevant when hashing)
  • When to rebuild some/all hashes

How this module detects entities (and wether they are revisionable) during the install process is based on the work by UUID, but that is where the similarities end. Where UUID is just a random string, Entity Hash is consistent provided the same inputs. Where UUID never changes once set, the Entity Hash will necessarily recalculate and may change every time an entity is re-saved.

To minimize performance impacts, this module uses a queue on install and for bulk rebuild operations, so it may take awhile to calculate or recalculate hashes for all applicable entities.

This project is sponsored by Switchback, LLC. We are available for customizations and other Drupal development projects.

Project information

Releases