Problem/Motivation
Currently with Drupal 8 a UUID is useless. It maybe universally unique, but it doesn't really identify anything on it's own.
In Multiversion module there is multiple entity indexes for UUID, revision hash, sequence, and revision tree. All but UUID are irrelevant for core at the moment.
In Multiversion all of the indexes are grouped by workspace, which is important for how Multiversion uses these indexes. If we put indexes into core before workspaces go into core we will need to find a way for contrib modules like Multiversion to alter these indexes.
Proposed resolution
- Create an interface for all indexes
- Create a base class for all indexes
- Create an index of UUIDs with their entity type, entity id and revision id.
Remaining tasks
This is already implemented in Multiversion module, so mostly just needs porting over. However Multiversion introduces workspaces which would need to be removed from the uuid index code in a way that still allows Multiversion to add them.
Technical summary
- entity.index.uuid service
- getters and setters in service
- hook_install to add all existing entities to index
- hook_entity_insert to add new entities to index
How the indexes in Multiversion are used
The main use case is the RELAXed Web Services module which implements the CouchDB API (http://docs.couchdb.org/en/stable/http-api.html) this API focuses 100% on UUIDs, so we need a way to know when GETting or POSTing a URI which entity that relates to without looping through every entity type.
Other use cases
#2353611: Make it possible to link to an entity by UUID - Wouldn't it be cool if we didn't need the entity type in the URI, and out UUIDs were universally identifiable?
#2577923: MenuLinkContent entities pointing to nodes are not deployable: LinkItem should have a "target_uuid" computed property - Wouldn't it be cool if menu links could link to a UUID and we can look up what entity type that relates to?
Comments
Comment #2
amateescu commentedI was thinking about this in the past few days and I couldn't quite figure out what is the use case for the UUID index as it is currently provided by the Multiversion module. Is it really useful to find an entity by UUID without knowing the entity type?
For core, I think we could do something more like the taxonomy index but for each entity type. It would be something like this:
- each content entity type that is referenced by an entity reference field will get its own index table (this is needed in order to be able to provide extra columns per entity type, e.g. 'status', 'sticky', 'created' from the current taxonomy index)
- we do not exclude "inaccessible" entities like the taxonomy index does
- we include UUIDs alongside numeric IDs for both referencing and referenced entity
What do you think?
Comment #3
timmillwoodComment #4
catchYes I'm also not clear on the use case here, tagging for issue summary update.
Comment #5
timmillwoodThe biggest use case for this in Multiversion is the Relaxed module.
Relaxed is an implementation of the CouchDB API (http://docs.couchdb.org/en/stable/http-api.html) a CouchDB database relates to a Workspace, and a CouchDB docid relates to a Drupal entity UUID. Therefore we need to be able to GET URLs like
/{db}/{docid}without knowing the entity type. We also need to POST to these types of URL for this we have a bunch of normalizers in the Replication module which handle the normalization and denormalization of entities.There are many many other places in these modules, especially in the normalizers, where we use the indexes to get entity information from just the UUID or revision hash or sequence id etc
Comment #6
larowlanIs there any technical reason those document IDs can't contain the entity type as well.
I.e. do they need to be UUIDs in the strict form of the word, or can they be in format {entity_type}:{uuid}?
Comment #7
timmillwoodI guess in theory they could be in the format {entity_type}:{uuid}
Comment #16
sime#2353611: Make it possible to link to an entity by UUID claims to be a use-case.
Comment #17
simeThis is where Drupal can better provide internal/external interfaces to its library of things. And provides a means to hide internal implementation details like entity types from external systems - general DX at the edge (or even "machine to machine experience"?).
Assumptions:
1. You would not want every entity's uuids in this index because that would include very low level entities. You would potentially compromise the value of the index on a site-by-site basis.
2. Following from that, the index is effectively a cache (could be rebuilt) and should not ever be some sort of primary key table thing.
3. While the index is by default in the database it should be readily put in memory or mongo or whatever.
4. Modules or site config may be able to define which entities are represented in the index, and may be able to extend the information in the index.
If this is written in an abstract way (think "uuid index api") then a module or subsystem could simply request its own index. The "uuid route" module (core or contrib, who cares) says "hey i want a uuid index of all these entity types and here is the storage class for it" and then define a /uuid/... route with the data storage looked after. Other modules could use it too, in the same way that modules share use of the default `cache` table, but a module could define its own for a specific purpose.
Comment #18
simeI'm interested if anyone has the view that Drupal should be centralising all entity uuids into a primary table: `uuid`, `entity_type`, `eid`, removing uuid completely from the entity tables, and whether that would be a good thing ™.
Comment #19
colanThere's some related discussion on this at #1637370-56: Add UUID support to core entity types. I feel like folks are afraid of using a single primary table because of the size, but I wouldn't mind hearing other opinions on that myself. Maybe this is less of an issue with the DBs now than it was 8 years ago.
What you've written above makes sense to me so far, but I'd feel better about it if we had an explicit reason to rule out the single-table approach. Maybe it's not such a terrible idea? I'm not sure.
Comment #20
berdirI'm fine with a centralized single index table, but not as the only storage for it. We definitely still need the UUID's to be stored as a field for content entities too.
Note that we already did a custom optimization for block content entities in \Drupal\block_content\BlockContentUuidLookup. That *might* not be required anymore then, but it's a cache collector, so stores all block content uuids in a single cache entry and looks them up if necessary.
Comment #21
simeMy instinct is that an all in approach might not be practical, or might simply impair the performance wins we get from a central index. More and more we will see developers using entities for things - specifically things that get created without human intervention. I had an issue where broken_link module filled up the db with 50k entities in a day. Or the site where a developer decided to use entities as log entries for machine-to-machine chatty communications. All these uuids would go into an "all in" index I assume.
Comment #22
aaronmchaleReading over the issue summary and comments I started to think that this could be a good idea, but (and #21 illustrates perfectly) my gut feeling is that the code which acts on an entity create/save should live in one or more methods in the EntityBase class (or maybe ContentEntityBase? Do we really need this for config entities?), and called at some point in the Entity save process. This would be instead of relying on hook_entity_insert, which is currently proposed in the issue summary. As said, comment #21 provides some rational for this, and so this approach means devs can effectively opt-out the entity types where it would not be appropriate to include in the index. We could even go a step further and provide an optional entity annotation key to allow configuring this.
Comment #26
ghost of drupal pastTo further on #6 we could create UUIDs that contain the entity type, for example the first 64 bits of our UUIDs could be the first 64 bits of
sha1("drupal:$entity_type")and then to find out the entity type one would need to run#(entity types)SHA1 operations -- but that's very well cacheable as that table almost never changes.Comment #27
bircherRE #26: This is the second issue I am aware of which would benefit from predictable or semi predictable UUIDs
The other one is #3208766: Add UUID to sections. If there is more then maybe we can create a new issue to do this in a reusable way.
Though as with the other issue, the upgrade path would be a bit difficult. Here I guess it is a new kind of uuid we would save?
Comment #29
geek-merlin#21 (Performance penalty from huge entity tables):
Yes it should be possible to exclude entity types from indexing.
Also there may be an alternative implementation that replaces the separate index with sql union magick:
I don't see a good use case for that though, but maybe others do.
Comment #30
c-logemannComment #33
claudiu.cristeaAlso
EntityRepository::loadEntityByUuid()can be deprecated in favour of a method that only needs the UUID, without the entity type argumentComment #34
c-logemannI completely disagree with the starting argument of 2016:
> Currently with Drupal 8 a UUID is useless.
There are so much things we can solve with UUIDs like config entity conflicts and hiding serial IDs (see my module U3ID. And since 2016 we have much more entity types including logs etc. and still growing. On larger systems this index would be a very huge table and should probably handled like a cache table which can be excluded in backups etc.
Personally I only see this as an interesting idea to find an entity just via UUID. But I hope this is something we won't be dependent on this feature in future because I see a lot of performance issues coming with it. So this should be optional at all or at least optional on any (!) entity type as an OPT IN setting in my opinion.