Problem/Motivation
On an "API-first" site, most/all of our client side interactions with entities is over JSON:API. Entities exposed over JSON:API require (I think?!) a UUID field. We also take steps to hide serialized entity IDs, where Drupal is apt to send them, to prevent enumeration and make the API less "Drupally."
AFAIK there is no strict requirement that content entities must implement a serial integer ID; they simply need an ID because, well, that's what is used around Drupal's core entity API to load and, uh, identity entities.
I posed the question in Drupal Slack, what if we just used the UUID for both? (This is for a custom content entity, obviously.)
This seems to work in theory, however DefaultTableMapping.php assumes these entity keys are unique field definitions, so if they refer to the same field, you get a deeply merged combination of duplicate values, which screws up the initialization of the entity tables.
Steps to reproduce
N/A
Proposed resolution
in DefaultTableMapping make sure $key_fields contains unique values.
Remaining tasks
Review
User interface changes
N/A
API changes
N/A
Data model changes
N/A
Release notes snippet
Content entities may now declare UUIDs as a content entity's ID (though entity key mappings), removing the prior implicit requirement by core that all entities have separate UUIDs and IDs.
| Comment | File | Size | Author |
|---|---|---|---|
| #32 | 3310170-nr-bot.txt | 2.18 KB | needs-review-queue-bot |
Issue fork drupal-3310170
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3310170-trait-method-typehint
changes, plain diff MR !10157
- 3310170-cannot-use-uuid
changes, plain diff MR !2766
Comments
Comment #3
bradjones1Let's see if
array_unique()breaks anything.Comment #4
bbralaTbh, it feels like that might work.
Comment #5
bbralaSo, if we have a failing test only using a uuid as id and then this + test green, I'd be happy to review and probably this.
Comment #6
bradjones1Here is the post from Percona regarding performance; honestly I'm not quite sure what this means, if anything, for this issue because it's not like we're saying you must or even should do this; in cases of relatively few entries or developer preference or... whatever, the performance question might not matter. I think this is only really about Drupal being able to support UUID-only identified entities. Whether to do so is up to you.
https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-pe...
Comment #7
wim leersHuh, I didn't realize this!
Very closely related: #2353611: Make it possible to link to an entity by UUID.
Comment #8
bbralaYour comment is a bit ambiguous, what didn't you realize? This issue or the performance 🤔
Comment #9
tstoecklerThanks for opening this!
Hit this as well before, but never got around to opening an issue. The fix is not quite complete, however: For translatable entities we explicitly remove the UUID field from the data table (because the UUID field comes with a unique key but the data table would have multiple identical entries (namely one for each language for a given entity)). If the UUID and ID keys are the same, however this removal does not work as then we also don't have an ID in the data table.
In case this is useful to anyone: We use the following storage class with our custom entities to handle this issue:
Also note that due to the unique key issue mentioned above, you cannot actually use a field of type
uuidfor the ID/UUID field, but you need to use a string field. We use the following snippet in our entity base class'sbaseFieldDefinitions(), which comes out effectively the same as auuidfield but for the unique key (and that we don't set it as read only because we want to use a widget for it, but that could/should be done depending on the use-case):And lastly linking some related issues where there has been an effort to avoid (some of) the performance hit of using UUIDs at least on PostgreSQL by using native UUID database columns.
Comment #10
thursday_bw commentedI just recently discovered this same issue when writing a unit test that relied on a contrib entity that was also patched to add the UUID key to work with JSON:API.
I dug in and applied the same solution as in this patch.
@tstockler Thanks for all those details about translatable entitie's and gotchas.
We are using postgres too, so that perfmormance information is relevant.
Comment #11
thursday_bw commentedThe following issue on on the apigee_edge contrib module requires the fix for UUIDs as entity ids too. Apigee edge issue: https://github.com/apigee/apigee-edge-drupal/issues/727
Comment #12
bradjones1Thanks for the notes regarding PgSQL as well as translations... I'm not sure I entirely follow the issue but I'll need to dig into the code more to understand.
Comment #13
tstoecklerSorry for being a bit brief @bradjones1. I'll take another stab at explaining this:
I'll leave both revisions and fields with dedicated tables (i.e. configurable fields) completely out of this for now and only consider a non-revisionable entity type with a bunch of base fields.
In the non-translatable case we just put all of the fields in the base table and call it a day. As you found there is a bug if the ID key and the UUID key are the same, but let's assume that had been fixed, i.e. the MR above had been merged.
If we make the entity type translatable the table layout changes a fair bit. We still have the base table to generate the serial ID for us and we have a few (other) key fields in the base table as well, but all of the fields are also in the data table which has ID and langcode as the primary key as it has a row for each entity translation. So given that a very simple entity table (where the ID and UUID fields are separate, however) would look like this (spoiler alert: it doesn't):
The UUID field, however, comes with a unique key in its schema definition, which makes sense for a UUID field ;-). Adding the field to the data table, however, would then also add that unique key to the data table. The example above is already proof that that would not be a great idea. That* is why the UUID field is in fact removed from the list of fields in the data table and it remains solely in the base table.
And the reason I explained this in such a convoluted way is because that is in fact how the code in
DefaultTableMapping::create()works:(This is the non-revisionable case, the revisionable case is different but similar enough.)
So with that in mind when you take a look at what then happens if the ID and UUID field are one and the same, you can see that there's a problem. Let's say the ID/UUID field is called
uuid, then:$all_fieldswill be['uuid', 'langcode', 'label', 'uid', 'changed']$uuid_keywill be'uuid'Therefore the end result of fields that will be added to the data table will be:
['langcode', 'label', 'uid', 'changed']notably without the ID/UUID field which is a problem.* At least I think that is why. I would be a bit surprised, but not really shocked to learn that that is not actually the reason. I have been wrong before with things about the entity system that I was more sure about than this...
Comment #15
tstoecklerComment #16
tstoecklerAfter talking to @rnsrk abou this in Burgas, spent some more time on this.
Fixed the issue mentioned in #9. Also added some inline comments about the changes and finally added some test coverage. I pushed those to individual commits in case there are any objections to any of those.
Note that I thought a bit about how to avoid all this array madness (i.e.
array_values(array_unique(array_filter(...)))and friends) and I think the solution would be to makeDefaultTableMappingprovide anaddFields()method, which would internally check for duplicate fields. Will open a follow-up about that, so we can avoid that discussion here (even though it does make the situation (inherently) slightly worse).Not exactly why this was tagged "Needs change record" or what the change record should be. I mean we could literally write a change notice with "Content entities may now use the same field for the ID and UUID keys" if people think that's useful. (I don't, in particular, but it also doesn't hurt, so I wouldn't mind it.)
Comment #17
tstoecklerNot sure why the testbot doesn't run for the updated commits, but I opened #3458145: Replace DefaultTableMapping::setFieldNames() with ::add*() and ::remove*() in the meantime.
Comment #18
tstoeckler...ahh maybe because the MR is targeting 10.1.x. @bradjones1 do you have access to change the target branch? Or should we create a new MR? Not sure...
Comment #19
bradjones1MR target changed.
Comment #20
bradjones1Needed a rebase and now GitLab CI is running.
Comment #21
bradjones1Added a basic CR. Tests green now.
Comment #22
tstoecklerAwesome, thanks. You dropped my commits from the MR, though, I'm guessing that was accidental? 🤔
Comment #23
bradjones1That is so strange, I think something weird happened with changing the merge target. I thought it was strange I didn't need to do a rebase --onto... and clearly that would be necessary. I've fixed this. Thanks for catching it.
Comment #24
bradjones1Comment #25
tstoecklerGreat, thanks! 👍
Comment #26
joachim commentedDocs will need changing.
For example, on EntityTypeInterface::getKeys()
Comment #27
tstoecklerThat documentation is actually already incorrect prior to this issue, so opened #3467540: Update EntityTypeInterface::getKeys() docs for string IDs about that. We are not changing anything about being able to use strings in the first place, this is just about using UUIDs, in particular, and to be more precise to be able to use the same field as UUID key and ID key.
Comment #28
wim leersComment #29
smustgrave commentedThere a recommended way for testing this one?
Comment #30
bradjones1Beyond the test that's in the MR? AFAIK this isn't test-able in the UI since you can't create a custom entity type in core via the UI.
Comment #31
smustgrave commentedIn that case I ran the test-only feature to show the test coverage.
Cleaned up the issue summary some, copied the CR to the release notes as seems like something that probably should be well announced.
Code review wise everything seems to have proper typehints and return hints. The comments added to DefaultTableMapping about why to get unique is a nice touch.
Going to go out on a limb and say this one is good.
Comment #32
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #33
tstoecklerFixed that, thanks!
Comment #37
longwaveA simple solution to the problem, I don't foresee any side effects but also a little bit wary of touching DefaultTableMapping; as this also has a change record I only committed this to 11.1.x and 10.4.x and higher so it will be available in the next minor releases.
Comment #40
godotislateLooks like after these changes on 11.x, PHPStan is failing the build: https://git.drupalcode.org/project/drupal/-/jobs/3330912.
Comment #43
nicxvan commentedI regenerated baseline.
Comment #46
longwaveThanks for the followup, not sure what happened there as PHPStan didn't complain locally on commit.
Committed and pushed b5c5a2a3ef8 to 11.x and 07eea76d204 to 11.1.x. Thanks!
Comment #48
wim leersSee y'all in #2353611: Make it possible to link to an entity by UUID next? 🤓
Comment #49
berdir@wim: The point of this issue is that the UUID can be the ID, at the same time. That means these entities do not need to link by UUID because it's literally the same thing as linking by ID.
This got it in just before #3396166: Convert entity type discovery to PHP attributes, so the new EntityTestUuidId entity type is still using annotation instead of attributes, should we open a follow-up issue or do a quick fix here?
Comment #50
longwave@berdir at some point we will deprecate annotations so can probably mop it up then?
Comment #52
c-logemannI just successfully created an "uuid only entity" with just using "entity_keys = { "id" = "uuid", ...". That's awesome and many thanks to everybody who made this possible. I will use this from now on many upcoming contrib and custom entities.
But when there is already content and other code handling with the serial ID? On some other uuid discussions I read many comments of people loving the old serial user and node IDs. For users I created the module "u3id" to handle a mixed usage. So I see a need of mixed usage and transformation of existing content etc. How to start in contrib/custom and how at core entities?
Comment #53
ressaThis is great! But how do you use it?
As I understand this change, content entities ("nodes") may now use UUID's instead of integers, so I was hoping to see a new option, like "Use UUID's in stead of classic node ID's" under
/admin/structure/types/add, when creating a new content type ...Do you need to create a custom module to use this new feature, and if yes, is there a documentation page?
Comment #54
berdirThis is an option for new entity types, it is not something that can be enabled for existing entity types such as nodes. It's a storage/base field decision, not something that can be configured. Existing core entity types such as nodes do not and will never* use this.
It's an interesting option for some use cases, but has its drawbacks, database indexes for v4 UUIDs perform much worse than integers for example.
* never say never, but I don't see how that's possible to migrate, all kinds of things reference nodes by ID and there are still things that do not support string entity IDs in core, such as comments.
Comment #55
ressaThanks for clarifying @berdir. I wonder if a regular old school nid-based node could reference content with a UUID-based entity type, or perhaps as @c-logemann mentions, he created "... the module "u3id" to handle a mixed usage.", and this is needed?
If the basics of setting up a basic entity type using UUID as identifier hasn't already been created, it would be fantastic if a basic documentation page was made, maybe also covering the subject of mixing nid's and UUID's?
Comment #56
berdirI think you still mix up entity types, content types and bundles a bit. Which is understandable, because the terminology is confusing.
This allows to set both the id and the uuid to the same field for an entity type. That's all. In practice, all the change really does is fix a problem that the storage got confused when the same field ended up multiple times in automatically created indexes in the table.
See the committed example test entity type: https://git.drupalcode.org/project/drupal/-/commit/9b1a9796b42d6bd581b5c..., I also mentioned it in the change record a bit.
There is no mixing of nid and uuid, because a custom entity type doesn't have a nid (because only nodes do) or any other integer id, both id() and uuid() return the UUID. They can be referenced by and reference other entity types because entity reference fields understand when the target entity type has a string ID.
Comment #57
ressaAh yes, I do have a hard time differentiating between them, thank you for your patience :)
Maybe it would have been easier if I had shared my vision: I was hoping to be able to create content, like we can with nodes now, but on top of NID's use UUID's as well (or solely UUID's) specifically to be able to manage most content as configuration, allowing some (or all) content to use UUID's as identifier. The point would be to not use the standard auto generated incrementing integer as unique identifier, as we do now with NID's. But maybe that's not possible ...
Comment #58
berdirAll nodes also have a UUID, always had since Drupal 8.0. You can rely on that.
The default_content project (including recipe default content in core which is the same format and implementation) uses the UUID as primary identifier for dependency resolution. Both REST and JSON:API modules use the UUID of entities the primary external identifier.
In both cases because the UUID is reliably unique and can be explicitly set, while it is not possible to rely on using a specific serial ID as that may already be in use in a given system.
Comment #59
ressaThat's awesome! I didn't realize that, thanks for making me aware, and of modules using UUID as identifier, I'll take a closer look at them. I am very grateful for your detailed answers, it helped me a lot. ... and now, I won't be adding more noise to this issue :)