Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 UTC on 18 March 2024, to get $100 off your ticket.
Problem
- The entity system supports arbitrary entity IDs, but Field API requires/assumes integer IDs.
- Thus, Field API breaks an Entity API contract.
Goal
- Allow support for remote entities and globally using UUIDs as entity IDs.
Proposed solution
- Change field_sql_storage module's
'entity_id'
column toVARCHAR(255)
.
Comment | File | Size | Author |
---|---|---|---|
#61 | entity-id-schema-1823494-61-interdiff.txt | 1.94 KB | Berdir |
#61 | entity-id-schema-1823494-61.patch | 14.84 KB | Berdir |
#59 | entity-id-schema-1823494-59-interdiff.txt | 699 bytes | Berdir |
#59 | entity-id-schema-1823494-59.patch | 14.73 KB | Berdir |
#57 | entity-id-schema-1823494-57.patch | 14.73 KB | Berdir |
Comments
Comment #1
sunLet's see what breaks.
Comment #2
yched CreditAttribution: yched commentedI'd be in favor of this, if we can prove that performance is not affected.
Comment #3
yched CreditAttribution: yched commentedPatch uses varchar(128) instead of the varchar(255) mentioned in the OP, though :-)
Comment #3.0
yched CreditAttribution: yched commentedUpdated issue summary.
Comment #4
sunyeah... I realized that we constrain UUIDs to 128 characters, too, and the only reason for why we're using VARCHAR for UUIDs is that we want to support "arbitrary strings as UUIDs" right now. (although that might change: #1805576: Add a 'uuid' database schema type)
Thus, I thought what's good enough for UUIDs must be good enough for entity IDs. Updated the summary accordingly.
Comment #5
sunApparently, the patch in #1 passed all tests. :)
So what's left? Upgrade path + Profit? :)
Comment #6
yched CreditAttribution: yched commentedBenchmark ? :-)
Comment #7
fagoIf we already change it to varchar, why not to varchar(255)? Another nice use-case would be to directly use URIs as IDs, as I implemented here: http://drupal.org/sandbox/fago/1493180
Comment #8
sunI don't know how to benchmark this. Field data is cached anyway, so the consequences of this change are barely noticeable.
Comment #9
chx CreditAttribution: chx commentedhttp://drupal.org/node/555536 you must cast to string when querying if you want to this.
Comment #10
sunNot sure I understand the data type problem. I think we're querying string columns with integer values elsewhere in core already (e.g., text format IDs).
If I get it right, then all you want is to add a type-cast to
(string)
for the entity ID in field_sql_storage, so all queries on field data tables use a string and not an integer for the entity ID?(I can't see why this would have to be a CAST() in SQL.)
That said, all entity IDs that have been queried from entity base tables should be strings already, since you get a
string '123'
instead ofint 123
as query result.Comment #12
chx CreditAttribution: chx commented> I think we're querying string columns with integer values elsewhere in core already (e.g., text format IDs).
It might need a review of what's happening there -- although if they are text solely then why would there be numbers?
> If I get it right, then all you want is to add a type-cast to (string) for the entity ID in field_sql_storage, so all queries on field data tables use a string and not an integer for the entity ID?
That's correct.
Comment #13
sunThe upgrade path from D6 to D7 converts existing text format serial IDs simply into strings. This means that upgraded text formats have IDs of "1", "2", "3", and so on, and only newly created formats on D7 have 'real' machine names as IDs (though there's technically nothing wrong with an all-numeric machine name). To my knowledge, Filter module does not type-cast the format IDs in any way before querying the database, and I'm not aware of a critical upgrade path bug or database engine incompatibility with those.
In turn, I still don't really understand the request for type-casting the entity IDs here. However, we can add the necessary tweaks to field_sql_storage module, if that is what it takes to get this issue done. ;)
Comment #14
pounardI'm really worried about changing the entity id to a varchar, the field API already is really slow, it will make it even slower, especially for INSERT/UPDATE, at least while using MySQL.
Keeping an internal entity id as a int32 integer for database stored entities sound wiser than making the API more flexible. We can still keep an index table somewhere else to store a potential varchar id if necessary, and the UUID of course.
Making the SQL field storage a bit more complex but allowing it to keep transparently entity id as numeric values would be a better choice, harder but better. Drupal is not using the database right, switching yet another primary field to varchar in those table will be hell. In fact, those tables should have only numeric index/primary keys because the way it exists actually is already slow with a few thousands of entities while it could be raging fast with millions of them using only integer indexes.
EDIT: Here is actually a proof of concept that it is not *that* complex to achieve (except for EFQ): https://drupal.org/sandbox/pounard/1760694
Another worrying thing is not only the performances, but both the index memory consumption and the on-disk table size.
Comment #15
Danny_Joris CreditAttribution: Danny_Joris commentedI work on a project called Islandora. Islandora is a set of drupal modules which provides a management and display environment for Fedora Commons. We've been pushing hard for tighter Drupal integration and serving Fedora Commons objects as entities is our next goal. One of our biggest hurdles currently is that the Fedora objects we're serving all have id's that are structured like this: "namespace:1234".
That's just our use case. I'm concerned about performance too. I'm not sure how much this would affect that.
Comment #15.0
Danny_Joris CreditAttribution: Danny_Joris commentedUpdated issue summary.
Comment #16
sunJust a plain re-roll against HEAD for now...
Comment #17
moshe weitzman CreditAttribution: moshe weitzman commentedIMO, allowing for non-integer IDs is an architectural improvement.
Comment #19
Jānis Bebrītis CreditAttribution: Jānis Bebrītis commentedI agree with this, we have limited time left to push this major change into core to get drupal8 UUID rolling.
recreated patch for 8.x branch
Comment #20
pounardThis will deteriorate performance with the database and fields, this is wrong to do this without refactoring the whole schema first. Field API is a bottleneck and will be even more.
Comment #21
PanchoMarked #1932612: PDOException: SQLSTATE[22P02]: Invalid text representation: 7 ERROR: invalid input syntax for integer (which tries to go the other route and contains a working D7 patch) a duplicate of this.
We'd need to settle which way to go and then enforce it through all subsystems.
Comment #22
PanchoLet the testbot do its job on #19.
Comment #24
chx CreditAttribution: chx commentedWe should look at the field and the instance and the entity controller of the instance and add a method to the entity storage controller which returns a type, for core that's sql or config and based on that just decide. Oh and throw an exception if you try to instantiate a field on an entity type where the backend type is not the same as an existing one. Most of this (especially the exception code) is in the entitystorage branch of my sandbox already http://drupal.org/node/1857558 just needs to be separated out.
Comment #24.0
chx CreditAttribution: chx commentedUpdated issue summary.
Comment #25
steinmb CreditAttribution: steinmb commentedAnother discussion about integer vs. the world goes on in #1003788: PostgreSQL: PDOException:Invalid text representation when attempting to load an entity with a string or non-scalar ID
Comment #26
yched CreditAttribution: yched commentedNow that field storage is per entity type, I guess it would be relatively easy to generate the table schema for the entity_id column as either int or varchar based on some entity type metadata - but that info would need to be available somewhere...
@fago, @Berdir: maybe this could be done by looking at the definition of the 'id' field ?
Comment #27
BerdirSomething like this maybe?
Comment #28
amateescu CreditAttribution: amateescu commentedShouldn't we check on IntegerInterface instead?
Are we sure we want to support primary ids and revision ids of different types?
Comment #29
BerdirOnly an actual field item value property instance would be of type IntegerInterface, not the definition...
I'm fine with only supporting the same type for id and revision...
Comment #30
sunWow, that looks surprisingly simple :)
I'd generally agree with @amateescu that the revision_id data type should follow the entity_id data type, but OTOH, that assumption is not even true if you look at 'nid' and 'vid'.
If you consider that you'd replace the entity ID key definition for Nodes to point to the UUID instead of the nid, then the vid may or may not remain to be a custom/serial ID. In case the vid remains as-is, $node->id() would be a string, and $node->revisionId() would still be an integer.
Please disregard the concrete example, I hope you get the point.
Consequently, separately checking for the definitions does appear to make sense.
What else is required for this patch? :)
Comment #32
amateescu CreditAttribution: amateescu commentedRight, my bad :/
Regarding supporting different types for entity_id and revision_id, I guess the use case from #30 is somewhat valid so we shouldn't disallow it :)
I'd like to see @yched's opinion here..
Comment #33
BerdirHuh, interesting fails. contact doesn't have an entity id...
Any interesting ideas? :)
We could easily add the same check as for the revision id, but in a way, that is reallly weird :) Should we check that before and don't even create those tables? We could have a null storage now, that just skip this...
Comment #34
yched CreditAttribution: yched commentedThe code looks good to me.
"contact doesn't have an entity id"
Right... Generally speaking, that sounds like a very weird thing that would break a lot of assumptions in many places...
As to our issue here, isn't it wrong for contact entities to use FieldableDatabaseStorageController as a storage controller if they don't intend to be stored in the first place ?
Comment #35
BerdirYes it is, and it's probably going to break horribly if you try to save it :) Agree that the cleanest solution is to have a FieldableNullStorageController or something like that, that doesn't do anything.
Comment #36
fago>FieldableNullStorageController
Agreed, but how is the null storage fieldable? I'd assume it's just a NullStorageController ?
>Wow, that looks surprisingly simple :)
Indeed!
That's not very solid, but getting more metadata is quite cumbersome right now, we'd have to instantiate empty field item objects, derive the main property name + check the property -- uh.
It becomes a bit easier with #2002134: Move TypedData metadata introspection from data objects to definition objects so we can save the empty object. Anyway, I think it's totally fine to get started like that and improving it with a better check later on.
Comment #37
sunContact Message entities are not saved to the database (by core; contrib might add a storage), their entity forms are submitted only. One major design goal was to allow Message entity bundles ("contact categories") to be fieldable, so you can have multiple contact forms with different fields.
On the assumption that FieldableFooStorageController implies built-in Field API integration (as opposed to FooStorageController), the suggested FieldableNullStorageController appears to make sense?
Comment #38
yched CreditAttribution: yched commentedyes, i might be overlooking smthg, but not sure why FieldableNull instead of Null. When it comes to storage, "not stored" is "not stored", whether runtime entities have fields or not ?
The ViewBuilder and FormControllers do need to be aware of fields, but storage ?
(validation needs to run on form submits too, but that's taken care of by the from controller, right ?)
Comment #39
BerdirLet's try this. The Null implementation actually needs a create() implementation, but we can probably move that into the base class, I don't want to touch that just yet, though.
Comment #40
yched CreditAttribution: yched commentedThat's 40 lines strictly copy/pasted from FieldableDatabaseStorageController, it's a bit sad, plus could easily get out of sync.
Currently there's no existing class higher up where we could share this code.
I'm a little lost in the various issues that reshuffle our storage class hierarchy, but does one of them introduce such a base class ?
ContentEntityStorageController or something ?
Comment #41
yched CreditAttribution: yched commentedEr, actually, why can't this create() implementation be moved up to FieldableEntityStorageControllerBase ?
Comment #42
BerdirFrom #39:
Yes it can, but I just created the Null implementation by copying the database implementation and dropping everything that was not necessary, didn't want to bother with changing the base/parent class yet. Will do so on the next update.
Comment #43
fagohm, true. I was confused by fieldable vs extensible here, but I see that the NULL-storage should act as extensible null storage - thus, yeah it makes sense.
Comment #44
yched CreditAttribution: yched commented@Berdir: doh, sorry, I looked at patch #39 but skipped the attached comment. Self wrist-slap.
No biggie, but I still don't fully understand why we we need a "null but with fileds" flavor of "null". No storage means no storage, whatever is in the thing to "not store" ?
Is it because storage controllers currently have the create() method, which needs to do stuff on fields ? If so, isn't there an existing issue that tries to refactor that, and if so should we include a @todo on that issue to turn FieldableNull into Null ?
Comment #45
Berdircreate() is one reason yes, but not just that. It's not temporary, Null will continue to be FieldableNull.
contact_message is a fieldable/extensible entity so it has to have a fieldable/extensible storage, even if it's a null implementation.
Both conceptually and due to technical necessity. Field::preSaveNew() calls $entity_manager->getStorageController($this->entity_type)->onFieldCreate($this);, the same goes for all those other methods, bundles are created and so on. That explodes if the storage controller for contact_message doesn't provide the corresponding methods.
Comment #46
chx CreditAttribution: chx commentedThis is a problem; this is not valid. I would very strongly recommend throwing an exception that querying these are not valid otherwise you need to write a null query object implementing the querry interface, doing nothing and register it as a service and I doubt we want that.
Comment #47
yched CreditAttribution: yched commented@berdir: of course, makes sense. Thanks.
Comment #48
BerdirMade the null storage controller nuller and added the exception.
This should be ready apart from tests. Wondering if it's enough to just have some unit tests for the schema generation code and if it's possible.
Comment #49
yched CreditAttribution: yched commentedYay. Then it can be removed from the FieldableDatabaseStorageController subclass ? :-)
Tests: per se, I guess unit tests should be enough here.
Then again, all our current test coverage for FieldableDatabaseStorageController is inDrupal\system\Tests\Entity\FieldSqlStorageTest, which is a DUTB
Comment #50
BerdirRight, but we don't have an entity with string identifier, so we'd have to define that. I think that's easier to provide in a unit test.
The attached patch does that, just adds test coverage for this specific change, but I think it would be nice to convert more of our test coverage to unit tests later.
Comment #51
dawehnerLet's not lie here.
We could use String::format here.
We use @inheritdoc on getInfo as well now.
Let's add @group Drupal here
this function name feels wrong.
Should we also test the other case of using a string?
Comment #52
BerdirRemoved the create() method in the fieldable database storage controller.
@dawehner: Thanks for the review!
Fixed 1-5
6: We have a lot of test coverage in DUBT and functional tests, I'd prefer to limit unit test coverage here to that specific use case. Otherwise we'll never stop.
Anything left to do here? :)
Comment #53
yched CreditAttribution: yched commentedOther than the extraneous empty line at the beginning of FieldableDatabaseStorageControllerTest::testFieldSqlSchemaForEntityWithStringIdentifier() ;-), this looks ready if green.
Comment #54
xjm52: entity-id-schema-1823494-52.patch queued for re-testing.
Comment #55
Berdir52: entity-id-schema-1823494-52.patch queued for re-testing.
Comment #57
BerdirRe-roll.
Comment #59
BerdirAnother try.
Comment #61
BerdirThis fixes the unit test and the incorrect revision_id check.
Comment #62
yched CreditAttribution: yched commentedBack to RTBC
Comment #63
alexpottCommitted 2a839b7 and pushed to 8.x. Thanks!
Minor fix during commit :)
Comment #64
gaele CreditAttribution: gaele commentedAccording to #21
Comment #65
David_Rothstein CreditAttribution: David_Rothstein commentedFor Drupal 7, is there anything that can be done here that isn't handled by the backport of #1003788: PostgreSQL: PDOException:Invalid text representation when attempting to load an entity with a string or non-scalar ID instead?
Comment #66
BerdirNo, those two issues are not really related. That one is about not having a database exception on PostgreSQL, when you e.g. go to node/bla.
This is about actually having entity types that do not have integer as their ID (but e.g. a UUID) and making them fieldable. Which is currently not possible, because field storage tables are always created with entity_id as integer.
Comment #67
David_Rothstein CreditAttribution: David_Rothstein commentedRight, but is it possible to safely backport that? (The suggested "backport" for this issue in #21 is basically a version of #1003788: PostgreSQL: PDOException:Invalid text representation when attempting to load an entity with a string or non-scalar ID instead.)
Comment #68
BerdirThat doesn't seem like a fix for this issue, just enforcing that IDs must be numeric, in which case this would no longer be a problem in the first place :)
An actual fix would need to be something like my patches in e.g. #27. Except we don't have that field definitions to know if the id /revision id fields are integers or not. I guess we could work around that by introducing a new flag in hook_entity_info(), something like 'entity_key_type' => array('id => 'string/integer').
I guess that would be safe to backport, because nothing would change unless you have an entity type that explicitly specifies that? If someone actually wants to work on it seems another question, I am not interested, really :)
That said, such a flag might also be a way for a 7.x fix for #1003788: PostgreSQL: PDOException:Invalid text representation when attempting to load an entity with a string or non-scalar ID, although *that* would break existing types that are strings :(
Comment #71
steinmb CreditAttribution: steinmb as a volunteer commentedRef. older issue
Comment #74
plachA D7 issue was opened to take care of the port (#3212398: [D7] Field API assumes serial/integer entity IDs, but the entity system does not ), I think we can close this.