Problem/Motivation

The dynamically-generated schemas for entity and field storage are currently locked behind protected methods, with no easy way to get at them. One reason folks may need to access those schemas is illustrated over in #2529554: Upgrade path for 1923406 (in the Head 2 Head module). The current workaround is to hack core to make them public. If these remain protected, anybody needing to do similar updates in the future will need to do similar acrobatics to work around this.

Proposed resolution

Make the following public:

  • SqlContentEntityStorage::getEntitySchema()
  • SqlContentEntityStorage::getDedicatedTableSchema()

Remaining tasks

User interface changes

API changes

Yes, but there are no interface changes.

Data model changes

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhedstrom’s picture

Issue summary: View changes
Status: Active » Needs review
FileSize
4.87 KB

Status: Needs review » Needs work

The last submitted patch, 1: entity-schema-visibility-2533282-01.patch, failed testing.

jhedstrom’s picture

Status: Needs work » Needs review
FileSize
801 bytes
5.65 KB

This should address the failing tests.

Berdir’s picture

Assigned: Unassigned » plach

Making them public for update functions makes sense I think.

Assigning to @plach for his opinion.

Do we need some docblock updates to clarify when they should be used and when not?

update functions are DB specific anyway, there's nothing we can do about that. However, we should probably make sure that we don't fail hard if the used storage class isn't the one we expect. Then e.g. MongoDB can provide it's own update functions.

plach’s picture

Assigned: plach » Unassigned

Not really a fan of this: the idea is that the storage handler is the only one manipulating the schema. Making the handler return it would allow anyone to do thing we do not want to happen, like manually manipulating the schema.

Can you foresee any legal usage aside from HEAD 2 HEAD?

Berdir’s picture

Well, it's not just head2head but also possibly every other update function that is going to be written.

That said, thinking about it, this might actually be a very bad idea.

It's the classical flaw in update functions.. you must not rely on the current state of Drupal since you might be an update function from beta 7 to 8 but the code/schema you are changing might have changed again in beta 12 to 12 and the site is doing a direct update to beta 12.

Given that, we should possibly do the same as we've always done in update functions.. hardcode the target schema changes.

And yes, that is a big conceptual problem with entity updates.. those don't know versions or dependencies or anything. They just update from old-state to new-state.

plach’s picture

Well, it's not just head2head but also possibly every other update function that is going to be written.

Honestly I was hoping the Entity update system + Migrate would be enough, but I guess that's because I'm not living in the real world yet :)

Given that, we should possibly do the same as we've always done in update functions.. hardcode the target schema changes.

Yep, I'd say this is the correct approach exactly for the reason you stated.

And yes, that is a big conceptual problem with entity updates.. those don't know versions or dependencies or anything. They just update from old-state to new-state.

Initially we were talking about introducing versioning to definitions added to state, but that was deemed unnecessarily complex. Maybe we should think about that in the future...

Berdir’s picture

Yeah.. You might be interested #2532356: Provide a wrapper to core entity.definition_update_manager to temporarily disable updates as well, which is the way head2head is trying to work around the problem that nothing can run before automated entity updates.

As I wrote there, I'm wondering if we could improve it a lot simply by doing the automated entity updates after manually defined update functions. Then they can do whatever they want to fix the things that automated updates can't do (yet) like renaming fields, changing schema and so on.

plach’s picture

As I wrote there, I'm wondering if we could improve it a lot simply by doing the automated entity updates after manually defined update functions. Then they can do whatever they want to fix the things that automated updates can't do (yet) like renaming fields, changing schema and so on.

That's tricky, also the current order has legit use cases: for instance, you want to pre-fill a base field column after creating it. I'm wondering whether there's any way to integrate with the update dependency system so that updates can choose whether running before or after entity updates.

plach’s picture

jhedstrom’s picture

Status: Postponed » Needs review

#2535082: Allow hook_update_N() implementations to run before the automated entity updates is in now, so un-postponing for further discussion.

Being able to access schema definitions would have drastically simplified both #2507201: Upgrade path for MySQL switch to multibyte UTF8 (which is still a work in progress) and #2529554: Upgrade path for 1923406 (which is committed, but since it is using hand-picked columns to focus on, I don't have confidence that the fix is complete).

Berdir’s picture

It would have, but the problem is that those functions will return the schema for the final state.

Let's assume that Node adds another base field, or renames one or a change like that (Node won't do that, but contrib definitely will).

getSchema() will then already contain that field but the table you're attempting to update might not yet. This is the same problem as with hook_schema(), an update function must never rely on that.

I think my idea in #2535082: Allow hook_update_N() implementations to run before the automated entity updates about having callbacks that are invoked if an automated schema change is not possible would allow for a more reliable implementation. You'd then simply get the old field schema/definition and the new one and can implement the code to go from old to new. head2head shows that maybe this needs to be a global event and not just a callback on entities/field types, but still I think that could work.

jhedstrom’s picture

@berdir in the callback idea, does would this account for schema changes to tables/fields that have data? Or would the callbacks be generic, then fire an event that modules could listen for?

I've changed approach for #2535082: Allow hook_update_N() implementations to run before the automated entity updates over in #2540990: Need to update stored schema data since without that, core thinks the old schemas are still installed, and repeatedly tries to run failing schema updates (they fail due to existing data). The approach there could be easier :) Given that approach, I'm not sure how useful this change would be.

jhedstrom’s picture

Status: Needs review » Closed (duplicate)