Updated: Comment #169

This is being split into separate issues to make changes easier to review.

Problem/Motivation

In #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities we introduced the capability for the entity storage to automatically generate the SQL schema for any entity type. This is currently limited to static generation at install time and concerns only the tables described in the official entity storage SQL table layout standard. That is the ones storing values for base fields. At the moment dynamically handling changes in the entity type and field definitions, and reflecting them in the SQL table layout, is not supported.

This is a critical missing piece towards achieving a whole set of improvements:

  • Modules are no longer required to inject random data into the $entity object during the load phase, since all the data retrieval is handled by the storage itself. This will motivate module authors to define their data through field definitions, instead of using random properties, with all the related benefits, such as automatic Rules integration, typed-data validation and so on.
  • Any entity type leveraging the core entity storage API, natively supports revision and translation without requiring module authors any additional effort.
  • Translation and revision support can be independently enabled/disabled by just altering the entity type definition. This makes it easy for site developers to adopt the table layout that is the most performant with respect to their business requirements.
  • Additional base fields can easily be added to the native entity table layout, making it straightforward to implement highly-efficient denormalization strategies.
  • The API is designed without having a particular storage backend in mind. This means module authors can leverage a storage-agnostic API without any additional effort required.

Proposed resolution

Part 1: unified API

  • Extend the content entity schema handler implemented in #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities to be able to handle the addition/removal/update of field storage definitions, transparently handling differences between single-valued base fields, which are stored in the entity tables, and bundle fields or multiple-value base fields, which instead are stored in dedicated tables.
  • Make the content entity schema handler able to handle changes in the entity type definition for the revisionable and translatable properties.
  • Move the Field SQL Storage (private) API to the (default) table mapping implementation, which is the SQL specific part of the core entity storage API, responsible for representing how fields are mapped to tables.

Part 2: UI and data handling

  • Implement a new EntitySchemaManager service responsible for spotting differences between the installed schema and the one generated from the current versions of the entity type and field storage definitions. Probably leveraging state to track changes.
  • Based on the current Field Storage API limitations, implement a set of validation rules so that only changes that do not imply a data migration are applied to the schema when data exists, that is:
    • adding a field table
    • adding a custom field column to a base/data/revision table
    • switching from revisionable to non-revisionable
    • switching from translatable to non-translatable
  • Any change not (yet) reflected in the installed schema is reported as an error in the status report, which offers a link to the update.php script to fix the database schema. The update.php script calls the entity schema manager to trigger any allowed schema change.

See #48 for a detailed description of the implementation plan: it's a bit outdated but many details are still valid.

Remaining tasks

Follow ups

User interface changes

  • Entries added to the status report page.
  • Minor tweaks to the update.php UI.



API changes

  • All the Entity Field CRUD API has moved to the entity schema handler.
  • All the Field SQL storage (private) API has been moved to the default table mapping.
CommentFileSizeAuthor
#169 et-entity_schema_handling-1498720-169-review-do-not-test.patch232.57 KBeffulgentsia
#164 et-entity_schema_handling-1498720-164.patch319.9 KBplach
#164 et-entity_schema_handling-1498720-164.interdiff.txt9.76 KBplach
#149 et-entity_schema_handling-1498720-149.patch256.46 KBplach
#149 et-entity_schema_handling-1498720-149.interdiff.txt3.63 KBplach
#149 et-entity_schema_handling-1498720-149.interdiff-142_145.txt16.9 KBplach
#145 Drupal_module_updates___Drupal_8_x_-_DEV.png188.43 KBplach
#145 Drupal_entity_schema_updates___Drupal_8_x_-_DEV.png224.87 KBplach
#145 Status_report___Drupal_8_x_-_DEV.png90.94 KBplach
#145 et-entity_schema_handling-1498720-145-part2.txt46.29 KBplach
#145 et-entity_schema_handling-1498720-145-part1.txt255.37 KBplach
#142 d8_storage.patch248.01 KBfago
#142 d8_storage.interdiff.txt10.23 KBfago
#113 et-entity_schema_handling-1498720-113.patch248.26 KBplach
#107 1498720.107.patch141.33 KBalexpott
#107 103-107-interdiff.txt4.08 KBalexpott
#103 1498720.103.patch139.84 KBalexpott
#103 101-103-interdiff.txt13.82 KBalexpott
#101 et-entity_schema_handling-1498720-101.patch136.81 KBplach
#100 et-entity_schema_handling-1498720-100.patch197.83 KBplach
#100 et-entity_schema_handling-1498720-100.interdiff.txt884 bytesplach
#98 et-entity_schema_handling-1498720-98.patch197.6 KBplach
#98 et-entity_schema_handling-1498720-98.interdiff.txt1.48 KBplach
#96 et-entity_schema_handling-1498720-96.interdiff.txt19.84 KBplach
#96 et-entity_schema_handling-1498720-96.patch196.93 KBplach
#96 et-entity_schema_handling-1498720-96-review.do_not_test.patch134.13 KBplach
#94 et-entity_schema_handling-1498720-94.patch186.04 KBplach
#94 et-entity_schema_handling-1498720-94.interdiff.txt3.1 KBplach
#92 et-entity_schema_handling-1498720-92.patch186.33 KBplach
#92 et-entity_schema_handling-1498720-92.interdiff.txt16.21 KBplach
#90 et-entity_schema_handling-1498720-90.interdiff.txt3.91 KBplach
#90 et-entity_schema_handling-1498720-90.patch179.24 KBplach
#88 et-entity_schema_handling-1498720-88.patch178.37 KBplach
#6 entity-schema-1498720-6-do-not-test.patch18.49 KBdas-peter
#4 entity-schema-1498720-4-do-not-test.patch13.41 KBdas-peter
#151 et-entity_schema_handling-1498720-151-part2.txt46.55 KBplach
#151 et-entity_schema_handling-1498720-151.patch299.37 KBplach
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

plach’s picture

Status: Active » Postponed

We need at least one translation schema created to go on with this.

plach’s picture

Title: Introduce automatic translation schema creation for the default SQL controller » Make the default SQL storage controller automatically generate tables for every defined entity type
catch’s picture

Status: Postponed » Active

Un-postponing.

plach’s picture

Issue summary: View changes

Updated issue summary.

plach’s picture

Issue summary: View changes

Updated issue summary.

plach’s picture

Issue summary: View changes

Updated issue summary.

das-peter’s picture

Here's a first POC of how I could imagine this could be done.

das-peter’s picture

Summary of discussion in IRC with fago:

  • Use the generated schema in hook_schema().
  • Replace the schema_settings by information from constrains and new data types where necessary.
das-peter’s picture

Here's a next attempt.

Use the generated schema in hook_schema().

This is quite hard because at the point when hook_schema() is invoked the entity information aren't yet updated. Thus we can't access the storage handler.
I don't know if we can or should change the installation order.

Replace the schema_settings by information from constrains and new data types where necessary.

Looks doable, however there are some pain-points:

  • Where come default values from? There's no such constraint - and it makes no sense to have on.
  • How can the default constraints of a field type be changed / extended? Currently I use the same approach as before for schema_settings. Downside is that this has no effect on the validation stuff.
Damien Tournoud’s picture

Indexes are going to be a whole world of pain.

This is quite hard because at the point when hook_schema() is invoked the entity information aren't yet updated. Thus we can't access the storage handler.

Isn't this basically just #1416558: hook_entity_info(), hook_schema(), and the field system are strongly bound to each other? One way of solving this as part of this patch is to remove the whole 'schema_fields_sql' thingie. We don't need that anymore if we enforce that the schema is generated directly from the field definitions.

yched’s picture

Just FYI :

Yeah, exposing dynamic schemas for field_sql_storage tables in it's hook_schema(), based on field definitions stored in ConfigEntities, has been a absolute can of worms in #1735118: Convert Field API to CMI.
Current patch does it by reading directly from the underyling config() files, skipping entity_load() - entity_load() within hook_schema() is a road to hell currently.

But : there has been discussions as to whether field_sql_storage really needs to expose its field data tables to hook_schema() to begin with. That's a *lot* of information, that isn't used anywhere in practice.
The people I asked so far (@chx, @fago, @effulgentsia) were all of the opinion that we probably shouldn't bother.

The pros & cons might be different when it comes to entity base tables, dunno for sure, but the two (base table, field data tables) should probably be treated the same ?

Damien Tournoud’s picture

This information has actually been very important in the past (in Drupal 7, for example, ctools is able to create relationship context automatically from the foreign keys). If tomorrow the SQL schema doesn't bring more information then the TypedData schema, of course we don't necessarily need to bother.

Berdir’s picture

Issue tags: +Entity Field API

In regards to default values, see #1777956: Provide a way to define default values for entity fields, but I'm not sure if we actually want to define that on the schema because the defaults could be dynamic I guess and you're not supposed to create database records yourself anyway.

Damien Tournoud’s picture

Yes, I think it's fine not to define defaults on the SQL layer.

yched’s picture

This information has actually been very important in the past (in Drupal 7, for example, ctools is able to create relationship context automatically from the foreign keys)

Right - though double checking, ctools does use drupal_schema() to get the foreign keys on entity types base tables, but reads foreign keys on field data tables through hook_field_schema(), not drupal_schema() / hook_schema().

Asked for feedback on "do we want to expose schemas of dynamic field data tables in hook_schema() ?" in #1735118-181: Convert Field API to CMI.

yched’s picture

Issue summary: View changes

Updated issue summary.

plach’s picture

Title: Make the default SQL storage controller automatically generate tables for every defined entity type » Make DatabaseStorageController automatically generate tables for every defined entity type
plach’s picture

Issue summary: View changes

Update issue summary

YesCT’s picture

Issue tags: +d8dx

while discussing #2057401: Make the node entity database schema sensible in irc, was pointed here while I was figuring out how the baseFieldDefinitions() in Node related to node_schema in node.install and the DatabaseStorageController and the EntityManager. (tagging d8dx since this might help with that confusion, maybe another tag would be better)

updated the issue summary.

YesCT’s picture

Issue summary: View changes

summarized, and used template.

YesCT’s picture

Issue summary: View changes

oops. space.

amateescu’s picture

Issue summary: View changes
Status: Active » Postponed
effulgentsia’s picture

What I like about this issue is that if you, say, swap in a MongoDb controller, then you don't need pointless tables in the MySQL db. However:

Moreover any entity can natively have multilingual properties and fields.

If I'm reading that right, then that means a different MySQL controller (say, one that wants to make menu link titles multilingual) would generate a different table. So, then, what happens when a table is made with one controller, and then a different controller is swapped in? Is the new controller responsible for altering the tables from any arbitrary schema into the one that it needs?

amateescu’s picture

My initial reaction would be to say that we can't (or shouldn't?) support swapping controllers if the current tables are not empty..

plach’s picture

Yep, a contrib module switching-in a different storage controller might try to write a migration to move data around, but I think that in the core context we should assume an empty storage. I was planning to write all the code here in the assumption there is no entity data yet. We can try to write a migration to switch between the table layouts supported by core but that would totally be a follow-up issue (and might also be contrib material).

plach’s picture

Priority: Normal » Major
Related issues: +#2144327: Make all field types provide a schema()

This is at least major as (together with #2144263: Decouple entity field storage from configurable fields) it will allow us to implement the unified storage discussed in Prague.

Gábor Hojtsy’s picture

effulgentsia’s picture

Issue tags: +beta blocker

Given that catch approved this issue being critical in the issue linked in #20, and that this is clearly data model/schema related, I'm tagging it "beta blocker" despite not having explicit approval to do so. Someone can correct me if they disagree.

plach’s picture

chx’s picture

> since all the data retrieval is handled by the controller itself. This may allow us to kill hook_entity_load() and hook_ENTITY_TYPE_load() altogether.

Erm. Nope? What if two modules want to add data? Both can't replace the controller.

plach’s picture

Well, the original idea was that everything would be a field, so any addition would be loaded by the storage controller, which would allow for fields to specify a custom storage engine if needed (see the "Unify Field Storage" section of the related Prague Notes). Computed data would just be computed fields. That said, I am no longer sure this would extinguish the need for hook_entity_load() (or an equivalent event) so removing it in this stage of D8 development would probably be a bad idea.

plach’s picture

To clarify: we were not proposing to support per-field storage again, just the ability to specify a different table layout than the default one.

Sylvain Lecoy’s picture

Doctrine #1817778: Consider using Doctrine ORM for Entities has this feature already, plus it supports schema version (e.g. adding or removing fields, changing the nature of the relationship, etc.).

plach’s picture

That's a D9 issue, is it?

Sylvain Lecoy’s picture

Yes it is, for the record it was originally against drupal8, then postponed.

As I see this issue postponed too now, it might be interesting to no re-invent the wheel if Doctrine (like Symfony2 folks use) is chosen.

I am pretty sure drupal will use an ORM in the end and provide tools to integrates these ORM just like it has become the Java standard with Spring/Hibernate.

Just look at how symfony guys did, it will become the common standard in php in a few years.

Sylvain Lecoy’s picture

Then every body will be amazed by ORM and DataModel visualisation and modeling through graphical IDE and @nnotations but this is a feature we had since 10 years in Java.

We'll also be able to remove the need for 'EntityInterface' and every developers will thank us for this, ORM taking the responsibility of creating tables, but also maintaining them, by deleting, updating and creating relationships (through foreign keys or not) as the conceptual data model evolves in the code by annotations.

We are not ready for this when I pledged in the issue this will be our future, but one day we'll have no choice. We don't want to rewrite code that has been tested for years, which is performant, and widely accepted as common pattern in other technologies such as persistent web servers like J2EE.

plach’s picture

I agree this option should be seriously took into consideration in the future, but I fail to see how we could do such a big change in this phase of D8 life-cycle, honestly.

Sylvain Lecoy’s picture

Sorry for my intrusion in the conversation, just wanted to tell you I am really excited about this field, and wanted to point you to some work in progress for drupal 9. I have also a working copy (generates entity on drupal 7 from Schema API, not yet Field API) here: https://drupal.org/project/doctrine.

Please continue to your original idea, for now its almost impossible to introduce such a big change, that's what I've been told by people in 2012 for D8 so i'm not surprised to hear the same thing now in 2014 :D You got a point.

plach’s picture

Well, it's great that you are already exploring this idea: by the time it can be seriously considered for core inclusion, we will have some real-life data to evaluate :)

andypost’s picture

Status: Postponed » Active

There's no issues blocking this

plach’s picture

Actually #2068325: [META] Convert entity SQL queries to the Entity Query API is more or less a prerequisite. However I am planning to start working on this in a few days.

plach’s picture

plach’s picture

plach’s picture

tstoeckler’s picture

Re #34: Hey there, I recently started some initial work which I just posted in #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities. Maybe we can coordinate efforts.

plach’s picture

It looks like a duplicate to me :(

tstoeckler’s picture

We can certainly continue here but the patch over there is not really an a reviewable state. I posted in only because I wanted to *avoid* any duplication if you wanted to get started on this soon per #34. The only patch in this issue was very much outdated so I don't think I duplicated much work?!

plach’s picture

No problem for the patch, but I think keeping these two issues open would be misleading. Also, it would be good to discuss the approach before diving into coding: there are some non-trivial aspects to figure out.

Btw, did you have a look to the Prague notes ("Unify Field storage" section)?

andypost’s picture

This needs to unify field and data type plugins in #2150511: [meta] Deduplicate the set of available field types

For example node.title defines itself as 'text' but have no 'format' associated.

sun’s picture

tstoeckler’s picture

Posted a new, now working patch in #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities. I kept this as a separate issue for now, as in theory we could handle the actual schema generation there, but still install the schema manually. This has turned out to be a pretty massive issue on its own. Then once that is in place, at least partly, we can figure out the dynamic creation/deletion here. Do you think that makes sense @plach?

tstoeckler’s picture

Title: Make DatabaseStorageController automatically generate tables for every defined entity type » Make FieldableDatabaseStorageController automatically generate tables for every defined entity type

also, it's now called FieldableDatabaseStorageController

DatabaseStorageController is the legacy one that's onely being used by menu link now.

plach’s picture

I've been sketching an implementation plan in the last couple of days. I should be able to post it soon for review...

plach’s picture

Aside from the aspects already laid out in Prague , below there is a description of how I'd implement this.

Goals

  1. Supporting changes to the following entity type properties: revisionable, translatable
  2. Supporting schema changes to store additional custom fields
  3. Supporting schema changes to drop storage for removed custom fields
  4. Avoiding the need of loading the full schema at runtime
  5. Providing a standardized lightweight field -> table.column mapping for modules needing to perform raw SQL queries (-> Views)
  6. Supporting schema versions and tracking changes
  7. Supporting schema regeneration without a UI (ideally just a cache clear)
  8. Supporting UI-driven schema regeneration (see next)
  9. Supporting data migration between the previous schema version and the current one

Implementation overview

In the current proposal there is a new entity controller/handler (an instance of EntitySchemaHandlerInterface, actual names TBD), that is responsible for performing all the schema-related operations. The schema handler:

  • (re)builds the entity schema definition based on entity type/field definitions
  • stores the schema definition
  • tracks changes and schema versions
  • (re)installs the schema
  • detetcs and reports conflicts (typical example: a change in the definition that cannot be performed when data is already stored)
  • handles field purging

Additionally a service (implementing EntitySchemaBuilderInterface) is used to trigger the actual actions performed by the schema handler. The core implementation is owned by the Entity module, which implements hook_rebuild() to react the proper way when entity and field info are regenerated. This is the main (only?) core way to rebuild/reinstall the entity schema definition. Contrib modules could swap in a more advanced schema builder, that could provide an actual UI to trigger schema changes, show schema version diffs and start data migrations. The core schema builder refuses to install a new schema version if data is available and the required action cannot be performed. Instead it reports the inconsistency in Drupal's status report as an error.

Entity schema is initially generated and installed during the Drupal installation process. We need to check whether switching storage controller is possible during installation, so we do not need to always install the default SQL schema before swapping it out. This is definitely something we should try to avoid, as users are created during the installation process, so any alternative storage controller would need to provide a migration and do some magic to move users off their SQL table.

The entity schema definition is stored through the State service. The stored data includes:

  1. the schema version, an integer incremented at each change;
  2. the hash of the current schema definition, used to quickly determine whether there are schema changes
  3. the handler class, which might be useful if the storage controller is switched on an existing installation that already has a schema change history
  4. the schema definition data structure (the schema API array for core storage)

Default SQL implementation details

The core SqlSchemaHandler (as well as the related storage controller) supports 4 table layouts, depending on whether the entity type supports revisions, translations, none or both. When building the schema definition it loops through the field definitions and assigns each field to a "base" table ({entity}, {entity_revision}, {entity_field_data}, {entity_field_revision}) or a field table. The actual column schema definitions are picked from the field schema. By default indexes are generated based on entity keys and field schema definitions, but every entity type can provide its own specialized version of the schema handler to optimize index definitions (or any other part of it). I think we should add a new revisionable property on field definitions: this would allow the schema handler to avoid storing unnecessary/unwanted data in the revision tables.

The storage controller applies the same logic to tell in which table to store each field, so it does not need to rely on the fully generated schema. This is not necessarily a big gain as field schema definitions might be needed to perform the actual queries. However one of the goals is trying to get rid of this requirement.

Entity data handling

As I said above, the core schema handler supports performing only "data-safe" changes. This means that when data is available only the following schema changes are allowed:

  • Adding/dropping a field table
  • Adding/dropping a custom field column to/from a base/data/revision table
  • Switching from revisionable to non-revisionable
  • Switching from translatable to non-translatable

We probably need some kind of confirmation step before performing an operation that would imply data loss (dropping columns or tables). For instance this status might trigger a warning in the status report with a link to a confirmation form.

Any other change, mainly switching from a simpler table layout to a more complex one (e.g. from non-revisionable to revisionable) requires a data migration and as such is not supported. Any attempt to trigger such an action causes an exception to be thrown.

Views support and (legal) raw SQL access

Views is the most obvious example of a module legally needing to perform SQL queries directly. In D8 this would be discouraged but Views offers a vast set of SQL-specific features that would be impossible to implement otherwise. The point is that Views' flexible architecture allows for pluggable query engines: there is nothing wrong in having a SQL-specific engine, as long as it is possible to provide other ones for alternative storages. However the fact that we support swappable entity storages implies that Views cannot rely on a fixed table layout (not even assuming the core entity storage), hence we need a way for it to generate entity (views) data automatically based on entity/field definitions (see #1740492: Implement a default entity views data handler). The ways this issue could help with that are:

  • Providing a lightweight field.property -> table.column mapping that Views could use to generate its data. This would be implemented by a method on the storage controller. This is an example of the hypothetical return values of SqlStorageInterface::getTableMapping($field_name) (see also #2079019: Make Views use SqlEntityStorageInterface):
    'foo' -> array(
      'table' => 'my_entity',
      'revision_table' => 'my_entity_revision',
      'columns' => array('foo'),
    )
    
    'bar' -> array(
      'table' => 'my_entity_field_data',
      'revision_table' => 'my_entity_field_revision',
      'columns' => array('bar__column1', 'bar__column2'),
    )
    
    'field_baz' -> array(
      'table' => 'my_entity__field_baz',
      'revision_table' => 'my_entity_revision__field_baz',
      'columns' => array('baz_column1', 'baz_column2'),
    )
    
  • Allowing the schema handler to return the full schema definition.

The data above + the entity field definitions should allow to implement a base entity views data controller.

Comments welcome :)

andypost’s picture

Also this code could be re-used to generate "table-per-entity type" for comment, comment statistics and history tables that all have entity_id and int but fieldable entities could have ID as string, the approach suggested by @catch in #2081585-79: [META] Modernize the History module

plach’s picture

If I understand correctly, those could be defined as custom fields. In that case, yes, they would get dedicated per-entity-type tables.

Berdir’s picture

The problem with that is also what @catch commented in #2205215: {comment} and {comment_entity_statistics} only support integer entity ids. We have no API to interact with single fields, only entities. So making them field storage, all changes would need to save the whole entity.

The only thing we could do is to optimize internally that to only update field tables that have changed, but that's it.

fago’s picture

In the current proposal there is a new entity controller/handler (an instance of EntitySchemaHandlerInterface, actual names TBD), that is responsible for performing all the schema-related operations.

I'm not sure it makes sense to have a separate controller/handler for taking care of schema changes. How or whether you have to deal with schema changes is highly storage dependent, so I'd see this as the job of the entity storage. If it's about pluggability, I'd agree that it makes sense to separate that out - either in a service injected into the storage controller, or similarly using a composite pattern as I've already suggested while moving field storage to entity storage.

That said, shouldn't be FieldableEntityStorageControllerInterface enough of an interface for handling schema changes? It needs to be generalized to work with entity field interfaces (for which we need to create a FieldStorageInterface, as part of #2116363: Unified repository of field definitions (cache + API) or as separate issue) - see #2144263: Decouple entity field storage from configurable fields. As we can generate our schema based on the field definitions now, having the changed field definitions should be enough to allow any storage to perform any necessary change actions.

That said, imo the API should not be about schema changes, but about field definition changes.

For that, I think a good first step would be to get going with adding necessary information to our metadata, i.e. #2143069: Add getProvider() and hasCustomStorage to FieldDefinitionInterface and the revisionable flag, this should be the easy/quick wins. Then make sure we've FieldStorageInterface and generalize FieldableEntityStorageControllerInterface. Then, make sure stuff depending on the tables has a way to lookup tablenames (= add an extended interface for SQL storage engines: #2079019: Make Views use SqlEntityStorageInterface). Then finally we need to test and verify that stuff does not break.

One issue we need to solve is how we'd generate field definition changes for module provided fields. Either we store previous definition versions and do it automatically, or we'd have to require developers to write suiting update functions (should be ok I think and is less auto-magic).

Finally, a one of the bigger tasks would be to make the schema generation for base tables the job of the storage controllers. A quick first step could be moving code and responsibility, while basing the base table schema on the field schema is probably more tricky - issue: #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities.

That's how I'd see, but let's discuss asap, nail down steps and get the ball rolling. Maybe let's discuss asap in szeged.

fago’s picture

plach’s picture

Given #52 I guess it's probably better to discuss this once again in Szeged before starting the actual work.

Just a quick answer: the main reason why I'd go for a separate class to handle entity schema is that it may involve quite an amount of code that we don't need to load on most requests. Obviously it would make little sense to swap the storage controller without swapping the schema handler too, so yes I see them strictly coupled in terms of actual usefulness. They would have two quite distinct responsibilities though, so having two classes would be correct IMHO.

Crell’s picture

Just because schema changes live in a separate object doesn't mean they have to be part of a global subsystem that all storage drivers have to implement. This is another reason all of these objects should just be in the container, not coupled together via the annotation by class name.

It's completely legit for the entity storage handler for SQL to depend on 4 other objects, including the database service, in order to do what it needs to do. Meanwhile, the MongoDB handler can only depend on 2, neither of which map conceptually to any of the 4 sub-objects that the SQL handler uses. That's not only OK, that's a good architecture if that detail is hidden behind the storage handler's interface in ways that the calling code doesn't give a damn about.

Architect globally, implement locally. :-)

tstoeckler’s picture

I agree with everything that's been said so far. I also agree it makes sense to put this logic into a separate class and inject that as a service, I will refactor #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities accordingly.

For now I will keep that issue in its limited scope, i.e. I will not account for any schema changes. I will also not create the schema only on demand but will hardcode the schema creation into ModuleHandlerInterface::install(). We will need to work out how we store the information which tables have been created with which state in a separate issue. I think that is a non-trivial problem on its own.

Something like a field-definition-update API that #52 talks about sounds pretty sweet. I was not aware of #2116363: Unified repository of field definitions (cache + API), I will read up on that.

plach’s picture

Discussed this again with @fago, @tstockler and @berdir. We decided to split this in smaller tasks. We are currently working on:

#2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities
#2143069: Add getProvider() and hasCustomStorage to FieldDefinitionInterface

@Crell:

We agreed to introduce the schema builder as a service: my only concern was the inability to have a different class per entity-type, if needed. @fago and @berdir pointed out that we can simply inject a different service, so that works for me :)

plach’s picture

Status: Active » Postponed
fago’s picture

Crell’s picture

Make it a service, or make it a required service that every entity type or storage handler must define?

The former is what I was saying we should do. :-) The latter I'm saying is unnecessary, or if it's necessary then it's a sign of a deeper design flaw.

plach’s picture

I'd say the former: every storage controller can define its own list of dependencies, and I don't think we are hardcoding the need for the schema builder anywhere in the public API...

Dave Reid’s picture

/me has no idea how this would affect or work for File Entity in contrib which "enhances" the file entity type with bundle information, which core doesn't even have the schema to hold.

plach’s picture

@Dave Reid:

I think this should let you obtain the same result by just defining a new bundle field, which would be then automatically added to the schema. It may also let you add the other stuff that is currently defined in file_entity_schema(), if you define them as additional custom fields. Not sure whether that would make sense actually, I didn't study code closely.

plach’s picture

For now we are planning to do some of the work described in #48 directly here.

jessebeach’s picture

Issue summary: View changes
fago’s picture

Opened an issue for the field storage interface dependency: #2226197: Introduce FieldStorageDefinitionInterface in the Entity Field API.

plach’s picture

Issue summary: View changes
plach’s picture

Issue summary: View changes
plach’s picture

Issue summary: View changes
plach’s picture

jessebeach’s picture

Title: Make FieldableDatabaseStorageController automatically generate tables for every defined entity type » [PP-2] Make FieldableDatabaseStorageController automatically generate tables for every defined entity type
Anonymous’s picture

I just found this issue. OMG I will be so happy when this will be in core!!!!

plach’s picture

Title: [PP-2] Make FieldableDatabaseStorageController automatically generate tables for every defined entity type » [PP-2] Make ContentEntityDatabaseStorage handle changes in the entity schema definition

The actual schema generation is addressed in #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities. We will provide a unified API to track schema changes in #2144263: Decouple entity field storage from configurable fields. Here we will have to react to any change in the entity schema definition and handle it properly.

Sylvain Lecoy’s picture

You might be interested in looking at the reverse procedure: e.g. building an entity object from Schema API: http://drupalcode.org/project/doctrine.git/blob/48b52566638b3e31e3e0ebff...

If you have any questions i'll be glad to help :)

Dave Reid’s picture

tstoeckler’s picture

Status: Postponed » Active
xjm’s picture

Thanks @davereid; that's important to know.

@tstoeckler: I think this issue was also still postponed on #2144263: Decouple entity field storage from configurable fields? Or is it possible to work on them at the same time?

tstoeckler’s picture

Title: [PP-2] Make ContentEntityDatabaseStorage handle changes in the entity schema definition » [PP-1] Make ContentEntityDatabaseStorage handle changes in the entity schema definition
Status: Active » Postponed

Hmm... I would have thought "Yes", but let's let @plach decide, as he has concrete plans for this issue, I think. Marking back to postponed for now.

effulgentsia’s picture

Just chatted with @fago and @plach, and they say that while this issue won't be finishable/committable until after #2144263: Decouple entity field storage from configurable fields is done, it can still be started in parallel. Leaving postponed for now, until someone begins that work.

plach’s picture

Assigned: Unassigned » plach

I will do tomorrow

fago’s picture

Issue summary: View changes
fago’s picture

updated the issue summary based on a call with plach and timplunkett

plach’s picture

Issue summary: View changes
plach’s picture

Issue summary: View changes
fago’s picture

As discussed with yched, alexpott, mtift, plach and xjm (notes here) we'll have to move the field purging mechanism to the entity field API and keep deleted field definitions in state for being able to properly solve this. However, we can postpone that to a follow-up and just forbid uninstalling modules providing fields as long as the fields have data for now.
-> Thus, we'll have to forbid uninstalling modules providing fields as long as the fields have data as part of this issue.

fago’s picture

plach’s picture

Status: Postponed » Needs review
FileSize
178.37 KB

A first patch, not really ready for review yet, to see how many failures we have. This includes #2144263: Decouple entity field storage from configurable fields as it depends on it.

Status: Needs review » Needs work

The last submitted patch, 88: et-entity_schema_handling-1498720-88.patch, failed testing.

plach’s picture

Status: Needs work » Needs review
FileSize
179.24 KB
3.91 KB

Some fixes

Status: Needs review » Needs work

The last submitted patch, 90: et-entity_schema_handling-1498720-90.patch, failed testing.

plach’s picture

Status: Needs work » Needs review
FileSize
16.21 KB
186.33 KB

This should fix quite a few test failures.

Status: Needs review » Needs work

The last submitted patch, 92: et-entity_schema_handling-1498720-92.patch, failed testing.

plach’s picture

Status: Needs work » Needs review
FileSize
3.1 KB
186.04 KB

More fixes

plach’s picture

Status: Needs review » Needs work

Green, cool.

As I was saying this is not ready for review yet, unless you just want to have a look to how the new API to handle field schema changes looks like. Still lot to do, but we have a good foundation now.

plach’s picture

Ok, this should be more serious: it implements schema handling for shared table fields.

There is still room for a lot of clean-up and field data purging is not handled (it will in #2282119: Make the Entity Field API handle field purging), but aside from additional test coverage this first step should not be too far from completion.

Next step is automatically detecting and applying changes. I will start working in a separate branch based on the current one, so we can decide whether split that part off in a separate non beta-blocking issue or whether it makes more sense to have them together.

The full patch is for the bot and includes #2144263: Decouple entity field storage from configurable fields, the one to review is, well, the .review one :)

Status: Needs review » Needs work

The last submitted patch, 96: et-entity_schema_handling-1498720-96.patch, failed testing.

plach’s picture

Status: Needs work » Needs review
FileSize
1.48 KB
197.6 KB

Oops

Status: Needs review » Needs work

The last submitted patch, 98: et-entity_schema_handling-1498720-98.patch, failed testing.

plach’s picture

Status: Needs work » Needs review
FileSize
884 bytes
197.83 KB

The last changes exposed this nice bug

plach’s picture

plach’s picture

Title: [PP-1] Make ContentEntityDatabaseStorage handle changes in the entity schema definition » Make ContentEntityDatabaseStorage handle changes in the entity schema definition

And no longer postponed!

alexpott’s picture

FileSize
13.82 KB
139.84 KB

Read through the patch - just get on terms with what is going - the patch attached has some minor clean up and shows that we have a lack of test coverage around ContentEntityDatabaseStorage::doDeleteFieldItems(), ContentEntitySchemaHandler::getSharedTableFieldSchema() and ContentEntitySchemaHandler::getDedicatedTableSchema(). The last two are using FieldException - the patch in 101 didn't include a use statement for this class. And ContentEntityDatabaseStorage::doDeleteFieldItems() was not touching the revision table which looks necessary.

Status: Needs review » Needs work

The last submitted patch, 103: 1498720.103.patch, failed testing.

Anonymous’s picture

This may allow us to kill hook_entity_load() and hook_ENTITY_TYPE_load() altogether.

Nope nope nope nope nope nope nope . Why would you do that? That has nothing to do with schema. It's an event trigger and what modules do with it is not Drupal's concern.

plach’s picture

@alexpott:

Sorry, I was going on with my work in the sandbox (see also #2298525: Test issue for Make ContentEntityDatabaseStorage handle changes in the entity schema definition), I hope I will be able to include your interdiff without merging issues :)

@ivanjaros:

That's no longer on the table (see #24), I will update the issue summary when I have a more stable patch.

alexpott’s picture

Status: Needs work » Needs review
FileSize
4.08 KB
141.33 KB

Fix tests. I got getEntityTypeId and getTargetEntityTypeId mixed up!

dawehner’s picture

Someone will have to update the issue summary, it is kinda outdated.

One general question: People did not wanted to expose the table name / column name as API, but I guess it is unavoidable here.

plach’s picture

Someone will have to update the issue summary, it is kinda outdated.

I will soon, I'm almost done with the first part (at least we should be close to a reviewable patch), see #2298525: Test issue for Make ContentEntityDatabaseStorage handle changes in the entity schema definition.

One general question: People did not wanted to expose the table name / column name as API, but I guess it is unavoidable here.

Yep, I am afraid it is. The route I took here was "hiding" it in the table mapping, so it's available only if you have a storage class that implements SqlEntityStorageInterface.

fago’s picture

I shortly reviewed the patch, without going into all the details for now. Overall I like how code has been refactored and moved around!

Here some remarks:

  1. +++ b/core/lib/Drupal/Core/Entity/Schema/EntitySchemaHandlerInterface.php
    @@ -7,8 +7,48 @@
    +  public function markFieldSchemaAsDeleted(FieldStorageDefinitionInterface $storage_definition);
    

    This naming and description confused me a bit - what does it mean to mark a schema as deleted? Schema does not support something like that ;)

    Maybe, it should be just prepareFieldSchemaDeletion()?

  2. +++ b/core/lib/Drupal/Core/Entity/Sql/DefaultTableMapping.php
    @@ -177,4 +197,105 @@ public function setExtraColumns($table_name, array $column_names) {
    +  function allowsSharedTableStorage(FieldStorageDefinitionInterface $storage_definition) {
    ...
    +  function requiresDedicatedTableStorage(FieldStorageDefinitionInterface $storage_definition) {
    

    Why does the one allow it and the other one require it. Cannot it just be "hasXTableStorage()" or usesXTableStorage()?

    Also, the logic seems a bit duplicated, maybe one could check the other, i.e. !customSTorage + !shared table -> dedicated table?

  3. +++ b/core/modules/file/file.views.inc
    @@ -520,9 +519,12 @@ function file_field_views_data(FieldConfigInterface $field) {
    +  $table_mapping = $entity_manager->getStorage($entity_type_id)->getTableMapping();
    

    Can it rely on having SQL storage here? Also on various other places - it seems to be a pre-existing assumption though. I guess we should ensure it does at least not FATAL on non-sql storages, besides providing a neat way to provide views integration still. Stuff for another issue.

plach’s picture

Thanks @fago, but that patch is very outdated, I am working in #2298525: Test issue for Make ContentEntityDatabaseStorage handle changes in the entity schema definition. I will see if I can incorporate your feedback there.

YesCT’s picture

adding the more widely used tag.

plach’s picture

FileSize
248.26 KB

This implements the first part of the work, see the (upcoming) updated issue summary. I am now going to start phase 2 ;)

plach’s picture

Issue summary: View changes
plach’s picture

plach’s picture

@fago:

The latest patch addresses most of your review. Some replies:

  1. Done, but I thought the previous name made sense as field tables are renamed (marked) as deleted :)
  2. I discussed those names extensively with @eff in Austin, and those were the ones that seemed to be more meaningful to him. No problem in renaming them, but I guess we should involve Alex in the bikeshed ;)
  3. Stuff for #1740492: Implement a default entity views data handler
plach’s picture

Title: Make ContentEntityDatabaseStorage handle changes in the entity schema definition » Make ContentEntityDatabaseStorage handle changes in the entity and field schema definitions

Better title

plach’s picture

Title: Make ContentEntityDatabaseStorage handle changes in the entity and field schema definitions » Make the entity storage system handle changes in the entity and field schema definitions

Even better

plach’s picture

Issue summary: View changes
sun’s picture

Ugh. I wasn't aware that the result of the Entity Schema effort was going to retry a reinvention of Data API ad-hoc and 5 minutes to midnight, even though there's plenty of past evidence to prove that the approach simply doesn't work out for more complex data types in SQL storages.

Likewise, for NoSQL storages, we're talking about either (1) excessive mass-updates of all stored items, or (2) application code that has backwards-compatibility layers baked in, so as to be able to understand data/properties/values in different formats.

What exactly did we sign up for by committing the entity/field schema abstraction?

xjm’s picture

FYI, "Five minutes to midnight" is an unreasonable characterization as these issues have been beta-blocking since December (and #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities was committed on June 1).

Berdir’s picture

Thanks @xjm :)

I experimented a bit with this with file_entity and managed to use it to add the type/bundle field: https://github.com/md-systems/file_entity/commit/8bd8e27f2cd5e488edba4bf...

I had one problem and that is that the bundle field is NOT NULL and I need an 'initial' value. I experimented with supporting that automatically, it's very simple but a bit random, see 8.x-et-support-initial-1498720-berdir. Not saying you should include it, just putting it out as an idea. Would certainly need tests and documentation. As an alternative, I could override the schema handler through a custom storage class and add the initial there.

chx’s picture

I thought massive changes are going to be handled by the migrate system ; just add D8 source classes to taste?

plach’s picture

Version: 8.0.x-dev » 8.x-dev

@sun:

TBH I am a bit surprised by your comment: as usual, I am completely open to discuss technical stuff, but your comment seems to imply I am trying to sneak in this huge change last-minute without prior discussions. Actually it's the exact opposite:

  • I have been talking about this to stuff to everyone looking even barely interested to it at each Drupal event I attended since Denver
  • I posted my implementation plan 5 months ago (see #48), after discussing the big picture in Prague. I announced that post on the g.d.o. core group and on Twitter.
  • I discussed it again in Szeged and Austin

To sum up, I think I did everything I could to advertise my goals in general and this issue's goals in particular. If that was not enough, I am sorry, but it wasn't certainly due to bad will. The only reason I started working on this so late is that there was an almost infinite list of prerequisite issues to address before it.

That said, thursday 6pm CEST we are going to have the usual #drupal-entity meeting with most key people in this area: you'd be welcome to join us so we can discuss your concerns.

@Berdir:

Your code looks certainly promising, if there are no major objections I'd see it as a valid solution.

@chx:

Not sure whether you are referring to @Berdir's comment or this issue in general, I will assume the latter: the plan for core is limiting schema changes to those not-implying data migrations, e.g. the usual field table addition. The service responsible for managing changes could be swapped in contrib with an implementation relying on the Migrate API to deal also with changes that involve data migrations.

plach’s picture

Restoring new version

yched’s picture

The case of "initial" is a bit tricky.
- it only applies to the case where you're adding a new field stored in the base table, where there is already one row per existing entity. When used on a field stored in a dedicated table, supporting an initial value would mean inserting potentially 1000s of new rows.
- the "initial value" is not inherent to the field type, but very much case-by-case (it's the code / person that adds the new field that knows the correct initial value for that new field specifically).

To me, the above says "not part of the field type schema, not handled by the field storage layer, belongs to migrations".

fago’s picture

I discussed those names extensively with @eff in Austin, and those were the ones that seemed to be more meaningful to him. No problem in renaming them, but I guess we should involve Alex in the bikeshed ;)

I'm not into bikesheeding, but I'd like to understand why one allows and the other requires the storage "approach". Is requiresDedicatedStorage() stronger? (Its docs seem to be wrong as they are from shared storage.).

Also, the logic seems a bit duplicated, maybe one could check the other, i.e. !customSTorage + !shared table -> dedicated table?

Looks like this has been addressed, thanks :)

xjm’s picture

We discussed this issue in the IRC meeting yesterday (including @fago, @berdir, @effulgentsia, @plach, @yched, @swentel, and @alexpott). Conclusions:

  1. We will file a followup issue for the "initial value" problem space.
  2. plach will ping joachim to see if there's any concerns he would raise following experience with the Data API, though we agreed this issue is much narrower in scope than that. (I left him an IRC tell.)
  3. fago and effulgentsia will provide some code reviews of Part 1.
  4. plach will also begin work on part 2 and explore what options/libraries might be available for the schema diff.
joachim’s picture

It's a long time since I've worked on Data module...

What I do remember is that has a hideous circularity problem that I never properly managed to fix. I don't recall any of the details, but the comments I put in the code at the time can do that for me ;)

/**
 * Implements hook_schema_alter().
 *
 * This is a central piece of data module:
 * Here we tack schema information that has been defined through the API in data_tables
 * or by hook_data_default onto the $schema array.
 *
 * We do not use hook_schema() for exposing schema information as this would cause a race
 * condition: ctools/exports looks for data module's data_tables at the same time when
 * we are actually rebuilding it - follow path through
 * data_get_all_tables() ... _data_load_table() ... ctools_export_load_object().
 *
 * TODO: This is still rather hairy, and needs more work.
 * In the meantime, it's probably best to enable CTools first, and then Data
 * rather than both together.
 */
function data_schema_alter(&$schema) {
  // Sidestep this during installation, as otherwise this is circular:
  // data_get_all_tables() calls ctools stuff, which calls the schema, and
  // gets us right back here.
  if (drupal_get_installed_schema_version('data') != SCHEMA_UNINSTALLED) {

So it looks like the big problem was that Data table definitions are exportable with CTools. Also, Data needs to use hook_schema_*alter*() because it's drawing from data that's in a table. With the D8 config system that won't be a problem at least.

Berdir’s picture

entity tables aren't exposed to drupal_get_schema() at all (nor are field tables), so anything related to that can't be an issue :)

chx’s picture

I still don't get it. Now I read the IS. Why we are not telling people to just add configurable fields?? Why put all this effort to store things in the base table? Between that and migrate, I really don't get why is this necessary.

joachim’s picture

Would be a good idea to also ping the maintainer of ECK, which deals with an almost identical situation to Data.

Anonymous’s picture

@chx IMHO a) to save unnecessary joins to additional field tables, b) mostly base fields of an entity are single valued so it's easy to just open the entity field data table and immediately see all values instead of "hunting" the values in different field tables(DX).

The b) point for me is very important but I guess moving entirely to Fields would save us quite a bit of coding(base field - definiton + display settings + form settings + all the code in core that handles base fields), no field would be outside of configuration, etc... so IMHO I'd agree with such approach.

plach’s picture

@joachim:

Thanks for your feedback! As @Berdir was point out, we are no longer exposing the schema array through hook_schema() so that issue should not affect us. Moreover atm I am trying to spot differences that might trigger schema changes by just inspecting entity type/field schema definitions, so that schema array should not be a concern at least in this basic implementation. I am envisioning a contrib module that could expand on the core behavior and provide schema diffs, change previews and similar cool stuff, but that would totally rely on existing code/libraries. Anyway, I pinged @fmizzell as you suggested :)

@chx:

Well, actually denormalization in particular and performance in general are some of the biggest reasons I have in mind. Certainly we are still supporting the ability to add configurable fields via code, actually it's even possible to define bundle fields via code and they would get a dedicated field table, exactly as configurable fields. Adding stuff to the base table(s) is just another option that I think may be quite useful in certain situations. I must say that I'd very much prefer to use base fields defined in code (instead of bundle ones), though. I've always been disturbed by the fact that a configurable field, even one added in code and locked, is somehow "unreliable" and basing coded business logic on it feels just wrong to me. A possible option to make it easier to add base fields in scenarios where data is already available, could be to expose a way to specify that a certain field should be stored in a dedicated table even if it's a single-value base field.

fmizzell’s picture

I haven't looked at D8's entity stuff since the time when typed data was being discussed, so I am not sure that looking at the patch and trying to give feedback would be that helpful at all, but it does seem that @platch envisions a contrib module that could offer a UI to trigger creation and deletion of base fields, and that is exactly what ECK is doing in D7. So as long as that is possible ECK will be able to work with the core system.

plach’s picture

@fmizzell:

Thanks! Is there any experience you'd like to share (tricky stuff, good solutions) that could be useful to design the D8 system?

fmizzell’s picture

In ECK I am lucky to have almost full control of how base field's metadata gets manipulated (A user could modify the serialized array in the db, but why would you when there is an API?). Adding a new property/base field looks something like this:

$entity_type->addProperty(...);
$entity_type->save();

So I can record manipulations that need to be performed to existing db tables and perform them during save. I am guessing that in D8 a lot of this metadata is in code or in the config system, so assuming that people is going through an API might be out of the question.

Knowing that users will go through an API eliminates the need for schema versions and other historical data (which makes everything much simpler), but on the other hand, this issue is more ambitious than just adding/deleting base fields, and saving historical information like diffs will be necessary anyways to help out with auto-migration features, etc.

plach’s picture

@fmizzell:

#137 makes a lot of sense to me. As you correctly point out, the main difference in D8 is that Entity Field CRUD operations are not observable, at least at the moment, so what we need to do to detect changes is storing a copy of entity type and field storage definitions and comparing them with the ones currently available in the system. This has also one advantage though: such operation can be performed whenever we want and allows for multiple changes to be applied in a single run.

Anyway, as discussed with @yched elsewhere, we should probably ensure the Entity Manager handles all this stuff and the storage is just an event subscriber or something like that. However we can probably implement that approach as an API addition, so no need to tackle that here.

chx’s picture

You need to realize that I have no bones in this -- mongodb will store whatever is the current field schema. As there's no database schema, for all it cares every document in a collection can be completely different. So from that perspective, I shouldn't care about this issue.

My concern is timing and resources and maintainability of the entity storage code. Please think carefully: is this really a release blocking task? Isn't it merely a feature? Shipping a field config with a module should be quite reliable AFAIK; it's already built and working.

fago’s picture

@chx:
Well, actually denormalization in particular and performance in general are some of the biggest reasons I have in mind.

While performance benefits are great to have, I think the main reason for us having to do this is something else: So far entity types have been able to do updates to their entity base tables by writing the schema update operations theirselves. With the schema being generated, this is not possible anymore. Instead we need this API to continue to have the possibility for entity type providing modules to make changes to their base fields / base table schema.

Gábor Hojtsy’s picture

Issue tags: +Drupalaton 2014
fago’s picture

FileSize
10.23 KB
248.01 KB

ok, I looked through the latest version of the patch (branch 8.x-et-entity_schema_handling-1498720-plach) in detail now. In general, this looks great. Adding an updated version with some small documentation stuff and glitches fixed.

Here some remarks/questions:

      // There is just one field for each dedicated storage table, thus
      // $field_name can only refer to it.
      if (isset($field_name) && $this->requiresDedicatedTableStorage($this->fieldStorageDefinitions[$field_name])) {
        $this->allColumns[$table_name] = array_merge($this->getExtraColumns($table_name), $this->allColumns[$table_name]);
      }
      else {
        $this->allColumns[$table_name] = array_merge($this->allColumns[$table_name], $this->getExtraColumns($table_name));
      }

I do not understand why the array_merge() order differs here. So this could use a comment / some explanation.

/**
 * Defines a schema handler that supports revisionable, translatable entities.
 */
class ContentEntitySchemaHandler implements ContentEntitySchemaHandlerInterface {

Given #2275659: Separate FieldableEntityInterface out of ContentEntityInterface it's actually not bound to content entities, but fieldable entities. I'm not sure whether it should be FieldableEntitySchemaHandlerInterface or better just "EntitySchemaHandlerInterface" + type hint on fieldable entity interface later on and ContentEntityInterface for now.

    // If we are adding a field stored in a shared table we need to recompute
    // the table mapping.
    if ($this->getTableMapping()->allowsSharedTableStorage($storage_definition)) {
      $this->tableMapping = NULL;
    }

Yep, but we do not generally know which changes influence the table mappings and which don't. So shouldn't we just re-compute the mapping on every change?

    if ($table_mapping->requiresDedicatedTableStorage($storage_definition)) {
      // Mark all data associated with the field for deletion.
      $table = $table_mapping->getDedicatedDataTableName($storage_definition);
      $revision_table = $table_mapping->getDedicatedRevisionTableName($storage_definition);
      $this->database->update($table)
        ->fields(array('deleted' => 1))
        ->execute();
      $this->database->update($revision_table)
        ->fields(array('deleted' => 1))
        ->execute();
    }

Shouldn't those changes be in the SchemaHandler as well? The name and description of the schema handler implies that is used for dedicated fields as well. If we want to keep implementations separated, I'd suggest moving it to a trait and use the trait in the schema handler?

CommentSchemaHandler
This one seems to be unused?

    $storage = $entity_manager->getStorage($field_storage->getTargetEntityTypeId());
    $result = $storage instanceof ContentEntityDatabaseStorage ? $storage : FALSE;

Is there a reason it checks the implementation and not on the sql storage interface?

      ->setSetting('unsigned', TRUE)
      ->addConstraint('TermParent', array())
      ->setCustomStorage(TRUE);

oh, yes :)

fago’s picture

        $this->dropEntitySchema($original);
        $this->storage->setEntityType($entity_type);
        unset($this->schema[$entity_type->id()]);
        $this->createEntitySchema($entity_type);

Does it check whether the entity has data somewhere? I'm missing it. Also, it would make sense to handle simple cases e.g. column additions and removals without migration / data loss I think? -> We do :)

Status: Needs review » Needs work

The last submitted patch, 142: d8_storage.patch, failed testing.

plach’s picture

Here's my current work: part2 still needs work, and I realized also part1 needs a bit of additional work to handle index/keys changes, but this is definitely looking more like it. Not sending to the bot yet, but manual testing here looks promising (see https://twitter.com/plach__/status/497797333015601153). Adding some screenshots of the UI tweaks.

(this does not include @fago's changes, just saw them)



plach’s picture

Status: Needs review » Needs work
plach’s picture

Issue summary: View changes
fago’s picture

Thanks!

  1. +++ b/core/lib/Drupal/Core/Entity/Schema/ContentEntitySchemaHandler.php
    @@ -95,6 +95,60 @@ public function __construct(EntityManagerInterface $entity_manager, ContentEntit
    +    // A change in the storage class may or may not imply a data migration. We
    

    This misses the storage class check although commented - it should probably just call the requiresChanges method for now as it's the same in case of entity data? Or do we handle going from revisionable to not revisionable without migration?

  2. +++ b/core/lib/Drupal/Core/Extension/ModuleHandler.php
    @@ -868,11 +869,12 @@ public function install(array $module_list, $enable_dependencies = TRUE) {
    -            $entity_manager->getStorage($entity_type->id())->onEntityTypeDefinitionCreate();
    ...
    +            $entity_schema_manager->onEntityTypeDefinitionCreate($entity_type);
    

    So this makes the base system supported schema changes instead of letting the entity storage handle changes. As update.php list schema changes separately, it makes sense - but still hard-codes entity storage changes to be schema changes only.
    Not sure whether that's a use-case, but should we continue routing change notification calls through the storage, such that a non-SQL storage could implement its own of storage adaptions as needed?
    Then, the SQL storage could forward calls to the schema manager service internally. Thoughts?

    Other storage engines might not have to do schema changes, but still e.g. data cleanup, changing indexes etc. might be interesting. Then who knows what sort of storage implementation people will come up with?

    If we do this, the question is how we deal with the summary in update.php. I could see us to just continue to hard-code the summary to schema changes. Even if other storages cannot provide a summary, avoiding to locking them out of storage changes seems reasonable.

plach’s picture

This should address @fago's reviews (thanks :). Attached you can find also the interdiff between #142 and #145, which probably @fago missed.

Given #2275659: Separate FieldableEntityInterface out of ContentEntityInterface it's actually not bound to content entities, but fieldable entities. I'm not sure whether it should be FieldableEntitySchemaHandlerInterface or better just "EntitySchemaHandlerInterface" + type hint on fieldable entity interface later on and ContentEntityInterface for now.

I've been asking myself more or less the same questions (here and in other places): I think for now it safer to target the most specific scenario (just content entity types). We can discuss the details and "widen" the target safely, the opposite strategy would imply API breaks.

Yep, but we do not generally know which changes influence the table mappings and which don't. So shouldn't we just re-compute the mapping on every change?

Fair point. But we can bring it even further and say that the storage should make no assumption on what is needed to reflect the change in the schema, so ideally it should be responsibility of the schema handler to instantiate a fresh table mapping. For now I just added a todo about removing those lines when we are done with #2274017: Make SqlContentEntityStorage table mapping agnostic .

Shouldn't those changes be in the SchemaHandler as well?

I don't think that code belongs to the schema handler: the storage handles field data, while the schema handler deals with schema. IMHO this way the overall (CR)AP strategy is correctly split into its two respective areas of responsibility.

CommentSchemaHandler
This one seems to be unused?

What do you mean? :)
It's overriding the getEntitySchema() method as usual.

Is there a reason it checks the implementation and not on the sql storage interface?

Well, the current implementation hardcodes assumptions on the table layout that are specific to the core implementation provided by ContentEntityDatabaseStorage. I think we can get rid of this entirely in #1740492: Implement a default entity views data handler.

Does it check whether the entity has data somewhere? I'm missing it.

Validation is being implemented in part2 atm. My current approach is leaving the possibility to override data handling policies in the schema manager. The storage layer just exposes new methods to describe how/if the change affects data. This part could use some IRC discussion :)

it should probably just call the requiresChanges method for now as it's the same in case of entity data? Or do we handle going from revisionable to not revisionable without migration?

Yep, we just need to drop revision tables. Added a comment to clarify that.

[...] Not sure whether that's a use-case, but should we continue routing change notification calls through the storage, such that a non-SQL storage could implement its own of storage adaptions as needed? [...]

This is an area I spent quite some time thinking around: IMO ideally all these notifications should be routed through the entity manager, which in turn would be responsible for notifying interested entity handlers (probably only storage) and for proxying the notifications to other interested business objects, such as the schema manager. Among the rest this would allow the EM to clear its own caches as discussed with @yched in #2144263-87: Decouple entity field storage from configurable fields.

I agree the current approach is not the cleanest possible, but it is still an improvement wrt the HEAD code (although the interdiff looks worse) and it ensures that any change to definitions is persisted in state after being reflected in the storage. In fact every call is initially proxied to the storage, so we are losing nothing in terms of the use cases we are able to support.

I think we should clean this up in a dedicated issue: we can definitely implement the plan above without introducing relevant API changes.

plach’s picture

plach’s picture

Sorry, patch in #149 contains only part1 while it was meant to contain both. Here is the complete one. Interdiffs above are correct. Attached the difference between #149 and #151.

xjm’s picture

Looks like the summary could use an update. :) Is that the full implementation of part 2?

plach’s picture

I am not sure what to update, I think I followed strictly what is proposed in the summary.

Gábor Hojtsy’s picture

I think the biggest question is are phase1 and phase2 both part of this issue intended to commit? What are missing pieces?

plach’s picture

Part 1 and part 2 are split just for reviewer convenience. We agreed they should be brought on together. Then maybe we can commit them separately if that's the easier way to get this in, but they should be set to RTBC together.

I am still coding a few bits, but we should be close. Namely:

  • Detect changes in entity schema indexes/keys.
  • Block module uninstallation if data is available, since we have the purge issue still open: #2282119: Make the Entity Field API handle field purging.
  • Block entity schema changes. We need this temporarily as the various entity-type-specific storage classes do not support dynamic table layouts yet, so switching the entity schema causes queries to break. Moreover we need Views integration fixes (#1740492: Implement a default entity views data handler and friends) before allowing for those, for the very same reason. However this can be done in a follow-up, as just unblocking these changes (simply removing a throw) is a totally non API breaking change.
  • Improve test coverage for the schema manager.

Additionally, I am wondering whether we should add a method to reconcile the definitions stored in state with the ones available in code. This would allow people to apply changes manually and just notify the schema manager that everything is fine again.

fago’s picture

Great work, we've come along way already! Here a another review, I've not gone into details really, tried to stay more on the bigger picture for now.

  1. +++ b/core/lib/Drupal/Core/Entity/Schema/ContentEntitySchemaHandler.php
    @@ -95,6 +95,65 @@ public function __construct(EntityManagerInterface $entity_manager, ContentEntit
    +    // A change in the storage class may or may not imply a data migration. We
    +    // assume it does. This method should be overridden otherwise. Basically the
    

    hm, reading this again I'm wondering what an example of a change in the storage class would be, which would lead to a different schema being generated?
    I guess it boilds down to table mapping changes, so maybe we can add a todo to compare the generated table mappings? I guess changing the class would be something you'd like to be able to do without being force into some migrations ;-)

  2. +++ b/core/lib/Drupal/Core/Entity/Schema/ContentEntitySchemaHandler.php
    @@ -95,6 +95,65 @@ public function __construct(EntityManagerInterface $entity_manager, ContentEntit
    +    // only schema change that does not imply a data migration is from
    +    // revisionable to non revisionable, as in that case we just need to drop
    +    // revision tables.
    

    Yes, but updateEntitySchema() seems to drop the schema and re-add it on the entity level - so there is still entity data migration required? It just handles dedicated revision tables better?

  3. +++ b/core/lib/Drupal/Core/Extension/ModuleHandler.php
    @@ -868,11 +869,12 @@ public function install(array $module_list, $enable_dependencies = TRUE) {
    +          if ($entity_type instanceof ContentEntityTypeInterface && $entity_type->getProvider() == $module) {
    +            $entity_schema_manager->onEntityTypeDefinitionCreate($entity_type);
    

    hm, yet another object with those event listeners. Should we start proxying all of them via the manager already as part of this issue?
    It would make more sense to me to have the storage notified from the manager and not from the schema managaer as now, as there might be storages completely unrelated to schema. E.g. a site running mongodb might want something like a null schema handler (or even remove the service if possible later on).

    Related, I'm not sure the SchemaManager should be the one who stores the definition changes into state. That does not directly seem to be related to schema managed, as it will be required to figuring out definition changes in general to which storages should be able to react - howsoever they need to do that. Maybe, mongo just needs to drop/clean some data?
    So what about moving this to the EntityManager?

  4. I've been asking myself more or less the same questions (here and in other places): I think for now it safer to target the most specific scenario (just content entity types). We can discuss the details and "widen" the target safely, the opposite strategy would imply API breaks.

    I see, good argument - but isn't it actually the other way round? E.g. changing an argument from FieldableInterface to ContentEntityInterface would not break things, but the other way round it would be broken as you narrow down the interface?

  5. Fair point. But we can bring it even further and say that the storage should make no assumption on what is needed to reflect the change in the schema, so ideally it should be responsibility of the schema handler to instantiate a fresh table mapping. For now I just added a todo about removing those lines when we are done with #2274017: Make SqlContentEntityStorage table mapping agnostic .

    True, ok.

  6. I don't think that code belongs to the schema handler: the storage handles field data, while the schema handler deals with schema. IMHO this way the overall (CR)AP strategy is correctly split into its two respective areas of responsibility.

    I'd not say this is traditional data that you saved as it's a consequence of the field definition change; i.e. it's a metadata/schema change not a data change. But as the storage class needs to have knowledge about the 'deleted' column anyway, it doesn't seem to matter much where it lives.

  7. CommentSchemaHandler
    This one seems to be unused?

    What do you mean? :)
    It's overriding the getEntitySchema() method as usual.

    Nope? Not in the previous version I reviewed, nor in the latest one. Other storages override schemaHandler() to register it, but CommentStorage doesn't ?

  8. Well, the current implementation hardcodes assumptions on the table layout that are specific to the core implementation provided by ContentEntityDatabaseStorage. I think we can get rid of this entirely in #1740492: Implement an entity views data controller.

    I see, yeah let's deal with that over there.

  9. This is an area I spent quite some time thinking around: IMO ideally all these notifications should be routed through the entity manager, which in turn would be responsible for notifying interested entity handlers (probably only storage) and for proxying the notifications to other interested business objects, such as the schema manager.

    Yep, I've been thinking about that as well. Given all the various reactions in different classes I've been wondering whether it makes sense to start making use of symfony events for that also. It should be possible to introduce without an API break though if we continue to call pre-existing listener methods. Proxying them through the storage for now is fine imo.

  10. * - entity_type: a scalar having only the ENTITY_TYPE_UPDATED value.

    If there is only one possible value, having the constant seems superfluous ? Why not just set it to TRUE? Or better, just have general constants for CREATED, UPDATED and DELETED which we can re-use independent of what has been created|updated|deleted ?

  11. * Only changes that do not imply a data migration are applied when data is
    * available for a certain entity type. If any change fails to comply with
    * this policy the operation is aborted.

    How would I notice an aborted operation? Does it throw an exception? If so, that misses docs.

effulgentsia’s picture

Sorry to be joining the review party so late (17 days since #128.3 was written). I started trying to absorb the patch, and the first thing that I got stuck on is:

+++ b/core/modules/aggregator/src/FeedStorage.php
@@ -21,24 +21,11 @@ class FeedStorage extends ContentEntityDatabaseStorage implements FeedStorageInt
-  public function getSchema() {
-    $schema = parent::getSchema();
-
-    // Marking the respective fields as NOT NULL makes the indexes more
-    // performant.
-    $schema['aggregator_feed']['fields']['url']['not null'] = TRUE;
-    $schema['aggregator_feed']['fields']['queued']['not null'] = TRUE;
-    $schema['aggregator_feed']['fields']['title']['not null'] = TRUE;
-
-    $schema['aggregator_feed']['indexes'] += array(
-      'aggregator_feed__url'  => array(array('url', 255)),
-      'aggregator_feed__queued' => array('queued'),
-    );
-    $schema['aggregator_feed']['unique keys'] += array(
-      'aggregator_feed__title' => array('title'),
-    );
-
-    return $schema;
+  protected function schemaHandler() {
+    if (!isset($this->schemaHandler)) {
+      $this->schemaHandler = new FeedSchemaHandler($this->entityManager, $this->entityType, $this, $this->database);
+    }
+    return $this->schemaHandler;

Looks like this patch splits every content entity storage handler into a custom storage handler + a custom schema handler, but are we sure we want to require that split in every case? I support the split at the base class level (ContentEntityDatabaseStorage + ContentEntitySchemaHandler) and making it possible to subclass each one separately, but as a default case, would it be better DX for ContentEntityDatabaseStorage::getEntitySchema() to call back into $this->storage->adjustSchema() (pending better name), so that most content entity types can get away with only implementing the one storage class unless they need a custom schema handler for more exotic reasons?

effulgentsia’s picture

+++ b/core/lib/Drupal/Core/Entity/Sql/DefaultTableMappingInterface.php
@@ -0,0 +1,118 @@
+interface DefaultTableMappingInterface extends TableMappingInterface {

Why does this need to be its own interface? Why not add the new methods to TableMappingInterface? I don't see where there's code that benefits from a narrower TableMappingInterface that lacks the new methods.

call back into $this->storage->adjustSchema() (pending better name)

At least based on the examples in core, would optimizeSchema() be a good name for this? If I'm reading it right, then the base class (ContentEntitySchemaHandler) generates a functionally correct schema, but what's left as a per-entity-type responsibility is adding indexes and setting not-null constraints (primarily to benefit indexes), so I think "optimize" would be a decent name for that, but open to other suggestions.

Finally, neither #151 nor #2298525-48: Test issue for Make ContentEntityDatabaseStorage handle changes in the entity schema definition apply cleanly to HEAD anymore. Would it be possible to get an updated patch posted once HEAD changes are merged back into the sandbox?

plach’s picture

@fago:

1: One example is the contact_message entity: it currently has null storage, but we have tests where a regular storage class is swapped in. We need to generate the schema in that case.
2: Yep, still working on that, as part of the entity keys/indexes stuff.
3: Unless every (content) entity type is stored in Mongo, I don't think a null schema manager would make sense. Anyway, I think the switch to the EntityManager as the class responsible for notifying (actually proxying notifications) about entity/field definition changes could deserve it's own issue, as I think there a few aspects to figure out that are not completely trivial. Personally I'd avoid rushing it in here, but if there's consensus we should be addressing that now, fine by me. What we could end-up with is a scenario where the EntityManager actually stores data in state and tracks changes in definitions and the schema manager receives notifications about those and interacts with the status report and the storage class to actually apply them. The only reason why I am currently routing calls through the schema manager is to ensure definitions are properly stored in state.
4: If we restrict the API to ContentEntityInterface and then we decide to "relax" it to support EntityInterface we break no existing code. I am not sure where a FieldableInterface would fit in a hypothetical hierarchy, but I'd guess ContentEntityInterface would extend it, so the same reasoning would apply.
6: I agree it's metadata. Currently what determines the decision of putting some code in the storage or in the schema handler is: if code touches table records it goes in the storage, otherwise in the schema handler.
7: Sorry, I missed what you meant.
9: Yep, this looks very event-ish to me too :)
10: Good point.
11: It is supposed to throw an exception, if not I will add it and document it :)

@effulgentsia:

This has a double reason:

  • The schema handler needs to retrieve the full entity schema (entity-type specific bits included) in various parts of its code, in the current form it can do that without needing to rely on the storage class.
  • The current way allows all the schema handling to be internal, there is no public method returning the schema array anymore. This allowed me to write all the consuming code in a cleaner way, without introducing assumptions that make sense only for our core storage.

Looking at core implementations, with the only exception of block content, every entity type that needs a specific schema handler needs also a specific storage, so I don't think this is a big DX burden. If it turns out I am wrong, we can try to refactor things so that the schema handler is injected in the storage class, but that would prevent lazy loading, which would be bad since the schema handler is a big class that is used very rarely.

plach’s picture

@effulgentsia:

Why does this need to be its own interface?

Because this is the specific implementation for core storage: other storage classes might want to provide completely different ways to deal with SQL tables and implement completely different table layouts (see also #2274017: Make SqlContentEntityStorage table mapping agnostic ). In that case the concepts of dedicated/shared tables would be meaningless.

I am completing a couple of changes and then I will merge head. Hopefully it won't be too painful :)

effulgentsia’s picture

The schema handler needs to retrieve the full entity schema (entity-type specific bits included) in various parts of its code, in the current form it can do that without needing to rely on the storage class.

How? $storage is a constructor dependency of ContentEntitySchemaHandler, and is used already within ContentEntitySchemaHandler::getEntitySchema(). All I'm saying is that at the end of that implementation, call $this->storage->optimizeSchema($schema). That doesn't add any additional object dependency that isn't already there.

The current way allows all the schema handling to be internal, there is no public method returning the schema array anymore.

Is that really a benefit? If a particular entity type's storage class extends ContentEntityDatabaseStorage, then it's already bound to the concept that some kind of schema exists, so outside code knowing that an optimizeSchema() method can be called on it isn't very harmful, is it? I actually think that would be a legitimate method to define in SqlEntityStorageInterface. Note, outside code couldn't get a schema from it, it would need to pass one in and let the method modify it (i.e., add indexes).

This allowed me to write all the consuming code in a cleaner way, without introducing assumptions that make sense only for our core storage.

I don't think this would change that. The consuming code you're talking about would still purely act on ContentEntitySchemaHandlerInterface exactly the same as now. It's only that the ContentEntitySchemaHandler implementation of the protected getEntitySchema() method would interact with the storage object it already has access to.

other storage classes might want to provide completely different ways to deal with SQL tables and implement completely different table layouts...In that case the concepts of dedicated/shared tables would be meaningless.

Ah, thanks for that explanation. In that case, should getReservedColumns() and getFieldColumnName() move to TableMappingInterface? Those seem generic to me rather than bound to any particular layout (dedicated/shared) strategy.

plach’s picture

How? $storage is a constructor dependency of ContentEntitySchemaHandler

Well, the plan for #2274017: Make SqlContentEntityStorage table mapping agnostic is making the schema handler depend only on the table mapping class, by making the latter encapsulate the logic that is currently located in ContentEntityDatabaseStorage::getTableMapping(). This would allow to instantiate a table mapping class wherever needed. However I just realized we are currently using the hasData() and countFieldData() methods from the storage, so I guess we won't be able to make the schema handler completely independent from the storage.

Anyway, personally I find handling schema only in the schema handler cleaner, but if there's consensus that the current DX is too bad (I definitely don't think so) your proposal is probably the best way forward.

plach’s picture

Issue summary: View changes

Actually any table layout change implies a data migration... updated summary.

plach’s picture

This addresses bullets 2, 7, 10, 11 of #156. The interdiff does not include 2 as I did quite a few merges meanwhile. I am waiting feedback on the other bullets. If we decide to address #156.3 in a separate issue, then I think we are ready to split this in sub-issuess as @eff proposed. I still have a couple issues on my todo list and there's still #161 to be discussed, but I think none of them would imply cross-issue changes, so we should be fine.

@effulgentsia:

If you want to proceed with the split, please use my sandbox and create a separate branch for each sub-issue, branching the dependent ones off the independent ones, so I can perform cross-branch changes if needed.

plach’s picture

Issue summary: View changes
plach’s picture

Sorry, I missed this one:

Ah, thanks for that explanation. In that case, should getReservedColumns() and getFieldColumnName() move to TableMappingInterface? Those seem generic to me rather than bound to any particular layout (dedicated/shared) strategy.

Well, not sure whether getReservedColumns() is still needed, but it's tied to an implementation detail of field tables. However I could see the value of moving those in the base interface. I will do it on the next reroll.

plach’s picture

Issue tags: +sprint

Tagging for the rocketship.

effulgentsia’s picture

@effulgentsia: If you want to proceed with the split, please use my sandbox and create a separate branch for each sub-issue, branching the dependent ones off the independent ones, so I can perform cross-branch changes if needed.

Ok, will do when I get a chance (probably Monday). For now, just testing out the first part in #2326719: Move pseudo-private table mapping functions from ContentEntityDatabaseStorage to public API of DefaultTableMapping.

effulgentsia’s picture

The issue in #168 is green, if anyone wants to review/rtbc that :)

Additionally, I opened #2326949: Move entity-type-specific schema information from the storage class to a schema handler, which is now also green, so let's make the decision there regarding #157/#162. I'm ok with that getting RTBC'd and committed if no one else objects to the DX of needing an extra class.

And, I opened #2326981: Move \Drupal\field\FieldException to \Drupal\Core\Field\FieldException, which is super simple, so can hopefully land very fast.

All 3 of those do not overlap at all, so can land in any order. So, I'm attaching a patch here that is the same as #164, but rebased on top of those 3, in case it helps any other reviewers focus on what's not covered by those issues.

Per #168, I'll commit all these as branches to the sandbox tomorrow, if no one beats me to it.

plach’s picture

Issue summary: View changes

I updated the Remaining tasks section of the issue summary with the new issues. Please add the new ones there too.

effulgentsia’s picture

I setup the sandbox branches:

effulgentsia’s picture

xjm’s picture

Title: Make the entity storage system handle changes in the entity and field schema definitions » [meta] Make the entity storage system handle changes in the entity and field schema definitions

Thanks for the great work separating it out into scoped steps.

plach’s picture

Issue summary: View changes
plach’s picture

plach’s picture

effulgentsia’s picture

Issue tags: -beta blocker

#2337927: SqlContentEntityStorage::onFieldStorageDefinition(Create|Update|Delete)() is broken for base fields is now the final beta blocking child of this meta issue, so moved the "beta blocker" tag to there.

plach’s picture

Status: Needs review » Fixed

#2337927: SqlContentEntityStorage::onFieldStorageDefinition(Create|Update|Delete)() is broken for base fields is in, so I guess we can call this fixed now, thanks everyone for the amazing work!

dawehner’s picture

plach’s picture

Issue tags: +beta blocker

Re-adding tag

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Gábor Hojtsy’s picture

Issue tags: -sprint
xjm’s picture

The change record was not completely updated for these changes. I opened #2507879: [no patch] Change record for automatic entity schema updates needs updating.