Problem/Motivation

  • In HEAD, we have ContentEntityDatabaseStorage and ContentEntitySchemaHandler as separate classes, the former implementing entity CRUD operations (which happen on common requests) itself while delegating schema-generation logic (which only happens on module install or similarly rare circumstances) to the latter. This both saves on code that needs to be loaded for typical requests, and makes each class easier to understand, since both are quite large.
  • Currently, ContentEntitySchemaHandler's only public method is getSchema(), leaving it to the caller to figure out what to do with that information (e.g., tell the database to create the corresponding tables).
  • Since ContentEntityDatabaseStorage::schemaHandler() is by design a protected method, outside code can't get at it, so ContentEntityDatabaseStorage also implements a getSchema() method that delegates to the schema handler. As a result of this, HEAD also has every content entity type's storage class able to override getSchema() with its specific customizations (e.g., adding indexes and NOT NULL constraints that aren't automatically added by the base class). (There's an argument to be made that the base ContentEntitySchemaHandler class should be smarter and better able to figure out some appropriate constraints and indexes to add on its own from the entity type's base field definitions, but that is not the scope of this issue.)
  • In #1498720: [meta] Make the entity storage system handle changes in the entity and field schema definitions, we are trying to make ContentEntitySchemaHandler completely responsible for create, update, and delete operations of the schema. In other words, instead of code like ModuleHandler::install() calling $storage->getSchema() and then telling the database to create those tables, we want it to be able to do something more like $storage->onEntityTypeDefinitionCreate(); and have the storage class forward that to the schema handler, and then let the schema handler communicate with the database itself. This encapsulation of the schema handler's responsibilities is logical on its own merits, but is also necessary for it to be able to manage updating the schema in response to code changes (i.e., during update.php), which is the main goal of that issue.
  • To enforce the above encapsulation, we want to remove ContentEntityDatabaseStorage::getSchema(), and make ContentEntitySchemaHandler::getSchema() protected (and renamed to getEntitySchema()).
  • Which leaves a question: where to put all of the entity-type-specific overrides of ContentEntityDatabaseStorage::getSchema()?

Proposed resolution

Option 1: patch in #24: Move them into subclasses of ContentEntitySchemaHandler.

Pros:

  • Direct: you want to extend what the base class does, so do that.
  • Runtime efficient: all the schema customization code doesn't need to be loaded for most requests (probably only a very tiny memory savings though).

Cons:

  • Conceptually, when creating a custom content entity type, whatever schema customizations you want are tightly related to what you're doing in the storage handler, so having to split that off into a separate class/file could be considered by some to be worse DX. OTOH the split between storage and schema handler already exists in the base classes so this would be consistent.

Option 2: patch in #17: Keep them in the entity type's storage class, as an anonymous function passed to ContentEntitySchemaHandler's constructor.

Pros:

  • Solves option 1's "con".
  • Doesn't expose the schema customizer as a public method, which is good, because no code other than the schema handler has any business interacting with that.

Cons:

  • Anonymous functions aren't very commonly used in Drupal, and some people might not like them.

Option 3: patch in #18: Keep them in the entity type's storage class, as a new public method, customizeSchema().

Pros:

  • Solves option 1's and option 2's "cons".

Cons:

  • Clutters SqlEntityStorageInterface with a public method that isn't required by the other options.
  • Adds an additional dependency of ContentEntitySchemaHandler on the storage object. This object dependency already exists, but if we're ever able to refactor the other reasons for that, then this new one will be a barrier to removing the dependency entirely.

Remaining tasks

Decide on one of those options or some other one.

User interface changes

None.

API changes

In this patch, none, since ContentEntityDatabaseStorage::getSchema() isn't being removed yet, but it will be in #1498720: [meta] Make the entity storage system handle changes in the entity and field schema definitions.

Files: 
CommentFileSizeAuthor
#47 2326949-47.interdiff.txt10.45 KBplach
#47 2326949-47.patch45.55 KBplach
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,611 pass(es).
[ View ]
#42 2326949-42.patch39.48 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,609 pass(es).
[ View ]
#34 entity_schema-part1b-custom-schema-handlers-18.patch9.28 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,611 pass(es).
[ View ]
#24 entity_schema-part1b-custom-schema-handlers.patch34.12 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,564 pass(es).
[ View ]
#18 entity_schema-part1b-custom-schema-handlers-18.patch9.28 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,266 pass(es).
[ View ]
#17 interdiff.txt1.94 KBeffulgentsia
#17 entity_schema-part1b-custom-schema-handlers-17.patch32.98 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,264 pass(es).
[ View ]
#12 interdiff.txt35.31 KBeffulgentsia
#12 entity_schema-part1b-custom-schema-handlers-12.patch32.96 KBeffulgentsia
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,231 pass(es), 4 fail(s), and 0 exception(s).
[ View ]
entity_schema-part1b-custom-schema-handlers.patch34.12 KBeffulgentsia
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,234 pass(es).
[ View ]

Comments

plach’s picture

The patch looks ready to me if we want to keep the current approach. Reviews/feedback welcome.

effulgentsia’s picture

Status:Needs review» Reviewed & tested by the community

Ok. 8 days since #1498720-157: [meta] Make the entity storage system handle changes in the entity and field schema definitions with no one else agreeing with my concern, so RTBC. While I'm not crazy about needing yet another class for every content entity type that needs indexes, I actually think 80% of the DX pain is mucking with $schema to begin with (not changed by this patch), and only 20% is where that lives. Maybe some day, after #1498720: [meta] Make the entity storage system handle changes in the entity and field schema definitions, we can keep improving the base ContentEntitySchemaHandler class to also add indexes and NOT NULL constraints intelligently, and then remove most of these custom classes, but until then, I think the split is fine.

yched’s picture

A bit above my head at the moment, but requiring yet another extra class for each entity type is ideed not too great :-/

#2316171: [meta] Improve DX of entity defining (if you want a UI) already points that defining a new entity type is already quite mind bending in D8...

effulgentsia’s picture

Status:Reviewed & tested by the community» Needs review

Thanks for reminding me of that meta. I left a comment there, so setting this back to "needs review" for a day or so to allow people following that to comment.

I will point out though, that implementing this extra class is not required to get your stuff to work. It's just for optimizing the schema (e.g., adding indexes) that's autogenerated by the base class. So I think that makes it less of a burden than the multiple classes that are needed just to get something working from which you can then iterate.

sun’s picture

+  protected function schemaHandler() {
+    if (!isset($this->schemaHandler)) {
+      $this->schemaHandler = new FeedSchemaHandler($this->entityManager, $this->entityType, $this, $this->database);

+  protected function schemaHandler() {
+    if (!isset($this->schemaHandler)) {
+      $this->schemaHandler = new ItemSchemaHandler($this->entityManager, $this->entityType, $this, $this->database);

This looks a bit weird/inconsistent with the rest of the Entity system acrchitecture…

Why don't we declare/define/register the entity schema handler class like any other entity handler (formerly controller) in entity type plugin annotations?

effulgentsia’s picture

Because it's only needed if you are using a SQL-based storage handler, so it should be the storage handler that brings the concept into existence. Unless we're ok with adding annotation keys for concepts that are handler-specific.

plach’s picture

Why don't we declare/define/register the entity schema handler class like any other entity handler (formerly controller) in entity type plugin annotations?

We discussed that approach in the initial phase of #2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities. Basically the main concerns (that I didn't share :) were around polluting entity handler keys with something that is specific to our SQL storage implementation (somehow related to #2232465: Remove table names from entity annotations). Since handler keys are optional and can be extended by any module (Content Translation does it), I don't see any architectural problem with going the sun's way, which in fact is more consistent.

sun’s picture

Is the schema really guaranteed to be SQL-specific? Are we sure that NoSQL implementations won't require it too; e.g., in order to introspect (arbitrary) actual data/columns against expected data/columns? Or do we assume that to be covered by (base) field definitions already?

plach’s picture

Well, also an XML storage could exploit a schema handler or our config entity storage could use it to deal with config schema. Let's say that at least it shouldn't be a required part of the API. That was the main concern, I think.

Or do we assume that to be covered by (base) field definitions already?

This is not exactly an area where I am particularly knowledgeable, but I'd say that an entity object should be prevented from being populated with invalid values by typed-data checks / entity validation constraints. And yes, I'd expect a NoSQL implementation to rely on field definitions to decide what values should be actually stored.

jhodgdon’s picture

Coming here from #2316171: [meta] Improve DX of entity defining (if you want a UI)... When I look at this code, it just makes me cringe.

So here's my perspective.

On the main Entity class, I define the base fields for the entity in baseFieldDefinitions(). It seems like I should be able to say "Oh, and this field is never NULL" there, and cut out these types of lines:

+      // Marking the respective fields as NOT NULL makes the indexes more
+      // performant.
+      $schema['taxonomy_term_field_data']['fields']['weight']['not null'] = TRUE;
+      $schema['taxonomy_term_field_data']['fields']['name']['not null'] = TRUE;

Would that be hard to add to the field definition, and thereby pick up in the storage class?

Providing a way to define what the indexes are on the base class would also seem to be desirable. Yes, it is SQL-specific, but if the method for doing that was optional (with an empty return on the base Content Entity class or something), someone who's building a custom entity for a site that is using some other entity storage could leave it out, and someone building a generic entity for a contrib module could include it, right? Forcing developers to implement an extra class, just so that they can have database storage that is efficiently indexed, is not great for DX.

And this move -- splitting up the custom storage classes into two classes -- seems like it makes the DX even worse for entity developers, although it may make the back-end guts of the entity system cleaner (if you could explain better this in the issue summary instead of referring to several other issues, that would be nice!).

plach’s picture

Since this issue is part of a beta-blocking meta, we should stay strictly on topic (also my fault, sorry). The only argument of discussion here should be whether it's ok to move entity schema customization to a dedicated class. Everything else can be discussed in a separate non-critical issue.

About

And this move -- splitting up the custom storage classes into two classes -- seems like it makes the DX even worse for entity developers, although it may make the back-end guts of the entity system cleaner

Personally I feel the opposite: as @sun points out these classes are entity handlers, at least in theory. We have a consistent pattern of implementing generic behavior in a base class and extending it whenever needed with entity-type-specific code (and the more we can do in the base class the better, FWIW): if we go another way here, we are introducing an inconsistency that IMO makes DX worse. I really cannot see how an additional class is a DX burden: code has to be located somewhere and if my entity type needs some custom code, I see very little benefit in sacrificing consistency and cleanly separated code just to save one file. I thought we already made this decision when we decided to embrace PSR-0, after all.

effulgentsia’s picture

StatusFileSize
new32.96 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,231 pass(es), 4 fail(s), and 0 exception(s).
[ View ]
new35.31 KB

Offering this up as an alternative. Will follow up with a comment as well, but wanted to get bot started on this in the meantime.

Status:Needs review» Needs work

The last submitted patch, 12: entity_schema-part1b-custom-schema-handlers-12.patch, failed testing.

effulgentsia’s picture

Issue summary:View changes

Updated the issue summary per #10's request.

effulgentsia’s picture

Issue summary:View changes
plach’s picture

FWIW (since I am the proponent of option 1) I find the DX implied by #12 worse precisely because it introduces a pattern we use nowhere else AFAIK. I'd rather go with a customizer method on the storage class as proposed by @eff in the parent issue (let's call it option 3).

effulgentsia’s picture

Status:Needs work» Needs review
Issue tags:-Needs issue summary update
StatusFileSize
new32.98 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,264 pass(es).
[ View ]
new1.94 KB

Fix for the failures in #12. Next I'll post an option 3 to clarify #16.

effulgentsia’s picture

StatusFileSize
new9.28 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,266 pass(es).
[ View ]

Here's option 3. Will update the issue summary to mention it and list pros/cons of each option shortly.

effulgentsia’s picture

Issue summary:View changes
effulgentsia’s picture

Issue summary:View changes
effulgentsia’s picture

I find the DX implied by #12 worse precisely because it introduces a pattern we use nowhere else AFAIK

We use it in Drupal\Component\Plugin\Discovery\StaticDiscoveryDecorator and possibly other places. But you're right that it's not common, so I added that as a "con" of option 2.

I might have been biased in writing the pros/cons of the options, so please feel free to edit as needed.

plach’s picture

Issue summary:View changes

Thanks! I did only a small addition.

effulgentsia’s picture

FWIW, I don't have a strong preference for which option to go with. Although I flagged option 1 as a concern in #1498720-157: [meta] Make the entity storage system handle changes in the entity and field schema definitions, I did so more as a question that I thought needed discussion rather than being strongly unhappy with it myself. So, I'm eager to hear from people following this what you think, so we can reach a decision quickly.

effulgentsia’s picture

Issue summary:View changes
StatusFileSize
new34.12 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,564 pass(es).
[ View ]

Reuploading the OP (option 1), and referencing it via the issue summary, since otherwise, it's out of view.

yched’s picture

I kind of like option 3...
I totally see the pro's, and the listed con's do not strike me as really problematic ?

Clutters SqlEntityStorageInterface with a public method that isn't required by the other options

But the code that would live in this method has to live somewhere, so other options add that same clutter in different places :-)
if the method makes sense as part of "being an SqlEntityStorage" (and "altering the schema for the entity type" makes sense to me), then adding it to the interface hardly qualifies as clutter ?

Adds an additional dependency of ContentEntitySchemaHandler on the storage object. This object dependency already exists, but if we're ever able to refactor the other reasons for that, then this new one will be a barrier to removing the dependency entirely

Not sure I get this one. Feels ok for an SQLStorage to depend on a SchemaHandler ?

Side note : is there an issue somewhere where we rename ContentEntityDatabaseStorage to ContentEntitySqlStorage ? Coz it is really about Sql (proof : it implements SqlEntityStorageInterface *and* it has a SchemaHandler), not about "other databases like Mongo"

jhodgdon’s picture

Thanks for the very clear issue summary! Personally, given that explanation, I think we could go with any of the options.

effulgentsia’s picture

Not sure I get this one. Feels ok for an SQLStorage to depend on a SchemaHandler ?

Sorry if the summary is confusing on that. Yes, of course, storage depends on schema handling. What I meant to say is that option 3 adds another way in which $storage is passed as a dependency into $schema_handler, so that both depend on each other. Per the summary, that's already the case in HEAD, might not be that bad, and might not be fixable regardless of option chosen here, but it is something that tightens the dependency in that reverse direction.

effulgentsia’s picture

But the code that would live in this method has to live somewhere

Well, in option 1, it lives in a protected method, and in option 2 it lives in an anonymous function passed only to the schema handler.

the listed con's do not strike me as really problematic

Despite the reply above and in #27, I agree that neither of the two cons is very problematic.

Wim Leers’s picture

effulgentsia asked me to read this and post my opinion. Please note that I'm not an Entity API expert.

Having read all of this, and all three patches, I'm missing a fourth option. This is essentially about "entity schema optimization/customization when using a SQL storage back-end". All three options propose to somehow separate the schema customization to prevent "clutter" when not using an SQL-based storage back-end. AFAICT all 3 options make the DX of creating an entity type worse.

Why wouldn't a storage back-end be able to consider these optimizations/customizations as hints? If an index or a NOT NULL is meaningless in a storage back-end, surely it can choose to ignore it?

So I agree with sun in #5 and #8:

This looks a bit weird/inconsistent with the rest of the Entity system acrchitecture…

Why don't we declare/define/register the entity schema handler class like any other entity handler (formerly controller) in entity type plugin annotations?

Is the schema really guaranteed to be SQL-specific? Are we sure that NoSQL implementations won't require it too; e.g., in order to introspect (arbitrary) actual data/columns against expected data/columns? Or do we assume that to be covered by (base) field definitions already?

Therefore, I think an option 4 could be:

  1. don't add another class, add an optional method (perhaps via an optional interface, something like EntitySchemaHandlerOptimizationHintsInterface) to specify optimization hints
  2. let the SQL storage back-end use them, let other storage back-ends decide for themselves
  3. this does not cause worse DX
yched’s picture

All three options propose to somehow separate the schema customization to prevent "clutter" when not using an SQL-based storage back-end

From what I understand, no, the Schemas we're dealing with here are SQL schemas, and are about SQL-based storages.

Therefore, I think an option 4 could be: don't add another class, add an optional method [to the SchemaHandler used by the entity type]

Well, that's option 1 ? With the DX downside that I need to add yet another cutsom class to code my CustomEntityType ?

webchick’s picture

Issue summary:View changes

Adding some links for convenience.

plach’s picture

All three options propose to somehow separate the schema customization to prevent "clutter" when not using an SQL-based storage back-end.

Well, not exactly: the main goal is making the schema handler encapsulate entity schema handling. AAMOF non-SQL storages would not inherit from ContentEntityDatabaseStorage, thus they would not be affected by the decisions we are making here.

Option 4 is not that distant from Option 3, so my concerns remain the same :)

@yched:

Side note : is there an issue somewhere where we rename ContentEntityDatabaseStorage to ContentEntitySqlStorage ? Coz it is really about Sql (proof : it implements SqlEntityStorageInterface *and* it has a SchemaHandler), not about "other databases like Mongo"

We talked about that earlier but there was no issue, I created one: #2330091: Rename ContentEntityDatabaseStorage to SqlContentEntityStorage.

webchick’s picture

Thanks to Alex who walked me through this issue and the various choices. Of those, I prefer the patch at #18 (option 3). Rationale:

- It's the closest to what's already in HEAD, so not as invasive of a change.
- It's also an *improvement* over what's in HEAD. I would never have guessed that a function called "getSchema" let me customize schema. :) customizeSchema(), on the other hand, is extremely clear.
- I sympathize with the DX concerns yched raised about forcing entity authors to add Yet Another Freaking Class in option #1. And that's in the best-case. Worst-case, people forget that this is here and *don't* implement the various optimizations to make their entities more performant, which we don't want.
- So in that respect, I think the trade-off of more visibility is worth the extra public method that most callers won't need. Especially given that we've really not been meticulous about correct function visibility throughout the code base, instead defaulting to public everywhere "just in case." So this isn't really introducing a "new" problem. The existing function is public as well.
- Option #2 both doesn't resolve the confusing naming problem, and also requires 5 lines of boilerplate code around every schema handler, making it not my first choice either.

So given that, #3 is what I would go with, personally.

effulgentsia’s picture

Status:Needs review» Needs work
StatusFileSize
new9.28 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,611 pass(es).
[ View ]

Thanks, webchick! I think that makes option 3 the leading contender here. So, reuploading it to make it the most recent.

+++ b/core/lib/Drupal/Core/Entity/Sql/SqlEntityStorageInterface.php
@@ -23,4 +23,9 @@
+  /**
+   * @todo Document.
+   */
+  public function customizeSchema(&$schema);

Setting to "needs work" to resolve that todo.

Shortly, I'll open the next sub-issue that builds from this. Since this is only 9k, that issue will be able to proceed for a few days with those hunks included in it until this lands, so no need to stress about getting this in today, but some time soon would still be nice.

plach’s picture

@webchick:

- It's the closest to what's already in HEAD, so not as invasive of a change.

What we have in HEAD is not the final solution, not even close, so the amount of changes needed to get there IMHO should not affect the decision, given we already have a working patch for Option 1.

- It's also an *improvement* over what's in HEAD. I would never have guessed that a function called "getSchema" let me customize schema. :) customizeSchema(), on the other hand, is extremely clear.

This argument I don't understand: by the same rationale we would need a customizeForm() method on our entity form classes, instead of just extending them to provide entity-type specific behaviors. Same reasoning with any other entity handler. Option 1 is just consistent with rest of the entity system. An entity type may need to extend the base schema handler for other reasons, and so it would have schema customization in a method of the storage class, where it would be completely out of scope as the storage responsibility is not dealing with the schema, and the rest of the customization code in the entity-type specific child class. The customizeSchema() method approach does not scale and would be confusing in such a scenario.

- I sympathize with the DX concerns yched raised about forcing entity authors to add Yet Another Freaking Class in option #1. And that's in the best-case. Worst-case, people forget that this is here and *don't* implement the various optimizations to make their entities more performant, which we don't want.

I think the first sentence is the real argument, the (valid) concern that people may not realize that they need to customize their schema can be applied exactly in the same way to Option 3.

- So in that respect, I think the trade-off of more visibility is worth the extra public method that most callers won't need.

I am not really disturbed by it being public per se. What concerns me is that it's another blocker towards eliminating a circular dependency between the storage class and the schema handler. Circular dependencies usually are indicators of bad architecture, and bad architecture will bite us sooner or later.

To sum up I think what people brought up here are mainly three different DX concerns:

A) The need to create a new class (file?).
B) The need to customize entity schema.
C) The inconsistency between the base schema handler instantiation, which is explicitly performed in the base storage, and the other entity handlers, which can just be retrieved from the entity manager.

Concern A and B are really not overlapping, as the need for B is already in HEAD and is not solved (nor is meant to be) by any option we are discussing here. Concern C is valid but is slightly off-topic here; however I am mentioning it as it's an example of architectural inconsistency that is causing DX troubles.

That said, the focus for this issue should be around Concern A, i. e. introducing a new class. I kindly ask everyone here to drop for a moment the concern about the perceived burden of creating a new class and look at the big picture: we have a nice and simple entity system architecture, that is allowing us to customize every single aspect of entity handling. And that granularity is important because it allows us to override these behaviors selectively: each handler has its own area of responsibility, which makes understanding its code, maintaining and customizing it easier (DX++). Moreover modules can provide replacements just for the handlers required to implement their business logic, without needing to touch the rest. An alternative approach could be having a single monolithic class, dealing with every aspect of entity handling: that would save a lot of files, but would it really be an improvement over the current system? I don't think so.

But here we want to go a different route because otherwise we need Yet Another Freaking Class. Well, personally I don't care at all about that: creating a new class requires 30 seconds to write the boilerplate code (5 if you use an IDE) and 1 minute to document it. The hard part is understanding how to properly implement methods to achieve one's goals, and that's Concern B. We have multiple concrete drawbacks in not going with Option 1 (inconsistency, tighter coupling, lack of "architectural scalability"), while the only one we have with going with Option 1 is "I don't feel like creating a new class". Good OOP is all about isolating well-focused code in single classes with clearly defined relationships, my feeling is that Concern A is only caused by the fact that in Drupal we are still getting used to it.

Berdir’s picture

I agree with @plach. Option #1 feels like the correct one, that class is responsible for generating the schema, so alterations should be done there.

I do understand why people prefer option 3 when looking at the patch and this issue. One reason is that all those cases already have that method on the storage handler, so it seems easy to just rename it and slightly change how it works. However, having a custom storage handler will probably actually not be that common in contrib. I am working on a considerable amount of contrib projects with content entities, and so far, only few have the need for custom storage handlers (Part of that might be that they did not yet move custom queries to storage methods). For an entity type that wants indexes on it's fields* or other schema changes and does not yet have a custom storage, option #3 will not make much sense IMHO and #2 even less.

However, option #1 then actually has an even bigger flaw. And that is that you actually need *two* custom classes to define it, the storage which then returns a custom handler class. Which looks very convoluted right now with the lazy loading logic and the arguments we have to pass in.

I think we can find a solution to solve that, the easiest one going from the current code would be a property on the storage with the class name. One that I'm more and more considering is to just use the system we have for this: making it a handler. I was initially against that because the handler is more or less an implementation detail of the database/sql backend, but our entity handler system has two purposes. On one side, it is part of our public API (storage, view builder, forms) but the second part is being able to use it as a common API for entity type specific custom things. We've had the content translation handler as an example of that for a long time, and since recently, we have a second example and that's views data generation. And what's funny is that especially with #1740492: Implement a default entity views data handler, that will be *very very* similar to the schema handler. Both of them are using the primary definition (the fields and their properties) to generate a secondary definition (views data/schema).

So if we go with a handler named "sql_schema" or so, then it makes perfect sense to me for that to live next to a views_data handler. And others might follow, for example, we could still move the token definitions for entities to the same pattern.

We don't have to do this here already, but it might not be too hard to switch (we can keep getSchema() and route that through the entity manager just like current patches are doing) and would simplify the things an entity type has to do.

* It would be great to find a non-sql specific way of defining "I want to query this field (or this group of fields), so do whatever you have to do make this performant), then most current customizations would go away.

plach’s picture

What @berdir said

fago’s picture

Shortly summarized I agree with plach and berdir here.

So here the long variant:

A bit above my head at the moment, but requiring yet another extra class for each entity type is ideed not too great :-/

I know we have been working with the assumptions of "multiple classes = bad DX" for quite a while, but I've been realizing: it's wrong. Having multiple, clearly separated classes are perceived as *better* DX than a single class that mixes a lots of different concerns. Yes, I do think we think we have baken too much in single classes in the past sometimes (e.g. as xano once pointed out to me: baking field settings and default values form into field type classes :/). We - as Drupal community are not used to creating lots of classes - but developers in general are not scared of creating classes at all.

In that concrete case, it might look simpler when everything is a single class from a first glance. However, when developers are digging into it for whatever reason (it doesn't not work as expected?), the complexity and confusing code-flow of two classes playing ping-pong won't be perceived as simple or good DX. On the other side, a simple class inheriting the default and customizing a certain aspect of the system (schema), is rather simple and it's pretty clear how it works and plays together for every OO-aware dev (=every d8 dev).

- So in that respect, I think the trade-off of more visibility is worth the extra public method that most callers won't need.

As plach, I'd disagree that the overridden method is more visible than the overridden class.

- It's also an *improvement* over what's in HEAD. I would never have guessed that a function called "getSchema" let me customize schema. :) customizeSchema(), on the other hand, is extremely clear.

I'd disagree with that as well. While customizeSchema() is a good method name, in an OO world things are customized by extending a class and overriding a method. Imo, good code is embracing that instead of cluttering it with shiny names. With good code, I also mean simple and understandable code as there is no additional clutter for handling shiny names involved.

Why wouldn't a storage back-end be able to consider these optimizations/customizations as hints? If an index or a NOT NULL is meaningless in a storage back-end, surely it can choose to ignore it?

Problem is that the optimizations are specific to the schema produced by certain handler, thus might be invalid / not applicable in other scenarios.

* It would be great to find a non-sql specific way of defining "I want to query this field (or this group of fields), so do whatever you have to do make this performant), then most current customizations would go away.

Yeah, interesting idea. Sounds like an abstraction of indexes to the entity field level...

As plach pointed out already, option 3 mixes schema handler business into the storage class, making the whole separation quite pointless. What do we have it for then? We could achieve code-reuse and separation by a trait as well, while people could easily override things in their custom storage class. (Option 5, yeah)
- I doubt the tiny amount of memory saved when not loading that code *once* for all storage handlers makes a big difference. By using the trait we'd save per entity custom handler classes and instantiating handlers at all.
- Code can be re-used via the trait easily
- The trait is not as flexible/easily exchangeable as e.g. the handler class as proposed by berdir.
- The trait mixes schema-handler specific customizations into storage handlers, what makes switching to different schema handler approaches impossible given a pre-existing custom storage handler class (more on that below)

On one side, it is part of our public API (storage, view builder, forms) but the second part is being able to use it as a common API for entity type specific custom things. We've had the content translation handler as an example of that for a long time, and since recently, we have a second example and that's views data generation.

True, handlers are not per-se public API, but are a general way of doing per entity-type customizations of certain aspects. While it seemed unnecessary to expose storage internals as a separate handler initially, I must agree that doing it is quite a DX boost while I do not see any cons given it's clearly sql specific, thus +1 on that!
I was also thinking that the DX part that needs work here is *how* you specify the schema handler class, not having the class per-se. The schema handler as handler/controller (whaah) solves that just fine (and resolves the whaah :D).

Summing up, I think we should either do
- Option 1 and having it as separate handler/controller
- Option 5, the trait if we do not think we can separate things properly anyway

We've already started with the separation and it makes sense to me. Thinking a bit more about it though, it only makes sense when custom storage methods like NodeStorage::revisionIds() are written table mapping agnostic as well. This would allow you to keep the NodeStorage while switching out NodeSchemaHandler. Given people will/are experimenting with stuff like using doctrine orm, etc. with Drupal, this seems to be worthwhile - even if not done in core and left for entities in contrib to experiment with. Thus, the handler/controller approach with Option 1 seems to be the best way forward to me.

effulgentsia’s picture

Ok, the recent comments here helped convince me to prefer option 1 as well. But:

Having multiple, clearly separated classes are perceived as *better* DX than a single class that mixes a lots of different concerns.

The trick is identifying what a "different concern" is. We have form classes that combine the buildForm() and submitForm() methods into one class, and I think that's correct, because at the level of specific forms, I think building and submitting are part of the same concern, despite FormBuilder and FormSubmitter being separate classes, because at their level, they're separate concerns. Similarly, I also think it's good that entity type classes have a baseFieldDefinitions() method on the same class that implements the methods that make use of those fields.

Initially, I perceived the storage/schema split along similar lines: that even if it makes sense for ContentEntityDatabaseStorage and ContentEntitySchemaHandler to be separate classes, that at the entity-type-specific logic level, storage and schema are the same concern.

However, I changed my mind on that because unlike the case of specific-form classes and specific-entity-type classes, I now think that the per-entity-type storage/schema handling isn't really about providing new entity-type-specific information, but rather compensating for shortcomings of the base implementations. As already discussed, IMO, on the schema front, ContentEntitySchemaHandler should be able to add NOT NULL constraints to base fields that are single column and isRequired(), and field definitions should allow for some kind of storage-agnostic indexing hint. Meanwhile, on the storage front, anything that can be implemented with straight EntityQuery and entity field CRUD (e.g., FeedStorage::getFeedDuplicates()) should be moved to the entity class, leaving the storage handler for only those things that can't use those generic APIs for some reason. So, once our base implementations become more capable, needing to override the storage and schema handlers will become rarer, and per #39, those overrides in many cases can even be somewhat independent. Plus, since we already have "views_data" as a separate handler from "storage", and since both can depend on schema, that gives yet more reason to separate out schema as its own thing that can drive both. So, on balance, even though based on current code in HEAD, I think it's unfortunate for the majority of core's content entity types to require both a custom storage handler and a custom schema handler, I think that it makes sense to proceed with that anyway, and then improve our base implementations to not require so much overriding for common needs.

In other words, +1 for option 1 with the change to make the schema one an annotation-registered handler.

So if we go with a handler named "sql_schema"

How about "storage_schema" instead? In #2330121: Replace ContentEntityDatabaseStorage::getSchema() with a new EntityTypeListenerInterface implemented by SqlContentEntityStorageSchema, the public API of a schema handler is not SQL-specific, only its protected implementation is.

Also, how should we name the classes? I propose *StorageSchema to match the handler name and not have a "Handler" suffix to be consistent with *Storage classes.

@webchick, @yched: are you ok with option 1 based on the reasoning from #36 on?

Berdir’s picture

Discussed the name with @plach in IRC and came up with storage_schema too, we were both OK with that. Forgot to post an update about that.

effulgentsia’s picture

Status:Needs work» Needs review
StatusFileSize
new39.48 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,609 pass(es).
[ View ]

Implements option 1 with name in #41.

Berdir’s picture

+++ b/core/lib/Drupal/Core/Entity/ContentEntityDatabaseStorage.php
@@ -234,7 +232,8 @@ public function getSchema() {
     if (!isset($this->schemaHandler)) {
-      $this->schemaHandler = new ContentEntitySchemaHandler($this->entityManager, $this->entityType, $this);
+      $schema_handler_class = $this->entityType->getHandlerClass('storage_schema') ?: 'Drupal\Core\Entity\Schema\ContentEntitySchemaHandler';
+      $this->schemaHandler = new $schema_handler_class($this->entityManager, $this->entityType, $this);

What about putting the default in ContentEntityType, next to the storage default for content entities?

Also, the constructor/create() would be a bit more complex, but what about using $entityManager->getHander('storage_schema') and rely on EntityHandlerInterface to pass in the field definitions/storage?

Re few vs. many classes. I think it is not the number of classes/methods that you have to implement that is a problem, it's the discoverability of those classes and know what class(es) you have to implement to do something specific. And I think this is a good direction as it makes those classes visible in the annotation.

effulgentsia’s picture

Also, the constructor/create() would be a bit more complex, but what about using $entityManager->getHander('storage_schema') and rely on EntityHandlerInterface to pass in the field definitions/storage?

This gets into an interesting question about the nature of "subhandlers": i.e., CEDS::schemaHandler() has arguments it wants to pass to the CESH constructor, and do we want to require/assume that those are always identical to what can be retrieved from $container?

effulgentsia’s picture

What about putting the default in ContentEntityType, next to the storage default for content entities?

If someone decides to use a different 'storage' handler, then shouldn't that storage handler get to decide on a different 'storage_schema' default? This is another aspect of the "subhandler" dilemma.

plach’s picture

#40 makes total sense to me, thanks for the new patch :)

Just one doubt: since we are renaming child classes wouldn't it make sense to rename the base one to SqlContentEntityStorageSchema (for consistency with #2330091: Rename ContentEntityDatabaseStorage to SqlContentEntityStorage)?

plach’s picture

StatusFileSize
new45.55 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 75,611 pass(es).
[ View ]
new10.45 KB

Went ahead and performed the renaming. Feel free to revert if that's not ok.

effulgentsia’s picture

#47 looks good to me. @Berdir: given #44 and #45, would you be ok with #43 being a followup? Anyone care to RTBC?

Berdir’s picture

Status:Needs review» Reviewed & tested by the community

Yes, works for me, we can revisit that later.

I think this is RTBC then.

webchick’s picture

Status:Reviewed & tested by the community» Fixed

Ok, sounds like we've come to consensus. The option that's here is better than the original option #1 in that the fact that this class exists is now documented in the annotation, so module developers will be less likely to accidentally skip it. It doesn't resolve either mine nor yched's concerns about Yet Another Freaking Class, but yched's not back for another couple of weeks and I don't really care about that enough to block this.

Committed and pushed to 8.x. Thanks!

  • webchick committed 5289c3e on 8.0.x
    Issue #2326949 by plach, effulgentsia: Move entity-type-specific schema...
yched’s picture

Convinced by #37 & #40, all's good :-)

Just not fully sure about "a non-sql-specific notion of schema". I have no idea what that would be or what use you'd make of that. The current thing we have is very much an SQL schema used strictly for SQL things. I'm a bit wary of making it look like a generic concept when it's not.

Also, very interested by @Berdir's suggestion of astract indexes in #37.

plach’s picture

Thanks!

effulgentsia’s picture

Just not fully sure about "a non-sql-specific notion of schema".

getSchema() returns a Schema API array that is bound to SQL, but getSchema() is being removed in #2330121: Replace ContentEntityDatabaseStorage::getSchema() with a new EntityTypeListenerInterface implemented by SqlContentEntityStorageSchema. At which point "storage schema handlers" become just something that can get notifications of entity type and field storage definition lifecycle events. In other words, "schema" in the more abstract sense of "whatever preparation is needed by the storage backend to be capable of storing entities with a given set of field storage definitions". Maybe for all current noSql databases, that preparation will always be "nothing", but who knows.

effulgentsia’s picture

Or, put more succinctly, the "schema" (abstractly) of a content entity type is its field storage definitions and the "storage schema handler" is about making and keeping the storage backend capable of and optimized for storing entities with that schema. For SQL backends, that involves mapping the abstract entity schema to a SQL schema.

xjm’s picture

effulgentsia’s picture

Status:Fixed» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.