Attempting to split #1498720: [meta] Make the entity storage system handle changes in the entity and field schema definitions into some logical chunks. Here's part 1.

Problem/Motivation

The Field SQL Storage API is currently marked as private. The rationale is that in D7 fields are storage-agnostic so using that API in contrib modules introduces storage-specific code that is going to break as soon as a different storage is used. However this does not account for cases where assuming a SQL storage is legitimate for various reasons. The most obvious example is Views, which has an architecture that supports alternative storages, but which is heavily "biased" toward SQL. Another typical situation is custom code perfectly knowing which storage is being used, and relying on that information to achieve better performance in certain cases by querying directly the database. Both of these cases rely on an API that is unlikely to change but isn't guaranteed not to.

Proposed resolution

Make the Field SQL Storage API a public API, that is confined to the SQL-specific part of the core entity storage API.
This moves the following methods to the (default) table mapping implementation, which is responsible for representing how fields are mapped to tables:

ContentEntityDatabaseStroage::_fieldColumnName()
  -> TableMappingInterface::getFieldColumnName()
FieldStorageConfig::getReservedColumns()
  -> TableMappingInterface::getReservedColumns()
ContentEntityDatabaseStorage::_fieldTableName()
  -> DefaultTableMappingInterface::getDedicatedDataTableName()
ContentEntityDatabaseStroage::_fieldRevisionTableName()
  -> DefaultTableMappingInterface::getDedicatedRevisionTableName()

A new DefaultTableMappingInterface is introduced to deal with the specific implementation for core storage: other storage classes might want to provide completely different ways to deal with SQL tables and implement completely different table layouts (see #2274017: Make SqlContentEntityStorage table mapping agnostic ). In that case the concepts of dedicated tables would be meaningless.

Remaining tasks

Reviews

User interface changes

None

API changes

None, just additions.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Status: Needs review » Needs work

The last submitted patch, entity_schema-part1-table-mapping.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
72.16 KB
24.97 KB

Reverted some changes that are out of scope for this issue.

effulgentsia’s picture

And a couple more.

effulgentsia’s picture

Sorry for the noise. I think these are the final reversions. The rest of the patch looks completely in scope for this issue title to me.

effulgentsia’s picture

+++ b/core/lib/Drupal/Core/Entity/Sql/DefaultTableMappingInterface.php
@@ -0,0 +1,88 @@
+   * @private Calling this function circumvents the entity system and is
+   * strongly discouraged. This function is not considered part of the public
+   * API
+  public function getDedicatedDataTableName(FieldStorageDefinitionInterface $storage_definition, $is_deleted = FALSE);
+  public function getDedicatedRevisionTableName(FieldStorageDefinitionInterface $storage_definition, $is_deleted = FALSE);
+  public function getFieldColumnName(FieldStorageDefinitionInterface $storage_definition, $column);

That @private is on all 3 functions. It just got copy-pasted as part of the move, but if we're putting these methods into an interface, does that mean we should also remove the @private? If not, should we remove them from the interface? Tricky, since hook_views_data() implementations need to call these, but the comment is correct that most other code should use the entity API and not deal with these directly.

Status: Needs review » Needs work

The last submitted patch, 4: entity_schema-part1-table-mapping-4.patch, failed testing.

effulgentsia’s picture

Status: Needs work » Needs review
FileSize
69.69 KB
544 bytes
yched’s picture

Separating those changes out of #2274017: Make SqlContentEntityStorage table mapping agnostic makes a lot of sense IMO.

#2183231: Make ContentEntityDatabaseStorage generate static database schemas for content entities comments #365 - #368 and #1498720: [meta] Make the entity storage system handle changes in the entity and field schema definitions had discussions about refactorings in the EntityStorage / SchemaHandler / TableMapping area, that back then were deferred to #2274017: Make SqlContentEntityStorage table mapping agnostic .

Since this issue here is now where the refactoring happens, it would be worth rechecking the items in the issues / comments mentioned above ?

yched’s picture

+++ b/core/lib/Drupal/Core/Entity/Query/Sql/Tables.php
@@ -220,11 +221,13 @@ protected function ensureEntityTable($index_prefix, $property, $type, $langcode,
+      /** @var \Drupal\Core\Entity\Sql\DefaultTableMappingInterface $table_mapping */
+      $table_mapping = $this->entityManager->getStorage($entity_type_id)->getTableMapping();
+      $table = $this->sqlQuery->getMetaData('age') == EntityStorageInterface::FIELD_LOAD_CURRENT ? $table_mapping->getDedicatedDataTableName($field) : $table_mapping->getDedicatedRevisionTableName($field);

Not sure I get the relation / differences between TableMappingInterfaces and DefaultTableMappingInterface, and what makes us sure that the mapping we get here (and in other places in the patch that force-cast to DefaultTableMappingInterface) really is a "Default" one on which we can safely call the "Default" specific methods ?

More generally, this split issue could use a summary / overview of its own ;-)

effulgentsia’s picture

Not sure I get the relation / differences between TableMappingInterfaces and DefaultTableMappingInterface

From plach in #1498720-160: [meta] Make the entity storage system handle changes in the entity and field schema definitions answering that question when I asked it there:

Because this is the specific implementation for core storage: other storage classes might want to provide completely different ways to deal with SQL tables and implement completely different table layouts (see also #2274017: Make SqlContentEntityStorage table mapping agnostic ). In that case the concepts of dedicated/shared tables would be meaningless.

And my response in #1498720-161: [meta] Make the entity storage system handle changes in the entity and field schema definitions:

Ah, thanks for that explanation. In that case, should getReservedColumns() and getFieldColumnName() move to TableMappingInterface? Those seem generic to me rather than bound to any particular layout (dedicated/shared) strategy.

Now, I did not carry over into this issue the allowsSharedTableStorage() and requiresDedicatedTableStorage() methods from that issue, because those are new and should be evaluated there along with what needs to call them. I only carried into this issue: getDedicatedDataTableName(), getDedicatedRevisionTableName(), and the ones in the above block quote. With hook_views_data() implementations being the most common caller of those methods outside of the entity component itself. We have #2079019: Make Views use SqlEntityStorageInterface open for that, but AFAICT, not a whole lot of traction there. However, does Views need to care whether these tables are "dedicated" or not? Or does it only need to know the table that a field is in? In which case, should we move these methods from DefaultTableMappingInterface into TableMappingInterface, and rename them to get(Field?)DataTableName() and get(Field?)RevisionTableName()? In which case, we wouldn't be introducing a DefaultTableMappingInterface in this issue, but may in the parent issue for code that needs to call allowsSharedTableStorage()/requiresDedicatedTableStorage(), at which point we could also come up with a better name than DefaultTableMappingInterface to convey that particular layout strategy concept.

@plach: thoughts?

More generally, this split issue could use a summary / overview of its own ;-)

I can take a stab at that tomorrow, but wasn't involved in all of the history that led to this, so it would be great if someone who has can contribute to that as well.

plach’s picture

However, does Views need to care whether these tables are "dedicated" or not? Or does it only need to know the table that a field is in?

The optimal scenario would be Views being able to rely on the table mapping, without making assumptions on the implemented table layout. This is what I originally envisioned and what made me initially think to the table mapping data structure. And, yes, in this case Views would only need to know in which table a particular field is stored, as join information can probably be derived by PK/FK data, which for entities is known in advance.

In which case, should we move these methods from DefaultTableMappingInterface into TableMappingInterface, and rename them to get(Field?)DataTableName() and get(Field?)RevisionTableName()?

We already have a method in the table mapping which lists all the fields stored in a particular table TMI::getFieldNames(). I guess this could be enough, although an inverse method could be useful too. The two above, even renamed, still assume we have dedicated tables for fields, which might not be the case for table layouts different from the one adopted by core. So I wouldn't move them in the base interface.

[...] at which point we could also come up with a better name than DefaultTableMappingInterface to convey that particular layout strategy concept.

Better names for that interface (and class) are totally welcome, I just hadn't the time/will to think to something better. The idea is that those describe the default (core) table layout. There is nothing special/characteristic about it aside from the fact that it's our usual (default?) choice for SQL storage.

plach’s picture

Btw, the issue where Views integration will be rewritten to exploit table mappings is probably #1740492: Implement a default entity views data handler.

plach’s picture

plach’s picture

Issue summary: View changes

I will perform these changes as soon as the new branches are set-up:

  1. +++ b/core/lib/Drupal/Core/Entity/Sql/DefaultTableMappingInterface.php
    @@ -0,0 +1,88 @@
    +  public function getReservedColumns();
    ...
    +  public function getFieldColumnName(FieldStorageDefinitionInterface $storage_definition, $column);
    

    As @eff was pointing out, it makes sense to move these two to the parent class as they provide sufficiently general concepts.

  2. +++ b/core/lib/Drupal/Core/Entity/Sql/DefaultTableMappingInterface.php
    @@ -0,0 +1,88 @@
    +   * @private Calling this function circumvents the entity system and is
    

    As @eff was pointing out, we should remove the @private tags, but we should keep the warnings about this being SQL-specific, although text should be slightly adjusted to be less intimidating :)

plach’s picture

Issue summary: View changes
plach’s picture

Issue summary: View changes
plach’s picture

Issue summary: View changes
effulgentsia’s picture

Thanks for the issue summary!

The two above, even renamed, still assume we have dedicated tables for fields, which might not be the case for table layouts different from the one adopted by core. So I wouldn't move them in the base interface.

Why would they assume that? If the method was named TMI::getFieldDataTableName(), and it's passed a FSDI, can't the DTM implementation of the method figure out if it's dedicated or shared, and return the correct name accordingly, while some other implementation can do whatever is appropriate based on its something-unrelated-to-dedicated-vs-shared strategy? Why are we making the caller have to know the layout strategy?

plach’s picture

Oh, sorry, now I get what you mean. The Field part in the method name, combined with Data/Revision, somehow misleaded me. What you are proposing makes sense to me: although there might be table layouts where only revisions are stored, so we would have no distinction between data and revision tables. However the concept of revision is baked deep in enough in our API, that I guess it shouldn't be harmful if it went in the base interface: we could always return the same value for both TMI::getField*TableName() methods.

yched’s picture

Why are we making the caller have to know the layout strategy?

+1 on #18, that looks flawed.

[edit: crosspost with @plach - does #19 mean DefaultMappingInterface goes away ? Yay if so]

plach’s picture

does #19 mean DefaultMappingInterface goes away ? Yay if so

We could be able to get rid of it here, but as @eff has mentioned, more methods exist on it in the parent issue, so I am not completely sure we will be able to generalize them.

An addition to #19:

An alternative could be having a single method returning an array of tables where a field is stored (nid is stored in 4 tables by default) and leaving more specialized methods to sub-interfaces.

plach’s picture

What is concerning me about baking in the base interface concepts that are too close to our specific implementation, is that API consumers (including us) might forget there might be alternative implementations and write code that is not generic enough. This is why I initially felt the need for a sub-interface for our logic.

For example, if we implement #19 then someone might retrieve the two table names and not account for the fact that they might be equal, possibly causing all sorts of breakage in that scenario.

yched’s picture

API consumers (including us) might forget there might be alternative implementations and write code that is not generic enough

Hmm - isn't that exactly the problem with the current patch ? See #9.

effulgentsia’s picture

The Field part in the method name, combined with Data/Revision, somehow misleaded me.

An alternative could be having a single method returning an array of tables where a field is stored (nid is stored in 4 tables by default) and leaving more specialized methods to sub-interfaces.

What if we address both of those by naming the methods getPrimaryDataTableName() and getPrimaryRevisionTableName()? Those are the ones I think we should have on the main interface since it's what Views needs. As to additional methods to return all the tables for a field, we already have in HEAD TMI::getTableNames() and TMI::getFieldNames($table_name), so edge case callers can do their own iteration if the really need to, so I suggest we hold off on adding getAllTableNames(FSDI) until we have a solid use case for it.

For example, if we implement #19 then someone might retrieve the two table names and not account for the fact that they might be equal, possibly causing all sorts of breakage in that scenario.

That would be a bug with the caller and should be fixed there.

We could be able to get rid of it here, but as @eff has mentioned, more methods exist on it in the parent issue, so I am not completely sure we will be able to generalize them.

I think that's fine, because once we're clear on what method names really are strategy-specific and what example callers of them are, we can design and name accordingly.

plach’s picture

Disclaimer: I am not trying to keep my implementation at all costs, I am just trying to make sure I express all the thoughts I put behind it. I am totally fine with going with @eff's proposal if there's consensus that it's better.

Hmm - isn't that exactly the problem with the current patch ? See #9.

Well, all the current Field-Views integration code is currently assuming our core SQL storage is in place, so it's perfectly fine to assume the default table mapping is returned. This is going to change in #1740492: Implement a default entity views data handler, and probably in that class and sons we shouldn't rely on methods belonging to DTMI.

What if we address both of those by naming the methods getPrimaryDataTableName() and getPrimaryRevisionTableName()? Those are the ones I think we should have on the main interface since it's what Views needs.

I am not sure the final version of the Entity–Views integration code will need even those methods. What about leaving them where they are for now, that is leaving the core-specific version on DTMI and see how needs evolve there? If it turns out we need them, moving them in the base interface can be done without API breaks with minimal BC code, but if instead we don't actually need them, we could avoid exposing the concept of revision table in the base interface, which still feels a bit wrong to me. AAMOF:

That would be a bug with the caller and should be fixed there.

I agree that it would be the caller's fault, but IMHO we wouldn't be providing the most intuitive DX possible here.

effulgentsia’s picture

Disclaimer: I wrote this up before reading #25, but am posting it as-is, and we can reconcile in additional comments.

What is concerning me about baking in the base interface concepts that are too close to our specific implementation

Ok, so let's maybe think through what kinds of implementation choices we want to allow vs. what we want callers to be able to rely on. Here's some of my initial thinking on that:

  1. It is already part of the Entity Field API that field values can vary by revision and language, and nothing else, so I think it's okay to bake the concepts of revisions and language into the table mapping API.
  2. It is already part of the Entity Field API that field items can consist of multiple columns, as defined by FieldItemInterface::schema() (maybe in the future, columns will be determinable entirely from FieldItemInterface::propertyDefinitions(), but I think this issue can ignore that for now).
  3. I suggest that for any given field, all table mapping implementations MUST provide one table that contains all of that field's values for a given field (i.e., for all revisions and all languages for which a value exists). In other words, it can't solely provide different tables per-revision or per-language without also providing one table with all of the data. I think we need this requirement to support Views.
  4. I suggest that for any given field, table mapping implementations MAY provide additional tables for either the default revision only, or the default language only, or both. In HEAD, we do this for default revision, but not for default language, but alternate implementations should be able to explore other permutations, and be able to make different choices on a per-field basis. Note that any data in these tables is a duplicate of what's also available in the full table, but these smaller tables allow for more efficient JOIN queries.
  5. We might want to consider also allowing table mapping implementations to provide additional tables for a specific revision or a specific language (i.e., just for revision 2 or language 'en', as opposed to just the default ones).
  6. I suggest that table mapping implementations MAY combine multiple fields into a single table, however they want. For example, fields a and b into table x, field c into table y, and fields d, e, and f into table z, etc. And that the combination MAY vary per-table (i.e., fields can be in separate tables for the cross-revision table, fields a and b combined in the default-revision-only table, and fields a and c combined in the default-language-only table).
  7. I suggest that if a field appears in multiple tables, then all of its columns MUST exist in each of those tables, and be named the same in each of those tables (I think that would make the API simpler, and that there's no good use case for deviating from that).

I'll post another comment with some method suggestions to represent the above, but posting this first in case anyone wants to give feedback on it in the meantime.

plach’s picture

#26 looks promising :)

Posting a new patch to address #14 meanwhile.

effulgentsia’s picture

Ignoring #26.5, here's what I think TMI could look like:

getAllTableNames() // rename HEAD's getTableNames()

getAllColumnNames($table_name) // rename HEAD's getAllColumns()

getTableNames(FSDI)

getColumnNames(FSDI) // no change from HEAD

getColumnName(FSDI, $property_name) // according to the docs of FieldItemInterface::schema(), we can assume that everything it returns as a "column" is also a property name

getPrimaryTableName(FSDI, $default_revision_only = FALSE, $default_language_only = FALSE)

isTableSingleRevision($table_name)

isTableSingleLanguage($table_name)

getFieldNames($table_name) // no change from HEAD

And if we want to support #26.5, then we would need to change getPrimaryTableName() to something like:

// $revision and $langcode can each be one of:
// - ALL
// - DEFAULT
// - a specific revision ID or language code
getPrimaryTableName(FSDI, $revision = ALL, $langcode = ALL)
effulgentsia’s picture

Re #25, I'd also be okay with punting #28 to a separate, post-beta issue, but if we think something like that is at least on the right track, then perhaps for this issue, we could ditch DefaultTableMappingInterface, and leave specific strategy methods on DefaultTableMapping, without interface definitions, and let callers continue to assume they have that specific implementation class to interact with, until we get TMI finished.

plach’s picture

#28 looks correct and comprehensive to me, my only doubt is whether we would actually need all those methods.

#29 works for me :)

plach’s picture

effulgentsia’s picture

Status: Needs review » Reviewed & tested by the community

#31 looks great to me.

  • webchick committed dcc880b on 8.0.x
    Issue #2326719 by plach, effulgentsia: Move pseudo-private table mapping...
webchick’s picture

Status: Reviewed & tested by the community » Fixed

Looks straight-forward to me. Lots of the same changes made lots of times. :)

Committed and pushed to 8.x. Thanks!

xjm’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

plach’s picture