Problem/Motivation

In Drupal 8, entity IDs can be strings.

taxonomy_entity_index table assumes they're all integers which results in an SQL exception if this entity-type is enabled

Proposed resolution

Add a second table taxonomy_entity_index_string for entities with string IDs.
Alter the views integration to join this table in that scenario.
Test coverage using EntityTestStringId in core.

See \Drupal\Core\Field\Plugin\Field\FieldType\EntityReferenceItem::propertyDefinitions for sample code to work out what sort of column the ID should be.

Remaining tasks

  • Write tests
  • Decide how - if - to support MySQL with ROW_FORMAT=compact (< 5.7) . See #8 and #15. Consider a popular provider like Acquia uses 5.6, on D9 too :( (update: Acquia is upgrading to 5.7)

User interface changes

API changes

Data model changes

Release notes snippet

Comments

larowlan created an issue. See original summary.

gambry’s picture

Assigned: Unassigned » gambry

working on this

gambry’s picture

Status: Active » Needs work
StatusFileSize
new11.8 KB

This is a draft of work, against 1.0-alpha01 version.
I haven't fully tested everything, when done I'll provide a full patch against 8.x-1.x-dev.

In the meanwhile few info about the work:

  • As pretty much everything is procedural, and inside the .module file, I preferred to be consistent and create a `taxonomy_entity_index_get_entity_index_table()` function in there, which return the right table according to the provided entity type machine name
  • I had to refactor a bit taxonomy_entity_index_reindex_entity_type() because it didn't support bundle-less entity types, as well as the selection of the bundle field was a bit of hacky. Now it uses Drupal API.

An early quick review would be good, but I hope to submit the final patch within the weekend.

gambry’s picture

StatusFileSize
new1.29 KB
new11.8 KB

Ops, sorry. small mistake already found.

larowlan’s picture

  1. +++ b/taxonomy_entity_index.module
    @@ -184,35 +189,30 @@ function taxonomy_entity_index_get_taxonomy_field_names($entity_type_id) {
    +    if ($entity_type->getBundleEntityType() && !empty($context['sandbox']['bundles'])) {
    
    @@ -222,13 +222,8 @@ function taxonomy_entity_index_reindex_entity_type($entity_type_id, &$context) {
    +  if ($entity_type->getBundleEntityType() && !empty($context['sandbox']['bundles'])) {
    

    entities can have bundles without a bundle entity, I think this should use ->hasKey('bundle') instead 🔧

  2. +++ b/taxonomy_entity_index.module
    @@ -288,3 +283,37 @@ function taxonomy_entity_index_entity_views_integrable() {
    + * with either 'integer' or 'string' IDs. Use this function as helper to select the
    

    nit: > 80 🧐

  3. +++ b/taxonomy_entity_index.module
    @@ -288,3 +283,37 @@ function taxonomy_entity_index_entity_views_integrable() {
    + *   The entity type machine name to be used in the taxonomy entity index query.
    

    nit: we call them `entity type IDs` not machine names 🧐

  4. +++ b/taxonomy_entity_index.module
    @@ -288,3 +283,37 @@ function taxonomy_entity_index_entity_views_integrable() {
    +      $target_id_data_type_suffix = '';
    

    nit: you could just return here and avoid the need for the variable $target_id_data_type_suffix 🧐

  5. We really should be adding tests for this, but unfortunately the issue to add tests is not in yet - perhaps a reciprocal review over there would help? see #3104318: Create tests and a config schema for taxonomy_entity_index.settings
larowlan’s picture

oh and thanks for working on this, looks great

gambry’s picture

Title: Entity IDs can be strings in Drupal 8 » [PP-1] Entity IDs can be strings in Drupal 8

1. Good to know. TBH I was looking for a ->hasBundles() in the API, so thanks for this.

2.,3.,4. 👍

5. Ok. With tests in place then it is much easier to work on this. I'll have a look at #3104318: Create tests and a config schema for taxonomy_entity_index.settings and do the work in here + tests on top of work in there.

gambry’s picture

Work on #4 generates the following error on MySQL <= 5.6:

SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.

That's due the primary key being > 767 bytes and even though the server may have the correct setup with:

innodb_large_prefix            = 1
innodb_file_format             = Barracuda

The problem is the default row format which in MySQL 5.6 is hardcoded to COMPACT. Starting from MySQL 5.7 you can set default row format by changing global variable innodb_default_row_format, which by default is already DYNAMIC.

Considerations:
- we could just set the ROW_FORMAT=DYNAMIC for this table, even though I'm not sure this will be possible through Schema API
- however as Drupal supports from 5.5.* we can't just simply change the ROW_FORMAT=DYNAMIC on creating the table
- but 5.5 is not supported any more since 2018
- but that doesn't mean anything, as users may still be used despite being unsupported as Drupal still allows it.
- the only alternative I can think of is removing the primary key from the schema ☹.

Thoughts?

colan’s picture

Maybe I'm a bit late to the party, but why not have a single table? Can we simply alter the existing schema?

gambry’s picture

I don't think it's wise to use a varchar both for string and integer entity IDs, as I suppose this is your suggestion @colan?
We will lose all the optimisation against querying integer IDs, which is the majority of the use cases.
And also I'm not sure sqllit or postgresql typecast easily like MySQL does? I know PostgreSQL is strict with typing, but I may be wrong on this.

To be added to #8 consideration: #3109534: Raise the minimum MySQL version to 5.7.8 and MariaDB version to 10.2.7 in Drupal 9.

colan’s picture

I don't think it's wise to use a varchar both for string and integer entity IDs, as I suppose this is your suggestion @colan?

Indeed.

We will lose all the optimisation against querying integer IDs, which is the majority of the use cases.

Fair enough.

And also I'm not sure sqllit or postgresql typecast easily like MySQL does? I know PostgreSQL is strict with typing, but I may be wrong on this.

No idea. I'm in the MySQL camp.

gambry’s picture

Issue tags: +Needs tests
StatusFileSize
new12.58 KB

Attached patch against 8.x-1.x (and also applies nicely to 8.x-1.1 version).
Feedbacks from #5 have been addressed, as well as removed the primary_key from _string table as per considerations in #8.

I can't create a clean interdiff due re-rolling original work against few commits on -dev branch. However the main - and probably only change is removing the primary key on taxonomy_entity_index.install:

  // Unset the primary key for _string table as the index is too big for MySQL
  // version < 5.6 and for 5.6 where `innodb_default_row_format` is set to
  // COMPACT (default).
  unset($schema['taxonomy_entity_index_string']['primary key']);

This still Needs Work as it would be ideal to have test coverage, but currently blocked by #3104318: Create tests and a config schema for taxonomy_entity_index.settings.

gambry’s picture

Assigned: gambry » Unassigned

For some reason this is still assigned to me, sorry about that. Now that D9 has been released and the MySQL minimum version requirement has been raised we probably have more space to tackle this properly.

gambry’s picture

Title: [PP-1] Entity IDs can be strings in Drupal 8 » Entity IDs can be strings in Drupal 8
gambry’s picture

Worth mentioning an other option could be to look if the database/table already use ROW_FORMAT=DYNAMIC and if it doesn't we can check if the DB supports it but it's not the default format (MySQL 5.6) by making sure GLOBAL innodb_file_format=Barracuda, GLOBAL innodb_file_per_table=1 and GLOBAL innodb_large_prefix=1. In this two scenario we can leave the primary key in, and if any of this fails we unset it.

Of course this whole conversation applies to MySQL-like DBs. It could be SQLite, PostreSQL, etc. may require their own tweaks and so increase the complexity.

Thoughts?

gambry’s picture

StatusFileSize
new12.78 KB

This is a re-roll against 8.x-1.x (applies to 1.3 nicely).

gambry’s picture

Issue summary: View changes

Adding the remaining tasks to IS.

gambry’s picture

Issue summary: View changes

Rephrasing hot to support MySQL <= 5.7

gambry’s picture

From @larowlan :

You can limit the length of the index. http://code-epicenter.com/prefixed-index-in-mysql-database/

Indeed from Schema API.

A key column specifier is either a string naming a column or an array of two elements, column name and length, specifying a prefix of the named column.

Let's give it a try!

gambry’s picture

Issue summary: View changes

Acquia is updating to 5.7. I'm not sure if this changes the thinking, but updating the IS as it mentioned Acquia still being in 5.6.

-J-o-n-n-y-’s picture

Re-rolling patch from #16 as we still need string support for taxonomy IDs with this module.

el7cosmos’s picture

rerolling patch against latest head

sarsion’s picture

Status: Needs work » Needs review
StatusFileSize
new14.76 KB

Rerolling patch against latest head (8.x-1.20 as of now).

We were missing a condition set to catch the new strict integer matching for updates and deletions. In our use case, reindexing meant that the taxonomy_entity_index_string would be completely wiped of the entity type. It also meant any rows that were deleted along with entity revisions would not be possible to regenerate. Updated this to check which table is being processed and strict compare type based on that.

sarsion’s picture

StatusFileSize
new14.4 KB

Making a quick adjustment to my last patch, as I had left a return commented out for taxonomy_entity_index_entity_update.

Without it, this would throw errors like:

Getting the base fields is not supported for entity type View.
Getting the base fields is not supported for entity type display.
damienmckenna’s picture

Should this be a feature request?