Problem/Motivation

Database identifiers resolution has grown recently, with escaping, caching and introduction of the square brackets syntax to resolve the problem of generic SQL statements to be ported across platforms using different quoting characters. This on top of the concept of prefixing tables.

All this has been added on the Connection class that is IMHO a bit overdoing.

Also, some platforms are hitting identifier length issues like e.g. #571548: Identifiers longer than 63 characters are truncated, causing Views to break on Postgres.

Developers should not be worried by exceeding identifiers length or using wrong identifiers; the database API should be self-sufficient in receiving input and cleansing/transforming it into machine usable information; failing as a last resort.

Proposed resolution

Decouple identifier management, including prefix management, in a class of its own, that platforms can override to manage their own restrictions.

Therefore:

  1. Table names, database name, column names, index names in a db are 'identifiers'.
  2. 'Identifiers' are subject to db-specific limitations (length, chars allowed, etc)
  3. In Drupal, 'Identifiers' are value objects. Each identifier has:
    • a type (e.g. table, column, alias, etc)
    • a raw value that include non SQL allowed characters and/or the quote characters, also in a <database>.<schema>.<table> format (e.g. config, key_value, mydb."key_value", with$$invalidchar, etc.)
    • a canonical value, i.e. the raw value cleansed of unallowed characters (from the example above: config, key_value, mydb.key_value, withinvalidchar, etc.)
    • a machine-usable value, i.e. the canonical value transformed to be used on the db-specific SQL query - including applying quote chars, prefixing, and shortening if necessary (from the example above, and assuming a prefix 'abba': "abbaconfig", "abbakey_value", "mydb"."key_value", "abbawithinvalidchar", etc.)
  4. each db-driver implements an 'IdentifierHandling' class that parses/transforms a raw value into the relevant value object, including a local cache to prevent processing again and again each raw value.

In this issue, only focus on table identifiers, which also require database and schema identifiers to be fleshed out. In follow ups, this approach could be extended from tables to columns, aliases, indexes etc.

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Issue fork drupal-3200743

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

mondrake created an issue. See original summary.

mondrake’s picture

StatusFileSize
new20.98 KB

A first patch, splitting the identifiers' resolution in a class of its own. I'd like to move table prefix management here too.

mondrake’s picture

Status: Active » Needs work
StatusFileSize
new20.98 KB
mondrake’s picture

StatusFileSize
new36.38 KB

Progress with SQLIte, MySql and PgSql will fail. Mainly trying to move prefixes to the new class, deprecating Connection::tablePrefix. Ideally we could get to deprecating also ::setPrefix.

mondrake’s picture

StatusFileSize
new38.38 KB
mondrake’s picture

StatusFileSize
new38.46 KB
mondrake’s picture

StatusFileSize
new40.01 KB

mondrake’s picture

mondrake’s picture

Assigned: mondrake » Unassigned

Will come back on this later.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

mondrake’s picture

Version: 9.3.x-dev » 10.0.x-dev
Status: Needs work » Postponed

I think this is D10 material, actually.

andypost’s picture

Status: Postponed » Needs work

Soon it will be open

andypost’s picture

mondrake’s picture

Status: Needs work » Postponed
Related issues: +#3124382: Remove per-table prefixing

Needs to do #3124382: Remove per-table prefixing first, anyway.

mondrake’s picture

Version: 10.0.x-dev » 11.x-dev

Refreshed a bit.

mondrake’s picture

Status: Postponed » Active
mondrake’s picture

Title: [WIP] Decouple identifier management from database Connection, introduce an IdentifierHandler class » Decouple identifier management from database Connection, introduce an IdentifierHandler class
Issue tags: +DX (Developer Experience)

So, here is the idea.

Developers should not be worried by exceeding identifiers length or using wrong identifiers; the database API should be self-sufficient in receiving input and cleansing/transforming it in machine usable information; failing as a last resort.

Therefore:

  1. Table names, database name, column names, index names in a db are 'identifiers'.
  2. 'Identifiers' are subject to db-specific limitations (length, chars allowed, etc)
  3. In Drupal, 'Identifiers' are value objects. Each identifier has:
    • a type (e.g. table, column, alias, etc)
    • a raw value (e.g. 'config', 'key_value', 'mydb."key_value"', 'with$$invalidchar', etc)
    • a canonical value, i.e. the raw value cleansed of unallowed characters (e.g. from the above 'config', 'key_value', 'mydb.key_value', 'withinvalidchar', etc)
    • a machine value, i.e. the canonical value transformed to be used on the db-specific SQL query - including applying quote chars, prefixing, and shortening if necessary (e.g. from the above, assuming a prefix 'abba': '"abbaconfig"', '"abbakey_value"', '"mydb"."key_value"', '"abbawithinvalidchar"', etc)
  4. each db-driver implements an 'IdentifierHandling' class that parses/transforms a raw value into the relevant value object, including a local cache to prevent processing again and again each raw value.

This works now for MySql and SQLite; the MySql implementation shortens table names that are longer than 64 characters using hashing techniques.

PgSql will follow with its own 63 chars max limitation.

In follow ups, this approach could be extended from tables to columns, aliases, indexes etc.

mondrake’s picture

Added pgsql conversion. Tests are now green on all 3 core drivers. The concept is in and implemented; now to cleaning, BC and documenting.

mondrake’s picture

Issue summary: View changes
Status: Active » Needs review

Ready for review now.

mondrake’s picture

Issue summary: View changes
mondrake’s picture

Assigned: Unassigned » mondrake
Status: Needs review » Needs work

The machineName in the value objects should be stored unquoted. On that.

mondrake’s picture

Assigned: mondrake » Unassigned
Status: Needs work » Needs review

Done #22, back to NR.

daffie’s picture

Issue summary: View changes
Status: Needs review » Active

I support the basic solution. Great work!

  1. The DX can be improved. Do we really need to deprecate methods like Connection::escapeTable() and Connection::getPrefix(). Can we keep them? Changes like:
    - $this->connection->getPrefix()
    + $this->connection->identifiers->tablePrefix
    

    or

    - $this->connection->escapeTable($this->table)
    + $this->connection->identifiers->table($this->table)->forMachine()
    

    Those changes do not make for an improved developer experience. Lets keep those methods as a wrapper for the new implementation. Maybe we need to add some more wrapper methods to improve the DX. It is great that we have an improved identifier management solution. Now lets hide it from them as much as possible. It works just a little bit better then before. For most of them that is all they need to know and maybe they do not need to know at all.

  2. The default database driver for Drupal is MySQL/MariaDB. Let the default implementation in the namespace Drupal\Core\Database\Identifier be for just MySQL/MariaDB. The specific changes that are needed for PostgreSQL and SQLite need to be done in their database driver modules. Like no "schema" stuff in the namespace Drupal\Core\Database\Identifier. Other database drivers that are not supported by Drupal core, have their own specific stuff they need to override. Let use the database drivers for PostgreSQL and SQLite as an example for those contrib database drivers. The current solution is limited to making it work for the by Drupal core supported database drivers.
mondrake’s picture

#24 thanks Daffie for your review!

1) for sure we can undeprecate those, or postpone removal to a later major (D13+). But why would that be desirable? Besides that those 'wrappers' would be redundant, what would someone call those for, given that the value objects encapsulate the result of those wrappers? Besides, once we move to value objects then we could pass them as arguments for methods instead of strings and the reason for such wrappers would simply not hold anymore - see for example #3514117: [PP-2] Use Table identifier value objects instead of table name strings in database Schema classes and how sqlite's Schema class would benefit.

Supposing a prefix 'foo' is required, given that the Table identifier implements \Stringable and it returns the machine name identifier:

$table = $this->connection->identifiers->table('try_me');
echo "SELECT * FROM $table";

would return, without any further processing:

for SQLite:
SELECT * FROM "foo"."try_me"

for MySQL
SELECT * FROM "footry_me"

for PostgreSql (supposing a 'bar' schema is in use)
SELECT * FROM "bar"."footry_me"

2) that's exactly what the MR is doing no? Identifier/Database, Identifier/Schema, Identifier/Table are value objects, defined final - they only carry the values. The values themselves are determined by each driver's IdentifierHandler class, that parses the 'raw value' string identifier and instantiates the value objects with the needed values. Each driver's IdentifierHandler extends from the abstract IdentifierHandlerBase which has the 'regular' implementation - and all drivers need diversions one way or another, say for the different max lenght, or for how tables are prefixed, or for adding schemas etc.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new89 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

mondrake’s picture

Status: Needs work » Needs review

Updated fork.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new89 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

liam morland made their first commit to this issue’s fork.

liam morland’s picture

Status: Needs work » Needs review

Re-rolled.

mradcliffe’s picture

Status: Needs review » Needs work

Switching to Needs work for phpunit and code quality failures.

mondrake’s picture

Status: Needs work » Needs review
joachim’s picture

I feel this might clash a bit with #3486125: add long table name abbreviation to Database API.

Is the MR here just rejecting names which are too long? Could it leave space for that issue to handle them?

mondrake’s picture

#35: this is a totally alternative approach, that removes the need for the other issue.

Too long names are automatically shortened here to match the db limitations.

mradcliffe’s picture

I came up with a rough outline for the change record after reviewing the changes.

  • Adds a new nullable argument to the base Connection __construct method that takes an IdentifierHandler.
  • Database drivers should implement IdentifierHandler (details) and pass this into Connection::parent::__construct().
    • Implement at least getMaxLength if extending IdentifierHandlerBase.
  • Method calls and properties that need to be changed:
    • getPrefix() should use identifiers->tablePrefix
    • $identifierQuotes should use identifiers->identifierQuotes
    • $prefix should use identiferies->tablePrefix
    • $escapedTables is no longer used
    • $tablePlaceholderReplacements should use ????
    • escapeDatabase() should use identifiers->database()
    • Any time need to use a table or prefix in code this should be identifiers->table
    • escapeTable() should be identifiers->table. escapeTable() is no longer needed when using connection->query()? A database drive may need to use schema()->quotedMachineName . $tableIdentifier->quotedMachineName?

It looks like this also applies to any usages of query that use escapeTable manually in addition to database driver modules.

Finally a question. If a site provided a prefix and used a non-core database driver would that site stop working because $prefix isn't set anywhere by that database driver (if it relied on the protected property)?

mondrake’s picture

Thanks @mradcliffe for your review! The bigger picture is better understandable IMHO if this is reviewed together with #3514261: [PP-1] Allow using Table identifier value objects in database Query classes and #3514117: [PP-2] Use Table identifier value objects instead of table name strings in database Schema classes that will show how this will improve readibility of code once we start using the identifiers as parameters in methods.

Re the BC concern:

If a site provided a prefix and used a non-core database driver would that site stop working because $prefix isn't set anywhere by that database driver (if it relied on the protected property)?

I added here a more robust handling of the deprecated $prefix variable through the magic methods.

Definitely review by maintainers of non-core drivers would help; however I am not sure how many D11 non-core db drivers are out there; MSSQL and Oracle seem stuck at earlier Drupal versions.

finalist’s picture

Status: Needs review » Needs work

The remarks of this user belong to user @daffie.

mondrake’s picture

@daffie thanks for the detailed review, it will take some time before I can get to it. In the meantime, please consider this is not the endgame, and consider reviewing this in combination with #3514261: [PP-1] Allow using Table identifier value objects in database Query classes and #3514117: [PP-2] Use Table identifier value objects instead of table name strings in database Schema classes that would further implement the changes occurring here.

For instance, when you talk about DX, the endgame would be using the new identifier object with its automatic conversion to string: for example,

$query = $comments . 'DELETE FROM {' . $this->connection->escapeTable($this->table) . '} ';

would become

$query = $comments . "DELETE FROM $this->tableIdentifier";

That's quite a DX improvement IMO... just we cannot do all of it toghether in a single MR otherwise it will become too big.

mondrake’s picture

Status: Needs work » Needs review
mondrake’s picture

smustgrave’s picture

Recommended way to test this?

mondrake’s picture

#43 you mean manually? An idea could be to create via createTable a PostgreSQL table with a very long name. Then inspecting the database you would find a table with an abbreviated name 63 chars long. Any Database API select/insert/update/query etc with the long name in Drupal would return data from the real table with abbreviated name.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new89 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

mondrake’s picture

Status: Needs work » Needs review

fork update

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new89 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

mondrake’s picture

Status: Needs work » Needs review

rebased

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new89 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

mondrake’s picture

Status: Needs work » Needs review
daffie’s picture

Status: Needs review » Needs work

Back to NW for the open remarks on the MR.

mondrake’s picture

Status: Needs work » Needs review

Sorry @daffie for some reason I cannot see your remarks on the GitLab UI, they show as collapsed and when I try to open them nothing shows up - I see what you wrote in the issue here, but the problem is I cannot see 'where' they are positioned. Can you try again or give some more details in comments here?

daffie’s picture

Status: Needs review » Needs work
StatusFileSize
new97.48 KB
new84.47 KB

Sorry @daffie for some reason I cannot see your remarks on the GitLab UI, they show as collapsed and when I try to open them nothing shows up

The mistake I made is that I mixed up by using 2 different accounts.

The MR looks very good, just a couple of nitpicks:
- Can we in the Unit IdentifierTests add testing for prefix not duplicated and only a table name.
- See the added images.

harivansh’s picture

Assigned: Unassigned » harivansh
mondrake’s picture

Status: Needs work » Postponed

I think we should postpone this for the moment, and let #3411490: Replace array-based DB Schema API with a value object structure land first. That one may have implication/connections with this one, so let's pause for a moment.

harivansh’s picture

Assigned: harivansh » Unassigned

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.