Problem/Motivation
The PostgreSQL database has indexes types that are only on PostgreSQL. Those indexes are called GIN and GIST. They add query improvements that are needed by Drupal in #3343634: Add "json" as core data type to Schema and Database API and #2988018: [PP-1] Performance issues with path alias generated queries on PostgreSQL .
Proposed resolution
The comment at #2988018-97: [PP-1] Performance issues with path alias generated queries on PostgreSQL proposes an interesting solution:
Yet another way would be to use an object with array access to denote it is a gist index - but this is a way bigger API change. That said at some point table specifications should move away from ArrayPI and this at least gives us a requirement.
This is what we ended up doing.
This issue is split out to address these competing options on how to allow indexes to still be specified in a driver-agnostic manner, with as little BC-break and as much forward-compatibility as possible.
There has been some bikeshedding in how those PostgreSQL specific indexes should be created, dropped and check if they exist.
Specifically, the core schema API currently provides no way for ancillary data to be communicated about an index, e.g. what Postgres index type should be used.
In #2988018-93: [PP-1] Performance issues with path alias generated queries on PostgreSQL there is a patch that makes the (very reasonable) decision to indicate the index type through a "magic" index suffix, which Postgres would use to determine the index type. Subsequent discussion centered on the fact this is frail, not easily enforced in code and easy to forget, plus you never know if someone will name an index with _gin or _gist as the suffix not intending any specific meaning.
An alternative proposal is to provide for an additional DB schema spec key, e.g. gists. This would maybe be workable if we were only talking about GIST types, however if you also need GIN then you'd need to support a second key, and it gets rather messy. It also injects Postgres-specific new keys into the spec array shape.
Remaining tasks
Reach a consensus on how those indexes should be created, dropped and check if they exist.
Most (all?) of the decision-making centers around the above questions on the method of specifying the type of the index to be created.
Done
User interface changes
None
API changes
New value object to wrap existing index definitions and also convey additional information for db drivers (e.g., Postgres index type.)
Data model changes
This doesn't change the data model but the new config object is backwards-compatible with ArrayPI.
Release notes snippet
Draft release note done.
Issue fork drupal-3397622
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
daffie commentedFor the bikeshedding part see #2988018: [PP-1] Performance issues with path alias generated queries on PostgreSQL
Comment #3
bradjones1Minor typo fix in IS and title.
Comment #4
bradjones1Comment #5
bradjones1A bit of rubber-ducking on the possibility of using an object implementing
ArrayAccessfor a value object that would allow for additional data on the index (beyond just an array of keys.) As proposed by @alexpott.If I understand this properly, the idea is to be able to specify indexes something like (pseudocode):
If
DbIndexSpecimplements bothArrayAccessandTraversable, then it could be crafted in a BC-compatible way for any instances where it is foreach()'ed over currently. There is an instance of the Postgres driver doing a typecheck with is_array(), which would not be compatible here, but that code would be changed anyway because this is the whole reason for this conversation.Setting the
is_array()issue aside, the value object would need to be BC-compatible for nested foreach loops such as found in the mysql driver:That said, since there are contrib DB drivers out there, is this just "too much" to try and attempt?
Comment #6
bradjones1Got some feedback and background in Slack on "ArrayPI" - the name appears to be a Drupalism (and a fun one at that). There are similar discussions around trying to chip away at particularly complex or DX-unfriendly structures, e.g. #3380145-19: ViewsData should not cache by language. Seems @catch is optimistic about being able to refactor some of these with limited BC impact.
While poking around the related code I also discovered that an index's "key column specifier" can either be an array of column names OR arrays which contain a column name and prefix length. Database drivers can elect to respect the prefix or just use the full column value. (IOW, they must support the syntax but not necessarily create a functional index as a result.) I _think_ this potentially opens us up to using this array syntax to also express driver-specific options, e.g. index type. I don't love it, but it is potentially better than the BC concerns around a value object.
So we could rephrase that docblock to say something like:
This shouldn't BC-break any code that accesses the numeric-indexed members.
Comment #7
bradjones1Comment #9
bradjones1So I _think_ I might have found a way to implement a value object for the index specification that will allow for this kind of configuration to be conveyed. The only issue would be if a non-core database driver does an explicit
is_array()check on the index definitions. This would only pass that kind of very strict (and not very useful) type checking if it changed tois_iterable(), which the value object would pass.Given the talk in Slack and on related issues on this subject, I think that should be OK? And we really, really don't seem to have any other way to include this outside of sticking it in a very awkward new schema definition key like
indexes_configor something, which is not very ergonomic and shards this information from the index definitions.Comment #10
bradjones1Comment #11
bradjones1I've added a draft CR to help explain how this works and start the conversation on how to message this, since it's the first time we're doing a value object inside ArrayPI.
I think it could be interesting to encapsulate this into the coding standards re:
is_array()isis_iterable()on type checks for ArrayPI members.Comment #12
mondrakeA comment on the general approach here if I may.
I think the idea of having value objects to represent DB objects is a great step forward. Doctrine DBAL has done something similar and dropped usage of arrays to describe DB structures.
That said, and given this will set a precedent in Drupal DB API, I would suggest:
1. To have a
Schemasubfolder where to store both the value object and the enum.2. With that, the value object name could easily be
Indexinstead ofIndexSpecification.3. Why final? The value object could be extended by drivers exactly to cover DB specific features like here.
4. Testing of DB specifics should be done in the driver extensions of DriverSpecificSchemaTestBase, not in the base class.
Comment #13
bradjones1Thanks for the vote of confidence and feedback. On points 1-3 that's the kind of feedback I needed because there is no existing pattern to follow so I was just winging it.
As for #4, I do think that we need to have testing of this in the base class, to detect compatibility breaks. Basically, we need to ensure that all drivers work when they don't necessarily support the extra data and successfully just treat it as an array. Perhaps annotating the test to state as such could make it clearer why it's there. The alternative would be having a base test that has just the object and the pgsql version has the pgsql-specific semantics.
Comment #14
bradjones1I realize I missed addressing the question about making the value object
final. My justification is that the index definition object itself is driver-agnostic; it's the configuration that is specific to each flavor. So there is no legitimate reason to fork this off asPostgresIndexor whatever because it's not a postgres index - it's an index definition with some postgres metadata attached.Comment #15
mondrakeSee #3410312-14: Flood database backend ::isAllowed() should call ::ensureTableExists()
Comment #16
daffie commented@bradjones1: The solution looks good to me.
We are missing some CRUD operations with the GIN/GIST indexes. We should also add functionality and testing to:
- to check that the GIN/GIST index exists;
- to add GIN/GIST index is created on an existing field;
- to remove a GIN/GIST index;
- when we change an existing field we can also add/remove a GIN/GIST index.
Add some testing that when we create a new table with a GIN/GIST index that the index exists and is a GIN/GIST index. Please create a query with the explain to check that we have created a GIN/GIST index and that the query is using it.
Comment #17
bradjones1Rebased.
Re: The requests in #16 -
GIN/GIST indexes are cataloged by Postgres similar to other indexes, so checking they exist is already possible. To determine their type, you must dig a little deeper, so there is now an additional column of information about indexes (using
pg_get_indexdef()) which we're also using in the new tests.Test coverage added.
I'm not sure we need to do this, and if we do, this amounts to a new feature for the driver/DBAL overall and nothing specific to GIN/GIST. There is a lot of logic already in
::changeField()but none of it currently touches indexes. In that respect GIN/GiST aren't special, and I think it's incumbent upon the caller to determine if the existing indexes will be compatible. I suggest addressing this in a separate issue. (GIN/GIST do introduce some interesting new considerations around index type support for changing field types, but I still don't think that means we must build a new feature into::changeField()to get support in.)This is in the test coverage. Note that we are in a cart-and-horse situation with respect to testing GIN - its test is almost identical to GIST, we just need a supported field type to test with. So I put in a skipped test as a placeholder and once we commit this, we can enable that test coverage with/after JSON data types go in.
Comment #18
daffie commentedThe testbot is failing on PostgreSQL on your added code.
The CR needs examples on how the use the added GIN and GIST indexes.
Comment #19
bradjones1Ah, how embarassing, I forget you need to manually trigger the PgSQL tests. Fixed with some updated coverage of the introspection method.
Updating the CR now.
Comment #20
bradjones1Missed that there are also new MR comments.
Comment #21
poker10 commentedThanks for great work @bradjones1! I have added some other comments to the MR.
Comment #22
mondrakeSome comments inline.
Comment #23
bradjones1Replied to MR comments and made a few changes as a result. Back to NR.
Comment #24
mradcliffeI think this pretty good to commit. Everything makes sense based on the changes from daffie's review.
I updated the change record.
I think we can update the issue summary with the proposed resolution now. Maybe moving some of the initial resolution into the problem/motivation?
Comment #25
bradjones1Comment #26
bradjones1Comment #27
bradjones1@mradcliffe would you mind RTBC'ing per your review?
Comment #28
mradcliffeAgreed. We'll see if we need to do anything else :-)
Comment #29
quietone commentedI read the Issue Summary, the CR and then the MR. I reviewed the comments and the MR. I left comments in the MR and suggestions. At least one of the suggestions is incomplete, that is it still needs work.
The change record reads well but I was surprised to find that new enum is only mentioned at the end. I think that is important information and should be at the start. There is a code example for the 'after' situation. Can one be added for the 'before' case? And I don't follow this sentence, "The only backwards-compatibility break would be if a driver is explicitly checking the index definition with is_array(), which is currently unnecessary.' It is not clear to me when this is unnecessary, is it before this change or after. Maybe it is better to change this to something like this, "Sites that are using a driver that is explicitly checking the index definition with is_array() will need to ....'..
Comment #30
larowlanComment #31
larowlanComment #34
daffie commentedI have reviewed the MR and it looks great.
@steinmb Could you address the open remarks on the MR?
Added #3411490: Replace array-based DB Schema API with a value object structure as a related issue. Should not block this issue.
Comment #37
mondrakeNew MR!13850 is built on top of #3411490: Replace array-based DB Schema API with a value object structure, showing how we could address this issue with the feature contained there.
Comment #38
mondrakeThe new MR is green both on MySql and PgSql.
Comment #39
smustgrave commentedNot sure I'm a good person to review just bumping priority so fingers crossed 11.4!
Comment #41
smustgrave commentedAppears to need a rebase now. Going to post in core-development to try and help get eyes