Problem/Motivation
Queries generated by AliasStorage::preloadPathAlias() taking huge amount of time to execute. A good example is the /admin/content page which takes 10 seconds and more to load for around 75k nodes.
SQL-Query example below:
2018-07-24 11:02:11 NZST [4923-1] drupal-site@drupal-site LOG: duration: 10990.918 ms statement: SELECT url_alias.source AS source, url_alias.alias AS alias, langcode AS langcode, pid AS pid
FROM
url_alias url_alias
WHERE ((source::text ILIKE '/node/28') OR (source::text ILIKE '/node/29') OR (source::text ILIKE '/node/30') OR (source::text ILIKE '/node/31') OR (source::text ILIKE '/node/24') OR (source::text ILIKE '/node/25') OR (source::text ILIKE '/node/26') OR (source::text ILIKE '/node/27') OR (source::text ILIKE '/node/23') OR (source::text ILIKE '/node/211996') OR (source::text ILIKE '/user/13') OR (source::text ILIKE '/node/87382') OR (source::text ILIKE '/node/134983') OR (source::text ILIKE '/user/25222') OR (source::text ILIKE '/node/216888') OR (source::text ILIKE '/node/149705') OR (source::text ILIKE '/node/216524') OR (source::text ILIKE '/node/209767') OR (source::text ILIKE '/user/24341') OR (source::text ILIKE '/node/216536') OR (source::text ILIKE '/node/135065') OR (source::text ILIKE '/user/53277') OR (source::text ILIKE '/node/173142') OR (source::text ILIKE '/user/30085') OR (source::text ILIKE '/node/188349') OR (source::text ILIKE '/user/69670') OR (source::text ILIKE '/node/147160') OR (source::text ILIKE '/node/147245') OR (source::text ILIKE '/node/174863') OR (source::text ILIKE '/user/59491') OR (source::text ILIKE '/node/216527') OR (source::text ILIKE '/node/174530') OR (source::text ILIKE '/node/178984') OR (source::text ILIKE '/node/154121') OR (source::text ILIKE '/node/172192') OR (source::text ILIKE '/node/216869') OR (source::text ILIKE '/user/93926') OR (source::text ILIKE '/node/216876') OR (source::text ILIKE '/node/216881') OR (source::text ILIKE '/user/93929') OR (source::text ILIKE '/node/215571') OR (source::text ILIKE '/user/58627') OR (source::text ILIKE '/taxonomy/term/50463') OR (source::text ILIKE '/node/202776') OR (source::text ILIKE '/user/47599') OR (source::text ILIKE '/node/210401') OR (source::text ILIKE '/node/215338') OR (source::text ILIKE '/node/87208') OR (source::text ILIKE '/node/190312') OR (source::text ILIKE '/node') OR (source::text ILIKE '/user/1')) AND (langcode IN ('en', 'und'))
ORDER BY langcode ASC NULLS FIRST, pid ASC NULLS FIRST
Not sure what part of Drupal this issue is actually coming from path alias or PostgreSQL driver.
Proposed resolution
Add an index on the field "path" and add an index on the field "alias". The index should be created with the option "USING GIN" or "USING GIST". The "pg_trgm" extension should be installed and created, before the indexes can be created.
For more info over "GIN" vs "GIST" see: https://www.postgresql.org/docs/10/textsearch-indexes.html
For more info over this solution see: https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...
Remaining tasks
User interface changes
None
API changes
None
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #112 | 2988018-112--postgresql-path-alias-queries.patch | 16.15 KB | mparker17 |
| #98 | 2988018-nr-bot.txt | 85 bytes | needs-review-queue-bot |
| #93 | 2988018-93.patch | 15.18 KB | daffie |
Issue fork drupal-2988018
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
xurizaemonThis issue appears both from site traffic and in other contexts (eg Drupal migrate scripts can trigger preloading of aliases per entity). The performance impact can be significant if Drupal is trying to look up aliases on a large-content, Postgres-backed site.
It is possible in PostgreSQL to add a pg_trgm index that will support Drupal's use of ILIKE to match case-insensitively - but this makes Drupal depend on DB configuration, and shouldn't really be necessary IMO. See #1518506-24: Normalize how case sensitivity is handled across database engines also re the
pg_trgmapproach.I think relying on presence of a case-insensitive index is not the ideal solution because:
If we ensure that aliases are stored lowercase, we can dispose of the additional case-insensitive index required to scan against.
Since Drupal currently handles aliases case-insensitively (#2075889: Make Drupal handle incoming paths in a case-insensitive fashion for routing), lowercasing the data will not change behaviour in locating aliased content via aliases. It would mean that uppercased characters in aliases can't be retrieved from that table though.
Attached is a patch which replaces
alias LIKE '$input'withalias = strtolower($input), which I'm trialling locally.There will be further work to do before this can be considerd (at least, lowercasing any existing aliases).
Comment #3
xurizaemonThere are multiple methods which can land Drupal at this case-insensitive table scan, so removing
AliasStorage::preloadPathAlias()from issue title.Comment #4
xurizaemonComment #5
jweowu commentedComment #6
bzrudi71 commentedComment #7
andypostAlias could be utf-8 so better to use mb_strtolower()
Also new comment should explain about supposing case insensitive behavior
Comment #8
bzrudi71 commentedNot bad... A quick testing shows some really, really major speed improvements. On a virtual testing instance we go down from 13sec! to 2sec when opening /admin/content. This is with only 78k url_aliases in use. Nice find @RoSk0
Comment #9
xurizaemonUses mb_strtolower() now.
@bzrudi17 yeah the contexts where Drupal can hit that ILIKE are more than I would have thought - eg I observed this when testing a user migration from D6, and loading 140K user accounts in Drupal 8 seemed to be taking many minutes to preload aliases for all of them ...
Comment #10
xurizaemonHopefully these updated comments will help explain the intent @andypost.
Also remembered to mb_strtolower() on save this time.
Comment #11
rosk0Feels like should not escape here.
Lets test
Comment #12
xurizaemonAgree on the escapeLike, thought I'd caught them all.
TBH I don't think this is ready for needs review yet, it needs an update hook to lowercase all existing aliases yet.
EDIT: And likely test coverage too. Test failure below should be expected at this stage.
Comment #15
bzrudi71 commentedAdded upgrade path hook to lowercase all aliases.
Comment #16
bzrudi71 commentedComment #17
bzrudi71 commentedFor the failing tests I think we should:
1. Extend randomMachineName() to accept a second parameter to generate lowercase only strings.
2. Then make use of it where needed
Adding mb_strtolower() in tests looks dirty ;)
Thoughts ?
Comment #18
rosk0Lowercase behaviour is only required for failing tests so I don't feel like extending randomMachineName() is a good idea for this issue. However from my point of view randomMachineName() should produce only lower case in the first place so maybe it's a topic for another issue?
Let's lowercase generated aliases in failing tests.
Also added lowercasing for source in AliasStorage::save().
Comment #20
rosk0We need to test update path. See \Drupal\FunctionalTests\Update\UpdatePathTestBase.
Comment #21
jweowu commentedSome questions to check on edge cases...
Without the patch, if you create an upper/mixed-case alias in D8 and ask for the url() for that path, does it currently produce the original case, or does it get down-cased? If it produces the original case, the changes could cause some unhappiness from site owners who were intentionally producing mixed-case friendly URLs for perceived extra readability, and would now get the lower-cased versions?
What about registering an upper/mixed-case internal path? Is that allowed? If so, can the proposed changes break things? (e.g. if you down-case the source in the alias table via an update hook, returning that lower-case source might be wrong; if you don't down-case the source in the alias table, then testing it with '=' against a lower-case value might be wrong.)
Is any of this a concern?
If so, should we instead be indexing the lower-case values, but retaining the original case in the column?
Comment #22
xurizaemonYes, storing paths lowercased means that we lose the case. I guess we could store the preferred version separately if we wanted to retain case *and* avoid relying on DB indexes for case insensitive behaviour.
The two issues here (see comment 2 for shm's relayed comment on it) is that even with pg_trgm, a significant performance issue was still observed with case-insensitive alias indexes on Postgres. Currently Drupal 8 / Postgres ships with a performance issue out of the box because of this issue, and even with adding the index per-site it seems the performance doesn't compete with other engines, hence this proposed patch.
I wonder from your question if you had a different method of indexing lowercase values in mind though?
Comment #23
jweowu commentedI was thinking about explicitly indexing the lower-case value, as an expression index.
Querying with the same expression then uses the index.
Refer to https://www.postgresql.org/docs/9.1/indexes-expressional.html (which happens to provide the very example in question).
n.b. Offhand I do not know whether this approach is portable across other databases that Drupal cares about, nor how much of a hassle it would be to use this approach for some databases but not others.
Comment #24
xurizaemonOK, that's another approach we could take then - it's mentioned in passing in ferfeble's article on pg_trgm (linked in comment 2 here).
Seems worthwhile comparing the performance of the three methods (expression index, lowercase, pg_trgm) side-by-side to see if expression indexes hold up better than pg_trgm (or if pg_trgm performs well), that method would be a lesser change than the currently proposed patch, we could add a requirements check that encourages / supports making the required changes at a DB level.
Comment #25
bzrudi71 commentedJust noticed #2233595: Deprecate the custom path alias storage :) Path alias service ist going to be replaced by the entity system...
Comment #26
pounardOh god, why...
Comment #27
amateescu commented@pounard, see #3007661: Modernize the path alias system.
Comment #28
catchForcing path aliases to lower case is a small feature regression that I'm not at all sure we can introduce in 8.x even if we wanted to - while I personally don't like them, there are sites out there using case-aware paths. Additionally since similar queries are done for usernames, I'm sure we wouldn't take the same approach of forcing all usernames to lower case.
Back in c.2006-7 before we had cross-database LIKE/ILIKE support, another option discussed was adding an additional column just for database lookups (so when rendering an alias, use the user-submitted data, but an additional lowercase-only column for queries).
irt adding this index, is this a case where a driver override could help? - see https://www.drupal.org/node/2306083
https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...
Comment #29
bzrudi71 commentedAfter days of local testing we rolled out the pg_trgm index workaround to live system today. It's like an all new Drupal 8 experience, no more wait for admin content pages, all and everything is twice as fast (at least). As a side effect CPU usage dropped by 70% and server load goes down by 40-50%. I think we really need a solution here.
I like the idea of having an additional query only lowercased alias column as @catch mentioned in #28. Maybe this could already be done together with #3007661: Modernize the path alias system. @amateescu?
Comment #30
xurizaemonpg_trgm didn't give as effective performance fix for us, but I'm glad it helped you @bzrudi71!
What advantages / differences would there be between an expression index (comment 23 above) and an additional lowercased column (28 above)?
Comment #31
jweowu commentedAn additional column requires more storage, as it would itself need to be indexed; so as well as that index using approximately the same storage space which would have been needed to provide an expression index, you additionally have the storage for the column itself.
An extra column would work with database types which don't implement expression indexes.
My expectation is that, in Postgres, performance would be more or less equivalent; but not having tested I can't say for sure.
(In either case it's possible that the original column would no longer need to be indexed for core usage; but it's almost certainly unsafe to assume that contrib/custom code isn't using it, so I would expect that index does need to be retained.)
Comment #34
TomGould01 commentedHi, I'm using this patch as the LIKE queries on large vocabs are causing issues with my database writer.
The comments say that they only use LIKE queries to make them case insensitive but MySQL is not case sensitive so this works just fine for me so far
Can anyone see any issue with this aproach for a one off install using composer to manage the patching on deployments ?
Comment #35
rob.barnett commentedI've been using the patch in comment #18 for some time, however, today when upgrading core to 8.8.0, the patch was not able to be applied. This makes sense due to the changes to url aliases in latest core minor release.
I created a new patch on the file AliasRepository.php which greatly reduces the load time of pages. I found that AliasStorage.php was never getting hit with breakpoints. This still needs some work as I did not make any changes to tests. So additional work is welcome.
Comment #36
rob.barnett commentedI realized that, while my patch speeds up page loads, with the Redirect module in place, it simply redirects from path aliases back to the system path so I need to look into this. I removed the patch and url aliases loaded again. After re-applying the patch though, everything seems to be working. This definitely needs some more eyes.
Comment #37
rob.barnett commentedAfter a lot more testing, my patch is breaking pages loaded by url alias. System paths are loaded instead.
I've opted to use the pg_trgm PostgreSQL module by doing the following:
Comment #38
TomGould01 commented@RobBarnett
I'm using this one now as the Drupal upgrade broke the last one
I'm not expecting it to pass all tests as it's just for non case sensitive databases
Hope this helps
PS tests failed due to not being able to apply patch, not sure why, it works fine on my composer update/install
15:35:18 fatal: corrupt patch at line 96
The very last line which is empty failed the patch :(
Comment #39
rob.barnett commentedThanks @TomGould01. I'll checkout your patch. It would be nice to have a drupal-only solution. In addition to the PostgreSQL pg_trgm module, I also tweaked my patch, and I'm successfully using both. I'm attaching the drupal patch here.
Comment #40
TomGould01 commentedThat's cool, A combination of both the strtolower and my one would work on all the databases and remove all those unnecessary LIKE operators :+1
Comment #41
daffie commentedLets see what the testbot thinks about the latest patch.
Comment #42
daffie commented@rob.barnett: Your patch fails the
Drupal\FunctionalTests\Routing\PathEncodedTestand theDrupal\Tests\path_alias\Kernel\AliasTest.Comment #43
daffie commentedBack to needs work.
Comment #44
daffie commentedAn approach that works is to try to create the extension pg_trgm. If we can then we shall use it. If not then we use the old code.
Todo:
- try to enable the extension as early as possible in the PotgreSQL Connection class. We like to use the extension also in other places;
- create a method in the PostgreSQL Connection class that tests if the extension is enabled (See: https://www.postgresql.org/docs/9.1/view-pg-available-extensions.html);
- create the 2 indexes if the extension is enabled on the alias and source columns.
- test if
ANALIZE url_aliasneeds to run.- we need to write a change record with the advice that (larger) PostgreSQL sites will be faster with the extension pg_trgm.
Maybe we need to make sure that we have a testbot with and one without the pg_trgm extension.
See: https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...
Edit: The other thing we can do is to test the value for having SQL wildcard-characters. If they are there then use LIKE-operator, and if they are not then use the equal-operator. The last option is the one that we have most of the time and is a lot quicker then the one with the LIKE-operator.
Both solutions combined should fix the problem.
Comment #45
catchIf we're not changing functionality, this doesn't need product manager review.
There was an issue I saw recently talking about revisiting the LIKE/ILIKE case insensitivity implementation, but I can't immediately find it.
Comment #46
daffie commentedTo be able to test a patch for this issue a new testbot with the extension
pg_trgmis needed. Postponing untill there is such a testbot. See: #3104007: Postgresql 12 DrupalCI environment needed for core testing.Comment #47
andypostComment #48
daffie commentedWe are still waiting on a testbot for this issue. See: #3104007: Postgresql 12 DrupalCI environment needed for core testing.
Comment #49
daffie commentedComment #50
catchIt's worth noting the prehistory of this issue.
Before the current LIKE/ILIKE implementation of this query, Drupal used to run LOWER() on path alias queries instead. We change from LOWER() to LIKE() so that MySQL would use an index, while postgres would still correctly handle case insensitivity.
However before we settled on that approach, there were multiple issues trying to solve it - most linked from here #279851: Replace LOWER() with db_select() and LIKE() where possible.
One approach we tried to take, but was rejected at the time, was adding an additional case-insensitive column to the table, so that queries could use a regular = comparison. It might still be worth revisiting that.
For example #83738: LOWER(name) queries perform badly; add column name_lower and index. for usernames.
Obviously this wouldn't help with queries that actually need LIKE.
Comment #51
daffie commentedWe have now a testbot with PostgreSQL 12. See #3104007: Postgresql 12 DrupalCI environment needed for core testing.
Comment #52
catchExplicitly postponing this on DrupalCI support for contributed database drivers. Or possibly #3128699: Testing issue for pgsql_fallback or regular travisci testing.
Comment #54
daffie commentedComment #56
daffie commentedTalked to @catch on Slack and the testbot run in #3128699: Testing issue for pgsql_fallback with the contrib database driver from https://www.drupal.org/project/pgsql_fallback is good enough for him. Therefore changing the status for this issue to "needs work".
Comment #57
daffie commentedThe problem is that the extension pg_trgm needed to be created and for that the database user has to have at least superuser privileges. The database user in the testbot does not have those privileges and therefore the pg_trgm extension needs to already be created on the database testbot envirionment. See: #3186676: Create the pg_trgm extension on PostgreSQL 10.12 and 12.1 database environments
Comment #58
sylus commentedHere is a patch against D9 9.0.x
Comment #59
anmolgoyal74 commentedRe-rolled for 9.2.x
Comment #60
daffie commentedBefore we can continue with this issue the testbot needs to be updated. See #3186676: Create the pg_trgm extension on PostgreSQL 10.12 and 12.1 database environments.
Comment #61
bappa.sarkar commentedMore performant patch
Comment #63
daffie commented#3186676: Create the pg_trgm extension on PostgreSQL 10.12 and 12.1 database environments has landed, therefore back to needs work.
Comment #64
daffie commentedThis will have to wait until D10, because in D10 will the creation of the pg_trgm extension be required.
Comment #65
ramadan35 commentedComment #66
jweowu commentedThis note is just to encourage everyone to add the pg_trgm indexes to their database if possible, regardless of patches.
I've been using the patch from #61 for a while, but noticed a pathological case which definitely wasn't being helped (this was the alternative View-based handler for /admin/reports/dblog which displays 50 log entries per page, and issues 50 extremely slow path_alias queries in the process of rendering such a page). Disabling the View is one solution (the default non-View handler is efficient), but I added the pg_trgm indexes to test the performance of that View (which improved by at least an order of magnitude).
Having done that, I tested the rest of the site and observed that uncached page loads in general were about twice as fast as before! That's huge.
So if you're only using the patch and you can add pg_trgm indexes, I strongly suspect you should do so.
Edit: This was with a path_alias table of around 1.3M rows, which was heavily-bloated due to the pathauto bug (see next comment) in conjunction with a nightly content synchronisation feature (a large number of nodes had been unexpectedly getting a new duplicate path_alias row every night over a long period of time).
Comment #67
jweowu commentedFor pathauto users, #3107144: Duplicate alias entities created with 'Create a new alias. Leave the existing alias functioning' setting may be bloating the path_alias table and exacerbating the performance issues.
Comment #68
daffie commentedHi @jweowu and others, great to see that you are all interested in making Drupal on PostgreSQL better. I am trying to fix this issue and for that we will make the pg_trgm extension in Drupal 10 required. To make that happen we need to create a warning on Drupal 9 sites on PostgreSQL that in Drupal 10 the extension will be required. The issue for this is #3214921: Add a requirements warning in Drupal 9 when PostgreSQL is used and the pg_trgm extension is not created. Could somebody do a review for that issue. Getting a review for it is a bit of a problem as most Drupal core developers are not using PostgreSQL. All that is needed is to apply the patch and to test that without the extension enabled you get a warning on the page "admin/reports/status" and when the extension is enabled you do not get the warning. I shall do my best to get a Drupal core commit credit for the person who does the testing and changes the status to RTBC.
What I also would like to know if there are other places in core where we should also add trgm indexes.
Comment #69
daffie commented#3214922: Add a requirements error in Drupal 10 when PostgreSQL is used and the pg_trgm extension is not installed or created has landed. Therefor in D10 we can fix this issue.
Added GIST indexes the tables "path_alias" and "path_alias_revision" have on the columns "path" and "alias" a GIST index.
Added a hook_update_N function to update existing sites.
Added the possibility to create and drop GIST indexes on PostgreSQL databases. The Schema API does not allow to specify what kind of index it is. Therefor I made the decision that when the name of an index ends with
_gistit is a GIST index. Better or other proposals to solve this are welcome.I added testing for all of the above.
Comment #70
jweowu commentedThanks daffie!
I have two suggestions wrt this update hook:
1. Make the update hook check for the existence of these indexes, and not attempt to add them when they already exist. Many sites will have already added them on account of this issue.
2. Related to 1, consider using the naming scheme that has been seen previously in this issue, to increase the chances of detecting pre-existing indexes.
Previously people will have seen:
* https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p... (back when the table was "url_alias")
* https://www.drupal.org/project/drupal/issues/2988018#comment-13389583 (for the "path_alias" table)
In both cases the suggested names were
TABLE__alias_trgm_gist_idxandTABLE__path_trgm_gist_idxso I suspect many of us followed suit and used those same names.Your names are missing the
_trgmcomponent, so won't match anyone who did that.Arguably most of the people in that situation will be following this issue, however, so if the revised name is preferred then we could always suggest that everyone 'renames' their indexes (delete + add, IIRC) to match what's been arrived at for D10.
Regardless of 2, I think 1 is important, otherwise
addIndex()is going to throwDatabaseSchemaObjectExistsException, so I'm setting this to Needs Work on that basis.Comment #71
daffie commented@jweowu: Thank you for your review.
For #70.1: Fixed.
For #70.2: About the naming of the GIST indexes. The original index names end with "_idx" and that does not work with my new convention that GIST indexes need to end with "_gist".
When you have existing GIST indexes on the "path_alias" table, you will now get them added again, only with a different name. When you have used the old naming system it will be easy to see which are the new ones and which are the old ones. That will make it easy to see which ones need to be dropped. Therefor to me it is better that GIST indexes names are different. But I have to admit that is not an ideal solution.
@jweowu: Do you know if there are any other places in core where it would be great to add a GIST index?
Comment #73
jweowu commentedNot offhand, but I've just grepped core for
'LIKE'and am attaching that file. (Edit: I did that in Drupal 9.2.13, rather than 10.0.x.)Oops. When I looked at that code I was assuming (badly) that
addIndex()was adding/enforcing the normal_idxsuffix, so I misunderstood the "name of an index ends with _gist" part.I'm wondering whether we can query the existing indexes by definition irrespective of name, and automatically remove anything which is identical to the new ones. I believe that Postgres doesn't provide any way of explicitly using an index by name, so I don't think automatically deleting the old ones could lead to problems for anyone, given that we've added a replacement.
Proof of concept:
And if we had the names, we could even rename rather than remove:
ALTER INDEX [ IF EXISTS ] name RENAME TO new_nameWhich also means that a cheap-but-fairly-effective alternative to querying index names would be to simply
ALTER INDEX IF EXISTS path_alias__path_trgm_gist_idx RENAME TO path_alias__path_gistas a first step, and then we're at least catching the cases where people had followed this issue and used the same names that had appeared here.(I guess all of this stuff falls under "extra credit", but it'd be a nice-to-have.)
We don't need to ANALYZE the tables after adding the new indexes, do we? I was going to suggest it, but on review I don't believe it makes a difference for this update scenario (i.e. when the actual table contents haven't changed). The postgres docs for ANALYZE don't mention indexes, and https://andreigridnev.com/blog/2016-04-01-analyze-reindex-vacuum-in-post... says directly that it isn't interested in them. I suppose it might help if someone hadn't been running an autovacuum, but that seems very much out of scope. (I did analyze the tables when I did this, but I'd also been purging a huge number of duplicate alias rows, so it made good sense on that occasion.)
Comment #74
jweowu commentedAttaching the same grep in the current 10.0.x codebase, FWIW.
Comment #75
catchIf someone's added indexes to the table manually, they can also remove them manually.
It would be worth adding a check for indexes with exactly the same name though, but I don't think we need to go looking.
Comment #76
daffie commentedTherefor #70.2 is now a will not be fixed. Moving it back to NR.
Comment #77
daffie commentedI have rerolled the patch.
Comment #78
daffie commentedComment #79
andypostit went wrong
Comment #80
daffie commentedSecond try.
Comment #81
daffie commentedFixed the style guide errors.
Comment #82
andypostLooks mostly ready, just nits
$schema could be moved to inside of condition
Please use PathAliasStorageSchema::class to easy to find
There's https://www.php.net/manual/en/function.str-ends-with.php in PHP 8+
Comment #83
andypostFix #82
Comment #84
daffie commented@andypost: All the changes you made in the patch from comment #83 are for me RTBC. If you can review the rest of the patch, you are free to change the status to RTBC.
Comment #85
andypostI gonna test it tonight on custom site as it has lots of aliases
Comment #86
andypostUpgrade works great!
I used
composer install drupal/webprofilerand now (for/admin/contentwith 50 nodes) query (151 times) looks the sameexplain is
indexes
Comment #87
andypostTime: 22.34 ms Caller: D\p\AliasRepository::preloadPathAlias Database: default Target: default remains mostly the same for gist index type (the query is generated from there
SELECT "base_table"."path" AS "path", "base_table"."alias" AS "alias" FROM "path_alias" "base_table" WHERE ("base_table"."status" = :db_condition_placeholder_0) AND (("base_table"."path"::text ILIKE :db_condition_placeholder_1) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_2) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_3) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_4) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_5) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_6) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_7) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_8) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_9) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_10) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_11) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_12) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_13) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_14) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_15) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_16) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_17) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_18) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_19) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_20) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_21) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_22) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_23) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_24) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_25) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_26) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_27) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_28) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_29) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_30) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_31) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_32) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_33) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_34) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_35) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_36) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_37) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_38) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_39) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_40) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_41) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_42) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_43) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_44) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_45) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_46) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_47) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_48) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_49) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_50) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_51) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_52) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_53) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_54) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_55) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_56) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_57) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_58) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_59) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_60) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_61) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_62) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_63) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_64) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_65) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_66) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_67) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_68) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_69) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_70) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_71) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_72) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_73) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_74) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_75) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_76) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_77) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_78) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_79) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_80) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_81) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_82) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_83) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_84) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_85) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_86) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_87) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_88) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_89) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_90) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_91) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_92) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_93) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_94) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_95) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_96) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_97) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_98) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_99) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_100) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_101) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_102) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_103) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_104) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_105) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_106) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_107) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_108) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_109) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_110) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_111) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_112) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_113) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_114) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_115) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_116) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_117) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_118) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_119) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_120) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_121) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_122) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_123) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_124) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_125) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_126) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_127) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_128) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_129) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_130) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_131) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_132) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_133) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_134) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_135) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_136) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_137) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_138) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_139) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_140) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_141) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_142) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_143) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_144) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_145) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_146) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_147) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_148) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_149) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_150) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_151)) AND ("base_table"."langcode" IN (:db_condition_placeholder_152, :db_condition_placeholder_153)) ORDER BY "base_table"."langcode" ASC NULLS FIRST, "base_table"."id" ASC NULLS FIRSTComment #88
andypostIn median the query became faster from 24ms to 22ms
Comment #89
samlerner commentedI would love for this this to get committed. I was having performance issues on a site with 260,000+ path aliases, and using 2.0@beta caused a huge performance improvement. Then, I upgraded to the 2.0 stable version, not realizing this hadn't been committed, and I immediately started seeing a huge increase in the time spent on queries to path_alias.
I'm going back to 2.0@beta for now, but I can't wait until this can be merged!
Comment #90
andypost@SamLerner which module you mean pathauto?
Comment #91
samlerner commented@andypost apologies, I conflated this issue with one about path.module - https://www.drupal.org/node/3164889
Please ignore my earlier comment. Thanks!
Comment #92
larowlanLooks to be failing to apply, and there's some fails on postgres, haven't reviewed the patch yet
Comment #93
daffie commentedRerolled the patch.
Comment #94
poker10 commentedThanks for the great work here @daffie, @andypost and others! We are also testing the patch from #83 on a few sites with lot of content and there is a visible performance gain for us.
Noticed a small comment issue, but probably can be fixed on commit (if it is not an intentional change). Each PostgreSQL docs link is referencing to the PostgreSQL 12, but this one was changed to version 10:
Comment #95
alexpottWhy did we chose GIST over GIN when the docs say that GIN is preferred? I read all the comments from when the GSIT functionality was introduced and I can't see it.
I wish there was a way to implement this without using magic index names. Magic like this always results in surprises. Can't we add something like another key to the schema that has the same structure as
'indexes'but is'gists'?Regardless of whether we do the magic index naming or add the new key to the schema we need to document this somewhere.
Actually I have a really good reason to add a new key to schema for this. If you create an index in a module that expects all databases to perform the same and it has _gist it will do different things if we make the change as is in #93. Whereas if we add a special 'gist' key that is only supported by postgres then a module can add a gist that will, as expected, only affect postgres and does not have to have a special postgres adapter module to override it's own storage implementation. It could just have something like:
Comment #96
daffie commented@alexpott: Moving the GIST/GIN indexes to their own key in the schema API, will result in PostgreSQL to have to override the class Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema. As that class creates the entity tables and indexes. If we do that we will break contrib module on PostgreSQL that have content entities that override the base storage schema class.
What we can do is create a new service called Drupal\Core\Entity\Sql\SqlContentEntityStorageSchemaService that do everything that the class Drupal\Core\Entity\Sql\SqlContentEntityStorageSchem is now doing. The old class will than become a thin wrapper on top of the service. Database driver modules can than override the service and do what they need to do. Every contrib/custom module that changes content entity storage schema class will keeps working and there will be no BC break.
Edit: With #3257405: Move the MySQL exception in UserStorage to the mysql module do we have the same problem, only than with the Drupal\Core\Entity\Sql\SqlContentEntityStorage.
Comment #97
alexpottI think we might not need to override SqlContentEntityStorageSchema. Looking at addIndex we have the full table spec there - so maybe
And we can use this info in the addIndex() in postrgres. At least it is explicit and not magical. In order to make this less painful we could implement a helper static method in Postgres's schema to add a gist index. So the code becomes something like:
Yet another way would be to use an object with array access to denote it is a gist index - but this is a way bigger API change. That said at some point table specifications should move away from ArrayPI and this at least gives us a requirement.
Comment #98
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #100
kopeboyIs this only related to Postgres? I see the same type of (long list of) path queries in Views with MariaDB: even if there are not aliases for the displayed entity fields, just activate the link to the entity for the node title (or term name) or the entity operations links and violà.
Comment #101
daffie commented@kopeboy: To test out if you are using an index, you should add "EXPLAIN " in front of the query and run it in something like "phpMyAdmin". For more info: https://mariadb.com/kb/en/explain/
Comment #102
kopeboyNot sure.. are you saying I should add an index to taxonomy term tables that is not already in core?!
Comment #103
daffie commented@kopeboy Something like:
Comment #104
bradjones1Dropping in here from #3343634: Add "json" as core data type to Schema and Database API, where we need GIN index creation support for JSON path query-based indexes.
I hope this helps sway the discussion here regarding whether to use GIN or GiST; the former also allows for JSON data and would enable broader functionality in the DB API.
Updating the issue title to better reflect that this is an index-creation issue first, and also includes a specific performance improvement in path aliasing.
For what it's worth, GIN support for JSON is limited to certain operators, mostly when it comes to object/array/scalar "containment" and key-exists type queries. But, if we're going to support these types of indexes in Postgres, it would be great to also leverage GIN for power users/if we end up supporting containment queries at some point in the future.
Comment #105
daffie commentedAs the GIN indexes do to working for path aliases, the solution will be adding a GIST index. As the best solution for JSON fields is a GIN index, we should add them both to Drupal. Each having its own use case.
Comment #106
bradjones1Re: #105 yes that's a good plan, however I had renamed this issue with the intent that a similar index-creation logic could be extended to GIN as well. I don't think it makes sense to have a separate issue that addresses pgsql index creation if the approach (special case by name) is going to be the same?
Comment #107
koolaidguy commentedI've attached a patch re-rolled for Drupal 10.1.5 as the other patches on this thread weren't applying.
Comment #108
poker10 commented@KoolAidGuy The patch #107 is missing important parts from the last working patch #93 and it does not seems to be based on the patch #93 either. When doing rerolls please be carefull that the new patch does not loose any code, otherwise such rerolls cause only distraction. Also it is preffered to use merge requests these days (which could be a good idea also here, as it seems that this issue will require some additional changes based on #97). Thanks!
I am hiding the patch #107.
Comment #109
bradjones1Index creation (for both GIN and GIST, as there is now a concrete use case for each) is broken out into #3397622: Adding GIN and GIST indexes to PostgreSQL databases.
Comment #110
bradjones1I have an MR for handling creation of both GIN and GIST indexes over at #3397622: Adding GIN and GIST indexes to PostgreSQL databases - please take a look.
Comment #111
bradjones1#3397622: Adding GIN and GIST indexes to PostgreSQL databases has test coverage and is in Needs Review if anyone here wants to move that along to unblock this.
Comment #112
mparker17I've attempted to re-roll the patch from #93 onto
11.xat commitf560c83a1c(from 2024-05-15).Comment #113
mparker17@bradjones1 would it be helpful to update this patch to use the API in #3397622: Adding GIN and GIST indexes to PostgreSQL databases, given that it is RTBC?
Any suggestions for how else I could help move this issue forward?
Comment #114
mradcliffe@mparker17 thanks for asking. I think the best thing is to start an issue fork and apply the patch you re-rolled into the issue fork branch and start a merge request.
Merge requests will soon be the only way to test and accept changes to Drupal core and contributed projects on drupal.org
Comment #115
bradjones1Re: #13 you could apply the patch from that other issue to core using composer-patches and then use it in this issue, yes.
Comment #117
mparker17> Re: #113 you could apply the patch from that other issue to core using composer-patches and then use it in this issue, yes.
I merged in the commits, but that idea sounds be a lot easier to review, so I'm going to try that shortly.
Comment #118
mparker17@bradjones1 Hmm... could you explain how to do this, or point me towards some documentation?
Aside: in order to change this for !8101, I'd have to drop all the commits in the #3397622: Adding GIN and GIST indexes to PostgreSQL databases branch (or re-create the branch), update composer.json to apply the patch, then re-apply the commits specific to this branch... which is fine, but I don't want to force-push until I know it's going to work, and local testing doesn't seem to work. For simplicity below, the logs will assume I'm re-creating the branch.
I've set up a dev environment using justafish/ddev-drupal-core-dev...
... as I expected, it didn't seem like cweagans/composer-patches was required by anything...
... so I tried installing cweagans/composer-patches, but that failed because, I guess, core references itself in its own composer.json?
So I tried simply adding the patch to composer.json...
...seems fine so far, so lets apply the patch...
... same as before.
What about just updating the lock file?
... apparently not!
What if I try running GitLab CI tests locally?
... so I'm not sure what to do next.
Comment #119
mparker17On the plus side, though, it seems like tests are passing in the merge request as it is now.