Problem/Motivation

Queries generated by AliasStorage::preloadPathAlias() taking huge amount of time to execute. A good example is the /admin/content page which takes 10 seconds and more to load for around 75k nodes.

SQL-Query example below:

2018-07-24 11:02:11 NZST [4923-1] drupal-site@drupal-site LOG:  duration: 10990.918 ms  statement: SELECT url_alias.source AS source, url_alias.alias AS alias, langcode AS langcode, pid AS pid
        FROM 
        url_alias url_alias
        WHERE ((source::text ILIKE '/node/28') OR (source::text ILIKE '/node/29') OR (source::text ILIKE '/node/30') OR (source::text ILIKE '/node/31') OR (source::text ILIKE '/node/24') OR (source::text ILIKE '/node/25') OR (source::text ILIKE '/node/26') OR (source::text ILIKE '/node/27') OR (source::text ILIKE '/node/23') OR (source::text ILIKE '/node/211996') OR (source::text ILIKE '/user/13') OR (source::text ILIKE '/node/87382') OR (source::text ILIKE '/node/134983') OR (source::text ILIKE '/user/25222') OR (source::text ILIKE '/node/216888') OR (source::text ILIKE '/node/149705') OR (source::text ILIKE '/node/216524') OR (source::text ILIKE '/node/209767') OR (source::text ILIKE '/user/24341') OR (source::text ILIKE '/node/216536') OR (source::text ILIKE '/node/135065') OR (source::text ILIKE '/user/53277') OR (source::text ILIKE '/node/173142') OR (source::text ILIKE '/user/30085') OR (source::text ILIKE '/node/188349') OR (source::text ILIKE '/user/69670') OR (source::text ILIKE '/node/147160') OR (source::text ILIKE '/node/147245') OR (source::text ILIKE '/node/174863') OR (source::text ILIKE '/user/59491') OR (source::text ILIKE '/node/216527') OR (source::text ILIKE '/node/174530') OR (source::text ILIKE '/node/178984') OR (source::text ILIKE '/node/154121') OR (source::text ILIKE '/node/172192') OR (source::text ILIKE '/node/216869') OR (source::text ILIKE '/user/93926') OR (source::text ILIKE '/node/216876') OR (source::text ILIKE '/node/216881') OR (source::text ILIKE '/user/93929') OR (source::text ILIKE '/node/215571') OR (source::text ILIKE '/user/58627') OR (source::text ILIKE '/taxonomy/term/50463') OR (source::text ILIKE '/node/202776') OR (source::text ILIKE '/user/47599') OR (source::text ILIKE '/node/210401') OR (source::text ILIKE '/node/215338') OR (source::text ILIKE '/node/87208') OR (source::text ILIKE '/node/190312') OR (source::text ILIKE '/node') OR (source::text ILIKE '/user/1')) AND (langcode IN ('en', 'und'))
        ORDER BY langcode ASC NULLS FIRST, pid ASC NULLS FIRST

Not sure what part of Drupal this issue is actually coming from path alias or PostgreSQL driver.

Proposed resolution

Add an index on the field "path" and add an index on the field "alias". The index should be created with the option "USING GIN" or "USING GIST". The "pg_trgm" extension should be installed and created, before the indexes can be created.

For more info over "GIN" vs "GIST" see: https://www.postgresql.org/docs/10/textsearch-indexes.html
For more info over this solution see: https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...

Remaining tasks

User interface changes

None

API changes

None

Data model changes

CommentFileSizeAuthor
#112 2988018-112--postgresql-path-alias-queries.patch16.15 KBmparker17
#107 2988018-107.patch2.2 KBkoolaidguy
#102 EXPLAIN taxonomy_term_field_data MariaDB.png55.4 KBkopeboy
#98 2988018-nr-bot.txt85 bytesneeds-review-queue-bot
#93 2988018-93.patch15.18 KBdaffie
#87 pgsql-2022-12-14 00-19-52.png74.5 KBandypost
#83 2988018-83.patch15.2 KBandypost
#83 interdiff.txt3.64 KBandypost
#81 interdiff-2988018-79-81.txt908 bytesdaffie
#81 2988018-81.patch15.07 KBdaffie
#80 2988018-79.patch15.06 KBdaffie
#77 2988018-77.patch15.59 KBdaffie
#74 like.10.0.x.grep_.txt6.85 KBjweowu
#73 like.grep_.txt6.3 KBjweowu
#71 2988018-71.patch14.08 KBdaffie
#71 interdiff-2988018-69-71.txt1.54 KBdaffie
#69 2988018-69.patch13.72 KBdaffie
#61 2988018-61.patch1.29 KBbappa.sarkar
#59 2988018-59.patch1.15 KBanmolgoyal74
#58 2988018-by-xurizaemon-rob.barnett-bzrudi71-RoS.patch1.54 KBsylus
#39 pathalias_postgres_ilike_performance.patch1.13 KBrob.barnett
#38 coreLibPath.patch4.08 KBTomGould01
#35 postgres_ilike_performance.patch1.96 KBrob.barnett
#18 2988018-15-18-interdiff.txt10.08 KBrosk0
#18 2988018-18.patch13.96 KBrosk0
#15 interdiff.txt1.32 KBbzrudi71
#15 drupal-2988018-aliases_use_strtolower_not_DB_LIKE-15.patch5.77 KBbzrudi71
#12 drupal-2988018-aliases_use_strtolower_not_DB_LIKE-12.patch4.75 KBxurizaemon
#9 drupal-2988018-aliases_use_strtolower_not_DB_LIKE-9.patch4.22 KBxurizaemon
#10 drupal-2988018-aliases_use_strtolower_not_DB_LIKE-10.patch4.22 KBxurizaemon
#2 drupal-2988018-aliases_use_strtolower_not_DB_LIKE.patch4.2 KBxurizaemon

Issue fork drupal-2988018

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

RoSk0 created an issue. See original summary.

xurizaemon’s picture

This issue appears both from site traffic and in other contexts (eg Drupal migrate scripts can trigger preloading of aliases per entity). The performance impact can be significant if Drupal is trying to look up aliases on a large-content, Postgres-backed site.

It is possible in PostgreSQL to add a pg_trgm index that will support Drupal's use of ILIKE to match case-insensitively - but this makes Drupal depend on DB configuration, and shouldn't really be necessary IMO. See #1518506-24: Normalize how case sensitivity is handled across database engines also re the pg_trgm approach.

I think relying on presence of a case-insensitive index is not the ideal solution because:

  • It means that Drupal/PgSQL is less performant out of the box / in default config, and
  • pg_trgm approach is still less performant than equality (sysadmin from another project affected by same: "From memory adding trigrams went from 40+ seconds to 13 or so, changing like to = went to sub-second")
  • additional work is repeated many times over the life of the content instead of only when aliases are created/updated, more CPU work is bad for the planet :)

If we ensure that aliases are stored lowercase, we can dispose of the additional case-insensitive index required to scan against.

Since Drupal currently handles aliases case-insensitively (#2075889: Make Drupal handle incoming paths in a case-insensitive fashion for routing), lowercasing the data will not change behaviour in locating aliased content via aliases. It would mean that uppercased characters in aliases can't be retrieved from that table though.

Attached is a patch which replaces alias LIKE '$input' with alias = strtolower($input), which I'm trialling locally.

There will be further work to do before this can be considerd (at least, lowercasing any existing aliases).

xurizaemon’s picture

Title: Performance issues with path alias AliasStorage::preloadPathAlias() generated queries on PostgreSQL » Performance issues with path alias generated queries on PostgreSQL

There are multiple methods which can land Drupal at this case-insensitive table scan, so removing AliasStorage::preloadPathAlias() from issue title.

jweowu’s picture

Issue tags: +Performance
bzrudi71’s picture

andypost’s picture

+++ b/core/lib/Drupal/Core/Path/AliasStorage.php
@@ -261,10 +262,10 @@ public function lookupPathSource($path, $langcode) {
-      ->condition('alias', $alias, 'LIKE');
+      ->condition('alias', strtolower($alias), '=');

@@ -290,12 +291,12 @@ public function lookupPathSource($path, $langcode) {
-    // Use LIKE and NOT LIKE for case-insensitive matching.
+    // Case-insensitive match by lowercasing input.
...
-      ->condition('alias', $this->connection->escapeLike($alias), 'LIKE')
+      ->condition('alias', strtolower($alias), '=')
...
-      $query->condition('source', $this->connection->escapeLike($source), 'NOT LIKE');
+      $query->condition('source', strtolower($source), '!=');

Alias could be utf-8 so better to use mb_strtolower()
Also new comment should explain about supposing case insensitive behavior

bzrudi71’s picture

Not bad... A quick testing shows some really, really major speed improvements. On a virtual testing instance we go down from 13sec! to 2sec when opening /admin/content. This is with only 78k url_aliases in use. Nice find @RoSk0

xurizaemon’s picture

Uses mb_strtolower() now.

@bzrudi17 yeah the contexts where Drupal can hit that ILIKE are more than I would have thought - eg I observed this when testing a user migration from D6, and loading 140K user accounts in Drupal 8 seemed to be taking many minutes to preload aliases for all of them ...

xurizaemon’s picture

Hopefully these updated comments will help explain the intent @andypost.

Also remembered to mb_strtolower() on save this time.

rosk0’s picture

Status: Active » Needs review
+++ b/core/lib/Drupal/Core/Path/AliasStorage.php
@@ -130,8 +130,8 @@ public function load($conditions) {
+        $select->condition($field, $this->connection->escapeLike(mb_strtolower($value)), '=');

Feels like should not escape here.

Lets test

xurizaemon’s picture

StatusFileSize
new4.75 KB

Agree on the escapeLike, thought I'd caught them all.

TBH I don't think this is ready for needs review yet, it needs an update hook to lowercase all existing aliases yet.

EDIT: And likely test coverage too. Test failure below should be expected at this stage.

Status: Needs review » Needs work
bzrudi71’s picture

Added upgrade path hook to lowercase all aliases.

bzrudi71’s picture

Issue summary: View changes
bzrudi71’s picture

For the failing tests I think we should:

1. Extend randomMachineName() to accept a second parameter to generate lowercase only strings.

  /**
   * Generates a unique random string containing letters and numbers.
   *
   * Do not use this method when testing unvalidated user input. Instead, use
   * \Drupal\simpletest\TestBase::randomString().
   *
   * @param int $length
   *   Length of random string to generate.
   *
   * @param bool $lowercase
   *   Generate lowercase string.
   *
   * @return string
   *   Randomly generated unique string.
   *
   * @see \Drupal\Component\Utility\Random::name()
   */
  protected function randomMachineName($length = 8, $lowercase = FALSE) {
    $randomString = $this->getRandomGenerator()->name($length, TRUE);
    return (!$lowercase) ? $randomString : mb_strtolower($randomString);
  }

2. Then make use of it where needed

Adding mb_strtolower() in tests looks dirty ;)

Thoughts ?

rosk0’s picture

Status: Needs work » Needs review
StatusFileSize
new13.96 KB
new10.08 KB

Lowercase behaviour is only required for failing tests so I don't feel like extending randomMachineName() is a good idea for this issue. However from my point of view randomMachineName() should produce only lower case in the first place so maybe it's a topic for another issue?

Let's lowercase generated aliases in failing tests.

Also added lowercasing for source in AliasStorage::save().

Status: Needs review » Needs work

The last submitted patch, 18: 2988018-18.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

rosk0’s picture

Issue tags: +Needs tests

We need to test update path. See \Drupal\FunctionalTests\Update\UpdatePathTestBase.

jweowu’s picture

Some questions to check on edge cases...

Without the patch, if you create an upper/mixed-case alias in D8 and ask for the url() for that path, does it currently produce the original case, or does it get down-cased? If it produces the original case, the changes could cause some unhappiness from site owners who were intentionally producing mixed-case friendly URLs for perceived extra readability, and would now get the lower-cased versions?

What about registering an upper/mixed-case internal path? Is that allowed? If so, can the proposed changes break things? (e.g. if you down-case the source in the alias table via an update hook, returning that lower-case source might be wrong; if you don't down-case the source in the alias table, then testing it with '=' against a lower-case value might be wrong.)

Is any of this a concern?

If so, should we instead be indexing the lower-case values, but retaining the original case in the column?

xurizaemon’s picture

Without the patch, if you create an upper/mixed-case alias in D8 and ask for the url() for that path, does it currently produce the original case, or does it get down-cased?

Yes, storing paths lowercased means that we lose the case. I guess we could store the preferred version separately if we wanted to retain case *and* avoid relying on DB indexes for case insensitive behaviour.

If so, should we instead be indexing the lower-case values, but retaining the original case in the column?

The two issues here (see comment 2 for shm's relayed comment on it) is that even with pg_trgm, a significant performance issue was still observed with case-insensitive alias indexes on Postgres. Currently Drupal 8 / Postgres ships with a performance issue out of the box because of this issue, and even with adding the index per-site it seems the performance doesn't compete with other engines, hence this proposed patch.

I wonder from your question if you had a different method of indexing lowercase values in mind though?

jweowu’s picture

I was thinking about explicitly indexing the lower-case value, as an expression index.

Querying with the same expression then uses the index.

Refer to https://www.postgresql.org/docs/9.1/indexes-expressional.html (which happens to provide the very example in question).

n.b. Offhand I do not know whether this approach is portable across other databases that Drupal cares about, nor how much of a hassle it would be to use this approach for some databases but not others.

xurizaemon’s picture

OK, that's another approach we could take then - it's mentioned in passing in ferfeble's article on pg_trgm (linked in comment 2 here).

Seems worthwhile comparing the performance of the three methods (expression index, lowercase, pg_trgm) side-by-side to see if expression indexes hold up better than pg_trgm (or if pg_trgm performs well), that method would be a lesser change than the currently proposed patch, we could add a requirements check that encourages / supports making the required changes at a DB level.

bzrudi71’s picture

Just noticed #2233595: Deprecate the custom path alias storage :) Path alias service ist going to be replaced by the entity system...

pounard’s picture

Oh god, why...

amateescu’s picture

catch’s picture

Forcing path aliases to lower case is a small feature regression that I'm not at all sure we can introduce in 8.x even if we wanted to - while I personally don't like them, there are sites out there using case-aware paths. Additionally since similar queries are done for usernames, I'm sure we wouldn't take the same approach of forcing all usernames to lower case.

Back in c.2006-7 before we had cross-database LIKE/ILIKE support, another option discussed was adding an additional column just for database lookups (so when rendering an alias, use the user-submitted data, but an additional lowercase-only column for queries).

irt adding this index, is this a case where a driver override could help? - see https://www.drupal.org/node/2306083
https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...

bzrudi71’s picture

After days of local testing we rolled out the pg_trgm index workaround to live system today. It's like an all new Drupal 8 experience, no more wait for admin content pages, all and everything is twice as fast (at least). As a side effect CPU usage dropped by 70% and server load goes down by 40-50%. I think we really need a solution here.
I like the idea of having an additional query only lowercased alias column as @catch mentioned in #28. Maybe this could already be done together with #3007661: Modernize the path alias system. @amateescu?

xurizaemon’s picture

pg_trgm didn't give as effective performance fix for us, but I'm glad it helped you @bzrudi71!

What advantages / differences would there be between an expression index (comment 23 above) and an additional lowercased column (28 above)?

jweowu’s picture

An additional column requires more storage, as it would itself need to be indexed; so as well as that index using approximately the same storage space which would have been needed to provide an expression index, you additionally have the storage for the column itself.

An extra column would work with database types which don't implement expression indexes.

My expectation is that, in Postgres, performance would be more or less equivalent; but not having tested I can't say for sure.

(In either case it's possible that the original column would no longer need to be indexed for core usage; but it's almost certainly unsafe to assume that contrib/custom code isn't using it, so I would expect that index does need to be retained.)

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

TomGould01’s picture

Hi, I'm using this patch as the LIKE queries on large vocabs are causing issues with my database writer.

The comments say that they only use LIKE queries to make them case insensitive but MySQL is not case sensitive so this works just fine for me so far

Can anyone see any issue with this aproach for a one off install using composer to manage the patching on deployments ?

diff --git a/core/lib/Drupal/Core/Path/AliasStorage.php b/core/lib/Drupal/Core/Path/AliasStorage.php
--- a/core/lib/Drupal/Core/Path/AliasStorage.php
+++ b/core/lib/Drupal/Core/Path/AliasStorage.php
@@ -129,13 +129,7 @@ class AliasStorage implements AliasStorageInterface {
   public function load($conditions) {
     $select = $this->connection->select(static::TABLE);
     foreach ($conditions as $field => $value) {
-      if ($field == 'source' || $field == 'alias') {
-        // Use LIKE for case-insensitive matching.
-        $select->condition($field, $this->connection->escapeLike($value), 'LIKE');
-      }
-      else {
-        $select->condition($field, $value);
-      }
+      $select->condition($field, $value);
     }
     try {
       return $select
@@ -158,13 +152,7 @@ class AliasStorage implements AliasStorageInterface {
     $path = $this->load($conditions);
     $query = $this->connection->delete(static::TABLE);
     foreach ($conditions as $field => $value) {
-      if ($field == 'source' || $field == 'alias') {
-        // Use LIKE for case-insensitive matching.
-        $query->condition($field, $this->connection->escapeLike($value), 'LIKE');
-      }
-      else {
-        $query->condition($field, $value);
-      }
+      $query->condition($field, $value);
     }
     try {
       $deleted = $query->execute();
@@ -188,11 +176,7 @@ class AliasStorage implements AliasStorageInterface {
       ->fields(static::TABLE, ['source', 'alias']);

     if (!empty($preloaded)) {
-      $conditions = new Condition('OR');
-      foreach ($preloaded as $preloaded_item) {
-        $conditions->condition('source', $this->connection->escapeLike($preloaded_item), 'LIKE');
-      }
-      $select->condition($conditions);
+        $select->condition('source', $preloaded, 'IN');
     }

     // Always get the language-specific alias before the language-neutral one.
@@ -232,7 +216,7 @@ class AliasStorage implements AliasStorageInterface {
     // See the queries above. Use LIKE for case-insensitive matching.
     $select = $this->connection->select(static::TABLE)
       ->fields(static::TABLE, ['alias'])
-      ->condition('source', $source, 'LIKE');
+      ->condition('source', $source);
     if ($langcode == LanguageInterface::LANGCODE_NOT_SPECIFIED) {
       array_pop($langcode_list);
     }
@@ -261,10 +245,9 @@ class AliasStorage implements AliasStorageInterface {
     $alias = $this->connection->escapeLike($path);
     $langcode_list = [$langcode, LanguageInterface::LANGCODE_NOT_SPECIFIED];

-    // See the queries above. Use LIKE for case-insensitive matching.
     $select = $this->connection->select(static::TABLE)
       ->fields(static::TABLE, ['source'])
-      ->condition('alias', $alias, 'LIKE');
+      ->condition('alias', $alias);
     if ($langcode == LanguageInterface::LANGCODE_NOT_SPECIFIED) {
       array_pop($langcode_list);
     }
@@ -290,12 +273,11 @@ class AliasStorage implements AliasStorageInterface {
    * {@inheritdoc}
    */
   public function aliasExists($alias, $langcode, $source = NULL) {
-    // Use LIKE and NOT LIKE for case-insensitive matching.
     $query = $this->connection->select(static::TABLE)
-      ->condition('alias', $this->connection->escapeLike($alias), 'LIKE')
+      ->condition('alias',$alias)
       ->condition('langcode', $langcode);
     if (!empty($source)) {
-      $query->condition('source', $this->connection->escapeLike($source), 'NOT LIKE');
+        $query->condition('source', $source, '!=');
     }
     $query->addExpression('1');
     $query->range(0, 1);
rob.barnett’s picture

StatusFileSize
new1.96 KB

I've been using the patch in comment #18 for some time, however, today when upgrading core to 8.8.0, the patch was not able to be applied. This makes sense due to the changes to url aliases in latest core minor release.
I created a new patch on the file AliasRepository.php which greatly reduces the load time of pages. I found that AliasStorage.php was never getting hit with breakpoints. This still needs some work as I did not make any changes to tests. So additional work is welcome.

rob.barnett’s picture

I realized that, while my patch speeds up page loads, with the Redirect module in place, it simply redirects from path aliases back to the system path so I need to look into this. I removed the patch and url aliases loaded again. After re-applying the patch though, everything seems to be working. This definitely needs some more eyes.

rob.barnett’s picture

After a lot more testing, my patch is breaking pages loaded by url alias. System paths are loaded instead.
I've opted to use the pg_trgm PostgreSQL module by doing the following:

CREATE INDEX path_alias__alias_trgm_gist_idx ON path_alias USING gist (alias gist_trgm_ops);
CREATE INDEX path_alias__path_trgm_gist_idx ON path_alias USING gist (path gist_trgm_ops);
ANALYZE path_alias;
TomGould01’s picture

StatusFileSize
new4.08 KB

@RobBarnett

I'm using this one now as the Drupal upgrade broke the last one

I'm not expecting it to pass all tests as it's just for non case sensitive databases

Hope this helps

PS tests failed due to not being able to apply patch, not sure why, it works fine on my composer update/install
15:35:18 fatal: corrupt patch at line 96
The very last line which is empty failed the patch :(

rob.barnett’s picture

StatusFileSize
new1.13 KB

Thanks @TomGould01. I'll checkout your patch. It would be nice to have a drupal-only solution. In addition to the PostgreSQL pg_trgm module, I also tweaked my patch, and I'm successfully using both. I'm attaching the drupal patch here.

TomGould01’s picture

That's cool, A combination of both the strtolower and my one would work on all the databases and remove all those unnecessary LIKE operators :+1

daffie’s picture

Status: Needs work » Needs review

Lets see what the testbot thinks about the latest patch.

daffie’s picture

@rob.barnett: Your patch fails the Drupal\FunctionalTests\Routing\PathEncodedTest and the Drupal\Tests\path_alias\Kernel\AliasTest.

daffie’s picture

Status: Needs review » Needs work

Back to needs work.

daffie’s picture

An approach that works is to try to create the extension pg_trgm. If we can then we shall use it. If not then we use the old code.

Todo:
- try to enable the extension as early as possible in the PotgreSQL Connection class. We like to use the extension also in other places;
- create a method in the PostgreSQL Connection class that tests if the extension is enabled (See: https://www.postgresql.org/docs/9.1/view-pg-available-extensions.html);
- create the 2 indexes if the extension is enabled on the alias and source columns.
- test if ANALIZE url_alias needs to run.
- we need to write a change record with the advice that (larger) PostgreSQL sites will be faster with the extension pg_trgm.

Maybe we need to make sure that we have a testbot with and one without the pg_trgm extension.

See: https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...

Edit: The other thing we can do is to test the value for having SQL wildcard-characters. If they are there then use LIKE-operator, and if they are not then use the equal-operator. The last option is the one that we have most of the time and is a lot quicker then the one with the LIKE-operator.

Both solutions combined should fix the problem.

catch’s picture

If we're not changing functionality, this doesn't need product manager review.

There was an issue I saw recently talking about revisiting the LIKE/ILIKE case insensitivity implementation, but I can't immediately find it.

daffie’s picture

Status: Needs work » Postponed
Related issues: +#3104007: Postgresql 12 DrupalCI environment needed for core testing

To be able to test a patch for this issue a new testbot with the extension pg_trgm is needed. Postponing untill there is such a testbot. See: #3104007: Postgresql 12 DrupalCI environment needed for core testing.

andypost’s picture

Status: Postponed » Needs work
daffie’s picture

Status: Needs work » Postponed

We are still waiting on a testbot for this issue. See: #3104007: Postgresql 12 DrupalCI environment needed for core testing.

daffie’s picture

Component: base system » postgresql db driver
catch’s picture

It's worth noting the prehistory of this issue.

Before the current LIKE/ILIKE implementation of this query, Drupal used to run LOWER() on path alias queries instead. We change from LOWER() to LIKE() so that MySQL would use an index, while postgres would still correctly handle case insensitivity.

However before we settled on that approach, there were multiple issues trying to solve it - most linked from here #279851: Replace LOWER() with db_select() and LIKE() where possible.
One approach we tried to take, but was rejected at the time, was adding an additional case-insensitive column to the table, so that queries could use a regular = comparison. It might still be worth revisiting that.

For example #83738: LOWER(name) queries perform badly; add column name_lower and index. for usernames.

Obviously this wouldn't help with queries that actually need LIKE.

daffie’s picture

Status: Postponed » Needs review

We have now a testbot with PostgreSQL 12. See #3104007: Postgresql 12 DrupalCI environment needed for core testing.

catch’s picture

Status: Needs review » Postponed

Explicitly postponing this on DrupalCI support for contributed database drivers. Or possibly #3128699: Testing issue for pgsql_fallback or regular travisci testing.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

daffie’s picture

Issue tags: +Bug Smash Initiative

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

daffie’s picture

Status: Postponed » Needs work

Talked to @catch on Slack and the testbot run in #3128699: Testing issue for pgsql_fallback with the contrib database driver from https://www.drupal.org/project/pgsql_fallback is good enough for him. Therefore changing the status for this issue to "needs work".

daffie’s picture

The problem is that the extension pg_trgm needed to be created and for that the database user has to have at least superuser privileges. The database user in the testbot does not have those privileges and therefore the pg_trgm extension needs to already be created on the database testbot envirionment. See: #3186676: Create the pg_trgm extension on PostgreSQL 10.12 and 12.1 database environments

sylus’s picture

Here is a patch against D9 9.0.x

anmolgoyal74’s picture

Status: Needs work » Needs review
StatusFileSize
new1.15 KB

Re-rolled for 9.2.x

daffie’s picture

Issue summary: View changes
Status: Needs review » Postponed

Before we can continue with this issue the testbot needs to be updated. See #3186676: Create the pg_trgm extension on PostgreSQL 10.12 and 12.1 database environments.

bappa.sarkar’s picture

StatusFileSize
new1.29 KB

More performant patch

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

daffie’s picture

Status: Postponed » Needs work
daffie’s picture

Version: 9.3.x-dev » 10.0.x-dev

This will have to wait until D10, because in D10 will the creation of the pg_trgm extension be required.

ramadan35’s picture

jweowu’s picture

This note is just to encourage everyone to add the pg_trgm indexes to their database if possible, regardless of patches.

I've been using the patch from #61 for a while, but noticed a pathological case which definitely wasn't being helped (this was the alternative View-based handler for /admin/reports/dblog which displays 50 log entries per page, and issues 50 extremely slow path_alias queries in the process of rendering such a page). Disabling the View is one solution (the default non-View handler is efficient), but I added the pg_trgm indexes to test the performance of that View (which improved by at least an order of magnitude).

Having done that, I tested the rest of the site and observed that uncached page loads in general were about twice as fast as before! That's huge.

So if you're only using the patch and you can add pg_trgm indexes, I strongly suspect you should do so.

Edit: This was with a path_alias table of around 1.3M rows, which was heavily-bloated due to the pathauto bug (see next comment) in conjunction with a nightly content synchronisation feature (a large number of nodes had been unexpectedly getting a new duplicate path_alias row every night over a long period of time).

jweowu’s picture

For pathauto users, #3107144: Duplicate alias entities created with 'Create a new alias. Leave the existing alias functioning' setting may be bloating the path_alias table and exacerbating the performance issues.

daffie’s picture

Hi @jweowu and others, great to see that you are all interested in making Drupal on PostgreSQL better. I am trying to fix this issue and for that we will make the pg_trgm extension in Drupal 10 required. To make that happen we need to create a warning on Drupal 9 sites on PostgreSQL that in Drupal 10 the extension will be required. The issue for this is #3214921: Add a requirements warning in Drupal 9 when PostgreSQL is used and the pg_trgm extension is not created. Could somebody do a review for that issue. Getting a review for it is a bit of a problem as most Drupal core developers are not using PostgreSQL. All that is needed is to apply the patch and to test that without the extension enabled you get a warning on the page "admin/reports/status" and when the extension is enabled you do not get the warning. I shall do my best to get a Drupal core commit credit for the person who does the testing and changes the status to RTBC.

What I also would like to know if there are other places in core where we should also add trgm indexes.

daffie’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
Related issues: +#3214922: Add a requirements error in Drupal 10 when PostgreSQL is used and the pg_trgm extension is not installed or created
StatusFileSize
new13.72 KB

#3214922: Add a requirements error in Drupal 10 when PostgreSQL is used and the pg_trgm extension is not installed or created has landed. Therefor in D10 we can fix this issue.

Added GIST indexes the tables "path_alias" and "path_alias_revision" have on the columns "path" and "alias" a GIST index.
Added a hook_update_N function to update existing sites.
Added the possibility to create and drop GIST indexes on PostgreSQL databases. The Schema API does not allow to specify what kind of index it is. Therefor I made the decision that when the name of an index ends with _gist it is a GIST index. Better or other proposals to solve this are welcome.
I added testing for all of the above.

jweowu’s picture

Status: Needs review » Needs work

Thanks daffie!

I have two suggestions wrt this update hook:

/**
 * Change the path_alias and path_alias_revision tables to add GIST indexes.
 */
function pgsql_update_10001(&$sandbox) {
  $connection = \Drupal::database();
  $schema = $connection->schema();
  if ($connection->databaseType() === 'pgsql') {
    // Add GIST indexes to make queries on the columns 'path' and 'alias'
    // faster.
    // @see https://www.postgresql.org/docs/10/textsearch-indexes.html
    if ($schema->tableExists('path_alias')) {
      $schema->addIndex('path_alias', 'path_alias__path_gist', ['path'], []);
      $schema->addIndex('path_alias', 'path_alias__alias_gist', ['alias'], []);
    }
    if ($schema->tableExists('path_alias_revision')) {
      $schema->addIndex('path_alias_revision', 'path_alias_revision__path_gist', ['path'], []);
      $schema->addIndex('path_alias_revision', 'path_alias_revision__alias_gist', ['alias'], []);
    }
  }
}

1. Make the update hook check for the existence of these indexes, and not attempt to add them when they already exist. Many sites will have already added them on account of this issue.

2. Related to 1, consider using the naming scheme that has been seen previously in this issue, to increase the chances of detecting pre-existing indexes.

Previously people will have seen:

* https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p... (back when the table was "url_alias")
* https://www.drupal.org/project/drupal/issues/2988018#comment-13389583 (for the "path_alias" table)

In both cases the suggested names were TABLE__alias_trgm_gist_idx and TABLE__path_trgm_gist_idx so I suspect many of us followed suit and used those same names.

Your names are missing the _trgm component, so won't match anyone who did that.

Arguably most of the people in that situation will be following this issue, however, so if the revised name is preferred then we could always suggest that everyone 'renames' their indexes (delete + add, IIRC) to match what's been arrived at for D10.

Regardless of 2, I think 1 is important, otherwise addIndex() is going to throw DatabaseSchemaObjectExistsException, so I'm setting this to Needs Work on that basis.

daffie’s picture

Status: Needs work » Needs review
StatusFileSize
new1.54 KB
new14.08 KB

@jweowu: Thank you for your review.

For #70.1: Fixed.

For #70.2: About the naming of the GIST indexes. The original index names end with "_idx" and that does not work with my new convention that GIST indexes need to end with "_gist".
When you have existing GIST indexes on the "path_alias" table, you will now get them added again, only with a different name. When you have used the old naming system it will be easy to see which are the new ones and which are the old ones. That will make it easy to see which ones need to be dropped. Therefor to me it is better that GIST indexes names are different. But I have to admit that is not an ideal solution.

@jweowu: Do you know if there are any other places in core where it would be great to add a GIST index?

Status: Needs review » Needs work

The last submitted patch, 71: 2988018-71.patch, failed testing. View results

jweowu’s picture

StatusFileSize
new6.3 KB

Do you know if there are any other places in core where it would be great to add a GIST index?

Not offhand, but I've just grepped core for 'LIKE' and am attaching that file. (Edit: I did that in Drupal 9.2.13, rather than 10.0.x.)

The original index names end with "_idx" and that does not work with my new convention that GIST indexes need to end with "_gist".

Oops. When I looked at that code I was assuming (badly) that addIndex() was adding/enforcing the normal _idx suffix, so I misunderstood the "name of an index ends with _gist" part.

When you have used the old naming system it will be easy to see which are the new ones and which are the old ones. That will make it easy to see which ones need to be dropped. Therefor to me it is better that GIST indexes names are different. But I have to admit that is not an ideal solution.

I'm wondering whether we can query the existing indexes by definition irrespective of name, and automatically remove anything which is identical to the new ones. I believe that Postgres doesn't provide any way of explicitly using an index by name, so I don't think automatically deleting the old ones could lead to problems for anyone, given that we've added a replacement.

Proof of concept:

SELECT indexname
FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'path_alias'
AND indexdef LIKE 'CREATE INDEX % ON public.path_alias USING gist (path gist_trgm_ops)'

          indexname            
--------------------------------
 path_alias__path_trgm_gist_idx
(1 row)

And if we had the names, we could even rename rather than remove:

ALTER INDEX [ IF EXISTS ] name RENAME TO new_name

Which also means that a cheap-but-fairly-effective alternative to querying index names would be to simply ALTER INDEX IF EXISTS path_alias__path_trgm_gist_idx RENAME TO path_alias__path_gist as a first step, and then we're at least catching the cases where people had followed this issue and used the same names that had appeared here.

(I guess all of this stuff falls under "extra credit", but it'd be a nice-to-have.)


We don't need to ANALYZE the tables after adding the new indexes, do we? I was going to suggest it, but on review I don't believe it makes a difference for this update scenario (i.e. when the actual table contents haven't changed). The postgres docs for ANALYZE don't mention indexes, and https://andreigridnev.com/blog/2016-04-01-analyze-reindex-vacuum-in-post... says directly that it isn't interested in them. I suppose it might help if someone hadn't been running an autovacuum, but that seems very much out of scope. (I did analyze the tables when I did this, but I'd also been purging a huge number of duplicate alias rows, so it made good sense on that occasion.)

jweowu’s picture

StatusFileSize
new6.85 KB

Attaching the same grep in the current 10.0.x codebase, FWIW.

catch’s picture

If someone's added indexes to the table manually, they can also remove them manually.

It would be worth adding a check for indexes with exactly the same name though, but I don't think we need to go looking.

daffie’s picture

Status: Needs work » Needs review

If someone's added indexes to the table manually, they can also remove them manually.

Therefor #70.2 is now a will not be fixed. Moving it back to NR.

daffie’s picture

StatusFileSize
new15.59 KB

I have rerolled the patch.

daffie’s picture

Version: 10.0.x-dev » 10.1.x-dev
andypost’s picture

Status: Needs review » Needs work
Issue tags: +Needs reroll

it went wrong

daffie’s picture

Status: Needs work » Needs review
Issue tags: -Needs reroll
StatusFileSize
new15.06 KB

Second try.

daffie’s picture

StatusFileSize
new15.07 KB
new908 bytes

Fixed the style guide errors.

andypost’s picture

Looks mostly ready, just nits

  1. +++ b/core/modules/pgsql/pgsql.install
    @@ -0,0 +1,31 @@
    +function pgsql_update_10101(&$sandbox) {
    +  $connection = \Drupal::database();
    +  $schema = $connection->schema();
    +  if ($connection->databaseType() === 'pgsql') {
    

    $schema could be moved to inside of condition

  2. +++ b/core/modules/pgsql/pgsql.module
    @@ -20,3 +20,13 @@ function pgsql_help($route_name, RouteMatchInterface $route_match) {
    +    $entity_types['path_alias']->setHandlerClass('storage_schema', 'Drupal\pgsql\PathAliasStorageSchema');
    

    Please use PathAliasStorageSchema::class to easy to find

  3. +++ b/core/modules/pgsql/src/Driver/Database/pgsql/Schema.php
    @@ -301,7 +302,10 @@ protected function createTableSql($name, $table) {
    +        $type = (substr($key_name, -5) == '_gist') ? 'gist' : 'idx';
    
    @@ -706,8 +710,13 @@ public function fieldExists($table, $column) {
    +    $type = (substr($name, -5) == '_gist') ? 'gist' : 'idx';
    
    @@ -823,7 +832,10 @@ public function addIndex($table, $name, $fields, array $spec) {
    +    $type = (substr($name, -5) == '_gist') ? 'gist' : 'idx';
    
    @@ -835,7 +847,10 @@ public function dropIndex($table, $name) {
    +    $type = (substr($name, -5) == '_gist') ? 'gist' : 'idx';
    

    There's https://www.php.net/manual/en/function.str-ends-with.php in PHP 8+

andypost’s picture

StatusFileSize
new3.64 KB
new15.2 KB

Fix #82

daffie’s picture

@andypost: All the changes you made in the patch from comment #83 are for me RTBC. If you can review the rest of the patch, you are free to change the status to RTBC.

andypost’s picture

I gonna test it tonight on custom site as it has lots of aliases

andypost’s picture

Status: Needs review » Reviewed & tested by the community

Upgrade works great!

I used composer install drupal/webprofiler and now (for /admin/content with 50 nodes) query (151 times) looks the same

 SELECT "base_table"."id" AS "id", "base_table"."path" AS "path", "base_table"."alias" AS "alias", "base_table"."langcode" AS "langcode" FROM "path_alias" "base_table" WHERE ("base_table"."status" = :db_condition_placeholder_0) AND ("base_table"."alias"::text ILIKE :db_condition_placeholder_1) AND ("base_table"."langcode" IN (:db_condition_placeholder_2, :db_condition_placeholder_3)) ORDER BY "base_table"."langcode" ASC NULLS FIRST, "base_table"."id" DESC NULLS LAST 

explain is

db=# explain SELECT "base_table"."id" AS "id", "base_table"."path" AS "path", "base_table"."alias" AS "alias", "base_table"."langcode" AS "langcode" FROM "path_alias" "base_table" WHERE ("base_table"."status" = 1) AND ("base_table"."alias"::text ILIKE '/node/168') AND ("base_table"."langcode" IN ('en', 'und')) ORDER BY "base_table"."langcode" ASC NULLS FIRST, "base_table"."id" DESC NULLS LAST ;
Sort  (cost=8.21..8.21 rows=1 width=43)
  Sort Key: langcode NULLS FIRST, id DESC NULLS LAST
  ->  Seq Scan on path_alias base_table  (cost=0.00..8.20 rows=1 width=43)
        Filter: (((alias)::text ~~* '/node/168'::text) AND ((langcode)::text = ANY ('{en,und}'::text[])) AND (status = 1))

indexes

public  path_alias____pkey      index   db      path_alias
public  path_alias__path_alias__alias_gist__gist        index   db      path_alias
public  path_alias__path_alias__alias_langcode_id_status__idx   index   db      path_alias
public  path_alias__path_alias__path_gist__gist index   db      path_alias
public  path_alias__path_alias__path_langcode_id_status__idx    index   db      path_alias
public  path_alias__path_alias__revision_id__key        index   db      path_alias
public  path_alias__path_alias__status__idx     index   db      path_alias
public  path_alias__path_alias_field__uuid__value__key  index   db      path_alias
public  path_alias_revision____pkey     index   db      path_alias_revision
public  path_alias_revision__path_alias__id__idx        index   db      path_alias_revision
public  path_alias_revision__path_alias_revision__alias_gist__gist      index   db      path_alias_revision
public  path_alias_revision__path_alias_revision__path_gist__gist       index   db      path_alias_revision
andypost’s picture

StatusFileSize
new74.5 KB

Time: 22.34 ms Caller: D\p\AliasRepository::preloadPathAlias Database: default Target: default remains mostly the same for gist index type (the query is generated from there

SELECT "base_table"."path" AS "path", "base_table"."alias" AS "alias" FROM "path_alias" "base_table" WHERE ("base_table"."status" = :db_condition_placeholder_0) AND (("base_table"."path"::text ILIKE :db_condition_placeholder_1) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_2) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_3) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_4) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_5) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_6) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_7) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_8) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_9) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_10) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_11) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_12) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_13) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_14) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_15) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_16) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_17) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_18) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_19) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_20) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_21) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_22) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_23) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_24) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_25) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_26) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_27) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_28) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_29) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_30) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_31) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_32) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_33) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_34) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_35) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_36) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_37) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_38) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_39) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_40) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_41) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_42) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_43) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_44) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_45) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_46) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_47) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_48) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_49) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_50) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_51) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_52) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_53) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_54) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_55) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_56) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_57) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_58) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_59) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_60) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_61) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_62) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_63) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_64) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_65) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_66) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_67) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_68) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_69) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_70) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_71) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_72) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_73) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_74) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_75) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_76) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_77) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_78) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_79) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_80) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_81) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_82) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_83) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_84) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_85) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_86) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_87) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_88) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_89) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_90) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_91) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_92) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_93) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_94) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_95) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_96) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_97) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_98) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_99) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_100) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_101) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_102) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_103) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_104) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_105) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_106) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_107) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_108) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_109) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_110) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_111) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_112) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_113) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_114) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_115) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_116) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_117) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_118) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_119) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_120) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_121) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_122) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_123) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_124) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_125) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_126) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_127) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_128) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_129) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_130) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_131) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_132) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_133) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_134) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_135) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_136) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_137) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_138) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_139) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_140) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_141) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_142) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_143) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_144) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_145) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_146) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_147) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_148) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_149) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_150) OR ("base_table"."path"::text ILIKE :db_condition_placeholder_151)) AND ("base_table"."langcode" IN (:db_condition_placeholder_152, :db_condition_placeholder_153)) ORDER BY "base_table"."langcode" ASC NULLS FIRST, "base_table"."id" ASC NULLS FIRST

andypost’s picture

In median the query became faster from 24ms to 22ms

samlerner’s picture

I would love for this this to get committed. I was having performance issues on a site with 260,000+ path aliases, and using 2.0@beta caused a huge performance improvement. Then, I upgraded to the 2.0 stable version, not realizing this hadn't been committed, and I immediately started seeing a huge increase in the time spent on queries to path_alias.

I'm going back to 2.0@beta for now, but I can't wait until this can be merged!

andypost’s picture

@SamLerner which module you mean pathauto?

samlerner’s picture

@andypost apologies, I conflated this issue with one about path.module - https://www.drupal.org/node/3164889

Please ignore my earlier comment. Thanks!

larowlan’s picture

Status: Reviewed & tested by the community » Needs work

Looks to be failing to apply, and there's some fails on postgres, haven't reviewed the patch yet

daffie’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new15.18 KB

Rerolled the patch.

poker10’s picture

Thanks for the great work here @daffie, @andypost and others! We are also testing the patch from #83 on a few sites with lot of content and there is a visible performance gain for us.

Noticed a small comment issue, but probably can be fixed on commit (if it is not an intentional change). Each PostgreSQL docs link is referencing to the PostgreSQL 12, but this one was changed to version 10:

@@ -706,8 +710,13 @@ public function fieldExists($table, $column) {
   public function indexExists($table, $name) {
-    // Details https://www.postgresql.org/docs/12/view-pg-indexes.html
     ...
+    // Details https://www.postgresql.org/docs/10/view-pg-indexes.html
alexpott’s picture

Status: Reviewed & tested by the community » Needs review

Why did we chose GIST over GIN when the docs say that GIN is preferred? I read all the comments from when the GSIT functionality was introduced and I can't see it.

I wish there was a way to implement this without using magic index names. Magic like this always results in surprises. Can't we add something like another key to the schema that has the same structure as 'indexes' but is 'gists'?

Regardless of whether we do the magic index naming or add the new key to the schema we need to document this somewhere.

Actually I have a really good reason to add a new key to schema for this. If you create an index in a module that expects all databases to perform the same and it has _gist it will do different things if we make the change as is in #93. Whereas if we add a special 'gist' key that is only supported by postgres then a module can add a gist that will, as expected, only affect postgres and does not have to have a special postgres adapter module to override it's own storage implementation. It could just have something like:

    // Add GIST indexes to make queries on the columns 'alias' and 'source'
    // faster on PostgreSQL.
    // @see https://www.postgresql.org/docs/12/textsearch-indexes.html
    $schema[$this->storage->getBaseTable()]['gists'] += [
      'path_alias__alias' => ['alias'],
      'path_alias__path' => ['path'],
    ];
    if ($revision_table = $this->storage->getRevisionTable()) {
      $schema[$revision_table]['gists'] += [
        'path_alias_revision__alias' => ['alias'],
        'path_alias_revision__path' => ['path'],
      ];
    }
daffie’s picture

@alexpott: Moving the GIST/GIN indexes to their own key in the schema API, will result in PostgreSQL to have to override the class Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema. As that class creates the entity tables and indexes. If we do that we will break contrib module on PostgreSQL that have content entities that override the base storage schema class.
What we can do is create a new service called Drupal\Core\Entity\Sql\SqlContentEntityStorageSchemaService that do everything that the class Drupal\Core\Entity\Sql\SqlContentEntityStorageSchem is now doing. The old class will than become a thin wrapper on top of the service. Database driver modules can than override the service and do what they need to do. Every contrib/custom module that changes content entity storage schema class will keeps working and there will be no BC break.

Edit: With #3257405: Move the MySQL exception in UserStorage to the mysql module do we have the same problem, only than with the Drupal\Core\Entity\Sql\SqlContentEntityStorage.

alexpott’s picture

I think we might not need to override SqlContentEntityStorageSchema. Looking at addIndex we have the full table spec there - so maybe

    // Add GIST indexes to make queries on the columns 'alias' and 'source'
    // faster on PostgreSQL.
    // @see https://www.postgresql.org/docs/12/textsearch-indexes.html
    $schema[$this->storage->getBaseTable()]['indexes'] += [
      'path_alias__alias' => ['alias'],
      'path_alias__path' => ['path'],
    ];
    $schema[$this->storage->getBaseTable()]['gists'] = array_merge($schema[$this->storage->getBaseTable()]['gists'] ?? [], ['path_alias__alias', 'path_alias__path']);
    if ($revision_table = $this->storage->getRevisionTable()) {
      $schema[$revision_table]['indexes'] += [
        'path_alias_revision__alias' => ['alias'],
        'path_alias_revision__path' => ['path'],
      ];
      $schema[$revision_table]['gists'] = array_merge($schema[$revision_table]['gists'] ?? [], ['path_alias_revision__alias', 'path_alias_revision__path'];
    }

And we can use this info in the addIndex() in postrgres. At least it is explicit and not magical. In order to make this less painful we could implement a helper static method in Postgres's schema to add a gist index. So the code becomes something like:

$schema[$this->storage->getBaseTable()] = Schema::addGistsToSchema([
      'path_alias__alias' => ['alias'],
      'path_alias__path' => ['path'],
    ], $schema[$this->storage->getBaseTable()]);

// Add then in the \Drupal\pgsql\Driver\Database\pgsql\Schema class...

public static function addGistsToSchema(array $gists, array $table_spec) {
  $table_spec['indexes'] += $gists;
  $table_spec['gists'] = array_merge($table_spec['gists'] ?? [], array_keys($gists));
  return $table_spec;
}

Yet another way would be to use an object with array access to denote it is a gist index - but this is a way bigger API change. That said at some point table specifications should move away from ArrayPI and this at least gives us a requirement.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new85 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

kopeboy’s picture

Is this only related to Postgres? I see the same type of (long list of) path queries in Views with MariaDB: even if there are not aliases for the displayed entity fields, just activate the link to the entity for the node title (or term name) or the entity operations links and violà.

SELECT "base_table"."path" AS "path", "base_table"."alias" AS "alias"
FROM
"path_alias" "base_table"
WHERE ("base_table"."status" = 1) AND (("base_table"."path" LIKE /taxonomy/term/1423 ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/1423/edit ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/1423/delete ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/1429 ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/1429/edit ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/1429/delete ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/160 ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/2052 ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/2052/edit ESCAPE '\\') OR ("base_table"."path" LIKE /taxonomy/term/2052/delete ESCAPE '\\') OR...
daffie’s picture

@kopeboy: To test out if you are using an index, you should add "EXPLAIN " in front of the query and run it in something like "phpMyAdmin". For more info: https://mariadb.com/kb/en/explain/

kopeboy’s picture

Not sure.. are you saying I should add an index to taxonomy term tables that is not already in core?!

EXPLAIN taxonomy_term_field_data on MariaDB

daffie’s picture

@kopeboy Something like:

EXPLAIN SELECT "base_table"."path" AS "path", "base_table"."alias" AS "alias"
FROM path_alias base_table
WHERE (base_table.status = 1) AND (base_table.path LIKE '/taxonomy/term/1423' ESCAPE '\')
bradjones1’s picture

Title: Performance issues with path alias generated queries on PostgreSQL » Support GIN/GIST index creation for PostgreSQL and use it on path alias queries
Related issues: +#3343634: Add "json" as core data type to Schema and Database API

Dropping in here from #3343634: Add "json" as core data type to Schema and Database API, where we need GIN index creation support for JSON path query-based indexes.

I hope this helps sway the discussion here regarding whether to use GIN or GiST; the former also allows for JSON data and would enable broader functionality in the DB API.

Updating the issue title to better reflect that this is an index-creation issue first, and also includes a specific performance improvement in path aliasing.

For what it's worth, GIN support for JSON is limited to certain operators, mostly when it comes to object/array/scalar "containment" and key-exists type queries. But, if we're going to support these types of indexes in Postgres, it would be great to also leverage GIN for power users/if we end up supporting containment queries at some point in the future.

daffie’s picture

Title: Support GIN/GIST index creation for PostgreSQL and use it on path alias queries » Performance issues with path alias generated queries on PostgreSQL

As the GIN indexes do to working for path aliases, the solution will be adding a GIST index. As the best solution for JSON fields is a GIN index, we should add them both to Drupal. Each having its own use case.

bradjones1’s picture

Re: #105 yes that's a good plan, however I had renamed this issue with the intent that a similar index-creation logic could be extended to GIN as well. I don't think it makes sense to have a separate issue that addresses pgsql index creation if the approach (special case by name) is going to be the same?

koolaidguy’s picture

StatusFileSize
new2.2 KB

I've attached a patch re-rolled for Drupal 10.1.5 as the other patches on this thread weren't applying.

poker10’s picture

@KoolAidGuy The patch #107 is missing important parts from the last working patch #93 and it does not seems to be based on the patch #93 either. When doing rerolls please be carefull that the new patch does not loose any code, otherwise such rerolls cause only distraction. Also it is preffered to use merge requests these days (which could be a good idea also here, as it seems that this issue will require some additional changes based on #97). Thanks!

I am hiding the patch #107.

bradjones1’s picture

Title: Performance issues with path alias generated queries on PostgreSQL » [PP-1] Performance issues with path alias generated queries on PostgreSQL
Status: Needs work » Postponed

Index creation (for both GIN and GIST, as there is now a concrete use case for each) is broken out into #3397622: Adding GIN and GIST indexes to PostgreSQL databases.

bradjones1’s picture

I have an MR for handling creation of both GIN and GIST indexes over at #3397622: Adding GIN and GIST indexes to PostgreSQL databases - please take a look.

bradjones1’s picture

#3397622: Adding GIN and GIST indexes to PostgreSQL databases has test coverage and is in Needs Review if anyone here wants to move that along to unblock this.

mparker17’s picture

I've attempted to re-roll the patch from #93 onto 11.x at commit f560c83a1c (from 2024-05-15).

mparker17’s picture

@bradjones1 would it be helpful to update this patch to use the API in #3397622: Adding GIN and GIST indexes to PostgreSQL databases, given that it is RTBC?

Any suggestions for how else I could help move this issue forward?

mradcliffe’s picture

@mparker17 thanks for asking. I think the best thing is to start an issue fork and apply the patch you re-rolled into the issue fork branch and start a merge request.

Merge requests will soon be the only way to test and accept changes to Drupal core and contributed projects on drupal.org

bradjones1’s picture

Re: #13 you could apply the patch from that other issue to core using composer-patches and then use it in this issue, yes.

mparker17’s picture

> Re: #113 you could apply the patch from that other issue to core using composer-patches and then use it in this issue, yes.

I merged in the commits, but that idea sounds be a lot easier to review, so I'm going to try that shortly.

mparker17’s picture

Re: #113 you could apply the patch from that other issue to core using composer-patches and then use it in this issue, yes.

@bradjones1 Hmm... could you explain how to do this, or point me towards some documentation?


Aside: in order to change this for !8101, I'd have to drop all the commits in the #3397622: Adding GIN and GIST indexes to PostgreSQL databases branch (or re-create the branch), update composer.json to apply the patch, then re-apply the commits specific to this branch... which is fine, but I don't want to force-push until I know it's going to work, and local testing doesn't seem to work. For simplicity below, the logs will assume I'm re-creating the branch.


I've set up a dev environment using justafish/ddev-drupal-core-dev...

$ git clone --branch 11.x https://git.drupalcode.org/project/drupal.git d10-core
$ cd d10-core
$ ddev config --project-type=drupal10
$ ddev start
$ ddev corepack enable
$ ddev get justafish/ddev-drupal-core-dev
$ ddev restart
$ ddev composer install

... as I expected, it didn't seem like cweagans/composer-patches was required by anything...

$ ddev composer depends 'cweagans/composer-patches'

In BaseDependencyCommand.php line 100:

  Could not find package "cweagans/composer-patches" in your project  

... so I tried installing cweagans/composer-patches, but that failed because, I guess, core references itself in its own composer.json?

$ ddev composer require 'cweagans/composer-patches:^1'
./composer.json has been updated
Running composer update cweagans/composer-patches
> Drupal\Composer\Composer::ensureComposerVersion
Loading composer repositories with package information
Updating dependencies
Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - Root composer.json requires drupal/core 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core[dev-main] but it does not match the constraint.
  Problem 2
    - Root composer.json requires drupal/core-project-message 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core-project-message[dev-main] but it does not match the constraint.
  Problem 3
    - Root composer.json requires drupal/core-vendor-hardening 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core-vendor-hardening[dev-main] but it does not match the constraint.

Use the option --with-all-dependencies (-W) to allow upgrades, downgrades and removals for packages currently locked to specific versions.

Installation failed, reverting ./composer.json and ./composer.lock to their original content.
Composer [require cweagans/composer-patches:^1] failed, composer command failed: exit status 2. stderr=

So I tried simply adding the patch to composer.json...

$ ddev composer config --json extra.patches.'drupal/core' '{"#3397622, !5839 Adding GIN and GIST indexes to PostgreSQL databases": "https://git.drupalcode.org/project/drupal/-/merge_requests/5839.diff"}'
$ git diff
diff --git a/composer.json b/composer.json
index b56c8cec3e..e30ad42bdf 100644
--- a/composer.json
+++ b/composer.json
@@ -96,6 +96,11 @@
                 "               and not intended to be used for production sites.",
                 "               See: https://www.drupal.org/node/3082474"
             ]
+        },
+        "patches": {
+            "drupal/core": {
+                "#3397622, !5839 Adding GIN and GIST indexes to PostgreSQL databases": "https://git.drupalcode.org/project/drupal/-/merge_requests/5839.diff"
+            }
         }
     },
     "autoload": {

...seems fine so far, so lets apply the patch...

$ ddev composer update drupal/core
> Drupal\Composer\Composer::ensureComposerVersion
Loading composer repositories with package information
Updating dependencies
Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - Root composer.json requires drupal/core 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core[dev-main] but it does not match the constraint.
  Problem 2
    - Root composer.json requires drupal/core-project-message 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core-project-message[dev-main] but it does not match the constraint.
  Problem 3
    - Root composer.json requires drupal/core-vendor-hardening 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core-vendor-hardening[dev-main] but it does not match the constraint.

Use the option --with-all-dependencies (-W) to allow upgrades, downgrades and removals for packages currently locked to specific versions.
Composer [update drupal/core] failed, composer command failed: exit status 2. stderr=

... same as before.

What about just updating the lock file?

$ ddev composer update --lock            
> Drupal\Composer\Composer::ensureComposerVersion
Loading composer repositories with package information
Updating dependencies
Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - Root composer.json requires drupal/core == 11.9999999.9999999.9999999-dev (exact version match), it is satisfiable by drupal/core[11.x-dev] from composer repo (https://repo.packagist.org) but drupal/core[dev-main] from path repo (core) has higher repository priority. The packages from the higher priority repository do not match your constraint and are therefore not installable. That repository is canonical so the lower priority repo's packages are not installable. See https://getcomposer.org/repoprio for details and assistance.
  Problem 2
    - Root composer.json requires drupal/core-project-message == 11.9999999.9999999.9999999-dev (exact version match), it is satisfiable by drupal/core-project-message[11.x-dev] from composer repo (https://repo.packagist.org) but drupal/core-project-message[dev-main] from path repo (composer/Plugin/ProjectMessage) has higher repository priority. The packages from the higher priority repository do not match your constraint and are therefore not installable. That repository is canonical so the lower priority repo's packages are not installable. See https://getcomposer.org/repoprio for details and assistance.
  Problem 3
    - Root composer.json requires drupal/core-vendor-hardening == 11.9999999.9999999.9999999-dev (exact version match), it is satisfiable by drupal/core-vendor-hardening[11.x-dev] from composer repo (https://repo.packagist.org) but drupal/core-vendor-hardening[dev-main] from path repo (composer/Plugin/VendorHardening) has higher repository priority. The packages from the higher priority repository do not match your constraint and are therefore not installable. That repository is canonical so the lower priority repo's packages are not installable. See https://getcomposer.org/repoprio for details and assistance.

Composer [update --lock] failed, composer command failed: exit status 2. stderr=

... apparently not!

What if I try running GitLab CI tests locally?

$ alias drupal-ci-local='gitlab-ci-local --remote-variables git@git.drupal.org:project/gitlab_templates=includes/include.drupalci.variables.yml=main --variable="_GITLAB_TEMPLATES_REPO=project/gitlab_templates"'
$ drupal-ci-local
...
 FAIL  📦️ Composer                                   
  >   Invalid version string "HEAD-dev"  
  >                                      
  > validate [--no-check-all] [--check-lock] [--no-check-lock] [--no-check-publish] [--no-check-version] [-A|--with-dependencies] [--strict] [--] [<file>]

... so I'm not sure what to do next.

mparker17’s picture

On the plus side, though, it seems like tests are passing in the merge request as it is now.

emircan erkul made their first commit to this issue’s fork.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

dydave made their first commit to this issue’s fork.