Problem/Motivation
Looking at PostgreSQL logs, these SELECT ... FROM information_schema.columns ...
are consistently slow, e.g. > 5 ms (or 10-20 ms) while almost everything else is < 2 ms on my system. This should be cached someway.
Proposed resolution
Change to querying the pg_attribute and pg_attrdef tables directly for the bytea and sequence data for a table.
Currently this has decreased the testbot run time by 10 minutes, which takes it to around ~51 minutes from 1 hour 2 minutes. This will save on server time. Additional PHP optimizations help reduce the test bot time on PHP 7 to ~38 minutes.
Remaining tasks
- Do research (bzrudi71) - DONE
- Write a patch (mradcliffe, alexpott) - DONE
- Review patch (daffie, bzrudi71) -
- Performance review - DONE
User interface changes
None
API changes
Table information is gathered in a different way with potentially some differences for PostgreSQL. This does not seem to have any affect on tests.
Data model changes
None.
For the committer
Can bzrudi71 also get commit credits for being the one that came with the solution for the problem.
Comment | File | Size | Author |
---|---|---|---|
#40 | 1079762-40.patch | 3.83 KB | mradcliffe |
#31 | 1079762-31.patch | 4.64 KB | alexpott |
#31 | 29-31-interdiff.txt | 5.47 KB | alexpott |
Comments
Comment #1
ogi CreditAttribution: ogi commentedThis is not a feature. It's a bug. It eats a lot of total query time because this query is used often.
columns
is a view and is intended for admin applications, not for all insert/update queries. The Drupal query should be rewritten so that it's faster. I'm considering this a major performance bug. Here is the definition of this view:Comment #2
catchMoving to PostgreSQL driver, I'm not convinced this is a major bug in Drupal, MySQL used to have very slow information_schema performance which was fixed in more recent releases, so optimizing it in Drupal itself somehow comes under 'normal task' for me.
Also it needs to go into 8.x first then be backported.
Comment #3
bzrudi71 CreditAttribution: bzrudi71 commentedClosed #2451723: Database::queryTableInformation optimization for PostgreSQL as a dup and added information here.
Database::queryTableInformation can be optimize, here is an example of the SQL generated by Database::queryTableInformation:
compared with an equivalent (I think) query against PostgreSQL system catalog tables:
The problem is originally reported by nathanweeks.
Comment #4
bzrudi71 CreditAttribution: bzrudi71 commentedComment #5
bzrudi71 CreditAttribution: bzrudi71 commentedComment #6
bzrudi71 CreditAttribution: bzrudi71 commentedEven so the results in #3 seems a bit outdated, because both queries finish within <1ms on a recent machine, the query against PostgreSQL system catalog tables is still around 8x faster than the current implementation. So seems worth to think about as we call queryTableInformation() very often!
Comment #7
bzrudi71 CreditAttribution: bzrudi71 commentedUnfortunately the query against the system catalogue doesn't return anything ;)
While this works:
The catalogue query doesn't work:
Guess we need to have a second look on this...
BTW Doesn't even work with '%nextval%' and the check for views in
AND pg_class.relkind IN ('r', 'v')
seems obsolete.Comment #8
bzrudi71 CreditAttribution: bzrudi71 commentedTried a handful of alternative queries but always failed to gather *all* required information from the catalogues ;)
References:
pg-type
pg-class
pg-attribute
pg-attrdef
Comment #9
bzrudi71 CreditAttribution: bzrudi71 commentedWell, at least I got an working alternative here that gives some results. However, sadly this is only 3-4 times faster than the information schema query ;)
Any ideas to further speed up? Guess we should make a patch to actually see what happens :)
Comment #12
mradcliffePutting @bzrudi71's SQL into a patch and see what happens. I thought about if it would be possible to split into 2 distinct queries (bytea columns and sequence) to maybe improve the performance based on #9.
Comment #13
mradcliffeMaybe add an actual patch.
Comment #14
daffie CreditAttribution: daffie commented@mradcliffe: You should not remove the quotes around the
.:table
variable.Comment #15
mradcliffe@daffie: I don't think I am removing the quotes unless you mean from @bzrudi71's query. The placeholder should add the quotes from the bound parameter, right? If I added quotes around the placeholder it doesn't work.
In any case, typrelid is going to be 0 for any non-composite data types and thus the query is not returning any bytea columns. Maybe exclude pg_type table join? This seems to work locally.
Comment #16
daffie CreditAttribution: daffie commented@mradcliffe: If you take a look in the database the value is something like
'public.node'::regclass
. And notpublic.node::regclass
.Comment #17
mradcliffeBut when I do this
or adding quotes in the parameters array it gives the following error:
Comment #18
mradcliffeFlipping back to Needs Work.
pg_attribute is not updated yet? Maybe there is a commit/transaction issue? Or a caching issue?
Comment #19
mradcliffepgattrdef.adsrc is a historical record so won't be updated when the default value for a column is changed. The documentation suggests as such https://www.postgresql.org/docs/9.1/static/catalog-pg-attrdef.html.
Here's a patch that explains the query and makes that change.
Comment #20
mradcliffeLet's write an issue summary too.
Comment #21
mradcliffeExtra space here:
pg_attribute.attrelid AND pg_attrdef.adnum
Comment #22
daffie CreditAttribution: daffie commentedThe select statement looks great. Good find that you could remove the join to the pg_type table. I have a few small points:
Can we add the constant value direct in the statement.
Can we rename
:table
to:key
. With key being the combined schema.table variable.Comment #23
mradcliffe+1 for consistency in variables and placeholders.
I think we still need to keep the LIKE, but can search from the end. It's possible a sequence next value could be generated from other functions though I honestly doubt the use of wrapping nextval() in any math functions.
Edit: The reason we might not be able to do exact matching on the nextval is that the sequence/key may need to be hashed based on the length of the string.
Comment #24
daffie CreditAttribution: daffie commented@mradcliffe: If you think that it is that important to keep "%nextval%", then lets keep it. You have made the change:
Is this change a mistake? Or not?
Comment #25
mradcliffeNo, it is not a mistake. I meant to keep that change as all of Drupal's sequence entries begin with
nextval(
so we do not need to search for nextval in the middle of a string.Comment #26
daffie CreditAttribution: daffie commentedThe patch looks great now.
The testbot with this patch is 10 minutes faster then without the patch. So the problem of this issue is addressed.
No extra tests are needed for this issue.
It get a RTBC from me.
Great work @mradcliffe.
Comment #27
bzrudi71 CreditAttribution: bzrudi71 commentedNice research @mradcliffe - well done. I did a short review and all seems good to me, including the change for nextval%.
Thanks a ton and RTBC from my side too :)
Comment #28
alexpottWow the performance improvement due to this patch is impressive.
Why do we bother with value replacement when the the value is hardcoded?
Comment #29
bzrudi71 CreditAttribution: bzrudi71 commentedRe #28 Good point let's do that... I also testet at least the query itself on PostgreSQL 9.4 and 9.5 just to make sure it works even with latest versions.
Comment #30
daffie CreditAttribution: daffie commentedI asked for the same change in comment #22. So, for me this is RTBC again.
Comment #31
alexpottIt is great to see some effort going into making postgres faster by default since there is no reason it should be this much slower then MySQL. If this is a performance sensitive function then we should do more to avoid unnecessary queries. The patch attached does this - which I think is all in scope because we're doing the change for performance reasons.
Let's use nowdoc syntax - all this concatenation is unnecessary and ending with the empty space a bit ugly.
Comment #32
mradcliffeInterested to see how this will work out on the tests.
Nice find.
Nice. I think this should work out and should clean up disk usage since we are probably creating lots of temporary schemas each connection.
Comment #33
mradcliffe1 minute off the 5.5 run, and 5 minutes off of the 7.x run. That's fairly significant. That probably means that the PHP changes in the driver are more optimized with the patch in #31 and the temporary schema change in #31 is consistent.
RTBC for me. I'll let someone else confirm and change it to get more eyes on the patch.
Comment #34
mradcliffeUpdated issue summary.
Comment #35
bzrudi71 CreditAttribution: bzrudi71 commentedWow! Another 5 min speed improvement for the PHP7 testbot, impressive :) I don't find anything wrong with the patch from alexpott and it's in scope of this issue, so let's get it in - RTBC.
Thanks @alexpott for the nice patch improvements!
Comment #36
daffie CreditAttribution: daffie commented+1 for RTBC. Thanks @alexpott for all your help!
Comment #38
catchCommitted/pushed to 8.3.x, thanks!
Comment #40
mradcliffeHere's a back port patch for Drupal 7.
It looks like everything is supported for those old versions of PostgreSQL.
Comment #42
mradcliffeBad test bot.
Comment #43
alexpott@mradcliffe the new backport policy is to open a new issue and related it to this one - see https://www.drupal.org/core/backport-policy
Comment #44
mradcliffeOh, I didn't realize the policy changed. Thank you, @alexpott.
Follow-up issue created for Drupal 7: #2823488: Backport DatabaseSchema_pgsql::queryTableInformation() improvements.