Needs work
Project:
Drupal core
Version:
main
Component:
entity system
Priority:
Major
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
28 Dec 2018 at 16:12 UTC
Updated:
22 Oct 2025 at 13:41 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
gabesulliceAny profiling statistics from XHProf or services like New Relic or Blackfire.io would be really helpful here.
I don't think this is the primary source of the problem, but I'm curious, why are you using the
INoperator instead of=?Comment #3
gabesulliceWhat version of Drupal core are you running?
Comment #4
wim leersComment #5
pixelwhip commentedHere's an example set of query params and the slow query logged to New Relic.
This site is currently on 8.6.2
Comment #6
gabesullice@pixelwhip and I got onto a Zoom together to try to isolate the problem and uncover more detail about the problem. The slow_query.txt and the query_params.txt that @pixelwhip added may not actually be for the same request, but the query parameters are sufficiently representative of the kinds of queries that are being bogged down that I think it gives us enough to go on.
Examining the slow query, I can see that the SQL query has 3 duplicated
WHEREconditions ontaxonomy_term_field_data_?.statusand another 3 duplicated conditions onnode_field_data_?.status.I confirmed that the issue was replicable and directly associated with 8.x-1.24 and that the performance regression was mitigated by adding the
administer taxonomypermission. This tells me that the slow down is related to the additional conditions added byTemporaryQueryGuardbecause no conditions (on terms) would be added if the user had this permission. Thebypass node accesspermission did not see to have a major impact.Observationally, it seems that performance degrades further with the addition of additional filters on taxonomy terms.
Comment #7
pixelwhip commentedI'm using
INinstead of=because I thoughtINmeant "one of" and=meant "exactly". Is this not the case?Thanks for the quick response time, BTW. I really appreciate it.
Comment #8
gabesulliceYou're correct. Each filter in the sample query on the issue summary only has one UUID given, which made my wonder why you wanted it to be "one of" only one UUID. But in the screen share that we had, I saw that in your real use case, it's possible to have numerous UUIDs, so
INmakes perfect sense.Comment #9
gabesulliceI tried to replicate this locally but did not have any success.
To replicate it:
jsonapianddevel_generatebasetargetsfield_foo_termsto thebasenode type referencingtargetsfield_bar_termsto thebasenode type referencingtargetstargetstermsbasenodesI tried a variety of requests (I've exported and uploaded a Postman collection of these requests) filtering by:
field_foo_terms.namefield_foo_terms.nameANDfield_bar_terms.namefield_foo_terms.nameANDfield_foo_terms.parent.nameI experimented with
INvs.=on these requests and discovered no real difference (INwas faster but only be a negligible margin).Finally, to verify the performance impact of 1.24 I ran all these request against 1.23 of JSON:API too.
Doing this, I only observed up to a ~3% increase in response times (for example, the times went from about 290ms to 300ms). These results did not replicate the severe degradation that I observed while screensharing with @pixelwhip. For what it's worth, I only observed the degradation on @pixelwhip's staging environment. We were not able to replicate it locally on his machine either.'
In order to proceed, I think we need to be able to replicate this at will and get live results where we can make changes in code.
I also tried this with JSON:API 2.x. The performance enhancements I was able to see when using the
includeandfieldsparameters far outweighed any slow downs I observed. Looking at thequery_params.txtfile that @pixelwhip uploaded, I see both of these parameters (a nested include too!). I would recommend trying to upgrade to 2.x first. It may mitigate the performance impacts enough to get you onto a secure release while we track down the root cause.Comment #10
gabesulliceThis is the Postman collection I mentioned in #9.Edit: D'oh! I can't upload JSON apparently
Comment #11
gabesulliceComment #12
e0ipsoRegardless of the actual query I'm worried that the same request is significantly slower. The delta between 1.23 and 1.24 is the same in 2.x so I suspect that the degradation may also exist. It's entirely possible that we cannot do anything about it, but as Gabe says we need more data points.
Comment #13
gabesulliceI've spoken with a colleague of @pixelwhip and she is unable to reproduce this locally as well (only on their staging environment).
Because of that and #12, I'm postponing until we get more info. I don't want to scare people away from a security release if this is actually an issue with a hosting environment.
Comment #14
wim leersThanks for digging so deep. It’s great to see all the cross-company and cross-time zone collaboration here!
Comment #15
pixelwhip commentedThanks for the help, all. I'm back from holiday and have been trying to reproduce this locally with no luck. I thought it may have something to do with my local database being significantly out of date or a difference in my local PHP memory limit vs our hosting environment. But after ensuring those matched, I get no significant difference in performance between 1.23 and 1.24. Blackfire shows a ~3% increase in overall response time (~360ms) which aligns with what gabesullice posted.
There is definitely still an issue in our staging environment but now I'm not sure if the upgrade to 1.24 is a red herring or not. I'm wondering if there is something depleting server resources and the long call to /Drupal\jsonapi\Controller\RequestHandler->handle and the subsequent slow query are what New Relic surfaces as a result. Unfortunately we do not have Blackfire running on our staging environment and New Relic does not seem to post any information from requests that time out.
Our next steps are to work with our hosting provider to try to diagnose the issue and further assess what we need to do to upgrade to 2.x.
Comment #16
wim leersThanks for digging deeper and keeping us in the loop, @pixelwhip! 🙏
Comment #17
wim leersDemoting from to for now, since all research points to this not being a bug, and it not even being a regression.
We treat support requests very seriously, so don't worry, we'll still be watching and responding :)
Comment #18
pixelwhip commentedOk, we've just finished refactoring in order to upgrade to 2.x. We're now experiencing similar performance regressions, after upgrading, that are bringing down our server with slow queries. Unfortunately we're still unable to reproduce this locally, it only happens reliably on our hosting provider's server. We've been banging our heads against the wall trying to pinpoint an exact cause. Perhaps someone can help make some sense out of this.
Attached is our slow query logs run through pt-query-digest.
My next step is to compare before and after queries for the same endpoint to see if that points to any major differences.
Comment #19
pixelwhip commentedComment #20
wim leersDarn :( That's really too bad.
That sounds perfect! Thank you so much 🙏
On a related note: #2819335: Resource (entity) normalization should use partial caching makes responses 4x faster if the cache is warm. I'd love to solve the problem you reported at the root though. But I'd feel irresponsible if I didn't at least make you aware of that other issue. If your suspicions are correct, then that patch shouldn't make a material difference anyway: that is only accelerating the normalization of the results of the query.
Comment #21
pixelwhip commentedThanks, Wim. I'm pretty sure the culprit is the the extra JOINS and search predicates added to check the published status of the referenced entities
I'm attaching example before (v1.23) and after (v.2.1) sql statements for very similar (not 100% exact) requests. The v1.23 is actually filtering with more entities. If these were exactly the same requests, the difference in MySQL queries would be even more exaggerated.
The difference is the 2.1 version has 27 more INNER JOIN/OUTER LEFT JOIN statements and 15 more search predicates (all published status checks). gabesullice had actually pointed at this as a potential issue when we first chatted about this.
Making the sql query in the 2.1 example directly to the DB actually causes MySQL to seize up.
Comment #22
wim leersSo then we're back to @gabesullice's conclusion in #6:
A lot of extra joins, many of which are AFAICT unnecessary. I don't know MySQL's optimization capabilities well enough to know whether this would make a material difference.
Similarly, the vast majority of the new
WHEREclauses is essentially saying the same thing: "give me only published nodes":I don't know if MySQL optimizes these away.
This is definitely a cost introduced by the security fix in 1.24/2.0-rc4.
Perhaps we will need to make
EntityFieldQuerysmarter, to not perform unnecessary joins. OTOH, that's what a DB's query planner is supposed to do…Could you run these queries with
EXPLAIN? That will give us valuable information about how MySQL processes these.Comment #23
pixelwhip commentedSure! Attached are the EXPLAIN results. Here's a link to a spreadsheet as well.
https://docs.google.com/spreadsheets/d/1VX9buQ4LdbrY--ZNiamjA86qcm0e049Z...
Comment #24
wim leersThanks for the super fast response! 🙏
I've pinged a colleague who has more SQL query optimization experience. I think we have all the information we need for now!
Comment #25
josh waihi commentedThere is something really off about the SQL query. For example, track the table alias node_field_data_3. It appears only twice. Once on the join:
And once in the WHERE clause
This makes absolutely no sense to me. We're not trying to acquire a field from this table nor join it to a particular type of field. It is abusive for an application to expect the database to plan nonsense like this away (lets not forget Drupal has to do expect this of more than just MySQL).
I'd be interested to know how much this query is slowed down every time a fruitless join like this is added. Could you capture timings for various SQL queries removing a join each time? E.g.
I'm keen to know if the slow down is linear or parabolic (are we multiplying time by the number of joins or is it to the power of the number of joins?)
Interesting to also point out that node_field_data_2 is the only node_field_data_* join with a proper condition on it (title LIKE '%sew%') and DOESN"T contain a status = 1 condition on it...
Comment #26
wim leersThank you so much, @Josh Waihi! 🙏
Comment #27
pixelwhip commentedHere are some differences running these queries locally. It's worth noting that my local environment is on MySQL 5.7 and doesn't seem to have issues even with the most complicated query. However our hosting environment is on MySQL 5.5 with MariaDB and doesn't handle the joins very well.
Joins and Status predicates removed Timing (Seconds)
0 0.83
-1 0.67
-2 0.61
-3 0.6
-4 0.55
-5 0.48
-6 0.44
-7 0.64
-8 0.64
-9 0.47
-10 0.42
-11 0.37
-12 0.33
-15 0.06
Comment #28
wim leers>3.5 months have passed. Is this still a problem?
Comment #29
santerref commentedSame for me in #27 !
MySQL 5.7: 1.30s
MariaDB 10.4: 13.4s
Query:
filter[field_time_end][condition][path]: field_time_end
filter[field_time_end][condition][operator]: IS NULL
filter[field_user.uid]: 3
filter[field_project.id]: fd24d778-bdf3-4aee-8e34-66f0f55bf720
I only have 10 nodes in my Drupal website...
I hope we will find a solution to this issue !
Thanks,
santerref
Comment #30
Anonymous (not verified) commentedI can confirm this behavior as well. At the moment I don't have great logs, but my setup is:
Drupal 8.7.5 using core JSON API module
My hosting provider is Pantheon, and I'm on their basic plan.
I see significant delays on queries that mirror the one posted in the original post. I'm seeing very long response times on queries filtering based on an entity field that allows for multiple values. I'm using subrequests along side it, and it was crashing my server. I pulled out the 4 queries in the subrequest and isolated the culprit, which was a query as I described.
I'll monitor this issue, and would love to help solve this as it creates poor UX from my application as it waits ~15-30 seconds for a fetch to finish.
Edit: I have my `mysqld-slow-query.log` file if that's helpful. And it appears pantheon uses MariaDB v 5.5
Comment #31
wim leers#29 + #30: It'd be very helpful to see both the JSON:API URL and the resulting SQL query, just like @pixelwhip provided earlier in the issue.
Comment #32
pixelwhip commentedSorry for the long delay on this issue. We had postponed the upgrade for a while as other priorities were taking precedence. I'm coming back to this now. We've made some changes to how our search interface generates filter query params in hopes that it would simplify the resulting SQL query. Unfortunately that didn't seem to have any impact. We are still seeing MariaDB queries on Pantheon that are taking 600-950 seconds, which obviously times out the request.
Here is an example JSON API request URL for 10 published future events, that happen at one of 3 locations (node ref) and are applicable to at least one of 3 audiences (term ref) ordered in ascending date order. This is a pretty typical search request for this application.
This particular query took ~ 697s to complete
Comment #33
gabesullice@pixelwhip, I think I remember that we came to the conclusion that it might be MariaDB vs MySQL. #29 seems to add additional evidence for that theory.
Since we were having trouble replicating the issue locally, did you ever try to install MariaDB locally?
I found this StackOverflow issue which sounds very similar: https://stackoverflow.com/questions/35889706/mariadb-running-a-left-join...
@santerref, maybe you can try this ^
Comment #34
wim leers🤯
Comment #35
cachesclay commentedI'm not sure if we were having this problem prior to upgrading to the versions in original post, but we're seeing a pretty severe timeout issue on requests with multiple filters, where all single filters work correctly. please let me know if I can provide any further information, this issue has been taking down our production server when the db fills up with hung processes, so any tips/ideas would be amazing.
(edited: note, this query works fine on my local, using 10.1.31-MariaDB)
/jsonapi/node/organization?page[limit]=20&page[offset]=0&sort=title&include=image,organization_type,programs,programs.timing&filter[q0][condition][path]=program&filter[q0][condition][operator]=%3D&filter[q0][condition][value][0]=1&filter[q1][condition][path]=location.administrative_area&filter[q1][condition][operator]=%3D&filter[q1][condition][value][0]=TX&filter[q2][condition][path]=regions.tid&filter[q2][condition][operator]=IN&filter[q2][condition][value][0]=603&filter[q3][condition][path]=organization_type.tid&filter[q3][condition][operator]=IN&filter[q3][condition][value][0]=119&filter[q3][condition][value][1]=120&filter[q3][condition][value][2]=485&filter[q3][condition][value][3]=16&filter[q3][condition][value][4]=125&filter[q4][condition][path]=programs.certification_area.tid&filter[q4][condition][operator]=IN&filter[q4][condition][value][0]=201&filter[q4][condition][value][1]=205&filter[q4][condition][value][2]=236&filter[q4][condition][value][3]=231&filter[q4][condition][value][4]=200&filter[q4][condition][value][5]=199&filter[q5][condition][path]=programs.tuition_out_state&filter[q5][condition][operator]=BETWEEN&filter[q5][condition][value][0]=250&filter[q5][condition][value][1]=20050Comment #36
joelsteidl commentedI can definitely confirm that this is a MariaDB issue (10.0, 10.1 & 10.3...I didn't test on 10.2). I get all kinds of timeouts.
I pointed a Pantheon multidev instance to a SERIOUSLY slow database on Dreamhost shared hosting running MySQL 5.7 and it works fine.
Can I provide other details that would be helpful?
Comment #37
e0ipsoThanks for providing more information! I'm sure this will be helpful to find a root cause.
I do not think there is an active push on this particular bug. We are waiting for a patch so we can review it. This is not a JSON:API specific issue, but an Entity Query API one. However JSON:API is more affected by it because its intense use of Entity Query API.
Comment #38
e0ipsoComment #39
gabesulliceThanks for updating the issue @e0ipso! I agree with all of that.
Comment #40
Anonymous (not verified) commentedFWIW, I migrated away from Pantheon as my provider because of this issue. MySQL 5.7 seems to hum through the extra joins for sure. At this point I just wouldn't recommend Pantheon as a hosting provider for a headless setup, I ended up spinning up a droplet which also gave me more fine grain control over Varnish and Redis at a better price point.
Comment #41
david straussA difference between MySQL and MariaDB here is unlikely as the root cause, as MySQL and MariaDB have similar query planners and storage engines. What would indicate a difference unique to MariaDB would be differing results from EXPLAIN. It's possible, but I think it's unlikely. (Please post your EXPLAIN results on both MariaDB and MySQL if you do believe this is the case.)
What you are more likely seeing is something like the query cache being enabled on MySQL 5.7 but not MariaDB. Many platforms enable the query cache on MySQL, even on 5.7. This creates a superficial experience of better performance, but it's a bit of a Faustian bargain.
We disable the query cache on Pantheon for several reasons:
The side effect of disabling the query cache is that some queries will appear to run slowly with MariaDB even though MySQL (with the query cache enabled) may seem to "hum" through them. However, such setups are not sustainable with MySQL because of reliability issues that occur with the query cache under heavy loads (which is, ultimately, why the feature was deprecated and removed from MySQL).
Based on the multiple AND conditions that put constraints on the far side of JOINs, it's unlikely that either MySQL or MariaDB can run this query efficiently on a real-time basis. No plan over table-level indexes can even exist even in relational algebra theory to efficiently execute such a query.
Assuming optimizing the query isn't an option, the best choice is probably caching the results (or something generated from the results) in the Drupal object cache. That way, the module won't rely on deprecated MySQL behavior to maintain decent performance.
Comment #42
joelsteidl commented@David Strauss thanks for jumping in.
Trying to troubleshoot the queries JSON:API is creating via Entity Query API on Pantheon, I actually spun up a MySQL 8.0 instance (and 5.7) and it works fine on both. I'm not saying the query cache isn't a factor, but no matter the MariaDB version I tried (Pantheon/Non-Pantheon) MariaDB times out with the query.
Comment #43
gabesulliceThanks for chiming in @David Strauss! Your comment has me a bit confused though:
If the default/common configuration of these two databases differ, then I think it's fair to say that the difference between them is a contributing factor of the "extremely poor performance" our users are experiencing. Please note the wording of the issue title; it's expressing a correlation not a causation.
🤔
I think this is jumping to conclusions without sufficient data. What we know is that there is a difference between default MySQL and default MariaDB. What we don't know is that it's a result of a query cache. @joelsteidl points out that he has tried many version of both databases and consistently finds a difference between them.
When I was trying to replicate this with @pixelwhip some months ago, we did not observe extremely poor performance on the first execution of the query using MySQL and then "fast" performance on subsequent iterations. I am not a database expert, but I think this is some evidence that the query cache is unlikely to be the sole reason for the difference.
This is already the case. Responses are cached by the dynamic page cache module, the page cache module and subsequently Varnish. This is an ineffective solution @pixelwhip and @joelsteidl's case because the URL query string is generated by a user-filterable listing (think Views exposed filters, but using React + JSON:API).
I agree that this is the best, next step.
Comment #44
joelsteidl commentedI'd be happy to do this, but I honestly don't know the best way to capture that information. Any direction would be super helpful.
Joel
Comment #45
david straussHappy to help! Here's how I run the analysis:
First, optimize the tables involved in the query. If you skip this, it's likely that the database server that had its data more recently "restored" or copied will have better statistics for the database's query cost planner, which is what chooses the strategy for running the query. Running the optimizer on the tables will ensure we're comparing apples-to-apples. (Also, sometimes this fixes the performance issue entirely.)
To optimize tables, one option is to run OPTIMIZE TABLE tbl_name for each involved table. Alternatively, you can use mysqlcheck to easily run it on all tables in the database. The mysqlcheck utility supports similar options to the normal mysql CLI.
Second, run an example of the slow query with the prefix "EXPLAIN" from the CLI. I usually like to end the command with "\G" instead of a semicolon to get nicer formatting for pasting here:
I made that example query intentionally bad -- it has to examine each of the 2621 rows of "users" -- to show a bit of interesting data on the example report above.
Comment #46
joelsteidl commentedThanks David!
Results attached. I did not run against MySQL 8.0 (only 5.7), but can if need be.
Comment #47
david straussThe EXPLAIN results are enlightening -- and not for the reason I anticipated. The execution plans themselves are pretty similar between MariaDB and MySQL. The surprising part is that the EXPLAIN itself took over a minute on MariaDB. That isn't typical; EXPLAIN should run nearly instantaneously. (EXPLAIN doesn't do the work of running the query, only the work of creating a plan for doing so.) The only data I could find on slow EXPLAIN results was an old bug in MySQL for EXPLAIN on queries with subqueries, but you don't have any subqueries (and neither database you're testing should have the bug, anyway).
So, I experimented with the query to see if I could determine where things are tipping over. This module's machine-generated query has a number of peculiar attributes.
First, it JOINs to the "node_field_data" table eight times using the same ON condition (node_field_data_X.nid = base_table.nid, either directly or transitively) as well as the same WHERE conditions (node_field_data_X.status = '1'). The exception is that the eighth JOIN has a WHERE condition on the type (node_field_data_8.type = 'event').
If I switch the WHERE conditions on the redundant node_field_data JOINs to all place the conditions on the first node_field_data table, EXPLAIN drops from a minute to 1-3 seconds. If I drop the redundant JOINs as well, then the EXPLAIN time drops to milliseconds on MariaDB.
Other weirdness with the query includes:
I'm going to take a look at why this module is building queries this way. I believe MySQL 5.7+ may realize the funkiness of what's happening and "fix" it before running the query plan, but we ought to produce a better query in the first place.
Comment #48
david straussI fear this issue may be misfiled. I went to investigate the underlying query generation code on the site, and it's using the contrib JSON:API module, not core's. While the 2.x version of contrib JSON:API tries to match core's code (at least for new changes), this query seems to be generated by the 1.x version of contrib JSON:API. Somehow, I missed the reference in the initial issue description saying "after upgrading to 1.24" and its connection to the contrib module's version. I see references to similar issues on contrib 2.x, but that's still contrib's implementation, which may differ somewhat.
I'll keep digging in the hope that this issue seen with the contrib module corresponds to an issue in core.
Comment #49
joelsteidl commentedSomeone on the JSON:API team can correct me, but I think in v1.24, the performance issue was introduced and that made it into 2.x and into CORE.
See https://www.drupal.org/sa-contrib-2018-081
The fix introduced additional queries to do proper access checking.
Comment #50
gabesullice@David Strauss, @joelsteidl is correct.
This issue is present on both 1.x and 2.x version in contrib and the core version of JSON:API.
The highly irregular JOINs are added by the class
Drupal\jsonapi\Access\TemporaryQueryGuard. If you'd like me to explain the reasoning for that code in private, I'd be happy to do that. It was added as part of the security release to which @joelsteidl provided a link.That security release was committed to both the 1.x and 2.x versions of contrib. Later, the 2.x version of contrib JSON:API was committed in its entirety to Drupal core in 8.7 with only slight modifications (none of which should have affected this issue).
In effect, all versions of JSON:API in core or contrib risk suffering from this issue.
Comment #51
alex dicianu commentedWe did some further digging using the method described in this article (https://pantheon.io/blog/profiling-mysql-queries-better-performance) to profile the query from #46 and as expected the majority of the time (~162 seconds or ~100% of the total execution time) was spent in the statistics phase - usually, an indicator that the server is spending a long time in the optimization phase where it tries to choose the right execution plan.
Keeping in mind that a join over n tables will have n-factorial combinations of join order to investigate, our query with 42 tables will have a … wait for it … 1.4e51 combinations!! (with 10 tables to join, the number of combinations would be ~3.6M; 42! is orders of magnitude higher).
Fortunately, MariaDB (and MySQL) offer us a couple of options to control the query plan execution, as per https://dev.mysql.com/doc/refman/5.5/en/controlling-query-plan-evaluation.html. I’ve chosen to play with optimizer_search_depth which tells the server how deep into the execution plan the optimizer should look.
On Pantheon, it’s currently set to 62 (the max number of joins allowed which is also the default option). In this particular case, this may be too high. Setting it to zero and running the query returned a result in 0.05 seconds (quite fast).
Profiling it again shows that the majority of the time is still spent in the statistics phase (about 95%), but that runs in only 0.049 seconds this time.
Increasing the optimizer_search_depth to 7 makes no difference in the query response time. Increasing it even more to 20 does affect performance and the query finishes in 4 min 26.25 sec.
It seems that other people have run into this issue before.
https://mariadb.com/resources/blog/setting-optimizer-search-depth-in-mysql/
https://www.percona.com/blog/2012/04/20/joining-many-tables-in-mysql-optimizer_search_depth/
I don’t think there’s an inherent problem with MariaDB in this case. I'm guessing newer versions of MySQL may come with different default settings which might be better in this specific case. As far as I’m aware the query optimizer is pretty much the same between MySQL and MariaDB. I do think that Drupal/jsonapi are abusing MariaDB with this many joins. The “real“ fix here as I see it is probably a combination of lowering the optimizer_search_depth and Drupal/jsonapi improving how that query is built.
Comment #52
david straussFrom @gabesullice:
I'd like to take a look, but the modifications to the JOINs I've suggested don't affect the query results. So, they shouldn't undermine any security hardening or permissions checking that's in place. It's fairly common for query builder systems to de-duplicate equivalent JOINs (or not create redundant ones in the first place).
In the mean time, the advice from @dicix is the most actionable path to reducing the excessive time spent in query execution planning caused by the excessive/redundant JOINs.
From @dicix:
I looked into this, but it doesn't appear to be the case. The difference in behavior seems to come from elsewhere, but the other query plan optimizer settings don't make it obvious.
From @dicix:
This is true, and the query plans we saw from both MySQL and MariaDB are both very similar (and have indistinguishable execution time differences when run).
Comment #53
wim leersComment #54
david straussI haven't written or posted such a change yet, and I think such changes need to go into the Entity Query API, anyway.
I noticed as much when diving into the code, but there's quite a bit of abstraction around the process. So far, I'm looking at
\Drupal\Core\Entity\Query\Sql\Tables::ensureEntityTable(). Is that a sensible place to add protection against redundant JOINs?Comment #55
joelsteidl commented@Alex Dicianu thanks for all the helpful info in #51!
Based on your feedback, I set
SET GLOBAL optimizer_search_depth=7;and the query is super fast now on MariaDB/Pantheon.For anyone else on Pantheon or a host using MariaDB and you have poor performance with JSON:API, this seems like a reasonable work around.
SET GLOBAL optimizer_search_depth=7;or whatever value you think is appropriateComment #56
pixelwhip commentedThanks, everyone! After applying that setting from #51, our application seems to be running faster than it was pre-upgrade. This made for an awesome start to my day!
Comment #57
ghost of drupal past*sheepishly* yeah
Tablesis not exactly optimized for performance. To be fair, it was hard enough to write it, I had some hopes it would get optimized for performance and I am real happy David is on it. Go David!Comment #58
wim leers#57: Thanks for confirming my suspicion that it was never really optimized for performance. It's totally understandable though :)
Thanks @David Strauss for diving deeper into this! 🥳🙏
Comment #59
joelsteidl commentedI hope this is not hi-jacking this thread too much, but I'd like to close the loop on the workaround I posted yesterday. A Pantheon engineer mentioned that doing
SET GLOBAL optimizer_search_depth=7;has the potential to not persist across certain server events. It seems like something like this could work, but I wanted to get some input.In
settings.php$databases['default']['default']['init_commands']['optimizer_search_depth'] = 'SET SESSION optimizer_search_depth = 7';Is setting
SESSIONthere what I really want to do?I REALLY appreciate all the help on this issue.
Comment #60
david straussThey are correct, and the workaround you specified for
settings.phpappears to be the right way to ensure it's always set for JSON:API's database access. Given the performance difference you're seeing, it should be clear whether thesettings.phpchange is in effect if you runSET GLOBALto reset the global setting back to near the default (approximately 50).It should be just fine, as the code snippit will cause Drupal to request that setting for every connection it establishes. The places it won't be in effect will be the MySQL CLI and other non-Drupal connections to the database. Given that the problem query is coming from Drupal, that should be fine.
For the longer term, we're going to look into rolling a similar configuration change platform-wide on Pantheon. I just need to understand more about why the optimizer search depth is set so high by default (and whether setting it lower has consequences we need to consider).
Comment #61
david straussHi ChX! I'll start a fresh comment to talk about optimizing the generated query.
The easiest way to optimize
JOINs is if we know we don't need toJOINto the same foreign table more than once per query (at least per uniqueONcondition). If that's the case, then it's usually possible to avoid creating the redundantJOINs in the first place.However, this could get complicated if there's a need to
JOINwith the same foreign table more than once with the sameONcondition but differentWHEREconditions. If that's necessary, it's impossible to know with just theJOINdata whether theJOINs are redundant. So, if we need toJOINto the same foreign table multiple times with the sameONcondition but differentWHEREconditions, we would need to prune theJOINs with identicalONandWHEREconditions.Technically, the query we're discussing falls under the latter category, but I'd like to check that it's actually necessary to support that.
Comment #62
cachesclay commentedSo many thank you's to everyone in this thread, this has been such a stressful ongoing issue and to know what it is and have a fix is incredible.
Comment #63
joelsteidl commentedThanks @David Strauss!
Confirming that the settings.php fix from #59 works well.
I'll be interested to hear if lowering that setting in general has any positive performance impact.
Comment #65
weri commentedWe expecting the same problem as soon as we ad filters to multiple terms related to a node and call it as anonymous.
After some debugging and before I found this issue, I saw the multiple (unnecessary) joins in the database query. The most of them coming from applying TemporaryQueryGuard::applyAccessControls().
The main problem of the many joins, specially when you have a lot of translations and revisions, the un-grouped result returns in our case over 1 million rows and grouped just 30. The database produces a cartesian product over multiple tables.
As I set the optimizer_search_depth to 3 the performance was quite good, but not in all cases.
I hope there is a way that for the same table only one join is created. When I optimize the SQL query directly, that we have only one join to the node_field_data table, the query is blazing fast.
Comment #66
fakir22 commented@joelsteidl Thanks for the solution in #59 , it worked for us too.
Regarding your question about lowering the setting for optimizer_search_depth : we didn't set the value to 7 but instead we set it to 0 as the MariaDB documentation states :
After running some tests, set optimizer_search_depth to 0 seems to offer best performances for us.
But as stated here , a settings between 7 and 8 fits 99% of all cases.
Comment #67
weri commentedI think this is related: #2875033: Optimize joins and table selection in SQL entity query implementation
Comment #69
nils.destoop commentedWe are also experiencing similar issues with one of our sites. The issue is related with the TemporaryQueryGuard, it iterates over all the properties, and all possible relations. But it does not take into account that the status condition can already be set by previous iterations.
Attached a patch that fixes already a lot for us (drupal 8.9), but it's still not perfect. We are still having duplicate joins, because the api is already filtering default on status TRUE. With the patch, our load time decreased from 1.2min to 7s on cold cache.
When I remove the extra filter on status, the load time goes to <2 secs
Comment #70
wim leers@nils.destoop That makes a ton of sense, thank you! 🙏 Can't wait for @gabesullice's review of this! 🤓
Comment #71
nils.destoop commentedSame patch on another project lowered the response time from 6-9 seconds to 125 - 300ms. We are using a custom version where we are already marking node as secured, because status is default already added.
Comment #72
skyredwang@nils.destoop I applied the #69 patch on Core 8.9.6, and profiled
/jsonapi/commerce_order/defaultendpoint with a few filters on MariaDB 10.3.13 and MariaDB 10.4.10, but I still see the crazy joins below:Please let me know if there is additional debugging information that might help this issue.
Comment #74
nils.destoop commented@skyredwang The patch does idd not remove all the crazy joins. It only limits the amount of joins. My patch fixes multiple joins that use the same status condition (published terms / nodes / ...). Your remaining joins seems to be not related with the TemporaryQueryGuard.
The patch will also not prevent a duplicate join if you are already filtering on the status via a filter in the querystring. For our projects, we are using a custom patch that already marks node as 'guarded'.
The real issue is probably #2875033: Optimize joins and table selection in SQL entity query implementation
Comment #75
cracu commentedI used optimizer_search_depth approach with values between 0 and 7.
It solved our issues initially, but once the editors entered more content even this approach was not enough.
Additionally I managed to get rid of some joins related to taxonomy, by implementing :
hook_jsonapi_entity_filter_access()
e.g.
In our case... we assumed that all taxonomies are published.
And still, if we wanted to do a semi-complex filter like:
Select nodes of type document
where
(filter_taxonomy_field1 IN (value1, value2, value3) OR filter_taxonomy_field1 IS NULL) AND
(filter_taxonomy_field2 = value_filter2 OR field_taxonomy_field2 IS NULL) AND
(filter_taxonomy_field3 = value_filter3 OR field_taxonomy_field3 IS NULL)
then, for every filter (node field) in place we get an additional join to node__field_data and status is checked multiple times. Patch posted above might help with these joins, still, this is not the main issue.
Needless to say that if we have 200 possible terms for filter_taxonomy_field1, 30 terms for filter_taxonomy_field2 and 20 terms for filter_taxonomy_field3 the query itself breaks mysql or it remain hanging.
Before the group by vid, nid is applied, it has to parse 1-2 million rows to get in the end to just 4 possible combinations.
Group By is clearly not the fastest solution here and I wonder why we don't have a SELECT DISTINCT in place, which will greatly improve the speed.
so... from an initial query like:
we applied hook_jsonapi_entity_filter_access()
we got:
and the last bit was altering the query and removing group by and using distinct and the query became:
Once we got to this query...
SET SESSION optimizer_search_depth = 7was not needed anymore and it needed around 0.2 sec to execute.Alteration of query was quite direct. We wanted to target only jsonapi queries, but I don't really know if that is possible other than looking for pager_size metadata.
This might be handy for those who lost many hours on this:
!!! Distinct approach will not work when SORT is involved.
Comment #82
marysmech commented#59 seems to fix the problem. I use
optimizer_search_depth = 0and queries take only few ms (previously timeouted after 30s).Comment #83
attheshow commentedIt looks like the patch in #69 is no longer applying for Drupal 10.3.
Comment #84
eliaspapa commentedI also get an issue with applying the patch after updating to 10.3. Any updated patches available?
Comment #85
eliaspapa commentedAfter carefully investigating the changes made to the file with the updates it would seem that the patch has been added into it so there is seemingly no need for it anymore.
Comment #86
attheshow commented@EliasPapa Where are you seeing that the issue has been addressed? We're not seeing that anywhere in the code for 10.3.
Comment #87
attheshow commentedThis is a re-roll of patch #69 for Drupal 10.3.
Comment #88
andypostFor MR to 11.x
Comment #89
driskell commentedThe patch in #87 / #69 looks like it would disable access checks on a node if there was an entity reference to another node, as it would only apply access check once per entity type. Which could be a security issue. Might need some work.