After upgrading to 1.24 we've experienced server crashing performance issues with filters including multiple entity references. I've reduced an example request down to what seems to be a minimal set of filters that lead to a timed out response.

  ?filter[status][value]=1
  &filter[node--location][condition][path]=field_location.uuid
  &filter[node--location][condition][value][]=1c28abaa-f088-404f-925f-263838332601
  &filter[node--location][condition][operator]=IN
  &filter[taxonomy_term--audience][condition][path]=field_event_audience.uuid
  &filter[taxonomy_term--audience][condition][value][]=9422db2a-0071-4d3e-8b33-f3762c26fc19
  &filter[taxonomy_term--audience][condition][operator]=IN
  &filter[taxonomy_term--audience][condition][memberOf]=audience
  &filter[audience][group][conjunction]=OR
  &filter[parent][condition][path]=field_event_audience.parent.uuid
  &filter[parent][condition][value][]=9422db2a-0071-4d3e-8b33-f3762c26fc19
  &filter[parent][condition][operator]=IN
  &filter[parent][condition][memberOf]=audience

I haven't been able to test this with 2.x yet as we're relying on consumer_image_styles at the moment and it does not appear to be compatible with 2.x. That's the next step but wanted to report this in case anyone has other ideas in the meantime.

Issue fork drupal-3022864

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

pixelwhip created an issue. See original summary.

gabesullice’s picture

Any profiling statistics from XHProf or services like New Relic or Blackfire.io would be really helpful here.

I don't think this is the primary source of the problem, but I'm curious, why are you using the IN operator instead of =?

gabesullice’s picture

What version of Drupal core are you running?

wim leers’s picture

pixelwhip’s picture

StatusFileSize
new3.83 KB
new2.62 KB

Here's an example set of query params and the slow query logged to New Relic.

This site is currently on 8.6.2

gabesullice’s picture

@pixelwhip and I got onto a Zoom together to try to isolate the problem and uncover more detail about the problem. The slow_query.txt and the query_params.txt that @pixelwhip added may not actually be for the same request, but the query parameters are sufficiently representative of the kinds of queries that are being bogged down that I think it gives us enough to go on.

Examining the slow query, I can see that the SQL query has 3 duplicated WHERE conditions on taxonomy_term_field_data_?.status and another 3 duplicated conditions on node_field_data_?.status.

I confirmed that the issue was replicable and directly associated with 8.x-1.24 and that the performance regression was mitigated by adding the administer taxonomy permission. This tells me that the slow down is related to the additional conditions added by TemporaryQueryGuard because no conditions (on terms) would be added if the user had this permission. The bypass node access permission did not see to have a major impact.

Observationally, it seems that performance degrades further with the addition of additional filters on taxonomy terms.

pixelwhip’s picture

I'm using IN instead of = because I thought IN meant "one of" and = meant "exactly". Is this not the case?

Thanks for the quick response time, BTW. I really appreciate it.

gabesullice’s picture

I'm using IN instead of = because I thought IN meant "one of" and = meant "exactly". Is this not the case?

You're correct. Each filter in the sample query on the issue summary only has one UUID given, which made my wonder why you wanted it to be "one of" only one UUID. But in the screen share that we had, I saw that in your real use case, it's possible to have numerous UUIDs, so IN makes perfect sense.

gabesullice’s picture

I tried to replicate this locally but did not have any success.

To replicate it:

  1. I created a new installation of Drupal core 8.6.5
  2. I installed jsonapi and devel_generate
  3. I created a node type named base
  4. I created a vocabulary named targets
  5. I added a term reference field named field_foo_terms to the base node type referencing targets
  6. I added a term reference field named field_bar_terms to the base node type referencing targets
  7. I generated 100 targets terms
  8. I generated 10,000 base nodes

I tried a variety of requests (I've exported and uploaded a Postman collection of these requests) filtering by:

  1. Just field_foo_terms.name
  2. field_foo_terms.name AND field_bar_terms.name
  3. field_foo_terms.name AND field_foo_terms.parent.name

I experimented with IN vs. = on these requests and discovered no real difference (IN was faster but only be a negligible margin).

Finally, to verify the performance impact of 1.24 I ran all these request against 1.23 of JSON:API too.

Doing this, I only observed up to a ~3% increase in response times (for example, the times went from about 290ms to 300ms). These results did not replicate the severe degradation that I observed while screensharing with @pixelwhip. For what it's worth, I only observed the degradation on @pixelwhip's staging environment. We were not able to replicate it locally on his machine either.'

In order to proceed, I think we need to be able to replicate this at will and get live results where we can make changes in code.


I also tried this with JSON:API 2.x. The performance enhancements I was able to see when using the include and fields parameters far outweighed any slow downs I observed. Looking at the query_params.txt file that @pixelwhip uploaded, I see both of these parameters (a nested include too!). I would recommend trying to upgrade to 2.x first. It may mitigate the performance impacts enough to get you onto a secure release while we track down the root cause.

gabesullice’s picture

This is the Postman collection I mentioned in #9.

Edit: D'oh! I can't upload JSON apparently

gabesullice’s picture

StatusFileSize
new5.33 KB
e0ipso’s picture

Regardless of the actual query I'm worried that the same request is significantly slower. The delta between 1.23 and 1.24 is the same in 2.x so I suspect that the degradation may also exist. It's entirely possible that we cannot do anything about it, but as Gabe says we need more data points.

gabesullice’s picture

Status: Active » Postponed (maintainer needs more info)

In order to proceed, I think we need to be able to replicate this at-will and get live results where we can make changes in code.

I've spoken with a colleague of @pixelwhip and she is unable to reproduce this locally as well (only on their staging environment).

Because of that and #12, I'm postponing until we get more info. I don't want to scare people away from a security release if this is actually an issue with a hosting environment.

wim leers’s picture

Thanks for digging so deep. It’s great to see all the cross-company and cross-time zone collaboration here!

pixelwhip’s picture

Thanks for the help, all. I'm back from holiday and have been trying to reproduce this locally with no luck. I thought it may have something to do with my local database being significantly out of date or a difference in my local PHP memory limit vs our hosting environment. But after ensuring those matched, I get no significant difference in performance between 1.23 and 1.24. Blackfire shows a ~3% increase in overall response time (~360ms) which aligns with what gabesullice posted.

There is definitely still an issue in our staging environment but now I'm not sure if the upgrade to 1.24 is a red herring or not. I'm wondering if there is something depleting server resources and the long call to /Drupal\jsonapi\Controller\RequestHandler->handle and the subsequent slow query are what New Relic surfaces as a result. Unfortunately we do not have Blackfire running on our staging environment and New Relic does not seem to post any information from requests that time out.

Our next steps are to work with our hosting provider to try to diagnose the issue and further assess what we need to do to upgrade to 2.x.

wim leers’s picture

Thanks for digging deeper and keeping us in the loop, @pixelwhip! 🙏

wim leers’s picture

Category: Bug report » Support request

Demoting from Bug report to Support request for now, since all research points to this not being a bug, and it not even being a regression.

We treat support requests very seriously, so don't worry, we'll still be watching and responding :)

pixelwhip’s picture

Version: 8.x-1.24 » 8.x-2.1
Status: Postponed (maintainer needs more info) » Active
StatusFileSize
new100.38 KB

Ok, we've just finished refactoring in order to upgrade to 2.x. We're now experiencing similar performance regressions, after upgrading, that are bringing down our server with slow queries. Unfortunately we're still unable to reproduce this locally, it only happens reliably on our hosting provider's server. We've been banging our heads against the wall trying to pinpoint an exact cause. Perhaps someone can help make some sense out of this.

Attached is our slow query logs run through pt-query-digest.

My next step is to compare before and after queries for the same endpoint to see if that points to any major differences.

pixelwhip’s picture

Title: Severe performance degradation after upgrading to 1.24 when filtering on multiple relationships » Severe performance degradation after upgrading to 1.24 or 2.x when filtering on multiple relationships
wim leers’s picture

Darn :( That's really too bad.

My next step is to compare before and after queries for the same endpoint to see if that points to any major differences.

That sounds perfect! Thank you so much 🙏


On a related note: #2819335: Resource (entity) normalization should use partial caching makes responses 4x faster if the cache is warm. I'd love to solve the problem you reported at the root though. But I'd feel irresponsible if I didn't at least make you aware of that other issue. If your suspicions are correct, then that patch shouldn't make a material difference anyway: that is only accelerating the normalization of the results of the query.

pixelwhip’s picture

StatusFileSize
new7.35 KB
new3.99 KB

Thanks, Wim. I'm pretty sure the culprit is the the extra JOINS and search predicates added to check the published status of the referenced entities

I'm attaching example before (v1.23) and after (v.2.1) sql statements for very similar (not 100% exact) requests. The v1.23 is actually filtering with more entities. If these were exactly the same requests, the difference in MySQL queries would be even more exaggerated.

The difference is the 2.1 version has 27 more INNER JOIN/OUTER LEFT JOIN statements and 15 more search predicates (all published status checks). gabesullice had actually pointed at this as a potential issue when we first chatted about this.

Making the sql query in the 2.1 example directly to the DB actually causes MySQL to seize up.

wim leers’s picture

So then we're back to @gabesullice's conclusion in #6:

I confirmed that the issue was replicable and directly associated with 8.x-1.24 and that the performance regression was mitigated by adding the administer taxonomy permission. This tells me that the slow down is related to the additional conditions added by TemporaryQueryGuard because no conditions (on terms) would be added if the user had this permission. The bypass node access permission did not see to have a major impact.

A lot of extra joins, many of which are AFAICT unnecessary. I don't know MySQL's optimization capabilities well enough to know whether this would make a material difference.

Similarly, the vast majority of the new WHERE clauses is essentially saying the same thing: "give me only published nodes":

AND (node_field_data_3.status = '1')
AND (node_field_data_4.status = '1')
AND (node_field_data_5.status = '1')
AND (node_field_data_6.status = '1')
AND (node_field_data_7.status = '1')
AND (node_field_data_8.status = '1')

I don't know if MySQL optimizes these away.

This is definitely a cost introduced by the security fix in 1.24/2.0-rc4.


Perhaps we will need to make EntityFieldQuery smarter, to not perform unnecessary joins. OTOH, that's what a DB's query planner is supposed to do…


Could you run these queries with EXPLAIN? That will give us valuable information about how MySQL processes these.

pixelwhip’s picture

StatusFileSize
new7.62 KB
new3.29 KB

Sure! Attached are the EXPLAIN results. Here's a link to a spreadsheet as well.

https://docs.google.com/spreadsheets/d/1VX9buQ4LdbrY--ZNiamjA86qcm0e049Z...

wim leers’s picture

Thanks for the super fast response! 🙏

I've pinged a colleague who has more SQL query optimization experience. I think we have all the information we need for now!

josh waihi’s picture

There is something really off about the SQL query. For example, track the table alias node_field_data_3. It appears only twice. Once on the join:

INNER JOIN node_field_data node_field_data_3 ON node_field_data_3.nid = base_table.nid

And once in the WHERE clause

AND (node_field_data_3.status = '1')

This makes absolutely no sense to me. We're not trying to acquire a field from this table nor join it to a particular type of field. It is abusive for an application to expect the database to plan nonsense like this away (lets not forget Drupal has to do expect this of more than just MySQL).

I'd be interested to know how much this query is slowed down every time a fruitless join like this is added. Could you capture timings for various SQL queries removing a join each time? E.g.

  1. Time existing v2.1 query
  2. remove node_field_data_3 join and condition and time query
  3. remove node_field_data_4 join and condition and time query
  4. remove node_field_data_5 join and condition and time query
  5. etc

I'm keen to know if the slow down is linear or parabolic (are we multiplying time by the number of joins or is it to the power of the number of joins?)

Interesting to also point out that node_field_data_2 is the only node_field_data_* join with a proper condition on it (title LIKE '%sew%') and DOESN"T contain a status = 1 condition on it...

wim leers’s picture

Thank you so much, @Josh Waihi! 🙏

pixelwhip’s picture

Here are some differences running these queries locally. It's worth noting that my local environment is on MySQL 5.7 and doesn't seem to have issues even with the most complicated query. However our hosting environment is on MySQL 5.5 with MariaDB and doesn't handle the joins very well.

Joins and Status predicates removed Timing (Seconds)
0 0.83
-1 0.67
-2 0.61
-3 0.6
-4 0.55
-5 0.48
-6 0.44
-7 0.64
-8 0.64
-9 0.47
-10 0.42
-11 0.37
-12 0.33
-15 0.06

wim leers’s picture

>3.5 months have passed. Is this still a problem?

santerref’s picture

Same for me in #27 !

MySQL 5.7: 1.30s
MariaDB 10.4: 13.4s

Query:

filter[field_time_end][condition][path]: field_time_end
filter[field_time_end][condition][operator]: IS NULL
filter[field_user.uid]: 3
filter[field_project.id]: fd24d778-bdf3-4aee-8e34-66f0f55bf720

I only have 10 nodes in my Drupal website...

I hope we will find a solution to this issue !

Thanks,
santerref

Anonymous’s picture

I can confirm this behavior as well. At the moment I don't have great logs, but my setup is:

Drupal 8.7.5 using core JSON API module

My hosting provider is Pantheon, and I'm on their basic plan.

I see significant delays on queries that mirror the one posted in the original post. I'm seeing very long response times on queries filtering based on an entity field that allows for multiple values. I'm using subrequests along side it, and it was crashing my server. I pulled out the 4 queries in the subrequest and isolated the culprit, which was a query as I described.

I'll monitor this issue, and would love to help solve this as it creates poor UX from my application as it waits ~15-30 seconds for a fetch to finish.

Edit: I have my `mysqld-slow-query.log` file if that's helpful. And it appears pantheon uses MariaDB v 5.5

wim leers’s picture

#29 + #30: It'd be very helpful to see both the JSON:API URL and the resulting SQL query, just like @pixelwhip provided earlier in the issue.

pixelwhip’s picture

Sorry for the long delay on this issue. We had postponed the upgrade for a while as other priorities were taking precedence. I'm coming back to this now. We've made some changes to how our search interface generates filter query params in hopes that it would simplify the resulting SQL query. Unfortunately that didn't seem to have any impact. We are still seeing MariaDB queries on Pantheon that are taking 600-950 seconds, which obviously times out the request.

Here is an example JSON API request URL for 10 published future events, that happen at one of 3 locations (node ref) and are applicable to at least one of 3 audiences (term ref) ordered in ascending date order. This is a pretty typical search request for this application.

/jsonapi/node/event?filter%5Bstatus%5D%5Bvalue%5D=1&filter%5Bfield_event_designation%5D%5Bvalue%5D=events&filter%5Bdata%5D%5Bcondition%5D%5Bpath%5D=field_date_time.value&filter%5Bdata%5D%5Bcondition%5D%5Bvalue%5D=2019-08-06T03%3A59%3A59.999Z&filter%5Bdata%5D%5Bcondition%5D%5Boperator%5D=%3E&filter%5Bnode--location%5D%5Bcondition%5D%5Bpath%5D=field_location.id&filter%5Bnode--location%5D%5Bcondition%5D%5Bvalue%5D%5B%5D=4ced26ce-f316-40be-8988-5a0776fce65e&filter%5Bnode--location%5D%5Bcondition%5D%5Bvalue%5D%5B%5D=ce465283-9203-49da-a342-5ff62af65c31&filter%5Bnode--location%5D%5Bcondition%5D%5Bvalue%5D%5B%5D=2f900166-5d6e-4f46-a87a-916b4eacff16&filter%5Bnode--location%5D%5Bcondition%5D%5Boperator%5D=IN&filter%5Btaxonomy_term--audience%5D%5Bcondition%5D%5Bpath%5D=field_event_audience.id&filter%5Btaxonomy_term--audience%5D%5Bcondition%5D%5Bvalue%5D%5B%5D=96887d69-800d-4e91-b434-d3d5167985d3&filter%5Btaxonomy_term--audience%5D%5Bcondition%5D%5Bvalue%5D%5B%5D=002a9b40-2c35-4c47-b0bc-f75c9b873cd4&filter%5Btaxonomy_term--audience%5D%5Bcondition%5D%5Bvalue%5D%5B%5D=354e80c8-6902-4d0f-99f4-dbca85205b25&filter%5Btaxonomy_term--audience%5D%5Bcondition%5D%5Boperator%5D=IN&fields%5Bnode--event%5D=drupal_internal__nid%2Cstatus%2Ctitle%2Cpath%2Cfield_date_time%2Cfield_must_register%2Cfield_text_teaser%2Cregistration%2Cfield_event_audience%2Cfield_event_register_period%2Cfield_event_type%2Cfield_event_tags%2Cfield_location%2Cfield_event_designation%2Cfield_room%2Cevent_thumbnail&fields%5Bnode--room%5D=drupal_internal__nid%2Ctitle%2Cfield_location&page%5Blimit%5D=10&include=field_room&sort=field_date_time.value

This particular query took ~ 697s to complete

# Time: 190806 23:00:41
# User@Host: pantheon[pantheon] @  [##.###.#.#]
# Thread_id: 12  Schema: pantheon  QC_hit: No
# Query_time: 696.996844  Lock_time: 0.000723  Rows_sent: 11  Rows_examined: 3855
SET timestamp=1565132441;
SELECT base_table.vid AS vid, base_table.nid AS nid, min(node__field_date_time_2.field_date_time_value) AS expression
FROM
node base_table
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
INNER JOIN node__field_event_designation node__field_event_designation ON node__field_event_designation.entity_id = base_table.nid
INNER JOIN node__field_date_time node__field_date_time ON node__field_date_time.entity_id = base_table.nid
INNER JOIN node__field_location node__field_location ON node__field_location.entity_id = base_table.nid
LEFT OUTER JOIN node node ON node.nid = node__field_location.field_location_target_id
INNER JOIN node node_2 ON node_2.nid = node.nid
INNER JOIN node__field_event_audience node__field_event_audience ON node__field_event_audience.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data ON taxonomy_term_data.tid = node__field_event_audience.field_event_audience_target_id
INNER JOIN taxonomy_term_data taxonomy_term_data_2 ON taxonomy_term_data_2.tid = taxonomy_term_data.tid
INNER JOIN node_field_data node_field_data_2 ON node_field_data_2.nid = base_table.nid
INNER JOIN node_field_data node_field_data_3 ON node_field_data_3.nid = base_table.nid
INNER JOIN node_field_data node_field_data_4 ON node_field_data_4.nid = base_table.nid
INNER JOIN node__field_location node__field_location_2 ON node__field_location_2.entity_id = base_table.nid
LEFT OUTER JOIN node node_3 ON node_3.nid = node__field_location_2.field_location_target_id
INNER JOIN node_field_data node_field_data_5 ON node_field_data_5.nid = node_3.nid
INNER JOIN node_field_data node_field_data_6 ON node_field_data_6.nid = base_table.nid
INNER JOIN node__field_event_audience node__field_event_audience_2 ON node__field_event_audience_2.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_3 ON taxonomy_term_data_3.tid = node__field_event_audience_2.field_event_audience_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data ON taxonomy_term_field_data.tid = taxonomy_term_data_3.tid
INNER JOIN node_field_data node_field_data_7 ON node_field_data_7.nid = base_table.nid
INNER JOIN node_field_data node_field_data_8 ON node_field_data_8.nid = base_table.nid
LEFT JOIN node__field_date_time node__field_date_time_2 ON node__field_date_time_2.entity_id = base_table.nid
WHERE ((node_field_data.status = '1') and (node__field_event_designation.field_event_designation_value = 'events') and (node__field_date_time.field_date_time_value > '2019-08-06T03:59:59.999Z') and (node_2.uuid IN ('4ced26ce-f316-40be-8988-5a0776fce65e', 'ce465283-9203-49da-a342-5ff62af65c31', '2f900166-5d6e-4f46-a87a-916b4eacff16')) and (taxonomy_term_data_2.uuid IN ('96887d69-800d-4e91-b434-d3d5167985d3', '002a9b40-2c35-4c47-b0bc-f75c9b873cd4', '354e80c8-6902-4d0f-99f4-dbca85205b25'))) AND (node_field_data_2.status = '1') AND (node_field_data_3.status = '1') AND (node_field_data_4.status = '1') AND (node_field_data_5.status = '1') AND (node_field_data_6.status = '1') AND (taxonomy_term_field_data.status = '1') AND (node_field_data_7.status = '1') AND (node_field_data_8.type = 'event')
GROUP BY base_table.vid, base_table.nid
ORDER BY expression ASC
LIMIT 11 OFFSET 0;
gabesullice’s picture

@pixelwhip, I think I remember that we came to the conclusion that it might be MariaDB vs MySQL. #29 seems to add additional evidence for that theory.

Since we were having trouble replicating the issue locally, did you ever try to install MariaDB locally?

I found this StackOverflow issue which sounds very similar: https://stackoverflow.com/questions/35889706/mariadb-running-a-left-join...

@santerref, maybe you can try this ^

wim leers’s picture

🤯

cachesclay’s picture

I'm not sure if we were having this problem prior to upgrading to the versions in original post, but we're seeing a pretty severe timeout issue on requests with multiple filters, where all single filters work correctly. please let me know if I can provide any further information, this issue has been taking down our production server when the db fills up with hung processes, so any tips/ideas would be amazing.

(edited: note, this query works fine on my local, using 10.1.31-MariaDB)

| innodb_version          | 5.7.27                               |
| protocol_version        | 10                                   |
| slave_type_conversions  |                                      |
| version                 | 10.2.26-MariaDB-10.2.26+maria~jessie |

/jsonapi/node/organization?page[limit]=20&page[offset]=0&sort=title&include=image,organization_type,programs,programs.timing&filter[q0][condition][path]=program&filter[q0][condition][operator]=%3D&filter[q0][condition][value][0]=1&filter[q1][condition][path]=location.administrative_area&filter[q1][condition][operator]=%3D&filter[q1][condition][value][0]=TX&filter[q2][condition][path]=regions.tid&filter[q2][condition][operator]=IN&filter[q2][condition][value][0]=603&filter[q3][condition][path]=organization_type.tid&filter[q3][condition][operator]=IN&filter[q3][condition][value][0]=119&filter[q3][condition][value][1]=120&filter[q3][condition][value][2]=485&filter[q3][condition][value][3]=16&filter[q3][condition][value][4]=125&filter[q4][condition][path]=programs.certification_area.tid&filter[q4][condition][operator]=IN&filter[q4][condition][value][0]=201&filter[q4][condition][value][1]=205&filter[q4][condition][value][2]=236&filter[q4][condition][value][3]=231&filter[q4][condition][value][4]=200&filter[q4][condition][value][5]=199&filter[q5][condition][path]=programs.tuition_out_state&filter[q5][condition][operator]=BETWEEN&filter[q5][condition][value][0]=250&filter[q5][condition][value][1]=20050

SELECT base_table.vid AS vid, base_table.nid AS nid, min(node_field_data_7.title) AS expression
FROM
node base_table
INNER JOIN node__program node__program ON node__program.entity_id = base_table.nid
INNER JOIN node__location node__location ON node__location.entity_id = base_table.nid
INNER JOIN node__regions node__regions ON node__regions.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data ON taxonomy_term_data.tid = node__regions.regions_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data ON taxonomy_term_field_data.tid = taxonomy_term_data.tid
INNER JOIN node__organization_type node__organization_type ON node__organization_type.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_2 ON taxonomy_term_data_2.tid = node__organization_type.organization_type_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_2 ON taxonomy_term_field_data_2.tid = taxonomy_term_data_2.tid
INNER JOIN node__programs node__programs ON node__programs.entity_id = base_table.nid
LEFT OUTER JOIN field_collection_item field_collection_item ON field_collection_item.item_id = node__programs.programs_target_id
INNER JOIN field_collection_item__certification_area field_collection_item__certification_area ON field_collection_item__certification_area.entity_id = field_collection_item.item_id
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_3 ON taxonomy_term_data_3.tid = field_collection_item__certification_area.certification_area_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_3 ON taxonomy_term_field_data_3.tid = taxonomy_term_data_3.tid
LEFT OUTER JOIN field_collection_item field_collection_item_2 ON field_collection_item_2.item_id = node__programs.programs_target_id
INNER JOIN field_collection_item__tuition_out_state field_collection_item__tuition_out_state ON field_collection_item__tuition_out_state.entity_id = field_collection_item_2.item_id
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
INNER JOIN node_field_data node_field_data_2 ON node_field_data_2.nid = base_table.nid
INNER JOIN node__regions node__regions_2 ON node__regions_2.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_4 ON taxonomy_term_data_4.tid = node__regions_2.regions_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_4 ON taxonomy_term_field_data_4.tid = taxonomy_term_data_4.tid
INNER JOIN node_field_data node_field_data_3 ON node_field_data_3.nid = base_table.nid
INNER JOIN node__organization_type node__organization_type_2 ON node__organization_type_2.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_5 ON taxonomy_term_data_5.tid = node__organization_type_2.organization_type_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_5 ON taxonomy_term_field_data_5.tid = taxonomy_term_data_5.tid
INNER JOIN node_field_data node_field_data_4 ON node_field_data_4.nid = base_table.nid
INNER JOIN node__programs node__programs_2 ON node__programs_2.entity_id = base_table.nid
LEFT OUTER JOIN field_collection_item field_collection_item_3 ON field_collection_item_3.item_id = node__programs_2.programs_target_id
INNER JOIN field_collection_item__certification_area field_collection_item__certification_area_2 ON field_collection_item__certification_area_2.entity_id = field_collection_item_3.item_id
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_6 ON taxonomy_term_data_6.tid = field_collection_item__certification_area_2.certification_area_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_6 ON taxonomy_term_field_data_6.tid = taxonomy_term_data_6.tid
INNER JOIN node_field_data node_field_data_5 ON node_field_data_5.nid = base_table.nid
INNER JOIN node_field_data node_field_data_6 ON node_field_data_6.nid = base_table.nid
LEFT JOIN node_field_data node_field_data_7 ON node_field_data_7.nid = base_table.nid
WHERE ((node__program.program_value = '1') and (node__location.location_administrative_area = 'TX') and (taxonomy_term_field_data.tid IN ('603')) and (taxonomy_term_field_data_2.tid IN ('119', '120', '485', '16', '125')) and (taxonomy_term_field_data_3.tid IN ('201', '205', '236', '231', '200', '199')) and (field_collection_item__tuition_out_state.tuition_out_state_value BETWEEN '250' AND '20050')) AND (node_field_data.status = '1') AND (node_field_data_2.status = '1') AND (taxonomy_term_field_data_4.status = '1') AND (node_field_data_3.status = '1') AND (taxonomy_term_field_data_5.status = '1') AND (node_field_data_4.status = '1') AND (taxonomy_term_field_data_6.status = '1') AND (node_field_data_5.status = '1') AND (node_field_data_6.type = 'organization')
GROUP BY base_table.vid, base_table.nid
ORDER BY expression ASC
joelsteidl’s picture

I can definitely confirm that this is a MariaDB issue (10.0, 10.1 & 10.3...I didn't test on 10.2). I get all kinds of timeouts.

I pointed a Pantheon multidev instance to a SERIOUSLY slow database on Dreamhost shared hosting running MySQL 5.7 and it works fine.

Can I provide other details that would be helpful?

e0ipso’s picture

Thanks for providing more information! I'm sure this will be helpful to find a root cause.

I do not think there is an active push on this particular bug. We are waiting for a patch so we can review it. This is not a JSON:API specific issue, but an Entity Query API one. However JSON:API is more affected by it because its intense use of Entity Query API.

e0ipso’s picture

Project: JSON:API » Drupal core
Version: 8.x-2.1 » 8.8.x-dev
Component: Code » jsonapi.module
gabesullice’s picture

Title: Severe performance degradation after upgrading to 1.24 or 2.x when filtering on multiple relationships » Extremely poor performance when using MariaDB and filtering on multiple relationships
Priority: Normal » Major
Issue tags: +mariadb

Thanks for updating the issue @e0ipso! I agree with all of that.

Anonymous’s picture

FWIW, I migrated away from Pantheon as my provider because of this issue. MySQL 5.7 seems to hum through the extra joins for sure. At this point I just wouldn't recommend Pantheon as a hosting provider for a headless setup, I ended up spinning up a droplet which also gave me more fine grain control over Varnish and Redis at a better price point.

david strauss’s picture

FWIW, I migrated away from Pantheon as my provider because of this issue. MySQL 5.7 seems to hum through the extra joins for sure.

A difference between MySQL and MariaDB here is unlikely as the root cause, as MySQL and MariaDB have similar query planners and storage engines. What would indicate a difference unique to MariaDB would be differing results from EXPLAIN. It's possible, but I think it's unlikely. (Please post your EXPLAIN results on both MariaDB and MySQL if you do believe this is the case.)

What you are more likely seeing is something like the query cache being enabled on MySQL 5.7 but not MariaDB. Many platforms enable the query cache on MySQL, even on 5.7. This creates a superficial experience of better performance, but it's a bit of a Faustian bargain.

We disable the query cache on Pantheon for several reasons:

  • The query cache is deprecated in MySQL 5.7 and removed from MySQL 8.0.
  • The query cache is disabled by default in MariaDB 10.1.7 and later (and disabled in practice in deployments of some earlier releases, though some installations enable it even for current releases).
  • While the query cache appears to help under development conditions, it creates substantial risk in production.
  • Drupal 8 and later are designed to not lean on the query cache (in the sense of relying on slow queries with underlying tables that rarely change).

The side effect of disabling the query cache is that some queries will appear to run slowly with MariaDB even though MySQL (with the query cache enabled) may seem to "hum" through them. However, such setups are not sustainable with MySQL because of reliability issues that occur with the query cache under heavy loads (which is, ultimately, why the feature was deprecated and removed from MySQL).

Based on the multiple AND conditions that put constraints on the far side of JOINs, it's unlikely that either MySQL or MariaDB can run this query efficiently on a real-time basis. No plan over table-level indexes can even exist even in relational algebra theory to efficiently execute such a query.

Assuming optimizing the query isn't an option, the best choice is probably caching the results (or something generated from the results) in the Drupal object cache. That way, the module won't rely on deprecated MySQL behavior to maintain decent performance.

joelsteidl’s picture

@David Strauss thanks for jumping in.

Trying to troubleshoot the queries JSON:API is creating via Entity Query API on Pantheon, I actually spun up a MySQL 8.0 instance (and 5.7) and it works fine on both. I'm not saying the query cache isn't a factor, but no matter the MariaDB version I tried (Pantheon/Non-Pantheon) MariaDB times out with the query.

gabesullice’s picture

Thanks for chiming in @David Strauss! Your comment has me a bit confused though:

A difference between MySQL and MariaDB here is unlikely as the root cause... What you are more likely seeing is something like the query cache being enabled on MySQL 5.7 but not MariaDB.

If the default/common configuration of these two databases differ, then I think it's fair to say that the difference between them is a contributing factor of the "extremely poor performance" our users are experiencing. Please note the wording of the issue title; it's expressing a correlation not a causation.

This creates a superficial experience of better performance

🤔

The side effect of disabling the query cache is that some queries will appear to run slowly with MariaDB even though MySQL (with the query cache enabled) may seem to "hum" through them.

I think this is jumping to conclusions without sufficient data. What we know is that there is a difference between default MySQL and default MariaDB. What we don't know is that it's a result of a query cache. @joelsteidl points out that he has tried many version of both databases and consistently finds a difference between them.

When I was trying to replicate this with @pixelwhip some months ago, we did not observe extremely poor performance on the first execution of the query using MySQL and then "fast" performance on subsequent iterations. I am not a database expert, but I think this is some evidence that the query cache is unlikely to be the sole reason for the difference.

Assuming optimizing the query isn't an option, the best choice is probably caching the results (or something generated from the results) in the Drupal object cache. That way, the module won't rely on deprecated MySQL behavior to maintain decent performance.

This is already the case. Responses are cached by the dynamic page cache module, the page cache module and subsequently Varnish. This is an ineffective solution @pixelwhip and @joelsteidl's case because the URL query string is generated by a user-filterable listing (think Views exposed filters, but using React + JSON:API).


Please post your EXPLAIN results on both MariaDB and MySQL if you do believe this is the case

I agree that this is the best, next step.

joelsteidl’s picture

Please post your EXPLAIN results on both MariaDB and MySQL if you do believe this is the case.

I'd be happy to do this, but I honestly don't know the best way to capture that information. Any direction would be super helpful.

Joel

david strauss’s picture

I'd be happy to do this, but I honestly don't know the best way to capture that information. Any direction would be super helpful.

Happy to help! Here's how I run the analysis:

First, optimize the tables involved in the query. If you skip this, it's likely that the database server that had its data more recently "restored" or copied will have better statistics for the database's query cost planner, which is what chooses the strategy for running the query. Running the optimizer on the tables will ensure we're comparing apples-to-apples. (Also, sometimes this fixes the performance issue entirely.)

To optimize tables, one option is to run OPTIMIZE TABLE tbl_name for each involved table. Alternatively, you can use mysqlcheck to easily run it on all tables in the database. The mysqlcheck utility supports similar options to the normal mysql CLI.

Second, run an example of the slow query with the prefix "EXPLAIN" from the CLI. I usually like to end the command with "\G" instead of a semicolon to get nicer formatting for pasting here:

MariaDB [pantheon]> EXPLAIN SELECT u.*, COUNT(n.nid) FROM users u INNER JOIN node n ON n.uid = u.uid WHERE LENGTH(data) > 10 GROUP BY n.nid\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2621
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: n
         type: ref
possible_keys: uid
          key: uid
      key_len: 4
          ref: pantheon.u.uid
         rows: 16
        Extra: Using where; Using index
2 rows in set (0.041 sec)

I made that example query intentionally bad -- it has to examine each of the 2621 rows of "users" -- to show a bit of interesting data on the example report above.

joelsteidl’s picture

Thanks David!

Results attached. I did not run against MySQL 8.0 (only 5.7), but can if need be.

david strauss’s picture

The EXPLAIN results are enlightening -- and not for the reason I anticipated. The execution plans themselves are pretty similar between MariaDB and MySQL. The surprising part is that the EXPLAIN itself took over a minute on MariaDB. That isn't typical; EXPLAIN should run nearly instantaneously. (EXPLAIN doesn't do the work of running the query, only the work of creating a plan for doing so.) The only data I could find on slow EXPLAIN results was an old bug in MySQL for EXPLAIN on queries with subqueries, but you don't have any subqueries (and neither database you're testing should have the bug, anyway).

So, I experimented with the query to see if I could determine where things are tipping over. This module's machine-generated query has a number of peculiar attributes.

First, it JOINs to the "node_field_data" table eight times using the same ON condition (node_field_data_X.nid = base_table.nid, either directly or transitively) as well as the same WHERE conditions (node_field_data_X.status = '1'). The exception is that the eighth JOIN has a WHERE condition on the type (node_field_data_8.type = 'event').

If I switch the WHERE conditions on the redundant node_field_data JOINs to all place the conditions on the first node_field_data table, EXPLAIN drops from a minute to 1-3 seconds. If I drop the redundant JOINs as well, then the EXPLAIN time drops to milliseconds on MariaDB.

Other weirdness with the query includes:

  • A MIN() condition on a single field. The database may optimize this out, but I fear that it harms index use.
  • OUTER JOINs with cascading INNER JOINs that depend on them. That causes the OUTER JOINs to function as INNER JOINs, removing any performance benefit of OUTER JOINs not constraining the result set.
  • A number of tables get aliased to the same name, which is confusing to read when other JOINs create additional aliases from the same table. This has no performance impact, though.

I'm going to take a look at why this module is building queries this way. I believe MySQL 5.7+ may realize the funkiness of what's happening and "fix" it before running the query plan, but we ought to produce a better query in the first place.

david strauss’s picture

I fear this issue may be misfiled. I went to investigate the underlying query generation code on the site, and it's using the contrib JSON:API module, not core's. While the 2.x version of contrib JSON:API tries to match core's code (at least for new changes), this query seems to be generated by the 1.x version of contrib JSON:API. Somehow, I missed the reference in the initial issue description saying "after upgrading to 1.24" and its connection to the contrib module's version. I see references to similar issues on contrib 2.x, but that's still contrib's implementation, which may differ somewhat.

I'll keep digging in the hope that this issue seen with the contrib module corresponds to an issue in core.

joelsteidl’s picture

Someone on the JSON:API team can correct me, but I think in v1.24, the performance issue was introduced and that made it into 2.x and into CORE.

See https://www.drupal.org/sa-contrib-2018-081

The fix introduced additional queries to do proper access checking.

gabesullice’s picture

@David Strauss, @joelsteidl is correct.

This issue is present on both 1.x and 2.x version in contrib and the core version of JSON:API.

The highly irregular JOINs are added by the class Drupal\jsonapi\Access\TemporaryQueryGuard. If you'd like me to explain the reasoning for that code in private, I'd be happy to do that. It was added as part of the security release to which @joelsteidl provided a link.

That security release was committed to both the 1.x and 2.x versions of contrib. Later, the 2.x version of contrib JSON:API was committed in its entirety to Drupal core in 8.7 with only slight modifications (none of which should have affected this issue).

In effect, all versions of JSON:API in core or contrib risk suffering from this issue.

alex dicianu’s picture

We did some further digging using the method described in this article (https://pantheon.io/blog/profiling-mysql-queries-better-performance) to profile the query from #46 and as expected the majority of the time (~162 seconds or ~100% of the total execution time) was spent in the statistics phase - usually, an indicator that the server is spending a long time in the optimization phase where it tries to choose the right execution plan.

MariaDB [pantheon]> SELECT base_table.vid AS vid ....
+-------+------+---------------------+
| vid   | nid  | expression          |
+-------+------+---------------------+
| 11785 | 8142 | 2019-09-05T19:30:00 |
| 11786 | 8143 | 2019-10-03T19:30:00 |
+-------+------+---------------------+
2 rows in set (2 min 42.54 sec)

Query Profile:
+----------------------+------------+--------+-------+----------------+
| STATE                | Total_R    | Pct_R  | Calls | R/Call         |
+----------------------+------------+--------+-------+----------------+
| statistics           | 162.533811 | 100.00 |     1 | 162.5338110000 |
| Copying to tmp table |   0.000694 |   0.00 |     2 |   0.0003470000 |
| preparing            |   0.000267 |   0.00 |     1 |   0.0002670000 |
...

Keeping in mind that a join over n tables will have n-factorial combinations of join order to investigate, our query with 42 tables will have a … wait for it … 1.4e51 combinations!! (with 10 tables to join, the number of combinations would be ~3.6M; 42! is orders of magnitude higher).

Fortunately, MariaDB (and MySQL) offer us a couple of options to control the query plan execution, as per https://dev.mysql.com/doc/refman/5.5/en/controlling-query-plan-evaluation.html. I’ve chosen to play with optimizer_search_depth which tells the server how deep into the execution plan the optimizer should look.

On Pantheon, it’s currently set to 62 (the max number of joins allowed which is also the default option). In this particular case, this may be too high. Setting it to zero and running the query returned a result in 0.05 seconds (quite fast).

MariaDB [pantheon]> SET SESSION optimizer_search_depth = 0;
Query OK, 0 rows affected (0.00 sec)

MariaDB [pantheon]> SELECT SQL_NO_CACHE base_table.vid ...
+-------+------+---------------------+
| vid   | nid  | expression          |
+-------+------+---------------------+
| 11785 | 8142 | 2019-09-05T19:30:00 |
| 11786 | 8143 | 2019-10-03T19:30:00 |
+-------+------+---------------------+
2 rows in set (0.05 sec)

Profiling it again shows that the majority of the time is still spent in the statistics phase (about 95%), but that runs in only 0.049 seconds this time.

Query Profile:
+----------------------+----------+-------+-------+--------------+
| STATE                | Total_R  | Pct_R | Calls | R/Call       |
+----------------------+----------+-------+-------+--------------+
| statistics           | 0.049202 | 95.60 |     1 | 0.0492020000 |
| Copying to tmp table | 0.001269 |  2.47 |     2 | 0.0006345000 |
| optimizing           | 0.000193 |  0.38 |     1 | 0.0001930000 |
| starting             | 0.000184 |  0.36 |     1 | 0.0001840000 |

Increasing the optimizer_search_depth to 7 makes no difference in the query response time. Increasing it even more to 20 does affect performance and the query finishes in 4 min 26.25 sec.

It seems that other people have run into this issue before.

https://mariadb.com/resources/blog/setting-optimizer-search-depth-in-mysql/
https://www.percona.com/blog/2012/04/20/joining-many-tables-in-mysql-optimizer_search_depth/

I don’t think there’s an inherent problem with MariaDB in this case. I'm guessing newer versions of MySQL may come with different default settings which might be better in this specific case. As far as I’m aware the query optimizer is pretty much the same between MySQL and MariaDB. I do think that Drupal/jsonapi are abusing MariaDB with this many joins. The “real“ fix here as I see it is probably a combination of lowering the optimizer_search_depth and Drupal/jsonapi improving how that query is built.

david strauss’s picture

From @gabesullice:

If you'd like me to explain the reasoning for that code in private, I'd be happy to do that. It was added as part of the security release to which @joelsteidl provided a link.

I'd like to take a look, but the modifications to the JOINs I've suggested don't affect the query results. So, they shouldn't undermine any security hardening or permissions checking that's in place. It's fairly common for query builder systems to de-duplicate equivalent JOINs (or not create redundant ones in the first place).

In the mean time, the advice from @dicix is the most actionable path to reducing the excessive time spent in query execution planning caused by the excessive/redundant JOINs.

From @dicix:

I'm guessing newer versions of MySQL may come with different default settings which might be better in this specific case.

I looked into this, but it doesn't appear to be the case. The difference in behavior seems to come from elsewhere, but the other query plan optimizer settings don't make it obvious.

From @dicix:

As far as I’m aware the query optimizer is pretty much the same between MySQL and MariaDB.

This is true, and the query plans we saw from both MySQL and MariaDB are both very similar (and have indistinguishable execution time differences when run).

wim leers’s picture

  • #45: Thanks, @David Strauss, that's very helpful!
  • #52: Where did you post those suggestions for JSON:API code changes? I don't see them. I think the most effective way forward here is for you to to provide patch that does what you are recommending. (Note that JSON:API is not constructing those unnecessary JOINS, the Entity Query API is.)
david strauss’s picture

Where did you post those suggestions for JSON:API code changes?

I haven't written or posted such a change yet, and I think such changes need to go into the Entity Query API, anyway.

Note that JSON:API is not constructing those unnecessary JOINS, the Entity Query API is.

I noticed as much when diving into the code, but there's quite a bit of abstraction around the process. So far, I'm looking at \Drupal\Core\Entity\Query\Sql\Tables::ensureEntityTable(). Is that a sensible place to add protection against redundant JOINs?

joelsteidl’s picture

@Alex Dicianu thanks for all the helpful info in #51!

Based on your feedback, I set SET GLOBAL optimizer_search_depth=7; and the query is super fast now on MariaDB/Pantheon.

For anyone else on Pantheon or a host using MariaDB and you have poor performance with JSON:API, this seems like a reasonable work around.

  1. Login to mysql via CLI
  2. SET GLOBAL optimizer_search_depth=7; or whatever value you think is appropriate
pixelwhip’s picture

Thanks, everyone! After applying that setting from #51, our application seems to be running faster than it was pre-upgrade. This made for an awesome start to my day!

ghost of drupal past’s picture

*sheepishly* yeah Tables is not exactly optimized for performance. To be fair, it was hard enough to write it, I had some hopes it would get optimized for performance and I am real happy David is on it. Go David!

wim leers’s picture

Title: Extremely poor performance when using MariaDB and filtering on multiple relationships » \Drupal\Core\Entity\Query\Sql\Tables causes extremely poor performance when using MariaDB and filtering on multiple relationships in JSON:API
Component: jsonapi.module » entity system
Category: Support request » Task
Issue tags: +Entity Field API

#57: Thanks for confirming my suspicion that it was never really optimized for performance. It's totally understandable though :)

Thanks @David Strauss for diving deeper into this! 🥳🙏

joelsteidl’s picture

I hope this is not hi-jacking this thread too much, but I'd like to close the loop on the workaround I posted yesterday. A Pantheon engineer mentioned that doing SET GLOBAL optimizer_search_depth=7; has the potential to not persist across certain server events. It seems like something like this could work, but I wanted to get some input.

In settings.php

$databases['default']['default']['init_commands']['optimizer_search_depth'] = 'SET SESSION optimizer_search_depth = 7';

Is setting SESSION there what I really want to do?

I REALLY appreciate all the help on this issue.

david strauss’s picture

A Pantheon engineer mentioned that doing SET GLOBAL optimizer_search_depth=7; has the potential to not persist across certain server events. It seems like something like this could work, but I wanted to get some input.

They are correct, and the workaround you specified for settings.php appears to be the right way to ensure it's always set for JSON:API's database access. Given the performance difference you're seeing, it should be clear whether the settings.php change is in effect if you run SET GLOBAL to reset the global setting back to near the default (approximately 50).

Is setting SESSION there what I really want to do?

It should be just fine, as the code snippit will cause Drupal to request that setting for every connection it establishes. The places it won't be in effect will be the MySQL CLI and other non-Drupal connections to the database. Given that the problem query is coming from Drupal, that should be fine.

For the longer term, we're going to look into rolling a similar configuration change platform-wide on Pantheon. I just need to understand more about why the optimizer search depth is set so high by default (and whether setting it lower has consequences we need to consider).

david strauss’s picture

Hi ChX! I'll start a fresh comment to talk about optimizing the generated query.

The easiest way to optimize JOINs is if we know we don't need to JOIN to the same foreign table more than once per query (at least per unique ON condition). If that's the case, then it's usually possible to avoid creating the redundant JOINs in the first place.

However, this could get complicated if there's a need to JOIN with the same foreign table more than once with the same ON condition but different WHERE conditions. If that's necessary, it's impossible to know with just the JOIN data whether the JOINs are redundant. So, if we need to JOIN to the same foreign table multiple times with the same ON condition but different WHERE conditions, we would need to prune the JOINs with identical ON and WHERE conditions.

Technically, the query we're discussing falls under the latter category, but I'd like to check that it's actually necessary to support that.

cachesclay’s picture

So many thank you's to everyone in this thread, this has been such a stressful ongoing issue and to know what it is and have a fix is incredible.

joelsteidl’s picture

Thanks @David Strauss!

Confirming that the settings.php fix from #59 works well.

I'll be interested to hear if lowering that setting in general has any positive performance impact.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

weri’s picture

We expecting the same problem as soon as we ad filters to multiple terms related to a node and call it as anonymous.

?filter[langcode]=en
&filter[field_industries.field_industry_code]=mem
&filter[field_countries.field_country_iso_code]=USA

After some debugging and before I found this issue, I saw the multiple (unnecessary) joins in the database query. The most of them coming from applying TemporaryQueryGuard::applyAccessControls().

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM
node base_table
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
INNER JOIN node__field_industries node__field_industries ON node__field_industries.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data ON taxonomy_term_data.tid = node__field_industries.field_industries_target_id
INNER JOIN taxonomy_term__field_industry_code taxonomy_term__field_industry_code ON taxonomy_term__field_industry_code.entity_id = taxonomy_term_data.tid
INNER JOIN node__field_countries node__field_countries ON node__field_countries.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_2 ON taxonomy_term_data_2.tid = node__field_countries.field_countries_target_id
INNER JOIN taxonomy_term__field_country_iso_code taxonomy_term__field_country_iso_code ON taxonomy_term__field_country_iso_code.entity_id = taxonomy_term_data_2.tid
INNER JOIN node_field_data node_field_data_2 ON node_field_data_2.nid = base_table.nid
INNER JOIN node_field_data node_field_data_3 ON node_field_data_3.nid = base_table.nid
INNER JOIN node__field_industries node__field_industries_2 ON node__field_industries_2.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_3 ON taxonomy_term_data_3.tid = node__field_industries_2.field_industries_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data ON taxonomy_term_field_data.tid = taxonomy_term_data_3.tid
INNER JOIN node_field_data node_field_data_4 ON node_field_data_4.nid = base_table.nid
INNER JOIN node__field_countries node__field_countries_2 ON node__field_countries_2.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_4 ON taxonomy_term_data_4.tid = node__field_countries_2.field_countries_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_2 ON taxonomy_term_field_data_2.tid = taxonomy_term_data_4.tid
INNER JOIN node_field_data node_field_data_5 ON node_field_data_5.nid = base_table.nid
INNER JOIN node_field_data node_field_data_6 ON node_field_data_6.nid = base_table.nid
WHERE ((node_field_data.status = '1') and (node_field_data.langcode = 'en') and (taxonomy_term__field_industry_code.field_industry_code_value LIKE 'mem' ESCAPE '\\') and (taxonomy_term__field_country_iso_code.field_country_iso_code_value LIKE 'USA' ESCAPE '\\')) AND (node_field_data_2.status = '1') AND (node_field_data_3.status = '1') AND (taxonomy_term_field_data.status = '1') AND (node_field_data_4.status = '1') AND (taxonomy_term_field_data_2.status = '1') AND (node_field_data_5.status = '1') AND (node_field_data_6.type = 'article') AND (EXISTS (SELECT na.nid AS nid
FROM
node_access na
WHERE (((gid IN ('0')) AND (realm = 'all')) OR ((gid IN ('1')) AND (realm = 'view_unpublished_published_content'))) AND (na.grant_view >= '1') AND (na.fallback = '1') AND ((base_table.nid = na.nid))))
GROUP BY base_table.vid, base_table.nid

The main problem of the many joins, specially when you have a lot of translations and revisions, the un-grouped result returns in our case over 1 million rows and grouped just 30. The database produces a cartesian product over multiple tables.

As I set the optimizer_search_depth to 3 the performance was quite good, but not in all cases.

I hope there is a way that for the same table only one join is created. When I optimize the SQL query directly, that we have only one join to the node_field_data table, the query is blazing fast.

fakir22’s picture

Thanks @David Strauss!

Confirming that the settings.php fix from #59 works well.

I'll be interested to hear if lowering that setting in general has any positive performance impact.

@joelsteidl Thanks for the solution in #59 , it worked for us too.

Regarding your question about lowering the setting for optimizer_search_depth : we didn't set the value to 7 but instead we set it to 0 as the MariaDB documentation states :

If set to 0, the system will automatically pick a reasonable value.

After running some tests, set optimizer_search_depth to 0 seems to offer best performances for us.
But as stated here , a settings between 7 and 8 fits 99% of all cases.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

nils.destoop’s picture

StatusFileSize
new4.76 KB

We are also experiencing similar issues with one of our sites. The issue is related with the TemporaryQueryGuard, it iterates over all the properties, and all possible relations. But it does not take into account that the status condition can already be set by previous iterations.

Attached a patch that fixes already a lot for us (drupal 8.9), but it's still not perfect. We are still having duplicate joins, because the api is already filtering default on status TRUE. With the patch, our load time decreased from 1.2min to 7s on cold cache.
When I remove the extra filter on status, the load time goes to <2 secs

wim leers’s picture

@nils.destoop That makes a ton of sense, thank you! 🙏 Can't wait for @gabesullice's review of this! 🤓

nils.destoop’s picture

Same patch on another project lowered the response time from 6-9 seconds to 125 - 300ms. We are using a custom version where we are already marking node as secured, because status is default already added.

skyredwang’s picture

@nils.destoop I applied the #69 patch on Core 8.9.6, and profiled /jsonapi/commerce_order/default endpoint with a few filters on MariaDB 10.3.13 and MariaDB 10.4.10, but I still see the crazy joins below:

SELECT base_table.order_id AS order_id, base_table.order_id AS base_table_order_id, commerce_order_?.placed AS placed
FROM
commerce_order base_table
INNER JOIN commerce_order commerce_order ON commerce_order.order_id = base_table.order_id
INNER JOIN commerce_order commerce_order_? ON commerce_order_?.order_id = base_table.order_id
LEFT JOIN commerce_order commerce_order_? ON commerce_order_?.order_id = base_table.order_id
LEFT JOIN commerce_order commerce_order_? ON commerce_order_?.order_id = base_table.order_id
WHERE (commerce_order.state NOT IN (:db_condition_placeholder_?, :db_condition_placeholder_?)) AND (commerce_order_?.type = :db_condition_placeholder_?) AND (commerce_order_?.uid LIKE :db_condition_placeholder_? ESCAPE ?)
ORDER BY commerce_order_?.placed DESC
LIMIT ? OFFSET ?

Please let me know if there is additional debugging information that might help this issue.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

nils.destoop’s picture

@skyredwang The patch does idd not remove all the crazy joins. It only limits the amount of joins. My patch fixes multiple joins that use the same status condition (published terms / nodes / ...). Your remaining joins seems to be not related with the TemporaryQueryGuard.
The patch will also not prevent a duplicate join if you are already filtering on the status via a filter in the querystring. For our projects, we are using a custom patch that already marks node as 'guarded'.

The real issue is probably #2875033: Optimize joins and table selection in SQL entity query implementation

cracu’s picture

I used optimizer_search_depth approach with values between 0 and 7.
It solved our issues initially, but once the editors entered more content even this approach was not enough.
Additionally I managed to get rid of some joins related to taxonomy, by implementing :

hook_jsonapi_entity_filter_access()
e.g.

function my_module_jsonapi_entity_filter_access(\Drupal\Core\Entity\EntityTypeInterface $entity_type, \Drupal\Core\Session\AccountInterface $account) {
  if ($entity_type->id() == 'taxonomy_term') {
      return ([
        JSONAPI_FILTER_AMONG_ALL => AccessResult::allowedIfHasPermission($account, 'access content'),
      ]);
  }
}

In our case... we assumed that all taxonomies are published.

And still, if we wanted to do a semi-complex filter like:

Select nodes of type document
where
(filter_taxonomy_field1 IN (value1, value2, value3) OR filter_taxonomy_field1 IS NULL) AND
(filter_taxonomy_field2 = value_filter2 OR field_taxonomy_field2 IS NULL) AND
(filter_taxonomy_field3 = value_filter3 OR field_taxonomy_field3 IS NULL)

then, for every filter (node field) in place we get an additional join to node__field_data and status is checked multiple times. Patch posted above might help with these joins, still, this is not the main issue.

Needless to say that if we have 200 possible terms for filter_taxonomy_field1, 30 terms for filter_taxonomy_field2 and 20 terms for filter_taxonomy_field3 the query itself breaks mysql or it remain hanging.

Before the group by vid, nid is applied, it has to parse 1-2 million rows to get in the end to just 4 possible combinations.
Group By is clearly not the fastest solution here and I wonder why we don't have a SELECT DISTINCT in place, which will greatly improve the speed.

so... from an initial query like:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM
node base_table
INNER JOIN node__field_color node__field_color ON node__field_color.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data ON taxonomy_term_data.tid = node__field_color.field_color_target_id
INNER JOIN taxonomy_term__field_color taxonomy_term__field_color ON taxonomy_term__field_color.entity_id = taxonomy_term_data.tid
INNER JOIN node__field_type node__field_type ON node__field_type.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_2 ON taxonomy_term_data_2.tid = node__field_type.field_type_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data ON taxonomy_term_field_data.tid = taxonomy_term_data_2.tid
INNER JOIN node__field_category node__field_category ON node__field_category.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_3 ON taxonomy_term_data_3.tid = node__field_category.field_category_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_2 ON taxonomy_term_field_data_2.tid = taxonomy_term_data_3.tid
INNER JOIN node__field_color node__field_color_2 ON node__field_color_2.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_4 ON taxonomy_term_data_4.tid = node__field_color_2.field_color_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_3 ON taxonomy_term_field_data_3.tid = taxonomy_term_data_4.tid
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
INNER JOIN node__field_type node__field_type_2 ON node__field_type_2.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_5 ON taxonomy_term_data_5.tid = node__field_type_2.field_type_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_4 ON taxonomy_term_field_data_4.tid = taxonomy_term_data_5.tid
INNER JOIN node_field_data node_field_data_2 ON node_field_data_2.nid = base_table.nid
INNER JOIN node__field_category node__field_category_2 ON node__field_category_2.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_6 ON taxonomy_term_data_6.tid = node__field_category_2.field_category_target_id
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data_5 ON taxonomy_term_field_data_5.tid = taxonomy_term_data_6.tid
INNER JOIN node_field_data node_field_data_3 ON node_field_data_3.nid = base_table.nid
INNER JOIN node_field_data node_field_data_4 ON node_field_data_4.nid = base_table.nid
WHERE ((taxonomy_term__field_color.field_color_value LIKE 'RED' ESCAPE '\\') and (taxonomy_term_field_data.name LIKE 'TYPE A' ESCAPE '\\') and (taxonomy_term_field_data_2.name IN ('CATEGORY 1', 'CATEGORY 2'))) AND (taxonomy_term_field_data_3.status = 1) AND (node_field_data.status = 1) AND (taxonomy_term_field_data_4.status = 1) AND (node_field_data_2.status = 1) AND (taxonomy_term_field_data_5.status = 1) AND (node_field_data_3.status = 1) AND (node_field_data_4.type = 'content_type_machine_name')
GROUP BY base_table.vid, base_table.nid
LIMIT 51 OFFSET 0

we applied hook_jsonapi_entity_filter_access()

we got:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM
node base_table
LEFT JOIN node__field_color node__field_color ON node__field_color.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data ON taxonomy_term_data.tid = node__field_color.field_color_target_id
LEFT JOIN taxonomy_term__field_color taxonomy_term__field_color ON taxonomy_term__field_color.entity_id = taxonomy_term_data.tid
LEFT JOIN node__field_type node__field_type ON node__field_type.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_2 ON taxonomy_term_data_2.tid = node__field_type.field_type_target_id
LEFT JOIN taxonomy_term_field_data taxonomy_term_field_data ON taxonomy_term_field_data.tid = taxonomy_term_data_2.tid
LEFT JOIN node__field_category node__field_category ON node__field_category.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_3 ON taxonomy_term_data_3.tid = node__field_category.field_category_target_id
LEFT JOIN taxonomy_term_field_data taxonomy_term_field_data_2 ON taxonomy_term_field_data_2.tid = taxonomy_term_data_3.tid
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
INNER JOIN node_field_data node_field_data_2 ON node_field_data_2.nid = base_table.nid
INNER JOIN node_field_data node_field_data_3 ON node_field_data_3.nid = base_table.nid
INNER JOIN node_field_data node_field_data_4 ON node_field_data_4.nid = base_table.nid
WHERE (((taxonomy_term__field_color.field_color_value LIKE 'RED' ESCAPE '\\') or (node__field_color.field_color_target_id IS NULL)) and ((taxonomy_term_field_data.name LIKE 'TYPE A' ESCAPE '\\') or (node__field_type.field_type_target_id IS NULL)) and ((taxonomy_term_field_data_2.name IN  ('CATEGORY 1', 'CATEGORY 2'))) or (node__field_category.field_category_target_id IS NULL))) AND (node_field_data.status = '1') AND (node_field_data_2.status = '1') AND (node_field_data_3.status = '1') AND (node_field_data_4.type = 'content_type_machine_name')
GROUP BY base_table.vid, base_table.nid
LIMIT 51 OFFSET 0

and the last bit was altering the query and removing group by and using distinct and the query became:

SELECT DISTINCT base_table.vid AS vid, base_table.nid AS nid
FROM
node base_table
LEFT JOIN node__field_color node__field_color ON node__field_color.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data ON taxonomy_term_data.tid = node__field_color.field_color_target_id
LEFT JOIN taxonomy_term__field_color taxonomy_term__field_color ON taxonomy_term__field_color.entity_id = taxonomy_term_data.tid
LEFT JOIN node__field_type node__field_type ON node__field_type.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_2 ON taxonomy_term_data_2.tid = node__field_type.field_type_target_id
LEFT JOIN taxonomy_term_field_data taxonomy_term_field_data ON taxonomy_term_field_data.tid = taxonomy_term_data_2.tid
LEFT JOIN node__field_category node__field_category ON node__field_category.entity_id = base_table.nid
LEFT OUTER JOIN taxonomy_term_data taxonomy_term_data_3 ON taxonomy_term_data_3.tid = node__field_category.field_category_target_id
LEFT JOIN taxonomy_term_field_data taxonomy_term_field_data_2 ON taxonomy_term_field_data_2.tid = taxonomy_term_data_3.tid
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
INNER JOIN node_field_data node_field_data_2 ON node_field_data_2.nid = base_table.nid
INNER JOIN node_field_data node_field_data_3 ON node_field_data_3.nid = base_table.nid
INNER JOIN node_field_data node_field_data_4 ON node_field_data_4.nid = base_table.nid
WHERE (((taxonomy_term__field_color.field_color_value LIKE 'RED' ESCAPE '\\') or (node__field_color.field_color_target_id IS NULL)) and ((taxonomy_term_field_data.name LIKE 'TYPE A' ESCAPE '\\') or (node__field_type.field_type_target_id IS NULL)) and ((taxonomy_term_field_data_2.name IN  ('CATEGORY 1', 'CATEGORY 2'))) or (node__field_category.field_category_target_id IS NULL))) AND (node_field_data.status = '1') AND (node_field_data.type = 'content_type_machine_name')
LIMIT 51 OFFSET 0

Once we got to this query... SET SESSION optimizer_search_depth = 7 was not needed anymore and it needed around 0.2 sec to execute.

Alteration of query was quite direct. We wanted to target only jsonapi queries, but I don't really know if that is possible other than looking for pager_size metadata.

This might be handy for those who lost many hours on this:

/**
 * Implements hook_query_alter().
 */
function my_module_query_alter(Drupal\Core\Database\Query\AlterableInterface $query) {
  $pager_size = $query->getMetaData('pager_size');
  if ($pager_size) {
    $query->distinct();
    $groupBy = &$query->getGroupBy();
    $groupBy = [];
  }
}

!!! Distinct approach will not work when SORT is involved.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

marysmech made their first commit to this issue’s fork.

marysmech’s picture

#59 seems to fix the problem. I use optimizer_search_depth = 0 and queries take only few ms (previously timeouted after 30s).

attheshow’s picture

It looks like the patch in #69 is no longer applying for Drupal 10.3.

eliaspapa’s picture

I also get an issue with applying the patch after updating to 10.3. Any updated patches available?

eliaspapa’s picture

After carefully investigating the changes made to the file with the updates it would seem that the patch has been added into it so there is seemingly no need for it anymore.

attheshow’s picture

@EliasPapa Where are you seeing that the issue has been addressed? We're not seeing that anywhere in the code for 10.3.

attheshow’s picture

StatusFileSize
new4.37 KB

This is a re-roll of patch #69 for Drupal 10.3.

andypost’s picture

Status: Active » Needs work

For MR to 11.x

driskell’s picture

The patch in #87 / #69 looks like it would disable access checks on a node if there was an entity reference to another node, as it would only apply access check once per entity type. Which could be a security issue. Might need some work.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.