Fix IndexTidDepth views argument plugin and TaxonomyIndexTidDepth views filter plugin performance [#2133215]

Problem/Motivation

This issue is to propose two significant optimizations to \Drupal\taxonomy\Plugin\views\argument\IndexTidDepth and \Drupal\taxonomy\Plugin\views\filter\TaxonomyIndexTidDepth.

the subselect the code generates is a multi-level left-join which is inefficient on MySQL because it cannot make use of indexes
in particular, the left join generates many duplicates
it generates a WHERE ... IN ( subselect ) clause, which is inefficient on MySQL if the subselect returns anything but a trivially small result set

Here is the complete query HEAD generates for a single TID query with a depth of 3.

SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' WHERE (((node__field_promote_states.field_promote_states_value IS NULL)) AND (node_field_data.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term__parent th ON th.entity_id = tn.tid LEFT OUTER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE (tn.tid = '353') OR (th1.entity_id = '353')))) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid    |
+-------------------------+--------+
|              1594378800 | 110686 |
|              1594360800 | 110446 |
|              1593946800 | 110321 |
|              1593342000 | 109946 |
|              1593241200 | 109896 |
|              1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.49 sec)

Proposed resolution

The patch changes the code in two aspects:

taxonomy hierarchy subselect with union of specific inner joins instead of left join

Existing code

The existing code generates subselect following this pattern (tid 353, +3 levels):

explain SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term__parent th ON th.entity_id = tn.tid LEFT OUTER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id LEFT OUTER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id WHERE (tn.tid = '353') OR (th1.entity_id = '353') OR (th2.entity_id = '353');
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref                          | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
|  1 | SIMPLE      | tn    | NULL       | index | term_node     | term_node | 14      | NULL                         | 124367 |   100.00 | Using index              |
|  1 | SIMPLE      | th    | NULL       | ref   | PRIMARY       | PRIMARY   | 4       | instyle.tn.tid               |      1 |   100.00 | NULL                     |
|  1 | SIMPLE      | th1   | NULL       | ref   | PRIMARY       | PRIMARY   | 4       | instyle.th.parent_target_id  |      1 |   100.00 | NULL                     |
|  1 | SIMPLE      | th2   | NULL       | ref   | PRIMARY       | PRIMARY   | 4       | instyle.th1.parent_target_id |      1 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+

The code in head has to scan all of the taxonomy_index rows.

New code

The patch generates much more efficient code using a UNION of specific inner joins, the replacement for the example above is:

explain SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id INNER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id WHERE th2.entity_id = '353';
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
| id | select_type  | table        | partitions | type | possible_keys            | key              | key_len | ref                   | rows | filtered | Extra                    |
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
|  1 | PRIMARY      | tn           | NULL       | ref  | PRIMARY,term_node,bundle | term_node        | 4       | const                 |  102 |   100.00 | Using index              |
|  2 | UNION        | th1          | NULL       | ref  | PRIMARY                  | PRIMARY          | 4       | const                 |    1 |   100.00 | Using index              |
|  2 | UNION        | th           | NULL       | ref  | PRIMARY,parent_target_id | parent_target_id | 4       | const                 |    5 |   100.00 | Using where; Using index |
|  2 | UNION        | tn           | NULL       | ref  | term_node                | term_node        | 4       | instyle.th.entity_id  |   17 |   100.00 | Using index              |
|  3 | UNION        | th2          | NULL       | ref  | PRIMARY                  | PRIMARY          | 4       | const                 |    1 |   100.00 | Using index              |
|  3 | UNION        | th1          | NULL       | ref  | PRIMARY,parent_target_id | parent_target_id | 4       | const                 |    5 |   100.00 | Using where; Using index |
|  3 | UNION        | th           | NULL       | ref  | PRIMARY,parent_target_id | parent_target_id | 4       | instyle.th1.entity_id |  223 |   100.00 | Using index              |
|  3 | UNION        | tn           | NULL       | ref  | term_node                | term_node        | 4       | instyle.th.entity_id  |   17 |   100.00 | Using index              |
| NULL | UNION RESULT | <union1,2,3> | NULL       | ALL  | NULL                     | NULL             | NULL    | NULL                  | NULL |     NULL | Using temporary          |
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+

Far fewer rows are scanned and indexes are used.

using an INNER join instead of a WHERE IN (...) for the taxonomy filter

if the outer query constructed by views is inefficient (as can easily be the case with any non-trivial LEFT-join), the query as a whole will still remain inefficient because the efficient subselect is used in a WHERE IN (...) which MySQL only uses as a filter on the outer query result.

This can be made much more efficient by using an INNER join. Here's an example for how the patch will change the SQL (reformatted for readability again)

Existing code + the above fix

SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' WHERE (((node__field_promote_states.field_promote_states_value IS NULL)) AND (node_field_data.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353'))) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid    |
+-------------------------+--------+
|              1594378800 | 110686 |
|              1594360800 | 110446 |
|              1593946800 | 110321 |
|              1593342000 | 109946 |
|              1593241200 | 109896 |
|              1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.08 sec)

New code

SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' INNER JOIN (SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353') node_taxonomy_depth ON node_taxonomy_depth.nid = node_field_data.nid WHERE ((node__field_promote_states.field_promote_states_value IS NULL)) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid    |
+-------------------------+--------+
|              1594378800 | 110686 |
|              1594360800 | 110446 |
|              1593946800 | 110321 |
|              1593342000 | 109946 |
|              1593241200 | 109896 |
|              1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.01 sec)

EXPLAIN is not so useful for this one because the join via query makes for complicated reading - but you can clearly see the performance improvement when running the query.

Overall gains

Use MySQL profiling we can compare the original query versus the final query...

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                                                                                                                                                                                        |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|        1 | 0.49241525 | SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.n |
|        2 | 0.00789575 | SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid
FROM
node_field_data node_field_data
LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid
LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.n |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

This shows that original query took 0.49241525 seconds to execute and the new query takes 0.00789575 seconds. This is 98% decrease in the time take to run the query!

Performance testing notes

node_field_data contains 34299 rows
taxonomy_index contains 129814 rows
taxonomy_term__parent contains 6712 rows
node__field_promote_states contains 96159 rows

Remaining tasks

User interface changes

N/a

API changes

N/a

Data model changes

N/a

Release notes snippet

Views that use either the taxonomy depth filter or taxonomy depth argument are massively more performant.

Comment	File	Size	Author
#54	2133215.53_54.interdiff.txt	7.37 KB	dww
#54	2133215-54.patch	23.9 KB	dww
#54	9.3.x: PHP 8.0 & MySQL 5.7 28,844 pass
#53	2133215-53.patch	24.05 KB	alexpott
#53	9.3.x: PHP 8.0 & MySQL 5.7 28,788 pass
#53	50-53-interdiff.txt	1.39 KB	alexpott
#50	2133215-50.patch	23.92 KB	alexpott
#50	9.3.x: PHP 8.0 & MySQL 5.7 28,779 pass
#50	47-50-interdiff.txt	5.21 KB	alexpott
#47	2133215-47.patch	23.92 KB	alexpott
#47	9.3.x: PHP 8.0 & MySQL 5.7 28,765 pass
#47	42-47-interdiff.txt	3.33 KB	alexpott
#42	2133215-42.patch	22.78 KB	alexpott
#42	9.3.x: PHP 8.0 & MySQL 5.7 28,729 pass PHP 7.3 & MariaDB 10.3.22 28,735 pass PHP 7.3 & PostgreSQL 10.12 28,729 pass PHP 7.3 & SQLite 3.28 28,729 pass
#42	2133215-8.9.x-42.patch	25.42 KB	alexpott
#42	8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
#42	36-42-interdiff.txt	4.77 KB	alexpott
#40	small-site-graph.png	383.59 KB	alexpott
#39	taxo_plugin_perf.php_.txt	4.61 KB	alexpott
#39	head_performance graph.png	414.41 KB	alexpott
#39	performance graph.png	370.03 KB	alexpott
#36	2133215-36.patch	26.46 KB	alexpott
#36	9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass
#36	31-36-interdiff.txt	1.34 KB	alexpott
#31	2133215-23.patch	25.83 KB	alexpott
#31	9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass
#26	2133215-8.9.x-26.patch	28.48 KB	alexpott
#26	8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
#26	25-26-interdiff.txt	645 bytes	alexpott
#25	2133215-8.9.x-25.patch	28.49 KB	alexpott
#25	8.9.x: PHP 7.3 & MySQL 5.7 28,635 pass, 10 fail
#23	2133215-23.patch	25.83 KB	alexpott
#23	9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass PHP 7.3 & PostgreSQL 10.12 28,725 pass PHP 7.3 & SQLite 3.28 28,725 pass
#23	18-23-interdiff.txt	2.48 KB	alexpott
#18	2133215-18.patch	25.84 KB	alexpott
#18	9.3.x: PHP 8.0 & MySQL 5.7 28,682 pass
#18	16-18-interdiff.txt	4.78 KB	alexpott
#16	2133215-16.patch	25.86 KB	alexpott
#16	9.3.x: PHP 8.0 & MySQL 5.7 28,682 pass
#16	13-16-interdiff.txt	3.79 KB	alexpott
#14	2133215-8.9.x-14.patch	26.64 KB	alexpott
#14	8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
#13	2133215-13.patch	23.85 KB	alexpott
#13	9.3.x: PHP 8.0 & MySQL 5.7 28,678 pass, 1 fail
#13	9-13-interdiff.txt	8.74 KB	alexpott
#12	11-12-interdiff.txt	1.27 KB	alexpott
#12	2133215-8.9.x-12.patch	23.33 KB	alexpott
#12	8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
#11	2133215-8.9.x-11.patch	22.07 KB	alexpott
#11	8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
#9	2133215-9.patch	20.54 KB	alexpott
#9	9.3.x: PHP 8.0 & MySQL 5.7 28,641 pass
#9	8-9-interdiff.txt	3.12 KB	alexpott
#8	2133215-8.patch	20.85 KB	alexpott
#8	9.3.x: PHP 8.0 & MySQL 5.7 28,641 pass
#8	6-8-interdiff.txt	8.63 KB	alexpott
#6	2133215-6.patch	16.94 KB	alexpott
#6	9.3.x: PHP 8.0 & MySQL 5.7 28,639 pass, 1 fail
#2	term-hierarchy.php_.txt	1.66 KB	dawehner
#1	0001-Issue-2133215-generate-efficient-SQL.patch	3.56 KB	uplex_slink
#1

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

uplex_slink CreditAttribution: uplex_slink commented 11 November 2013 at 14:16

Status:

Active

» Needs review

File	Size
0001-Issue-2133215-generate-efficient-SQL.patch	3.56 KB

Comment #2

dawehner

German

CreditAttribution: dawehner commented 16 November 2014 at 12:48

File	Size
term-hierarchy.php_.txt	1.66 KB

I totally believe that these kind of queries could get quite slow, but the question is whether its worth to change
the behaviour by default.

I tried to do some basic benchmarking setting up the content using the following script, which was still pretty fast.
How many entries do you have in the taxonom_index and term tables?

+++ b/modules/taxonomy/views_handler_argument_term_node_tid_depth.inc
@@ -107,31 +107,56 @@ class views_handler_argument_term_node_tid_depth extends views_handler_argument
+     *
+     * TODO: is there a cleaner way to do this?
+     */

Regarding this, I don't really know a much cleaner way to do it, table_queue is considered actually to be a internal data structure, so there was never really a reason to change it.

Comment #3

chr.fritsch

he/him

German

🇩🇪🇪🇺🌍

CreditAttribution: chr.fritsch commented 19 November 2014 at 15:49

mysql> select count(*) from taxonomy_term_data;
+----------+
| count(*) |
+----------+
| 109702 |
+----------+
1 row in set (0.36 sec)

mysql> select count(*) from taxonomy_term_hierarchy;
+----------+
| count(*) |
+----------+
| 109429 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from taxonomy_index;
+----------+
| count(*) |
+----------+
| 4204213 |
+----------+
1 row in set (0.00 sec)

Comment #4

MustangGB CreditAttribution: MustangGB commented 2 March 2019 at 15:30

Category:	Task	» Feature request
Issue tags:		+Performance

Comment #5

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 22 June 2021 at 19:31

Project:	Views (for Drupal 7)	» Drupal core
Version:	7.x-3.x-dev	» 9.3.x-dev
Component:	Code	» taxonomy.module

This is definitely an issue - #1358412: Content: Has taxonomy term ID (with depth) query performance landed in Views 3 and it's a popular core patch. The problem with it is that it slower for MySQL only backends - although it might also be more performant when the db server is under load. Given we're unsure of the performance characteristics I think we should revisit the approach here - or at least something similar.

Moving this issue to the core queue as this is where the latest version of views live and this code is now part of the taxonomy module.

Comment #6

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 22 June 2021 at 19:59

File	Size
2133215-6.patch	16.94 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,639 pass, 1 fail

Here's a patch that has quite a big impact on a large site with lots of content and plenty of rows in taxonomy_index - it's the scanning of this table that make the query slow. Also need to fix up TaxonomyIndexTidDepth as it uss the same logic. Perhaps we can move the logic into a trait to make it easy to share.

Comment #7

22 June 2021 at 20:54

Status:

Needs review

» Needs work

The last submitted patch, 6: 2133215-6.patch, failed testing. View results

Comment #8

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 22 June 2021 at 21:22

Status:

Needs work

» Needs review

File	Size
6-8-interdiff.txt	8.63 KB
2133215-8.patch	20.85 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,641 pass

Refactored the common stuff into a trait so \Drupal\taxonomy\Plugin\views\argument\IndexTidDepth and \Drupal\taxonomy\Plugin\views\filter\TaxonomyIndexTidDepth can share the common functionality.

Comment #9

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 23 June 2021 at 00:44

File	Size
8-9-interdiff.txt	3.12 KB
2133215-9.patch	20.54 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,641 pass

Hmmm... okay the approach in #8 doesn't result in any performance improvements. So I've moved to the union approach in the issue summary.

This results in a test site view going from 537.49 ms to under 200ms so that's quite the performance improvement.

Looking at the queries involved...

HEAD

explain SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term__parent th ON th.entity_id = tn.tid LEFT OUTER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id LEFT OUTER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id WHERE (tn.tid = '353') OR (th1.entity_id = '353') OR (th2.entity_id = '353');
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref                          | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
|  1 | SIMPLE      | tn    | NULL       | index | term_node     | term_node | 14      | NULL                         | 124367 |   100.00 | Using index              |
|  1 | SIMPLE      | th    | NULL       | ref   | PRIMARY       | PRIMARY   | 4       | instyle.tn.tid               |      1 |   100.00 | NULL                     |
|  1 | SIMPLE      | th1   | NULL       | ref   | PRIMARY       | PRIMARY   | 4       | instyle.th.parent_target_id  |      1 |   100.00 | NULL                     |
|  1 | SIMPLE      | th2   | NULL       | ref   | PRIMARY       | PRIMARY   | 4       | instyle.th1.parent_target_id |      1 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+

PATCH

explain SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id INNER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id WHERE th2.entity_id = '353';
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
| id | select_type  | table        | partitions | type | possible_keys            | key              | key_len | ref                   | rows | filtered | Extra                    |
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
|  1 | PRIMARY      | tn           | NULL       | ref  | PRIMARY,term_node,bundle | term_node        | 4       | const                 |  102 |   100.00 | Using index              |
|  2 | UNION        | th1          | NULL       | ref  | PRIMARY                  | PRIMARY          | 4       | const                 |    1 |   100.00 | Using index              |
|  2 | UNION        | th           | NULL       | ref  | PRIMARY,parent_target_id | parent_target_id | 4       | const                 |    5 |   100.00 | Using where; Using index |
|  2 | UNION        | tn           | NULL       | ref  | term_node                | term_node        | 4       | instyle.th.entity_id  |   17 |   100.00 | Using index              |
|  3 | UNION        | th2          | NULL       | ref  | PRIMARY                  | PRIMARY          | 4       | const                 |    1 |   100.00 | Using index              |
|  3 | UNION        | th1          | NULL       | ref  | PRIMARY,parent_target_id | parent_target_id | 4       | const                 |    5 |   100.00 | Using where; Using index |
|  3 | UNION        | th           | NULL       | ref  | PRIMARY,parent_target_id | parent_target_id | 4       | instyle.th1.entity_id |  223 |   100.00 | Using index              |
|  3 | UNION        | tn           | NULL       | ref  | term_node                | term_node        | 4       | instyle.th.entity_id  |   17 |   100.00 | Using index              |
| NULL | UNION RESULT | <union1,2,3> | NULL       | ALL  | NULL                     | NULL             | NULL    | NULL                  | NULL |     NULL | Using temporary          |
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+

Comment #10

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 23 June 2021 at 01:21

Another advantage of the union approach is that the sub query returns less rows. On the site I'm testing on the sub query in HEAD returns 462 rows whereas the patch with unions returns 219. If I add a distinct the HEAD approach returns 219 and I've checked they contain existing the same node IDs.

Comment #11

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 28 June 2021 at 10:53

File	Size
2133215-8.9.x-11.patch	22.07 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass

Here's a patch for 8.9.x to enable testing on sites that are using #1358412: Content: Has taxonomy term ID (with depth) query performance - this will help us test this patch on production sites that are using that patch.

Comment #12

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 28 June 2021 at 11:56

File	Size
2133215-8.9.x-12.patch	23.33 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
11-12-interdiff.txt	1.27 KB

Whoops forgot to add the testing plugins into views data.

Comment #13

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 28 June 2021 at 22:19

File	Size
9-13-interdiff.txt	8.74 KB
2133215-13.patch	23.85 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,678 pass, 1 fail

Back to implementing everything in the issue summary - this does part 2:

using an INNER join instead of a WHERE IN (...) for the taxonomy filter

Comment #14

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 28 June 2021 at 22:23

File	Size
2133215-8.9.x-14.patch	26.64 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass

Here's the patch for 8.9.x

Comment #15

28 June 2021 at 23:22

The last submitted patch, 13: 2133215-13.patch, failed testing. View results

Comment #16

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 29 June 2021 at 09:04

File	Size
13-16-interdiff.txt	3.79 KB
2133215-16.patch	25.86 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,682 pass

Fixing the random fail during to ordering. Currently all nodes have the same created time because REQUEST_TIME doesn't change in the test runner. Added a sort and fixed up the created times so that we testing what we think we're testing rather than the somewhat arbitrary DBs emit data when the sort is sorting on data that is equal.

Comment #17

daffie CreditAttribution: daffie commented 29 June 2021 at 09:35

Status:

Needs review

» Needs work

First quick review:

+++ b/core/modules/taxonomy/src/Plugin/views/argument/IndexTidDepth.php
@@ -101,40 +103,13 @@ public function query($group_by = FALSE) {
+    $this->addJoin(Database::getConnection(), $tids, /*$this->options['depth'], $this->tableAlias*/);

+++ b/core/modules/taxonomy/src/Plugin/views/filter/TaxonomyIndexTidDepth.php
@@ -70,32 +68,7 @@ public function query() {
+    $this->addJoin(Database::getConnection(), $this->value);

Can we replace Database::getConnection() with $this->query->getConnection(). Maybe not needed as a parameter.

+++ b/core/modules/taxonomy/tests/src/Kernel/Views/TaxonomyTermArgumentDepthTest.php
@@ -0,0 +1,201 @@
+    // Create a hierarchy 3 deep. Note the parent setup function creates two
+    // top-level terms w/o children.

I do not think that this comment is correct. The hierarchy is 5 levels deep.

```
+++ b/core/modules/views/src/Plugin/views/join/InSubquery.php
@@ -0,0 +1,86 @@
+ *   'left_table_query' => $query,
```
In the view subquery plugin we have the parameter "left_query" and here we add the parameter "left_table_query". Maybe better for consistency to name it similar?

Comment #18

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 29 June 2021 at 10:59

Category:	Feature request	» Bug report
Priority:	Normal	» Major
Status:	Needs work	» Needs review

File	Size
16-18-interdiff.txt	4.78 KB
2133215-18.patch	25.84 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,682 pass

Thanks for the review @daffie.
Re:
1. Sure - fixed.
2. Fixed
3. The "left_query" in \Drupal\views\Plugin\views\join\Subquery is badly named. It's actually being as the left field in a join. What we need here is a subquery being used as the left table. The join plugin added here is badly named... I'm changing it to LeftTableSubquery as that is accurate.

Here are some performance stats from a real site on 8.9.x - 34299 nodes, 6712 terms and 129814 rows in the taxonomy index table.

HEAD

Run 1:
Query build time 10.39 ms
Query execute time 836.03 ms

Run 2:
Query build time 7.07 ms
Query execute time 520.63 ms

#1358412: Content: Has taxonomy term ID (with depth) query performance

Run 1:
Query build time 18.42 ms
Query execute time 13.16 ms

Run 2:
Query build time 41.45 ms
Query execute time 13.9 ms

THIS PATCH

Run 1:
Query build time 14.6 ms
Query execute time 19.3 ms

Run 2:
Query build time 10.14 ms
Query execute time 14.23 ms

Given that views using this filter or argument can cause sites to become very unresponsive I'm changing this to be a major bug with the views system. The fact that this issue exists and that #1358412: Content: Has taxonomy term ID (with depth) query performance has been committed to Views 3 for Drupal 7 shows this is a real problem that people are hitting.

I think the solution on this issue is superior to #1358412: Content: Has taxonomy term ID (with depth) query performance because it results in less queries or accesses to cache - since that one has to load all the taxonomy terms involved and it is obviously a massive improvement on HEAD. Additionally I think solutions like the new module linked in #1358412-84: Content: Has taxonomy term ID (with depth) query performance because maintaining more indexes and tables to speed up taxonomy hierarchy has additional costs at taxonomy and node save time,

Comment #19

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 10:32

Issue summary:

View changes

Fixing issue summary up.

Comment #20

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 10:35

Issue summary:

View changes

More issue summary fixes.

Comment #21

daffie CreditAttribution: daffie commented 5 July 2021 at 11:08

Status:

Needs review

» Needs work

15 files were hidden/shown/deleted

File	Size
0001-Issue-2133215-generate-efficient-SQL.patch	3.56 KB

term-hierarchy.php_.txt	1.66 KB
2133215-6.patch	16.94 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,639 pass, 1 fail
6-8-interdiff.txt	8.63 KB
2133215-8.patch	20.85 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,641 pass
8-9-interdiff.txt	3.12 KB
2133215-9.patch	20.54 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,641 pass
2133215-8.9.x-11.patch	22.07 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
2133215-8.9.x-12.patch	23.33 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
11-12-interdiff.txt	1.27 KB
9-13-interdiff.txt	8.74 KB
2133215-13.patch	23.85 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,678 pass, 1 fail
2133215-8.9.x-14.patch	26.64 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
13-16-interdiff.txt	3.79 KB
2133215-16.patch	25.86 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,682 pass

Patch looks good!

+++ b/core/modules/taxonomy/src/NodeDepthQueryTrait.php
@@ -0,0 +1,64 @@
+    $definition['left_table'] = 'node_taxonomy_depth';

Why do we have this line? It is not used in the plugin. Can it be removed?

+++ b/core/modules/taxonomy/src/NodeDepthQueryTrait.php
@@ -0,0 +1,64 @@
+    $definition['left_table_query'] = $subquery;
+    $join = Views::pluginManager('join')->createInstance('left_table_subquery', $definition);

+++ b/core/modules/views/src/Plugin/views/join/LeftTableSubquery.php
@@ -0,0 +1,86 @@
+ *   'left_table_query' => $query,
...
+ * - left_table_query: The subquery to use as the left table of the join clause.
...
+ * @ViewsJoin("left_table_subquery")
...
+    assert($this->configuration['left_table_query'] instanceof SelectInterface);
+    $this->leftTableQuery = $this->configuration['left_table_query'];

Can we choose either for "left_table_query" or for "left_table_subquery"? We have have just one then there is just one mistake less to make. Either one is fine for me.

Comment #22

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 11:28

Issue summary:

View changes

Updated the first set of queries for D9. Working on changing the second part.

Comment #23

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 11:47

Status:

Needs work

» Needs review

File	Size
18-23-interdiff.txt	2.48 KB
2133215-23.patch	25.83 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass PHP 7.3 & PostgreSQL 10.12 28,725 pass PHP 7.3 & SQLite 3.28 28,725 pass

@daffie nice review - great points. Patch attached addresses both of them and fleshes out the documentation on LeftQueryJoin.

Comment #24

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 11:49

Title:

make views_handler_argument_term_node_tid_depth efficient

» Fix IndexTidDepth views argument plugin and TaxonomyIndexTidDepth views filter plugin performance

Comment #25

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 12:02

File	Size
2133215-8.9.x-25.patch	28.49 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,635 pass, 10 fail

Here's the patch for 8.9.x that also works with the new plugins created by the alternative approach patch... I need it for testing on real sites...

Comment #26

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 12:11

File	Size
25-26-interdiff.txt	645 bytes
2133215-8.9.x-26.patch	28.48 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass

Hmmm #25 doesn't work on D8 because it's getConnection() doesn't exist... ho hum.

Comment #27

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 12:22

Issue summary:

View changes

Updated the remaining queries in the issue summary to D9.

Comment #28

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce commented 5 July 2021 at 12:41

Issue summary:

View changes

Comment #29

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 5 July 2021 at 12:42

Issue summary:

View changes

Comment #30

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 5 July 2021 at 12:58

Issue summary:

View changes

Comment #31

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 5 July 2021 at 13:07

File	Size
2133215-23.patch	25.83 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass

Re-upping the D9 patch so it's the latest one on the issue so people know what to review.

Comment #32

5 July 2021 at 13:15

The last submitted patch, 25: 2133215-8.9.x-25.patch, failed testing. View results

Comment #33

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 5 July 2021 at 13:22

Issue summary:

View changes

Adding CR for the new join plugin to the issue summary.

Comment #34

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 5 July 2021 at 13:26

Issue summary:

View changes

Comment #35

daffie CreditAttribution: daffie commented 5 July 2021 at 14:36

Status:

Needs review

» Reviewed & tested by the community

7 files were hidden/shown/deleted

File	Size
16-18-interdiff.txt	4.78 KB
2133215-18.patch	25.84 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,682 pass
18-23-interdiff.txt	2.48 KB
2133215-23.patch	25.83 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass PHP 7.3 & PostgreSQL 10.12 28,725 pass PHP 7.3 & SQLite 3.28 28,725 pass
2133215-8.9.x-25.patch	28.49 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,635 pass, 10 fail
25-26-interdiff.txt	645 bytes
2133215-8.9.x-26.patch	28.48 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass

All the code changes look good to me.
There is testing added test the code changes.
The IS and the CR are in order.
For me it is RTBC.

Comment #36

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 5 July 2021 at 15:24

File	Size
31-36-interdiff.txt	1.34 KB
2133215-36.patch	26.46 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass

Thanks for the reviews @daffie!

We're missing an empty post update to ensure the new views join plugin can be found and we should also address the @todo in the new join plugin. I sat in xdebug to see what is going on with the adjusted flag and reading \Drupal\views\Plugin\views\query\Sql::adjustJoin() makes it clear that setting this to TRUE is a good idea since in this case the left table is a query and the alias is decided by the call to addRelationship(). Maybe this will need adjusting it we start allowing such joins to be configured via hook_views_data() but that feels like a very odd use-case. This type of join needs to be added programatically via some other sort of plugin. Like the existing \Drupal\views\Plugin\views\join\Subquery plugin.

Comment #37

daniel.bosen

he/him

CreditAttribution: daniel.bosen at Thunder commented 5 July 2021 at 16:05

Status:

Reviewed & tested by the community

» Needs work

I tested the new approach with different terms on our projects. The results were very situation-dependent. When I choose a term, that is the root of a bigger tree, the subselect of the WHERE ... IN() returns a lot of rows, and we actually see a performance regression.

The results for HEAD query:

SELECT DISTINCT node_field_data.created AS node_field_data_created,
                node_field_data.nid     AS nid
FROM   node_field_data node_field_data
           LEFT JOIN taxonomy_index taxonomy_index
                     ON node_field_data.nid = taxonomy_index.nid
           LEFT JOIN node__field_promote_states node__field_promote_states
                     ON node_field_data.nid = node__field_promote_states.entity_id
                         AND node__field_promote_states.field_promote_states_value =
                             'landing'
WHERE  ( (( node__field_promote_states.field_promote_states_value IS NULL ))
    AND ( node_field_data.nid IN (SELECT tn.nid AS nid
                                  FROM   taxonomy_index tn
                                             LEFT OUTER JOIN taxonomy_term__parent th
                                                             ON th.entity_id = tn.tid
                                             LEFT OUTER JOIN taxonomy_term__parent th1
                                                             ON th.parent_target_id =
                                                                th1.entity_id
                                  WHERE  ( tn.tid = '718' )
                                     OR ( th1.entity_id = '718' )) ) )
  AND ( ( taxonomy_index.status = '1' )
    AND ( node_field_data.status = '1' )
    AND ( node_field_data.type IN ( 'article', 'recipe' ) ) )
ORDER  BY node_field_data_created DESC
    LIMIT  6 offset 5;

+-------------------------+--------+
| node_field_data_created | nid    |
+-------------------------+--------+
|              1625284800 | 118893 |
|              1625207400 | 118916 |
|              1625126400 | 118902 |
|              1625122800 | 118394 |
|              1624968000 | 118888 |
|              1624867200 | 118756 |
+-------------------------+--------+
6 rows in set (0.014 sec)

With new code:

SELECT DISTINCT node_field_data.created AS node_field_data_created,
                node_field_data.nid     AS nid
FROM   node_field_data node_field_data
           LEFT JOIN taxonomy_index taxonomy_index
                     ON node_field_data.nid = taxonomy_index.nid
           LEFT JOIN node__field_promote_states node__field_promote_states
                     ON node_field_data.nid = node__field_promote_states.entity_id
                         AND node__field_promote_states.field_promote_states_value =
                             'landing'
           INNER JOIN (SELECT tn.nid AS nid
                       FROM   taxonomy_index tn
                       WHERE  tn.tid = '718'
                       UNION
                       SELECT tn.nid AS nid
                       FROM   taxonomy_index tn
                                  INNER JOIN taxonomy_term__parent th
                                             ON tn.tid = th.entity_id
                                  INNER JOIN taxonomy_term__parent th1
                                             ON th.parent_target_id = th1.entity_id
                       WHERE  th1.entity_id = '718') node_taxonomy_depth
                      ON node_taxonomy_depth.nid = node_field_data.nid
WHERE  (( node__field_promote_states.field_promote_states_value IS NULL ))
  AND ( ( taxonomy_index.status = '1' )
    AND ( node_field_data.status = '1' )
    AND ( node_field_data.type IN ( 'article', 'recipe' ) ) )
ORDER  BY node_field_data_created DESC
    LIMIT  6 offset 5;

+-------------------------+--------+
| node_field_data_created | nid    |
+-------------------------+--------+
|              1625284800 | 118893 |
|              1625207400 | 118916 |
|              1625126400 | 118902 |
|              1625122800 | 118394 |
|              1624968000 | 118888 |
|              1624867200 | 118756 |
+-------------------------+--------+
6 rows in set (0.124 sec)

The queries are the same as in the issue description, just different tids

Comment #38

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 5 July 2021 at 16:19

Thanks for the review @daniel.bosen. It seems that the MySQL is very sensitive to this part...

it generates a WHERE ... IN ( subselect ) clause, which is inefficient on MySQL if the subselect returns anything but a trivially small result set

I think we can remove this part of patch and discuss when and how to add that. Without the left join using a query I see a speed up for both 353 and 718 with the dataset we're looking at.

Comment #39

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 6 July 2021 at 12:22

Issue summary:	View changes
Status:	Needs work	» Needs review

File	Size
performance graph.png	370.03 KB
head_performance graph.png	414.41 KB
taxo_plugin_perf.php_.txt	4.61 KB

I've some extensive performance testing of this patch on every term with results on the production site I have access to. The difference in the performance of JOIN (subselect) ON and the WHERE ... IN ( subselect ) is determined by the number of terms returned by the sub-select.

`JOIN (subselect) ON` vs `WHERE ... IN ( subselect )`

Graph showing the performance of 'WHERE ... IN ( subselect )' in blue pluses and 'JOIN (subselect) ON' in green circles. The x axis is the number of rows returned by the subselect. The y axis is the time in milliseconds for the query.
EDIT: the x-axis is supposed to be labelled "Number of node IDs returned by the subquery" as that's what the subquery lists.

This shows that the variation in the performance of JOIN (subselect) ON is a linear growth compared to the number of terms. This is very different for the WHERE ... IN ( subselect ) which shows an exponential decrease in time taken as the number of terms returned by the subquery increases.

Therefore we have a tradeoff to make. I think the fact that linear growth in slow and even when there a over 10000 nids return by the subquery the query still only takes 50ms means that we should continue with the UNIONS subquery + JOIN (subselect) ON approach (i.e. the current patch in #36).

HEAD vs `JOIN (subselect) ON`

Just to show how HEAD compares to the new code...
Graph showing the performance of 'HEAD' in magenta squares and 'JOIN (subselect) ON' in green circles. The x axis is the number of rows returned by the subselect. The y axis is the time in milliseconds for the query.
EDIT: the x-axis is supposed to be labelled "Number of node IDs returned by the subquery" as that's what the subquery lists.

The performance of HEAD when the subsquery only returns a few rows is dire.

Notes

The script to generate the numbers is attached.
When analysing the query it was found that the production site is unnecessary adding a filter on $data['taxonomy_index']['status'] - this results in an additional join to taxonomy_index to filter by taxonomy_index.status - but this is the node's status and doesn't need this join because we have a filter on node_field_data.status already. This is a problem that can be fixed on the production site I'm performance testing. Removing this unnecessary filter narrows the gap between JOIN (subselect) ON and the WHERE ... IN ( subselect ) when dealing with subquery that return large numbers of node IDs.

Comment #40

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 6 July 2021 at 14:30

File	Size
small-site-graph.png	383.59 KB

I've also looked at a much smaller production site with only 1874 nodes and 14925 rows in taxonomy term index and 2227 terms. The performance problems are not as significant but there are still benefits of moving to this approach.

For HEAD the slowest TID was 92.7ms and the fastest was 1.8ms
For WHERE ... IN ( subselect ) the slowest was 56.7ms and the fastest was 1.9ms
For JOIN (subselect) ON the slowest was 5.2ms and the fastest was 1.6ms

The graph shows a very similar distribution of times...

Graph showing query performance. X-axis is the number node IDs returned by the sub query. Y-axis is the time for the query in milliseconds. The green circle is for the query generated by HEAD. The blue triangle is for the query generated by the patch in comment #11. The grey cross is for the query generated by the patch in comment #36.

Comment #41

daniel.bosen

he/him

CreditAttribution: daniel.bosen at Thunder commented 6 July 2021 at 14:35

Status:

Needs review

» Reviewed & tested by the community

@alexpott Thanks for all the nice graphs and performance investigations! I can confirm, that the example I had, was an outlier and that most of the other terms are quicker with the new code.
So back to RTBC.

Comment #42

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 6 July 2021 at 15:03

Issue summary:	View changes
Status:	Reviewed & tested by the community	» Needs review

File	Size
36-42-interdiff.txt	4.77 KB
2133215-8.9.x-42.patch	25.42 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
2133215-42.patch	22.78 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,729 pass PHP 7.3 & MariaDB 10.3.22 28,735 pass PHP 7.3 & PostgreSQL 10.12 28,729 pass PHP 7.3 & SQLite 3.28 28,729 pass

I was just reviewing #3221933: PHP Notice when using "left_formula" in views join and through that I learnt about \Drupal\taxonomy\Plugin\views\relationship\NodeTermData::query() and in there we do a subquery join! Using the standard views join! So there's no need to add the new plugin. Which means no update function and this could potentially go in a patch release.

Comment #43

daffie CreditAttribution: daffie commented 6 July 2021 at 18:50

Status:

Needs review

» Reviewed & tested by the community

7 files were hidden/shown/deleted

File	Size
2133215-23.patch	25.83 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass
31-36-interdiff.txt	1.34 KB
2133215-36.patch	26.46 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass
performance graph.png	370.03 KB
head_performance graph.png	414.41 KB
taxo_plugin_perf.php_.txt	4.61 KB
small-site-graph.png	383.59 KB

The previous added new plugin has been removed.
Also from the IS and the CR.
All code changes look good to me.
The added testing is not removed.
Back to RTBC.

Comment #44

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 6 July 2021 at 21:43

7 files were hidden/shown/deleted

File	Size
2133215-23.patch	25.83 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass
31-36-interdiff.txt	1.34 KB
2133215-36.patch	26.46 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass
performance graph.png	370.03 KB
head_performance graph.png	414.41 KB
taxo_plugin_perf.php_.txt	4.61 KB
small-site-graph.png	383.59 KB

I've been investigate why we see the patch in #42 slow down whereas HEAD and the WHERE ... IN ( subselect ) gets quicker. It turns out that this is due to how MySQL's query planner is working. It chooses a less optimal query plan for JOIN (subselect) ON when the number of nodes returned by the subquery increases. However it possible to use MySQL query hints to improve this. Against my test site:

SELECT
DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid
FROM
node_field_data node_field_data
LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid
LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing'
INNER JOIN (SELECT tn.nid AS nid
FROM
taxonomy_index tn
WHERE tn.tid = '718' UNION SELECT tn.nid AS nid
FROM
taxonomy_index tn
INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id
INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id
WHERE th1.entity_id = '718' UNION SELECT tn.nid AS nid
FROM
taxonomy_index tn
INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id
INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id
INNER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id
WHERE th2.entity_id = '718') node_taxonomy_depth ON node_field_data.nid = node_taxonomy_depth.nid
WHERE ((node__field_promote_states.field_promote_states_value IS NULL)) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe')))
ORDER BY node_field_data_created DESC
LIMIT 11 OFFSET 0

Takes 442 ms.

If you start the query with...

SELECT
/*+ JOIN_PREFIX(node_field_data) */

it takes 28 ms!

Unfortunately core does not support adding query optimisation hints at all so I'm going to create a follow-up to discuss adding support so we can leverage them here to massively improve performance even when there are loads and loads of nodes that have the terms. I don't think any of this revokes the rtbc as the new query is much much more performant and consistently performant.

Comment #45

daniel.bosen

he/him

CreditAttribution: daniel.bosen at Thunder commented 7 July 2021 at 06:17

I am not so sure about the query hints, they tend to be very use-case specific. But yes, let's discuss this separately. This patch is a huge performance improvement RTBC +1

Comment #46

Lendude

Dutch

Amsterdam

CreditAttribution: Lendude at iO commented 7 July 2021 at 14:09

This looks really great, and LOVE the graphs.

But for all the things that the new method does here, it feels a little light on documentation, a little bit like the code in HEAD ;)

Just some things I think would benefit from some extra explanation:

+++ b/core/modules/taxonomy/src/NodeDepthQueryTrait.php
@@ -0,0 +1,66 @@
+   * Builds a performant depth subquery and adds it as a join to the query.
+   *
+   * @param string|array $tids
+   *   The terms IDs to do a depth search for.

We have some great examples of the structure of the query that is being build here, maybe add that to the doc block?

+++ b/core/modules/taxonomy/src/NodeDepthQueryTrait.php
@@ -0,0 +1,66 @@
+  protected function addJoin($tids): void {

Seems like a pretty minimalist name for all the work it is doing, feels like it deserves something better ;)

```
+++ b/core/modules/taxonomy/src/NodeDepthQueryTrait.php
@@ -0,0 +1,66 @@
+    if ($this->options['depth'] !== 0) {
+      // Create the queries to build the union.
```
'the union' is the first reference to the fact that we will be using a UNION here, so it feels like this could use more introduction, but maybe adding the structure to the main docblock would suffice?

+++ b/core/modules/taxonomy/src/NodeDepthQueryTrait.php
@@ -0,0 +1,66 @@
+      if ($this->options['depth'] > 0) {
...
+      foreach (range(1, abs($this->options['depth'])) as $count) {
...
+        if ($this->options['depth'] > 0) {

We probably could use some explanation as to what is going on with the juggling with depth and negative depth.

Comment #47

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 9 July 2021 at 11:03

File	Size
42-47-interdiff.txt	3.33 KB
2133215-47.patch	23.92 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,765 pass

Thanks for the review @Lendude the patch attached addresses #46 - good points!

Comment #48

Lendude

Dutch

Amsterdam

CreditAttribution: Lendude at iO commented 9 July 2021 at 14:30

@alexpott, thanks!

#47 totally satisfies my craving for docs, leaving RTBC

Comment #49

larowlan

🇦🇺🏝.au GMT+10

CreditAttribution: larowlan at PreviousNext commented 12 July 2021 at 06:58

This is some really great work folks. I've got one question about whether we can adapt this to make it more useful for contrib, and another that' s just a code-style nit.

```
+++ b/core/modules/taxonomy/src/NodeDepthQueryTrait.php
@@ -0,0 +1,92 @@
+trait NodeDepthQueryTrait {
...
+    $subquery = $connection->select('taxonomy_index', 'tn');
+    $subquery->addField('tn', 'nid');
...
+    $definition['left_field'] = 'nid';
+    $definition['field'] = 'nid';
...
+    $this->query->addRelationship('node_taxonomy_depth', $join, 'node', $this->relationship);
```
As a maintainer of taxonomy_entity_index, it would be good for us to replicate this logic over there to improve performance of querying other entity-types.

To that effect, is there any merit in making this not specific to nodes?

I guess it would be tricky because the base table would need to change to taxonomy_entity_index and then we'd need to also have a condition for entity-type. So all of these parts would need to be customisable, plus additional conditions in the join.

+++ b/core/modules/taxonomy/tests/src/Kernel/Views/TaxonomyTermArgumentDepthTest.php
@@ -0,0 +1,201 @@
+    $this->terms[0] = $first;
+    $this->terms[1] = $second;
+    $this->terms[2] = $third;
+    $this->terms[3] = $fourth;
+    $this->terms[4] = $fifth;

Nit: Could any reason not to write this as

$this->terms = [$first, $second, $third, $fourth, $fifth];

Comment #50

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 12 July 2021 at 11:11

File	Size
47-50-interdiff.txt	5.21 KB
2133215-50.patch	23.92 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,779 pass

Thanks for the review @larowlan.

1. I had a go at this... I changed the signature to protected function addSubQueryJoin($tids, string $index_table = 'taxonomy_index', string $index_join_entity_field = 'nid', string $index_taxonomy_field = 'tid'): void {. But then I realised that the taxonomy_entity_index module also adds revisions into the mix. I certainly think there might be scope for that module to leverage similar logic but making this re-usable for that will add much complexity and we'd basically have to add taxonomy_entity_index to core to test it.

2. Fixed - it's kinda from another test in core - \Drupal\Tests\taxonomy\Functional\Views\TaxonomyTermFilterDepthTest but we don't change that here so leaving that one alone.

Inspired by 1 I've removed a few of the hardcoded things that can be replaced by information on the class and I've renamed the trait to be about the taxonomy_index table because that's the table it is joining in.

Comment #51

dww

we/he/they

CreditAttribution: dww at TEN7 commented 16 July 2021 at 02:50

This all looks really great. Thanks to everyone who's worked on it, especially @alexpott!

Initial review. So far, only trivial doc nits. I've only skimmed the test coverage so far. I'll try to give that a closer look in the next few days...

+++ b/core/modules/taxonomy/src/TaxonomyIndexDepthQueryTrait.php
@@ -0,0 +1,93 @@
+ * - it uses inner joins and unions to select all the node IDs with the terms in

Given the next line, s/inner joins/INNER JOIN/ and s/unions/UNION/?

+++ b/core/modules/taxonomy/src/TaxonomyIndexDepthQueryTrait.php
@@ -0,0 +1,93 @@
+ * - it uses INNER JOIN (subquery) rather instead of a WHERE IN (subquery).

"rather instead" seem redundant. I'd remove "rather" and keep "instead of".

+++ b/core/modules/taxonomy/src/TaxonomyIndexDepthQueryTrait.php
@@ -0,0 +1,93 @@
+ * resulting join to node_field_data will be:

JOIN?
'node_field_data' (with single quotes)?

+++ b/core/modules/taxonomy/src/TaxonomyIndexDepthQueryTrait.php
@@ -0,0 +1,93 @@
+   * Builds a performant depth subquery and adds it as a join to the query.
...
+    // Add the subquery as a join.

JOIN?

+++ b/core/modules/taxonomy/src/TaxonomyIndexDepthQueryTrait.php
@@ -0,0 +1,93 @@
+   * @param string|array $tids
+   *   The terms IDs to do a depth search for.

Do we need to explain how the string option works here?

Comment #52

dww

we/he/they

CreditAttribution: dww at TEN7 commented 16 July 2021 at 02:51

Issue tags:

+Bug Smash Initiative

Comment #53

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 16 July 2021 at 10:02

File	Size
50-53-interdiff.txt	1.39 KB
2133215-53.patch	24.05 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,788 pass

Fixed #51 - not in the way asked for - I rewrote the docs to include more detail.

Comment #54

dww

we/he/they

CreditAttribution: dww at TEN7 commented 16 July 2021 at 18:19

Status:

Reviewed & tested by the community

» Needs review

File	Size
2133215-54.patch	23.9 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,844 pass
2133215.53_54.interdiff.txt	7.37 KB

15 files were hidden/shown/deleted

File	Size
2133215-23.patch	25.83 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass
31-36-interdiff.txt	1.34 KB
2133215-36.patch	26.46 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,725 pass
performance graph.png	370.03 KB
head_performance graph.png	414.41 KB
taxo_plugin_perf.php_.txt	4.61 KB
small-site-graph.png	383.59 KB
36-42-interdiff.txt	4.77 KB
2133215-8.9.x-42.patch	25.42 KB
8.9.x: PHP 7.3 & MySQL 5.7 28,640 pass
2133215-42.patch	22.78 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,729 pass PHP 7.3 & MariaDB 10.3.22 28,735 pass PHP 7.3 & PostgreSQL 10.12 28,729 pass PHP 7.3 & SQLite 3.28 28,729 pass
42-47-interdiff.txt	3.33 KB
2133215-47.patch	23.92 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,765 pass
47-50-interdiff.txt	5.21 KB
2133215-50.patch	23.92 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,779 pass
50-53-interdiff.txt	1.39 KB

@alexpott re: #53: Of course you did. ;) That's even better. Thanks!

I finished my pedantic review of the tests, too. Had a brief head-scratch over point 1, then realized it's string view ID vs. ViewExecutable. I think I cleaned this up well in the attached patch:

+++ b/core/modules/taxonomy/tests/src/Kernel/Views/TaxonomyTermArgumentDepthTest.php
@@ -0,0 +1,197 @@
+   * Views used by this test.
...
+   * Stores the view used in the test.

Huh? Oh, types. ;)

+++ b/core/modules/taxonomy/tests/src/Kernel/Views/TaxonomyTermArgumentDepthTest.php
@@ -0,0 +1,197 @@
+    // Third-level and second-level term, depth -1, using a comma. Note that due
+    // to performance the "and" meaning of comma is not supported.

🤔 I don't see this limitation mentioned anywhere else in the patch. Not clear what it means here. Maybe this needs more explanation?

Would love to hear about #2 before this is truly RTBC.

Instead of commenting on everything else and having Alex re-roll it, rest of my review in the form of a new patch an interdiff. Basically:
A. Minor copy editing to comments.
B. Move a few things around to put related things closer together (e.g. at first I didn't notice we initialized $this->terms).
C. Add :void to all the new methods in the new test classes.
D. Remove $column_map and $assert_method local variables (which are only used once) and pass the right values directly to assertIdenticalResultsetHelper() (just like assertTermWithDepthResult() was already doing).

Comment #55

daffie CreditAttribution: daffie commented 16 July 2021 at 22:03

Status:

Needs review

» Reviewed & tested by the community

2 files were hidden/shown/deleted

File	Size
2133215-53.patch	24.05 KB
9.3.x: PHP 8.0 & MySQL 5.7 28,788 pass
2133215.53_54.interdiff.txt	7.37 KB

All the changes look good to me.
Back to RTBC.

Comment #56

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 17 July 2021 at 01:14

Re #54.2 This is a restriction based on the HEAD.


    $form['break_phrase'] = [
      '#type' => 'checkbox',
      '#title' => $this->t('Allow multiple values'),
      '#description' => $this->t('If selected, users can enter multiple values in the form of 1+2+3. Due to the number of JOINs it would require, AND will be treated as OR with this filter.'),
      '#default_value' => !empty($this->options['break_phrase']),
    ];

Here's the documentation in HEAD already about this.

Comment #57

dww

we/he/they

CreditAttribution: dww at TEN7 commented 17 July 2021 at 03:28

Re: #55: Thanks!
Re: #56: Ahh, right. Makes sense. Thanks again!

$RTBC++;

As the maintainer of views_taxonomy_term_name_into_id I'm extremely in favor of this improvement to these views plugins!

Thanks,
-Derek

Comment #58

19 August 2021 at 12:02

catch committed 923b2e6 on 9.3.x

Issue #2133215 by alexpott, dww, uplex_slink, dawehner, daffie, daniel....

Comment #59

catch

he/him

English

CreditAttribution: catch at Third and Grove commented 19 August 2021 at 12:03

Status:

Reviewed & tested by the community

» Fixed

Committed 923b2e6 and pushed to 9.3.x. Thanks!

This looks great, couldn't find anything to complain about and thanks for all the detailed performance investigation.

Seems like a small chance this could break hook_views_query_alter() implementations, so not backported to 9.2.x for now.

Comment #60

alexpott

he/they

English

🇪🇺🌍

CreditAttribution: alexpott at Acro Commerce, Thunder commented 24 August 2021 at 17:06

Filed a fix for the unfortunate views random sorting fail this issue causes.... #3229665: \Drupal\Tests\taxonomy\Functional\Views\TaxonomyTermFilterDepthTest can randomly fail due to views sorts

Comment #61

7 September 2021 at 17:09

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Fix IndexTidDepth views argument plugin and TaxonomyIndexTidDepth views filter plugin performance

Problem/Motivation

Proposed resolution

taxonomy hierarchy subselect with union of specific inner joins instead of left join

Existing code

New code

using an INNER join instead of a WHERE IN (...) for the taxonomy filter

Existing code + the above fix

New code

Overall gains

Performance testing notes

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Comments

HEAD

PATCH

HEAD

THIS PATCH

JOIN (subselect) ON vs WHERE ... IN ( subselect )

HEAD vs JOIN (subselect) ON

Notes

Referenced by

`JOIN (subselect) ON` vs `WHERE ... IN ( subselect )`

HEAD vs `JOIN (subselect) ON`