Problem/Motivation
This issue is to propose two significant optimizations to \Drupal\taxonomy\Plugin\views\argument\IndexTidDepth
and \Drupal\taxonomy\Plugin\views\filter\TaxonomyIndexTidDepth
.
- the subselect the code generates is a multi-level left-join which is inefficient on MySQL because it cannot make use of indexes
- in particular, the left join generates many duplicates
- it generates a
WHERE ... IN ( subselect )
clause, which is inefficient on MySQL if the subselect returns anything but a trivially small result set
Here is the complete query HEAD generates for a single TID query with a depth of 3.
SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' WHERE (((node__field_promote_states.field_promote_states_value IS NULL)) AND (node_field_data.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term__parent th ON th.entity_id = tn.tid LEFT OUTER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE (tn.tid = '353') OR (th1.entity_id = '353')))) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid |
+-------------------------+--------+
| 1594378800 | 110686 |
| 1594360800 | 110446 |
| 1593946800 | 110321 |
| 1593342000 | 109946 |
| 1593241200 | 109896 |
| 1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.49 sec)
Proposed resolution
The patch changes the code in two aspects:
taxonomy hierarchy subselect with union of specific inner joins instead of left join
Existing code
The existing code generates subselect following this pattern (tid 353, +3 levels):
explain SELECT tn.nid AS nid FROM taxonomy_index tn LEFT OUTER JOIN taxonomy_term__parent th ON th.entity_id = tn.tid LEFT OUTER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id LEFT OUTER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id WHERE (tn.tid = '353') OR (th1.entity_id = '353') OR (th2.entity_id = '353');
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
| 1 | SIMPLE | tn | NULL | index | term_node | term_node | 14 | NULL | 124367 | 100.00 | Using index |
| 1 | SIMPLE | th | NULL | ref | PRIMARY | PRIMARY | 4 | instyle.tn.tid | 1 | 100.00 | NULL |
| 1 | SIMPLE | th1 | NULL | ref | PRIMARY | PRIMARY | 4 | instyle.th.parent_target_id | 1 | 100.00 | NULL |
| 1 | SIMPLE | th2 | NULL | ref | PRIMARY | PRIMARY | 4 | instyle.th1.parent_target_id | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------------------------------+--------+----------+--------------------------+
The code in head has to scan all of the taxonomy_index rows.
New code
The patch generates much more efficient code using a UNION of specific inner joins, the replacement for the example above is:
explain SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id INNER JOIN taxonomy_term__parent th2 ON th1.parent_target_id = th2.entity_id WHERE th2.entity_id = '353';
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
| 1 | PRIMARY | tn | NULL | ref | PRIMARY,term_node,bundle | term_node | 4 | const | 102 | 100.00 | Using index |
| 2 | UNION | th1 | NULL | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 2 | UNION | th | NULL | ref | PRIMARY,parent_target_id | parent_target_id | 4 | const | 5 | 100.00 | Using where; Using index |
| 2 | UNION | tn | NULL | ref | term_node | term_node | 4 | instyle.th.entity_id | 17 | 100.00 | Using index |
| 3 | UNION | th2 | NULL | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 3 | UNION | th1 | NULL | ref | PRIMARY,parent_target_id | parent_target_id | 4 | const | 5 | 100.00 | Using where; Using index |
| 3 | UNION | th | NULL | ref | PRIMARY,parent_target_id | parent_target_id | 4 | instyle.th1.entity_id | 223 | 100.00 | Using index |
| 3 | UNION | tn | NULL | ref | term_node | term_node | 4 | instyle.th.entity_id | 17 | 100.00 | Using index |
| NULL | UNION RESULT | <union1,2,3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+--------------+------------+------+--------------------------+------------------+---------+-----------------------+------+----------+--------------------------+
Far fewer rows are scanned and indexes are used.
using an INNER join instead of a WHERE IN (...) for the taxonomy filter
if the outer query constructed by views is inefficient (as can easily be the case with any non-trivial LEFT-join), the query as a whole will still remain inefficient because the efficient subselect is used in a WHERE IN (...) which MySQL only uses as a filter on the outer query result.
This can be made much more efficient by using an INNER join. Here's an example for how the patch will change the SQL (reformatted for readability again)
Existing code + the above fix
SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' WHERE (((node__field_promote_states.field_promote_states_value IS NULL)) AND (node_field_data.nid IN (SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353'))) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid |
+-------------------------+--------+
| 1594378800 | 110686 |
| 1594360800 | 110446 |
| 1593946800 | 110321 |
| 1593342000 | 109946 |
| 1593241200 | 109896 |
| 1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.08 sec)
New code
SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.nid = node__field_promote_states.entity_id AND node__field_promote_states.field_promote_states_value = 'landing' INNER JOIN (SELECT tn.nid AS nid FROM taxonomy_index tn WHERE tn.tid = '353' UNION SELECT tn.nid AS nid FROM taxonomy_index tn INNER JOIN taxonomy_term__parent th ON tn.tid = th.entity_id INNER JOIN taxonomy_term__parent th1 ON th.parent_target_id = th1.entity_id WHERE th1.entity_id = '353') node_taxonomy_depth ON node_taxonomy_depth.nid = node_field_data.nid WHERE ((node__field_promote_states.field_promote_states_value IS NULL)) AND ((taxonomy_index.status = '1') AND (node_field_data.status = '1') AND (node_field_data.type IN ('article', 'recipe'))) ORDER BY node_field_data_created DESC LIMIT 6 OFFSET 5;
+-------------------------+--------+
| node_field_data_created | nid |
+-------------------------+--------+
| 1594378800 | 110686 |
| 1594360800 | 110446 |
| 1593946800 | 110321 |
| 1593342000 | 109946 |
| 1593241200 | 109896 |
| 1592823600 | 109591 |
+-------------------------+--------+
6 rows in set (0.01 sec)
EXPLAIN is not so useful for this one because the join via query makes for complicated reading - but you can clearly see the performance improvement when running the query.
Overall gains
Use MySQL profiling we can compare the original query versus the final query...
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 0.49241525 | SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid FROM node_field_data node_field_data LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.n |
| 2 | 0.00789575 | SELECT DISTINCT node_field_data.created AS node_field_data_created, node_field_data.nid AS nid
FROM
node_field_data node_field_data
LEFT JOIN taxonomy_index taxonomy_index ON node_field_data.nid = taxonomy_index.nid
LEFT JOIN node__field_promote_states node__field_promote_states ON node_field_data.n |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
This shows that original query took 0.49241525 seconds to execute and the new query takes 0.00789575 seconds. This is 98% decrease in the time take to run the query!
Performance testing notes
node_field_data contains 34299 rows
taxonomy_index contains 129814 rows
taxonomy_term__parent contains 6712 rows
node__field_promote_states contains 96159 rows
Remaining tasks
User interface changes
N/a
API changes
N/a
Data model changes
N/a
Release notes snippet
Views that use either the taxonomy depth filter or taxonomy depth argument are massively more performant.
Comment | File | Size | Author |
---|---|---|---|
#54 | 2133215.53_54.interdiff.txt | 7.37 KB | dww |
#54 | 2133215-54.patch | 23.9 KB | dww |
Comments
Comment #1
uplex_slink CreditAttribution: uplex_slink commentedComment #2
dawehnerI totally believe that these kind of queries could get quite slow, but the question is whether its worth to change
the behaviour by default.
I tried to do some basic benchmarking setting up the content using the following script, which was still pretty fast.
How many entries do you have in the taxonom_index and term tables?
Regarding this, I don't really know a much cleaner way to do it, table_queue is considered actually to be a internal data structure, so there was never really a reason to change it.
Comment #3
chr.fritschmysql> select count(*) from taxonomy_term_data;
+----------+
| count(*) |
+----------+
| 109702 |
+----------+
1 row in set (0.36 sec)
mysql> select count(*) from taxonomy_term_hierarchy;
+----------+
| count(*) |
+----------+
| 109429 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from taxonomy_index;
+----------+
| count(*) |
+----------+
| 4204213 |
+----------+
1 row in set (0.00 sec)
Comment #4
MustangGB CreditAttribution: MustangGB commentedComment #5
alexpottThis is definitely an issue - #1358412: Content: Has taxonomy term ID (with depth) query performance landed in Views 3 and it's a popular core patch. The problem with it is that it slower for MySQL only backends - although it might also be more performant when the db server is under load. Given we're unsure of the performance characteristics I think we should revisit the approach here - or at least something similar.
Moving this issue to the core queue as this is where the latest version of views live and this code is now part of the taxonomy module.
Comment #6
alexpottHere's a patch that has quite a big impact on a large site with lots of content and plenty of rows in taxonomy_index - it's the scanning of this table that make the query slow. Also need to fix up TaxonomyIndexTidDepth as it uss the same logic. Perhaps we can move the logic into a trait to make it easy to share.
Comment #8
alexpottRefactored the common stuff into a trait so \Drupal\taxonomy\Plugin\views\argument\IndexTidDepth and \Drupal\taxonomy\Plugin\views\filter\TaxonomyIndexTidDepth can share the common functionality.
Comment #9
alexpottHmmm... okay the approach in #8 doesn't result in any performance improvements. So I've moved to the union approach in the issue summary.
This results in a test site view going from 537.49 ms to under 200ms so that's quite the performance improvement.
Looking at the queries involved...
HEAD
PATCH
Comment #10
alexpottAnother advantage of the union approach is that the sub query returns less rows. On the site I'm testing on the sub query in HEAD returns 462 rows whereas the patch with unions returns 219. If I add a distinct the HEAD approach returns 219 and I've checked they contain existing the same node IDs.
Comment #11
alexpottHere's a patch for 8.9.x to enable testing on sites that are using #1358412: Content: Has taxonomy term ID (with depth) query performance - this will help us test this patch on production sites that are using that patch.
Comment #12
alexpottWhoops forgot to add the testing plugins into views data.
Comment #13
alexpottBack to implementing everything in the issue summary - this does part 2:
Comment #14
alexpottHere's the patch for 8.9.x
Comment #16
alexpottFixing the random fail during to ordering. Currently all nodes have the same created time because REQUEST_TIME doesn't change in the test runner. Added a sort and fixed up the created times so that we testing what we think we're testing rather than the somewhat arbitrary DBs emit data when the sort is sorting on data that is equal.
Comment #17
daffie CreditAttribution: daffie commentedFirst quick review:
Can we replace
Database::getConnection()
with$this->query->getConnection()
. Maybe not needed as a parameter.I do not think that this comment is correct. The hierarchy is 5 levels deep.
In the view subquery plugin we have the parameter "left_query" and here we add the parameter "left_table_query". Maybe better for consistency to name it similar?
Comment #18
alexpottThanks for the review @daffie.
Re:
1. Sure - fixed.
2. Fixed
3. The "left_query" in \Drupal\views\Plugin\views\join\Subquery is badly named. It's actually being as the left field in a join. What we need here is a subquery being used as the left table. The join plugin added here is badly named... I'm changing it to LeftTableSubquery as that is accurate.
Here are some performance stats from a real site on 8.9.x - 34299 nodes, 6712 terms and 129814 rows in the taxonomy index table.
HEAD
Run 1:
Query build time 10.39 ms
Query execute time 836.03 ms
Run 2:
Query build time 7.07 ms
Query execute time 520.63 ms
#1358412: Content: Has taxonomy term ID (with depth) query performance
Run 1:
Query build time 18.42 ms
Query execute time 13.16 ms
Run 2:
Query build time 41.45 ms
Query execute time 13.9 ms
THIS PATCH
Run 1:
Query build time 14.6 ms
Query execute time 19.3 ms
Run 2:
Query build time 10.14 ms
Query execute time 14.23 ms
Given that views using this filter or argument can cause sites to become very unresponsive I'm changing this to be a major bug with the views system. The fact that this issue exists and that #1358412: Content: Has taxonomy term ID (with depth) query performance has been committed to Views 3 for Drupal 7 shows this is a real problem that people are hitting.
I think the solution on this issue is superior to #1358412: Content: Has taxonomy term ID (with depth) query performance because it results in less queries or accesses to cache - since that one has to load all the taxonomy terms involved and it is obviously a massive improvement on HEAD. Additionally I think solutions like the new module linked in #1358412-84: Content: Has taxonomy term ID (with depth) query performance because maintaining more indexes and tables to speed up taxonomy hierarchy has additional costs at taxonomy and node save time,
Comment #19
alexpottFixing issue summary up.
Comment #20
alexpottMore issue summary fixes.
Comment #21
daffie CreditAttribution: daffie commentedPatch looks good!
Why do we have this line? It is not used in the plugin. Can it be removed?
Can we choose either for "left_table_query" or for "left_table_subquery"? We have have just one then there is just one mistake less to make. Either one is fine for me.
Comment #22
alexpottUpdated the first set of queries for D9. Working on changing the second part.
Comment #23
alexpott@daffie nice review - great points. Patch attached addresses both of them and fleshes out the documentation on LeftQueryJoin.
Comment #24
alexpottComment #25
alexpottHere's the patch for 8.9.x that also works with the new plugins created by the alternative approach patch... I need it for testing on real sites...
Comment #26
alexpottHmmm #25 doesn't work on D8 because it's getConnection() doesn't exist... ho hum.
Comment #27
alexpottUpdated the remaining queries in the issue summary to D9.
Comment #28
alexpottComment #29
alexpott.
Comment #30
alexpottComment #31
alexpottRe-upping the D9 patch so it's the latest one on the issue so people know what to review.
Comment #33
alexpottAdding CR for the new join plugin to the issue summary.
Comment #34
alexpottComment #35
daffie CreditAttribution: daffie commentedAll the code changes look good to me.
There is testing added test the code changes.
The IS and the CR are in order.
For me it is RTBC.
Comment #36
alexpottThanks for the reviews @daffie!
We're missing an empty post update to ensure the new views join plugin can be found and we should also address the @todo in the new join plugin. I sat in xdebug to see what is going on with the adjusted flag and reading
\Drupal\views\Plugin\views\query\Sql::adjustJoin()
makes it clear that setting this to TRUE is a good idea since in this case the left table is a query and the alias is decided by the call to addRelationship(). Maybe this will need adjusting it we start allowing such joins to be configured via hook_views_data() but that feels like a very odd use-case. This type of join needs to be added programatically via some other sort of plugin. Like the existing \Drupal\views\Plugin\views\join\Subquery plugin.Comment #37
daniel.bosenI tested the new approach with different terms on our projects. The results were very situation-dependent. When I choose a term, that is the root of a bigger tree, the subselect of the WHERE ... IN() returns a lot of rows, and we actually see a performance regression.
The results for HEAD query:
With new code:
The queries are the same as in the issue description, just different tids
Comment #38
alexpottThanks for the review @daniel.bosen. It seems that the MySQL is very sensitive to this part...
I think we can remove this part of patch and discuss when and how to add that. Without the left join using a query I see a speed up for both 353 and 718 with the dataset we're looking at.
Comment #39
alexpottI've some extensive performance testing of this patch on every term with results on the production site I have access to. The difference in the performance of
JOIN (subselect) ON
and theWHERE ... IN ( subselect )
is determined by the number of terms returned by the sub-select.JOIN (subselect) ON
vsWHERE ... IN ( subselect )
EDIT: the x-axis is supposed to be labelled "Number of node IDs returned by the subquery" as that's what the subquery lists.
This shows that the variation in the performance of
JOIN (subselect) ON
is a linear growth compared to the number of terms. This is very different for theWHERE ... IN ( subselect )
which shows an exponential decrease in time taken as the number of terms returned by the subquery increases.Therefore we have a tradeoff to make. I think the fact that linear growth in slow and even when there a over 10000 nids return by the subquery the query still only takes 50ms means that we should continue with the UNIONS subquery +
JOIN (subselect) ON
approach (i.e. the current patch in #36).HEAD vs
JOIN (subselect) ON
Just to show how HEAD compares to the new code...
EDIT: the x-axis is supposed to be labelled "Number of node IDs returned by the subquery" as that's what the subquery lists.
The performance of HEAD when the subsquery only returns a few rows is dire.
Notes
$data['taxonomy_index']['status']
- this results in an additional join to taxonomy_index to filter by taxonomy_index.status - but this is the node's status and doesn't need this join because we have a filter on node_field_data.status already. This is a problem that can be fixed on the production site I'm performance testing. Removing this unnecessary filter narrows the gap betweenJOIN (subselect) ON
and theWHERE ... IN ( subselect )
when dealing with subquery that return large numbers of node IDs.Comment #40
alexpottI've also looked at a much smaller production site with only 1874 nodes and 14925 rows in taxonomy term index and 2227 terms. The performance problems are not as significant but there are still benefits of moving to this approach.
WHERE ... IN ( subselect )
the slowest was 56.7ms and the fastest was 1.9msJOIN (subselect) ON
the slowest was 5.2ms and the fastest was 1.6msThe graph shows a very similar distribution of times...
Comment #41
daniel.bosen@alexpott Thanks for all the nice graphs and performance investigations! I can confirm, that the example I had, was an outlier and that most of the other terms are quicker with the new code.
So back to RTBC.
Comment #42
alexpottI was just reviewing #3221933: PHP Notice when using "left_formula" in views join and through that I learnt about \Drupal\taxonomy\Plugin\views\relationship\NodeTermData::query() and in there we do a subquery join! Using the standard views join! So there's no need to add the new plugin. Which means no update function and this could potentially go in a patch release.
Comment #43
daffie CreditAttribution: daffie commentedThe previous added new plugin has been removed.
Also from the IS and the CR.
All code changes look good to me.
The added testing is not removed.
Back to RTBC.
Comment #44
alexpottI've been investigate why we see the patch in #42 slow down whereas HEAD and the
WHERE ... IN ( subselect )
gets quicker. It turns out that this is due to how MySQL's query planner is working. It chooses a less optimal query plan forJOIN (subselect) ON
when the number of nodes returned by the subquery increases. However it possible to use MySQL query hints to improve this. Against my test site:Takes 442 ms.
If you start the query with...
it takes 28 ms!
Unfortunately core does not support adding query optimisation hints at all so I'm going to create a follow-up to discuss adding support so we can leverage them here to massively improve performance even when there are loads and loads of nodes that have the terms. I don't think any of this revokes the rtbc as the new query is much much more performant and consistently performant.
Comment #45
daniel.bosenI am not so sure about the query hints, they tend to be very use-case specific. But yes, let's discuss this separately. This patch is a huge performance improvement RTBC +1
Comment #46
LendudeThis looks really great, and LOVE the graphs.
But for all the things that the new method does here, it feels a little light on documentation, a little bit like the code in HEAD ;)
Just some things I think would benefit from some extra explanation:
We have some great examples of the structure of the query that is being build here, maybe add that to the doc block?
Seems like a pretty minimalist name for all the work it is doing, feels like it deserves something better ;)
'the union' is the first reference to the fact that we will be using a UNION here, so it feels like this could use more introduction, but maybe adding the structure to the main docblock would suffice?
We probably could use some explanation as to what is going on with the juggling with depth and negative depth.
Comment #47
alexpottThanks for the review @Lendude the patch attached addresses #46 - good points!
Comment #48
Lendude@alexpott, thanks!
#47 totally satisfies my craving for docs, leaving RTBC
Comment #49
larowlanThis is some really great work folks. I've got one question about whether we can adapt this to make it more useful for contrib, and another that' s just a code-style nit.
As a maintainer of taxonomy_entity_index, it would be good for us to replicate this logic over there to improve performance of querying other entity-types.
To that effect, is there any merit in making this not specific to nodes?
I guess it would be tricky because the base table would need to change to taxonomy_entity_index and then we'd need to also have a condition for entity-type. So all of these parts would need to be customisable, plus additional conditions in the join.
Nit: Could any reason not to write this as
Comment #50
alexpottThanks for the review @larowlan.
1. I had a go at this... I changed the signature to
protected function addSubQueryJoin($tids, string $index_table = 'taxonomy_index', string $index_join_entity_field = 'nid', string $index_taxonomy_field = 'tid'): void {
. But then I realised that the taxonomy_entity_index module also adds revisions into the mix. I certainly think there might be scope for that module to leverage similar logic but making this re-usable for that will add much complexity and we'd basically have to add taxonomy_entity_index to core to test it.2. Fixed - it's kinda from another test in core -
\Drupal\Tests\taxonomy\Functional\Views\TaxonomyTermFilterDepthTest
but we don't change that here so leaving that one alone.Inspired by 1 I've removed a few of the hardcoded things that can be replaced by information on the class and I've renamed the trait to be about the taxonomy_index table because that's the table it is joining in.
Comment #51
dwwThis all looks really great. Thanks to everyone who's worked on it, especially @alexpott!
Initial review. So far, only trivial doc nits. I've only skimmed the test coverage so far. I'll try to give that a closer look in the next few days...
Given the next line, s/inner joins/INNER JOIN/ and s/unions/UNION/?
"rather instead" seem redundant. I'd remove "rather" and keep "instead of".
JOIN?
'node_field_data' (with single quotes)?
JOIN?
Do we need to explain how the string option works here?
Comment #52
dwwComment #53
alexpottFixed #51 - not in the way asked for - I rewrote the docs to include more detail.
Comment #54
dww@alexpott re: #53: Of course you did. ;) That's even better. Thanks!
I finished my pedantic review of the tests, too. Had a brief head-scratch over point 1, then realized it's string view ID vs. ViewExecutable. I think I cleaned this up well in the attached patch:
Huh? Oh, types. ;)
🤔 I don't see this limitation mentioned anywhere else in the patch. Not clear what it means here. Maybe this needs more explanation?
Would love to hear about #2 before this is truly RTBC.
Instead of commenting on everything else and having Alex re-roll it, rest of my review in the form of a new patch an interdiff. Basically:
A. Minor copy editing to comments.
B. Move a few things around to put related things closer together (e.g. at first I didn't notice we initialized
$this->terms
).C. Add
:void
to all the new methods in the new test classes.D. Remove
$column_map
and$assert_method
local variables (which are only used once) and pass the right values directly toassertIdenticalResultsetHelper()
(just likeassertTermWithDepthResult()
was already doing).Comment #55
daffie CreditAttribution: daffie commentedAll the changes look good to me.
Back to RTBC.
Comment #56
alexpottRe #54.2 This is a restriction based on the HEAD.
Here's the documentation in HEAD already about this.
Comment #57
dwwRe: #55: Thanks!
Re: #56: Ahh, right. Makes sense. Thanks again!
$RTBC++;
As the maintainer of views_taxonomy_term_name_into_id I'm extremely in favor of this improvement to these views plugins!
Thanks,
-Derek
Comment #59
catchCommitted 923b2e6 and pushed to 9.3.x. Thanks!
This looks great, couldn't find anything to complain about and thanks for all the detailed performance investigation.
Seems like a small chance this could break hook_views_query_alter() implementations, so not backported to 9.2.x for now.
Comment #60
alexpottFiled a fix for the unfortunate views random sorting fail this issue causes.... #3229665: \Drupal\Tests\taxonomy\Functional\Views\TaxonomyTermFilterDepthTest can randomly fail due to views sorts