There is a data model issue. Taxonomy terms are content, Vocabularies are config. Changing the taxonomy term content can change the config This is wrong. If the vocabulary hierarchy is indeed content, it should be stored somewhere else. The Vocabulary hierarchy field and the taxonomy hierarchy table are two different things, don't confuse them. This is about Vocabulary.hierarchy, which is just used by the UI.

As a User Interface configuration field, there is a also UI issue. If you configure a vocabulary to be hierarchical and the user removes all child terms, making the hierarchy flat, this changes the configuration from hierarchical to flat. Why does user content override the site builder's configuration? Further, if the user makes such a change, converting the vocabulary from hierarchical to flat, they can no longer add hierarchical terms on the taxonomy edit page, they can only do so on the overviews page.

I believe that Vocabulary.hierarchy is only used by the UI:

* If it's set to 0 (disabled), it's used on the overview page to not allow re-ordering. In our case, if it's set to 0, then the UI prevents re-ordering, and our change should never be called.
* If it's set to multiple, the term edit page displays related terms differently.

Our options are:

1. Treat the Vocabulary hierarchy as configuration only. Site builders can set this. Data changes do not change the hierarchy setting.
2. Add a vocabulary content entity that stores the configuration.
3. Remove the hierarchy property from vocabularies and compute it at runtime when needed.

I think (1) is the right option.

The patch from #4 implements option 3.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

douggreen created an issue. See original summary.

Fabianx’s picture

I think what I would do is:

- Split it in 3 parts:

* taxonomy_term, content entity
* taxonomy_hierarchy, content entity
* vocabulary, config entity

One vocabulary "contains" (like references, is combined with, ...):

- 0..1 taxonomy_hierarchy entities
- 0..n taxonomy_term entities

That also sweetly solves the use case of CPS (and related modules), allows generic hierarchy's if there is a generic hierarchy entity, and data model wise it can be in a table, but that is determined by the hierarchy entity itself.

douggreen’s picture

Issue summary: View changes
amateescu’s picture

Priority: Normal » Major
Status: Active » Needs review
Issue tags: +Workflow Initiative
FileSize
14.6 KB

The fact that a change on a content entity can also affect a config entity is a huge problem for the Workspace module, because we want to disallow any config changes at the API level when you are in a non-default workspace.

So I think we should simply remove this hierarchy property from vocabularies altogether and replace it with a very simple run-time query:

SELECT parent_target_id as parent_id, MAX(delta) AS max_delta FROM `taxonomy_term__parent` WHERE bundle = '<vid>' GROUP BY parent_target_id

Vocabulary::getHierarchy() changes into a simple static cache for the result of that query.

Without this fix, all the work we did in #2880149: Convert taxonomy terms to be revisionable is meaningless since saving a revisionable term won't be possible in a non-default workspace, so promoting to major.

Status: Needs review » Needs work

The last submitted patch, 4: 2957381.patch, failed testing. View results

amateescu’s picture

Status: Needs work » Needs review
FileSize
20.87 KB
6.27 KB

Missed a few spots.

jibran’s picture

Status: Needs review » Needs work
Issue tags: +Needs issue summary update, +Needs upgrade path tests, +Needs change record

Wow, #4 is quite simple and smart soultion. We should update the IS. We also need a change record. The patch looks great this is the only thing was able to find questionable:

+++ b/core/modules/taxonomy/taxonomy.post_update.php
@@ -0,0 +1,18 @@
+function taxonomy_post_update_remove_hierarchy_from_vocabularies(&$sandbox = NULL) {
...
diff --git a/core/modules/taxonomy/tests/src/Functional/Rest/VocabularyResourceTestBase.php b/core/modules/taxonomy/tests/src/Functional/Rest/VocabularyResourceTestBase.php

Do we have upgrade path tests?

amateescu’s picture

Issue summary: View changes
Status: Needs work » Needs review
Issue tags: -Needs issue summary update, -Needs upgrade path tests, -Needs change record
FileSize
22.5 KB
2.08 KB

Wrote upgrade path tests, a change record (https://www.drupal.org/node/2979986) and updated the IS.

jibran’s picture

Status: Needs review » Reviewed & tested by the community

Thanks, patch looks ready to me. Do we need an approval from subsystem maintainer?

amateescu’s picture

I don't think so, but if @catch is going to look at committing it, he's also a Taxonomy maintainer :)

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/modules/taxonomy/src/Entity/Vocabulary.php
    @@ -87,11 +86,11 @@ class Vocabulary extends ConfigEntityBundleBase implements VocabularyInterface {
        * Possible values:
        * - VocabularyInterface::HIERARCHY_DISABLED: No parents.
        * - VocabularyInterface::HIERARCHY_SINGLE: Single parent.
    -   * - VocabularyInterface::HIERARCHY_MULTIPL: Multiple parents.
    +   * - VocabularyInterface::HIERARCHY_MULTIPLE: Multiple parents.
        *
        * @var int
        */
    -  protected $hierarchy = VocabularyInterface::HIERARCHY_DISABLED;
    +  protected $hierarchy;
    

    Can we document that this is not a property from configuration but now a static cache. Also we've introduced a subtle bug. If you:

    1. load the vocab
    2. get the hierarchy
    3. then save a term in a way that would change the hierarchy
    4. load the vocab
    5. get the hierarchy

    It won't be updated but I think it would have been before.

  2. +++ b/core/modules/taxonomy/taxonomy.post_update.php
    @@ -0,0 +1,18 @@
    +  \Drupal::classResolver(ConfigEntityUpdater::class)->update($sandbox, 'taxonomy_vocabulary', function ($vocabulary) {
    +    $vocabulary->set('hierarchy', NULL);
    +    return TRUE;
    +  });
    

    Is the anonymous function actually necessary here? Won't just saving them remove the hierarchy because it is not in the export properties?

amateescu’s picture

Status: Needs work » Needs review
FileSize
24.34 KB
1.85 KB

Thanks for reviewing!

1. Very good point! You're right that the hierarchy would have been updated before. I added a static cache reset for that property in a way that should also work if you do those steps via a REST request, which was not handled before.

2. Nope, because the default callback provided by \Drupal\Core\Config\Entity\ConfigEntityUpdater::update() only checks if the dependencies have been changed, and returns FALSE in our case, so the entities are not saved. Here's a test run without the anonymous functions:

Testing Drupal\Tests\taxonomy\Functional\Update\TaxonomyVocabularyHierarchyUpdateTest

Schema key taxonomy.vocabulary.forums:hierarchy failed with: missing schema
 /home/andrei/work/d8.dev/core/tests/Drupal/Tests/SchemaCheckTestTrait.php:43
 /home/andrei/work/d8.dev/core/tests/Drupal/FunctionalTests/Update/UpdatePathTestBase.php:372
 /home/andrei/work/d8.dev/core/modules/taxonomy/tests/src/Functional/Update/TaxonomyVocabularyHierarchyUpdateTest.php:34
amateescu’s picture

Oops, the interdiff is correct but I forgot to stage the changes in git :/

The last submitted patch, 12: 2957381-12.patch, failed testing. View results

alexpott’s picture

  1. +++ b/core/modules/taxonomy/src/Entity/Vocabulary.php
    @@ -87,11 +86,11 @@ class Vocabulary extends ConfigEntityBundleBase implements VocabularyInterface {
    -  protected $hierarchy = VocabularyInterface::HIERARCHY_DISABLED;
    +  protected $hierarchy;
    

    I think we still some comment that this property is a static cache and not a config property. Mostly because this has been a config property in the past.

  2. +++ b/core/modules/taxonomy/src/TermStorage.php
    @@ -360,6 +361,39 @@ public function getNodeTerms(array $nids, array $vocabs = [], $langcode = NULL)
    +    $query = $this->database->select($table_mapping->getFieldTableName('parent'), 'p');
    +    $query->addField('p', $target_id_column, 'parent_id');
    +    $query->addExpression("MAX($delta_column)", 'max_delta');
    +    $query->condition('bundle', $vid);
    +    $query->groupBy('parent_id');
    +
    +    $result = $query->execute()->fetchAllKeyed();
    

    It'd be good to know how expensive this query is. We already have performance problems with the term overview page and this might add to them. On really large term tables I think this might be expensive.

Status: Needs review » Needs work

The last submitted patch, 13: 2957381-13.patch, failed testing. View results

amateescu’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
24.97 KB
7.01 KB
11.38 KB
11.99 KB
11.91 KB

Re #15

1. As discussed on slack, let's remove that variable from Vocabulary altogether and move it to the place which actually computes it, the term storage :)

2. Here's some performance number for that query, with various number of terms:

2649 terms:

Query result: 1307 total, Query took 0.0031 seconds.

7789 terms:

Query result: 3841 total, Query took 0.0214 seconds.

16294 terms:

Query result: 8089 total, Query took 0.0036 seconds.

At this point, I wasn't even able to load the term overview page with a 128M memory limit, so I think the performance of the query is fine :)

alexpott’s picture

@amateescu thanks - there's another issue to fix the ability to load the page :)

  1. +++ b/core/modules/taxonomy/src/Entity/Vocabulary.php
    @@ -104,14 +91,14 @@ class Vocabulary extends ConfigEntityBundleBase implements VocabularyInterface {
       public function setHierarchy($hierarchy) {
    -    $this->hierarchy = $hierarchy;
    +    $this->entityTypeManager()->getStorage('taxonomy_term')->setVocabularyHierarchyType($this->id(), $hierarchy);
         return $this;
       }
    

    I think we should deprecate this setter can just make it reset the static cache on the TermStorage - as the reality is this is not something you can set.

  2. +++ b/core/modules/taxonomy/src/TermStorage.php
    @@ -360,10 +375,57 @@ public function getNodeTerms(array $nids, array $vocabs = [], $langcode = NULL)
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function setVocabularyHierarchyType($vid, $hierarchy) {
    +    $this->vocabularyHierarchyType[$vid] = $hierarchy;
    +    return $this;
    +  }
    

    I don;t think we should introduce this as this is not something to set.

amateescu’s picture

Re #18, that's perfectly fine with me, I don't know why I thought about introducing a new setter for it :)

Status: Needs review » Needs work

The last submitted patch, 19: 2957381-19.patch, failed testing. View results

amateescu’s picture

Status: Needs work » Needs review
FileSize
25.04 KB

Here's the proper patch :)

jibran’s picture

Status: Needs review » Reviewed & tested by the community

@alexpott that was some amazing reviews. Patch looks really nice now and change notice is also up to date so back to RTBC.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
  1. +++ b/core/modules/taxonomy/src/Entity/Vocabulary.php
    @@ -104,14 +91,15 @@ class Vocabulary extends ConfigEntityBundleBase implements VocabularyInterface {
    -    return $this->hierarchy;
    +    return $this->entityTypeManager()->getStorage('taxonomy_term')->getVocabularyHierarchyType($this->id());
    

    Let's deprecate in favour of calling the term storage - and replace core's usages.

  2. +++ b/core/modules/taxonomy/src/TermStorage.php
    @@ -360,10 +375,49 @@ public function getNodeTerms(array $nids, array $vocabs = [], $langcode = NULL)
    +    // Return early if we already computed this value or if it was set from the
    +    // outside.
    

    There's no setting from outside anymore.

  3. +++ b/core/modules/taxonomy/src/TermStorage.php
    @@ -360,10 +375,49 @@ public function getNodeTerms(array $nids, array $vocabs = [], $langcode = NULL)
    -    unset($vars['treeChildren'], $vars['treeParents'], $vars['treeTerms'], $vars['trees']);
    +    unset($vars['ancestors'], $vars['treeChildren'], $vars['treeParents'], $vars['treeTerms'], $vars['trees'], $vars['vocabularyHierarchyType']);
    

    Nice catch. Extra bug fixes here too .... hmmm.

  4. +++ b/core/modules/taxonomy/taxonomy.module
    @@ -172,33 +171,15 @@ function taxonomy_theme() {
    + * @deprecated in Drupal 8.6.x. Will be removed before Drupal 9.0.0. Use
    + *   \Drupal\taxonomy\VocabularyInterface::getHierarchy() instead.
    ...
    +  @trigger_error('taxonomy_check_vocabulary_hierarchy() is deprecated in Drupal 8.6.x and will be removed before Drupal 9.0.x. Use \Drupal\taxonomy\VocabularyInterface::getHierarchy() instead.', E_USER_DEPRECATED);
    +  return $vocabulary->getHierarchy();
    

    Should we be telling people to get it from the Vocabulary or the term storage direct? I think term storage.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

amateescu’s picture

Status: Needs work » Needs review
FileSize
29.33 KB
10.52 KB

Fixed all of #23, rerolled to 8.7.x and opened #2994830: TermStorage::__sleep() does not unset the 'ancestors' variable for #23.3.

Wim Leers’s picture

Status: Needs review » Reviewed & tested by the community

#25 looks great!

The last submitted patch, 21: 2957381-21.patch, failed testing. View results

catch’s picture

Approach looks really good here, do we need to run EXPLAIN on the query with a decent sized database though?

catch’s picture

Status: Reviewed & tested by the community » Needs review

Here's the query

+++ b/core/modules/taxonomy/src/TermStorage.php
@@ -360,10 +375,48 @@ public function getNodeTerms(array $nids, array $vocabs = [], $langcode = NULL)
+
+    $query = $this->database->select($table_mapping->getFieldTableName('parent'), 'p');
+    $query->addField('p', $target_id_column, 'parent_id');
+    $query->addExpression("MAX($delta_column)", 'max_delta');
+    $query->condition('bundle', $vid);
+    $query->groupBy('parent_id');
+

And the current indexes:

MariaDB [d8]> SHOW INDEXES FROM taxonomy_term__parent;
+-----------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                 | Non_unique | Key_name         | Seq_in_index | Column_name      | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| taxonomy_term__parent |          0 | PRIMARY          |            1 | entity_id        | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| taxonomy_term__parent |          0 | PRIMARY          |            2 | deleted          | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| taxonomy_term__parent |          0 | PRIMARY          |            3 | delta            | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| taxonomy_term__parent |          0 | PRIMARY          |            4 | langcode         | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| taxonomy_term__parent |          1 | bundle           |            1 | bundle           | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| taxonomy_term__parent |          1 | revision_id      |            1 | revision_id      | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| taxonomy_term__parent |          1 | parent_target_id |            1 | parent_target_id | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Do we need something on bundle, delta? Maybe replace the current index that's just on 'bundle'?

amateescu’s picture

@catch, see #17 for some performance numbers. It would be interesting to see if there are any improvements with a [bundle, delta] index, but the result there with 16294 terms seems to indicate that it's not really needed.. I think.

catch’s picture

hmm the 'using temporary, using filesort' and that it's looking at 8k out of 16k rows from the EXPLAIN suggests to me that it does need the improved index (assuming that fixes it).

GaëlG’s picture

FileSize
29.49 KB

Here's a backport of patch #25 for Drupal 8.6 (applies at least on 8.6.0 and 8.6.1), to be referenced in a composer.json file.

amateescu’s picture

I tried adding an index on bundle, delta but it didn't change anything, so instead I looked a bit into the reason why that query was using temporary and filesort, and apparently it's caused by the GROUP BY clause. After inspecting our requirements more carefully I realized that we don't really need to select all parent IDs, we just need to know if we have any parent_id > 0, so here's the simplified query that we can use:

SELECT MAX(parent_target_id) AS max_parent_id, MAX(delta) AS max_delta
FROM
{taxonomy_term__parent} p
WHERE bundle = <vid>

And with an index on bundle, delta, parent_target_id the EXPLAIN looks like this:

+----+-------------+-----------------------+------------+------+------------------------+------------------------+---------+-------+------+----------+-------------+
| id | select_type | table                 | partitions | type | possible_keys          | key                    | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------------+------------+------+------------------------+------------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | taxonomy_term__parent | NULL       | ref  | bundle_delta_target_id | bundle_delta_target_id | 130     | const |    2 |   100.00 | Using index |
+----+-------------+-----------------------+------------+------+------------------------+------------------------+---------+-------+------+----------+-------------+
amateescu’s picture

Status: Needs review » Reviewed & tested by the community

The performance problems with the query have been addressed, so I think the patch is ready for another review from core maintainers :)

catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed 70e4010 and pushed to 8.7.x. Thanks!

  • catch committed 70e4010 on 8.7.x
    Issue #2957381 by amateescu, GaëlG, alexpott, douggreen, jibran, catch:...
Wim Leers’s picture

I just want to say thank you very much, @amateescu for simplifying one bit of Drupal, which will result in one bit of complexity less in Drupal 9! 🙏

Manuel Garcia’s picture

Awesome work here guys, so glad to see this in!

Should we publish the CR or does it need any final touches?

amateescu’s picture

Thank you both for the kind words :) I published the CR now.

Wim Leers’s picture

Issue tags: +API-First Initiative
+++ b/core/modules/taxonomy/tests/src/Functional/Rest/VocabularyResourceTestBase.php
@@ -54,7 +54,6 @@ protected function getExpectedNormalizedEntity() {
-      'hierarchy' => 0,

This proves the REST API was impacted. In principle, this was a BC break.

This also broke JSON:API: https://www.drupal.org/pift-ci-job/1122706

Wim Leers’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

amateescu’s picture