Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
From the discussion in #2447555: Unnecessary index on langcode and deleted column in dedicated field tables. (comments #3 and #4 over there), we miss two composite indexes on dedicated field tables :
- entity_id, deleted, langcode : for Views joins
- revision_id, deleted, delta, langcode: for SqlContentEntityStorage::loadFromDedicatedTables() in case of "load an older / not current revision"
Comments
Comment #1
yched CreditAttribution: yched commentedAs a side note, we should really add comments in the code saying why each index is present (like "used by the query in Xxx::foo()"), this would save us having to dig in the code for who does which query each time we wonder if our indexes are right :-)
Comment #2
yched CreditAttribution: yched commentedIf our views joins are not indexed as they should, that probably qualifies at least as major :-)
Comment #3
yched CreditAttribution: yched commentedRegarding "add index on entity_id, deleted, langcode : for Views joins" :
We did a couple tests with @Berdir and @fgm :
- A typical Views joins looks like (here, a view with a filter on body field):
This currently use the primary key ("entity_id, deleted, delta, langcode"). It has no use for the "delta" part, so it's mostly using the first two parts of the index.
Putting delta last in the primary key could be slightly better for this query, but the overall query perf is likely to be determined by the actual condition on the field value (and most notably whether the condition is on an indexed column or not - here, node__body.body_value is not indexed)
- The primary key is otherwise mostly used to index the "main entity load request" in SqlContentEntityStorage::loadFromDedicatedTables() - typically :
The primary key is in that order (delta, langcode, rather than langcode, delta) for no good reason. It seems D7's #367595: Translatable fields simply pushed 'langcode' at the end of the existing primary key without caring too much...
From our tests, switching the order in the primary key doesn't make much difference, typically because selecting a bunch of entity_ids reduced the number of rows drastically enough that additionally using the index for the low-cardinality delta and langcode columns is not that critical.
Comment #4
yched CreditAttribution: yched commentedBased on the above, our proposal would be : don't add a new index for Views, but just reorder the primary key to "entity_id, deleted, langcode, delta" :
- it makes no difference for the "entity load" case
- it is slightly better for Views joins (at least theoretically, in practice probably a wash)
- it is more consistent with our runtime data model (entity, field, langcode, delta)
Comment #5
yched CreditAttribution: yched commentedThen, about the second index mentioned in the IS ("revision_id, deleted, delta, langcode", for in case of "load an older / not current entity revision") :
The query in that case is :
- the index is only needed on the revision table
- by the same reasoning, and for consistency with the above, it could be "revision_id, deleted, langcode, delta" (instead of the proposed "... delta, langcode")
The primary key on the revision table is "*entity_id*, revision_id, deleted, delta, langcode" - and I don't think we can actually add a condition on entity_ids in the query above, so the primary key doesn't help here.
So we would need to add a new index on "revision_id, deleted, langcode, delta" in addition to the current primary key.
But : what if we removed entity_id from the primary key in the revision table ? Then we could use that primary key for the "load old entity revision" query, just like we use the primary key of the "current revision table" for the "load current entity revision" query...
Looking at what is done of shared / base fields table
node_field_data has a primary key of "nid, langcode"
node_field_revision has a primary key of "vid, langcode"
so that would be consistent ? :-)
Comment #14
pameeela CreditAttribution: pameeela commented@yched do you think this issue is still relevant, and if so, still major? If so it would be great to get an issue summary update with the current state and perhaps a proposal for how to address it? Asking a lot, I know but this was currently the last major in the whole queue as far as latest update!
Comment #15
Berdir@yched isn't active anymore, but he outlined the results of our tests and discussions very well, so I think anyone could try to update the issue summary based on that.
Implementing it, specifically an update path might be a challenge though.
Also, for the revision tables, we should also look at the performance of saving an entity, with a delete and then insert.
Comment #16
pameeela CreditAttribution: pameeela commentedComment #17
johnwebdev CreditAttribution: johnwebdev commentedGiven the outline in #4
Given the improvement just seems to be slightly better theoretically, I don't think this issues needs to be major anymore, feel free to disagree and promote it back.
Comment #18
pameeela CreditAttribution: pameeela commented