'indexes' => array(
'node_changed' => array('changed'),
'node_created' => array('created'),
'node_default_langcode' => array('default_langcode'),
'node_langcode' => array('langcode'),
'node_frontpage' => array('promote', 'status', 'sticky', 'created'),
'node_status_type' => array('status', 'type', 'nid'),
'node_title_type' => array('title', array('type', 4)),
'node_type' => array(array('type', 4)),
'nid' => array('nid'),
'vid' => array('vid'),
'uid' => array('uid'),
),
That's too much. Indexes have a cost, inserts and updates get slower. Some of them are useless, like the separate index on nid, indexes also always work on parts of them. Others are unnecessary or inefficent or can be combined.
Comments
Comment #1
soul88To understand what indexes do we need it seems a good idea to collect the most common use-cases of queries on nodes. After that we might think what can be changed.
'node_changed' => array('changed'),
Is used for sorting nodes that were changed recently.
'node_created' => array('created'),
Same as 'node_changed'.
'node_default_langcode' => array('default_langcode'),
It seems that this one will hardly ever be used as it's boolean. And fullscan of the table is faster than using index.
'node_langcode' => array('langcode'),
Is needed for translation.
Idea - can we move it to the language module, so it appears only on multilingual sites?
'node_frontpage' => array('promote', 'status', 'sticky', 'created'),
Should be used for sorting purposes to display nodes on front page. But I can hardly imagine we do really need all these fields in the index.
'node_status_type' => array('status', 'type', 'nid'),
//What is it used for?
'node_title_type' => array('title', array('type', 4)),
Is it used for sorting by title?
'node_type' => array(array('type', 4)),
Is used for filtering by node type.
'nid' => array('nid'),
Won't ever be used, as it duplicated the primary.
'vid' => array('vid'),
Is used to get the revision of the node.
'uid' => array('uid'),
Is used to get the user's content for example.
And one more important note: all these indexes we have only on the node_field_data table. Other field_data_field_* look like this:
So am I missing something?
Comment #2
plachThis is going to conflict with #1998366: [meta] SQLite is broken. Can we postpone it?
Comment #3
soul88And I'd also like to leave one idea here, that might look crazy and to listen to your reply.
The idea is the following: let's add autoinc ID field for all the field_* tables and change
PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`langcode`), to UNIQUE key.
Such approach won't involve any major changes in code but will dramatically reduce the size of indexes. Cause for now in case of InnoDB tables we automatically do add ALL the fields from primary key to every index we create. And as we have a few data-fields in each field table (5 fields are primary key and 1-2 fields - do store the data) it might appear that our indexes are several times bigger than the actual data we have.
//in case of my example all the field from PK are stored 7 more times in different indexes.
Comment #4
andypost@Soul88 Introduction of PK will add another index, so makes no sense
About issue - to properly optimize indexes we need to investigate their usage.
So most of core queries should be examined (direct queries - not a lot) and default views' queries
Comment #5
soul88@andypost my idea is based on the statement that default engine for Drupal is MySQL - InnoDb and that it stores all the fields of a PK in every index.
So the math is the following:
autoinc ID: 4-8B per row per index.
current implementation:
`entity_type`: 6-128B
`entity_id`: 4-8B
`deleted`: 1B
`delta`: 4-8B
`langcode`: 4-32B
total to 19-177B per index per row.
It's a size of a single value of PK.
For now we have 7 indexes. So PK in other indexes will weigh 28-56B in case of autoinc INT field and 133-1239B per row in case of current implementation.
As a result in the worst case we'll have 1KB of useless info in the index for each row of each field_* table. Which is pretty a lot, I think. And additionally to that we've got revisions...
That's why I'm saying, that if we add some CHAR(1) field in Drupal - we might have 1KB of useless info for each 1B we actually use. Which is a 1:1000 ratio.
Comment #6
panchoRe #2: Yes, let's postpone this on #1998366: [meta] SQLite is broken. At this point this optimization really makes no sense, plus it would mean another reroll for that other critical issue. Let's see what can be further improved after the dust has settled.
Comment #7
pancho#3 seems to make sense, though.
If we refocus this issue on fields schema, then we might unpostpone it again.
However, we should also keep an eye on node schema, so fields tables might be a spin-off of this issue?
Either way, tagging this as performance-relevant.
Comment #8
mustanggb commentedBlocking issue is closed, re-opening this.
Comment #9
catchCan happen any time no?
Comment #10
soul88Here are some additional numbers:
I've got a table with 755454 records in it - a field from a D7 project (field_data_field_target_nodes).
the structure:
stats:
data size: 65.7 MB
index size: 174.2 MB
What I did:
- created a new table with the same schema
- copied the data into it
- made unique index out of primary
- added primary autoinc index
Stats:
data size: 68.6 MB
index size: 122.4 MB
So I think the outcome is worth trying, isn't it?
//I'll try to prepare a patch a bit later.
I'm also adding the "revisit before beta" tag for now. But feel free to remove it if you think that this is not gonna happen in D8.
Comment #11
andypostSuppose this tag make more sense, and this could happen anytime but better before beta
Comment #12
xjmThis issue was marked as a beta target for the 8.0.x beta, but is not applicable as an 8.1.x beta target, so untagging.
Comment #20
mxr576And the revision table(s) has different constraints and indexes, is it intentional?
Comment #25
larowlanThis came up as a triage target for Bug Smash Initiative
I think it makes sense to reclassify this as a major task. Feel free to reply with reasons for why it should remain a bug
It's probably worth doing some git archaeology to ascertain when and why each of these were introduced.
Comment #26
catchComment #30
acbramley commentedWe need a title and IS update here about exactly what we're proposing to change.
Comment #31
smustgrave commentedWanted to bump 1 more time.
Comment #32
catchThis still needs to be done I think but it's not easy.
We'd need to check all the queries that get run on these tables, then check which of the existing indexes they use if any. There might be another issue around this.
Comment #33
smustgrave commentedWould it be easier to remove 1-2 at a time?
Comment #34
catchWe can probably only remove one or two, the problem is knowing which ones are safe to remove.
Comment #35
smustgrave commentedThere any BC policy over indexes to worry about?
Comment #36
catchNo, it's database schema, and less disruptive than another schema change would be.