'indexes' => array(
      'node_changed' => array('changed'),
      'node_created' => array('created'),
      'node_default_langcode' => array('default_langcode'),
      'node_langcode' => array('langcode'),
      'node_frontpage' => array('promote', 'status', 'sticky', 'created'),
      'node_status_type' => array('status', 'type', 'nid'),
      'node_title_type' => array('title', array('type', 4)),
      'node_type' => array(array('type', 4)),
      'nid' => array('nid'),
      'vid' => array('vid'),
      'uid' => array('uid'),
    ),

That's too much. Indexes have a cost, inserts and updates get slower. Some of them are useless, like the separate index on nid, indexes also always work on parts of them. Others are unnecessary or inefficent or can be combined.

CommentFileSizeAuthor
#20 image (6).png155.24 KBmxr576

Comments

soul88’s picture

To understand what indexes do we need it seems a good idea to collect the most common use-cases of queries on nodes. After that we might think what can be changed.

'node_changed' => array('changed'),
Is used for sorting nodes that were changed recently.

'node_created' => array('created'),
Same as 'node_changed'.

'node_default_langcode' => array('default_langcode'),

'default_langcode' => array(
'description' => 'Boolean indicating whether the property values are in the {language}.langcode of this node.',

It seems that this one will hardly ever be used as it's boolean. And fullscan of the table is faster than using index.

'node_langcode' => array('langcode'),
Is needed for translation.
Idea - can we move it to the language module, so it appears only on multilingual sites?

'node_frontpage' => array('promote', 'status', 'sticky', 'created'),
Should be used for sorting purposes to display nodes on front page. But I can hardly imagine we do really need all these fields in the index.

'node_status_type' => array('status', 'type', 'nid'),
//What is it used for?

'node_title_type' => array('title', array('type', 4)),
Is it used for sorting by title?

'node_type' => array(array('type', 4)),
Is used for filtering by node type.

'nid' => array('nid'),
Won't ever be used, as it duplicated the primary.

'vid' => array('vid'),
Is used to get the revision of the node.

'uid' => array('uid'),
Is used to get the user's content for example.

And one more important note: all these indexes we have only on the node_field_data table. Other field_data_field_* look like this:

CREATE TABLE `field_data_field_testfield` (
  `entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
  `bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
  `langcode` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language code for this data item.',
  `delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `field_testfield_value` varchar(255) DEFAULT NULL,
  `field_testfield_format` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`langcode`),
  KEY `entity_type` (`entity_type`),
  KEY `bundle` (`bundle`),
  KEY `deleted` (`deleted`),
  KEY `entity_id` (`entity_id`),
  KEY `revision_id` (`revision_id`),
  KEY `langcode` (`langcode`),
  KEY `field_testfield_format` (`field_testfield_format`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Data storage for field badcfcb6-7f45-41a1-a4b1-0203e8f00b78 '

So am I missing something?

plach’s picture

This is going to conflict with #1998366: [meta] SQLite is broken. Can we postpone it?

soul88’s picture

And I'd also like to leave one idea here, that might look crazy and to listen to your reply.

The idea is the following: let's add autoinc ID field for all the field_* tables and change
PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`langcode`), to UNIQUE key.

Such approach won't involve any major changes in code but will dramatically reduce the size of indexes. Cause for now in case of InnoDB tables we automatically do add ALL the fields from primary key to every index we create. And as we have a few data-fields in each field table (5 fields are primary key and 1-2 fields - do store the data) it might appear that our indexes are several times bigger than the actual data we have.

//in case of my example all the field from PK are stored 7 more times in different indexes.

andypost’s picture

@Soul88 Introduction of PK will add another index, so makes no sense

About issue - to properly optimize indexes we need to investigate their usage.
So most of core queries should be examined (direct queries - not a lot) and default views' queries

soul88’s picture

@andypost my idea is based on the statement that default engine for Drupal is MySQL - InnoDb and that it stores all the fields of a PK in every index.

So the math is the following:
autoinc ID: 4-8B per row per index.

current implementation:
`entity_type`: 6-128B
`entity_id`: 4-8B
`deleted`: 1B
`delta`: 4-8B
`langcode`: 4-32B
total to 19-177B per index per row.

It's a size of a single value of PK.

For now we have 7 indexes. So PK in other indexes will weigh 28-56B in case of autoinc INT field and 133-1239B per row in case of current implementation.

As a result in the worst case we'll have 1KB of useless info in the index for each row of each field_* table. Which is pretty a lot, I think. And additionally to that we've got revisions...

That's why I'm saying, that if we add some CHAR(1) field in Drupal - we might have 1KB of useless info for each 1B we actually use. Which is a 1:1000 ratio.

pancho’s picture

Status: Active » Postponed
Issue tags: +revisit before beta

Re #2: Yes, let's postpone this on #1998366: [meta] SQLite is broken. At this point this optimization really makes no sense, plus it would mean another reroll for that other critical issue. Let's see what can be further improved after the dust has settled.

pancho’s picture

Issue tags: +Performance

#3 seems to make sense, though.
If we refocus this issue on fields schema, then we might unpostpone it again.
However, we should also keep an eye on node schema, so fields tables might be a spin-off of this issue?

Either way, tagging this as performance-relevant.

mustanggb’s picture

Issue summary: View changes
Status: Postponed » Active

Blocking issue is closed, re-opening this.

catch’s picture

Issue tags: -revisit before beta

Can happen any time no?

soul88’s picture

Issue tags: +revisit before beta

Here are some additional numbers:

I've got a table with 755454 records in it - a field from a D7 project (field_data_field_target_nodes).

the structure:

CREATE TABLE `field_data_field_target_nodes` (
  `entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
  `bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
  `language` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language for this data item.',
  `delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `field_target_nodes_target_id` int(10) unsigned NOT NULL COMMENT 'The id of the target entity.',
  PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
  KEY `entity_type` (`entity_type`),
  KEY `bundle` (`bundle`),
  KEY `deleted` (`deleted`),
  KEY `entity_id` (`entity_id`),
  KEY `revision_id` (`revision_id`),
  KEY `language` (`language`),
  KEY `field_target_nodes_target_id` (`field_target_nodes_target_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Data storage for field 6 (field_target_nodes)'

stats:
data size: 65.7 MB
index size: 174.2 MB

What I did:
- created a new table with the same schema
- copied the data into it
- made unique index out of primary
- added primary autoinc index

CREATE TABLE `atmp_benchmark` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
  `bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
  `language` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language for this data item.',
  `delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `field_target_nodes_target_id` int(10) unsigned NOT NULL COMMENT 'The id of the target entity.',
  PRIMARY KEY (`id`),
  UNIQUE KEY `EX_PRIMARY` (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
  KEY `entity_type` (`entity_type`),
  KEY `bundle` (`bundle`),
  KEY `deleted` (`deleted`),
  KEY `entity_id` (`entity_id`),
  KEY `revision_id` (`revision_id`),
  KEY `language` (`language`),
  KEY `field_target_nodes_target_id` (`field_target_nodes_target_id`)
) ENGINE=InnoDB AUTO_INCREMENT=756820 DEFAULT CHARSET=utf8 COMMENT='Data storage for field 6 (field_target_nodes)'

Stats:
data size: 68.6 MB
index size: 122.4 MB

So I think the outcome is worth trying, isn't it?

//I'll try to prepare a patch a bit later.

I'm also adding the "revisit before beta" tag for now. But feel free to remove it if you think that this is not gonna happen in D8.

andypost’s picture

Suppose this tag make more sense, and this could happen anytime but better before beta

xjm’s picture

Version: 8.0.x-dev » 8.2.x-dev
Issue tags: -beta target

This issue was marked as a beta target for the 8.0.x beta, but is not applicable as an 8.1.x beta target, so untagging.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

mxr576’s picture

StatusFileSize
new155.24 KB

And the revision table(s) has different constraints and indexes, is it intentional?

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

larowlan’s picture

Title: Too many indexes on the node_field_* tables » Reduce the number of indexes on the node_field_* tables
Category: Bug report » Task
Priority: Normal » Major
Issue tags: +Bug Smash Initiative

This came up as a triage target for Bug Smash Initiative

I think it makes sense to reclassify this as a major task. Feel free to reply with reasons for why it should remain a bug

It's probably worth doing some git archaeology to ascertain when and why each of these were introduced.

catch’s picture

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

acbramley’s picture

Status: Active » Postponed (maintainer needs more info)
Issue tags: +Needs title update

We need a title and IS update here about exactly what we're proposing to change.

smustgrave’s picture

Wanted to bump 1 more time.

catch’s picture

This still needs to be done I think but it's not easy.

We'd need to check all the queries that get run on these tables, then check which of the existing indexes they use if any. There might be another issue around this.

smustgrave’s picture

Would it be easier to remove 1-2 at a time?

catch’s picture

We can probably only remove one or two, the problem is knowing which ones are safe to remove.

smustgrave’s picture

There any BC policy over indexes to worry about?

catch’s picture

No, it's database schema, and less disruptive than another schema change would be.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.