Reduce the number of indexes on the node_field_* tables [#2015277]

Comment	File	Size	Author
#20	image (6).png	155.24 KB	mxr576

Comment #1

Ukrainian

Ukraine

commented 10 June 2013 at 08:33

To understand what indexes do we need it seems a good idea to collect the most common use-cases of queries on nodes. After that we might think what can be changed.

'node_changed' => array('changed'),
Is used for sorting nodes that were changed recently.

'node_created' => array('created'),
Same as 'node_changed'.

'node_default_langcode' => array('default_langcode'),

'default_langcode' => array(
'description' => 'Boolean indicating whether the property values are in the {language}.langcode of this node.',

It seems that this one will hardly ever be used as it's boolean. And fullscan of the table is faster than using index.

'node_langcode' => array('langcode'),
Is needed for translation.
Idea - can we move it to the language module, so it appears only on multilingual sites?

'node_frontpage' => array('promote', 'status', 'sticky', 'created'),
Should be used for sorting purposes to display nodes on front page. But I can hardly imagine we do really need all these fields in the index.

'node_status_type' => array('status', 'type', 'nid'),
//What is it used for?

'node_title_type' => array('title', array('type', 4)),
Is it used for sorting by title?

'node_type' => array(array('type', 4)),
Is used for filtering by node type.

'nid' => array('nid'),
Won't ever be used, as it duplicated the primary.

'vid' => array('vid'),
Is used to get the revision of the node.

'uid' => array('uid'),
Is used to get the user's content for example.

And one more important note: all these indexes we have only on the node_field_data table. Other field_data_field_* look like this:

CREATE TABLE `field_data_field_testfield` (
  `entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
  `bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
  `langcode` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language code for this data item.',
  `delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `field_testfield_value` varchar(255) DEFAULT NULL,
  `field_testfield_format` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`langcode`),
  KEY `entity_type` (`entity_type`),
  KEY `bundle` (`bundle`),
  KEY `deleted` (`deleted`),
  KEY `entity_id` (`entity_id`),
  KEY `revision_id` (`revision_id`),
  KEY `langcode` (`langcode`),
  KEY `field_testfield_format` (`field_testfield_format`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Data storage for field badcfcb6-7f45-41a1-a4b1-0203e8f00b78 '

So am I missing something?

Log in or register to post comments

Comment #2

plach

he/him

Italian

Venezia

commented 10 June 2013 at 08:47

This is going to conflict with #1998366: [meta] SQLite is broken. Can we postpone it?

Log in or register to post comments

Comment #3

soul88

Ukrainian

Ukraine

commented 10 June 2013 at 08:47

And I'd also like to leave one idea here, that might look crazy and to listen to your reply.

The idea is the following: let's add autoinc ID field for all the field_* tables and change
PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`langcode`), to UNIQUE key.

Such approach won't involve any major changes in code but will dramatically reduce the size of indexes. Cause for now in case of InnoDB tables we automatically do add ALL the fields from primary key to every index we create. And as we have a few data-fields in each field table (5 fields are primary key and 1-2 fields - do store the data) it might appear that our indexes are several times bigger than the actual data we have.

//in case of my example all the field from PK are stored 7 more times in different indexes.

Log in or register to post comments

Comment #4

andypost

he/him

Russian

commented 10 June 2013 at 09:02

@Soul88 Introduction of PK will add another index, so makes no sense

About issue - to properly optimize indexes we need to investigate their usage.
So most of core queries should be examined (direct queries - not a lot) and default views' queries

Log in or register to post comments

Comment #5

soul88

Ukrainian

Ukraine

commented 10 June 2013 at 09:49

@andypost my idea is based on the statement that default engine for Drupal is MySQL - InnoDb and that it stores all the fields of a PK in every index.

So the math is the following:
autoinc ID: 4-8B per row per index.

current implementation:
`entity_type`: 6-128B
`entity_id`: 4-8B
`deleted`: 1B
`delta`: 4-8B
`langcode`: 4-32B
total to 19-177B per index per row.

It's a size of a single value of PK.

For now we have 7 indexes. So PK in other indexes will weigh 28-56B in case of autoinc INT field and 133-1239B per row in case of current implementation.

As a result in the worst case we'll have 1KB of useless info in the index for each row of each field_* table. Which is pretty a lot, I think. And additionally to that we've got revisions...

That's why I'm saying, that if we add some CHAR(1) field in Drupal - we might have 1KB of useless info for each 1B we actually use. Which is a 1:1000 ratio.

Log in or register to post comments

Comment #6

pancho

UTC+2 🇪🇺 EU

commented 9 July 2013 at 11:29

Status:	Active	» Postponed
Issue tags:		+revisit before beta

Re #2: Yes, let's postpone this on #1998366: [meta] SQLite is broken. At this point this optimization really makes no sense, plus it would mean another reroll for that other critical issue. Let's see what can be further improved after the dust has settled.

Log in or register to post comments

Comment #7

pancho

UTC+2 🇪🇺 EU

commented 9 July 2013 at 11:41

Issue tags:

+Performance

#3 seems to make sense, though.
If we refocus this issue on fields schema, then we might unpostpone it again.
However, we should also keep an eye on node schema, so fields tables might be a spin-off of this issue?

Either way, tagging this as performance-relevant.

Log in or register to post comments

Comment #8

mustanggb commented 31 December 2013 at 00:25

Issue summary:	View changes
Status:	Postponed	» Active

Blocking issue is closed, re-opening this.

Log in or register to post comments

Comment #9

catch

he/him

English

commented 25 February 2014 at 15:03

Issue tags:

-revisit before beta

Can happen any time no?

Log in or register to post comments

Comment #10

soul88

Ukrainian

Ukraine

commented 28 February 2014 at 19:50

Issue tags:

+revisit before beta

Here are some additional numbers:

I've got a table with 755454 records in it - a field from a D7 project (field_data_field_target_nodes).

the structure:

CREATE TABLE `field_data_field_target_nodes` (
  `entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
  `bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
  `language` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language for this data item.',
  `delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `field_target_nodes_target_id` int(10) unsigned NOT NULL COMMENT 'The id of the target entity.',
  PRIMARY KEY (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
  KEY `entity_type` (`entity_type`),
  KEY `bundle` (`bundle`),
  KEY `deleted` (`deleted`),
  KEY `entity_id` (`entity_id`),
  KEY `revision_id` (`revision_id`),
  KEY `language` (`language`),
  KEY `field_target_nodes_target_id` (`field_target_nodes_target_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Data storage for field 6 (field_target_nodes)'

stats:
data size: 65.7 MB
index size: 174.2 MB

What I did:
- created a new table with the same schema
- copied the data into it
- made unique index out of primary
- added primary autoinc index

CREATE TABLE `atmp_benchmark` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `entity_type` varchar(128) NOT NULL DEFAULT '' COMMENT 'The entity type this data is attached to',
  `bundle` varchar(128) NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int(10) unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int(10) unsigned DEFAULT NULL COMMENT 'The entity revision id this data is attached to, or NULL if the entity type is not versioned',
  `language` varchar(32) NOT NULL DEFAULT '' COMMENT 'The language for this data item.',
  `delta` int(10) unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `field_target_nodes_target_id` int(10) unsigned NOT NULL COMMENT 'The id of the target entity.',
  PRIMARY KEY (`id`),
  UNIQUE KEY `EX_PRIMARY` (`entity_type`,`entity_id`,`deleted`,`delta`,`language`),
  KEY `entity_type` (`entity_type`),
  KEY `bundle` (`bundle`),
  KEY `deleted` (`deleted`),
  KEY `entity_id` (`entity_id`),
  KEY `revision_id` (`revision_id`),
  KEY `language` (`language`),
  KEY `field_target_nodes_target_id` (`field_target_nodes_target_id`)
) ENGINE=InnoDB AUTO_INCREMENT=756820 DEFAULT CHARSET=utf8 COMMENT='Data storage for field 6 (field_target_nodes)'

Stats:
data size: 68.6 MB
index size: 122.4 MB

So I think the outcome is worth trying, isn't it?

//I'll try to prepare a patch a bit later.

I'm also adding the "revisit before beta" tag for now. But feel free to remove it if you think that this is not gonna happen in D8.

Log in or register to post comments

Comment #11

andypost

he/him

Russian

commented 28 February 2014 at 23:27

Issue tags:

-revisit before beta

+beta target

Suppose this tag make more sense, and this could happen anytime but better before beta

Log in or register to post comments

Comment #12

xjm

she/her

English

commented 13 March 2016 at 20:20

Version:	8.0.x-dev	» 8.2.x-dev
Issue tags:	-beta target

This issue was marked as a beta target for the 8.0.x beta, but is not applicable as an 8.1.x beta target, so untagging.

Log in or register to post comments

Comment #13

13 March 2016 at 20:20

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #14

13 March 2016 at 20:20

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #15

13 March 2016 at 20:20

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #16

13 March 2016 at 20:20

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #17

13 March 2016 at 20:20

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #18

13 March 2016 at 20:20

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #19

13 March 2016 at 20:20

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #20

mxr576

Hungarian

Hungary

commented 24 April 2020 at 08:30

Status	File	Size
new	image (6).png	155.24 KB

And the revision table(s) has different constraints and indexes, is it intentional?

Log in or register to post comments

Comment #21

24 April 2020 at 08:30

Version:

8.9.x-dev

» 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #22

24 April 2020 at 08:30

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Log in or register to post comments

Comment #23

24 April 2020 at 08:30

Version:

9.2.x-dev

» 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #24

24 April 2020 at 08:30

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #25

larowlan

🇦🇺🏝.au GMT+10

commented 11 April 2022 at 00:08

Title:	Too many indexes on the node_field_* tables	» Reduce the number of indexes on the node_field_* tables
Category:	Bug report	» Task
Priority:	Normal	» Major
Issue tags:		+Bug Smash Initiative

This came up as a triage target for Bug Smash Initiative

I think it makes sense to reclassify this as a major task. Feel free to reply with reasons for why it should remain a bug

It's probably worth doing some git archaeology to ascertain when and why each of these were introduced.

Log in or register to post comments

Comment #26

catch

he/him

English

commented 11 April 2022 at 08:17

Issue tags:

+Needs issue summary update

Log in or register to post comments

Comment #27

11 April 2022 at 08:17

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #28

11 April 2022 at 08:17

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #29

11 April 2022 at 08:17

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #30

acbramley commented 20 March 2025 at 03:43

Status:	Active	» Postponed (maintainer needs more info)
Issue tags:		+Needs title update

We need a title and IS update here about exactly what we're proposing to change.

Log in or register to post comments

Comment #31

smustgrave commented 2 September 2025 at 14:48

Wanted to bump 1 more time.

Log in or register to post comments

Comment #32

catch

he/him

English

commented 2 September 2025 at 15:51

This still needs to be done I think but it's not easy.

We'd need to check all the queries that get run on these tables, then check which of the existing indexes they use if any. There might be another issue around this.

Log in or register to post comments

Comment #33

smustgrave commented 2 September 2025 at 16:03

Would it be easier to remove 1-2 at a time?

Log in or register to post comments

Comment #34

catch

he/him

English

commented 2 September 2025 at 16:08

We can probably only remove one or two, the problem is knowing which ones are safe to remove.

Log in or register to post comments

Comment #35

smustgrave commented 2 September 2025 at 16:18

There any BC policy over indexes to worry about?

Log in or register to post comments

Comment #36

catch

he/him

English

commented 2 September 2025 at 16:36

No, it's database schema, and less disruptive than another schema change would be.

Log in or register to post comments

Comment #37

2 September 2025 at 16:36

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Reduce the number of indexes on the node_field_* tables

Comments