Remove field ID from field table names [#569240]

When we added the ability to clean up deleted fields in #367753: Bulk deletion, we made a very strange change to the name of the field tables. Previously our tables had names such as field_data_body, but after adding the deletion ability we started naming our tables such as field_data_body_1, where "_1" is the field ID. We have these table names so that a deleted field table name does not conflict with a new field that is given the same name while the table's data is being deleted.

This patch makes it so that we rename the table upon deletion, rather than having the field ID tacked on the to the table name at all times. To test the efficiency of this, I created a table with 1 million rows, containing 1.5 GB of data (when exported as a compressed MySQL dump). After adding this data I tested a straight-up RENAME operation:

With 1,000,000 rows of data:

mysql> RENAME TABLE field_data_field_test_8 TO field_data_field_test;
Query OK, 0 rows affected (0.08 sec)

With 1 row of data:

mysql> RENAME TABLE field_data_field_test_8 TO field_data_field_test;
Query OK, 0 rows affected (0.01 sec)

So it does seem like there might be some kind of linear increase in processing time, but at the same time, if we can rename a table in 0.08 seconds when it contains a million rows, I don't think this is a performance problem of concern.

This patch makes it so that tables are named field_data_body while in use and field_data_body_deleted_1 while being deleted. The end result is table names that make much more sense both when in use and while being deleted.

Comment	File	Size	Author
#4	field_sql_storage_deleted_tables.patch	2.02 KB	quicksketch
#4
	field_sql_storage_deleted_tables.patch	2 KB	quicksketch

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

quicksketch

he/him

CreditAttribution: quicksketch commented 5 September 2009 at 14:03

Issue tags:

+Fields in Core, +API change

tagging

Comment #2

bjaspan CreditAttribution: bjaspan commented 5 September 2009 at 14:14

Status:

Needs review

» Needs work

I think this is fine in principle, and I considered this approach when doing the bulk delete patch. I decided to have the table names always contain the field id for these reasons:

* I thought it was conceptually simpler for the tables to always have the same name. Who needs "Well, the table name starts as foo, but then later we might rename to this other thing because..."?

* I was concerned about odd race conditions in which we've changed $field['deleted'] to 1 and/or renamed the table but not yet updated the cached schema array, etc. etc., leading to all kinds of hard to debug weirdness. Keeping the table names the same for their whole lifetime avoids all of that risk. Actually, this points out a bug in the current patch: You have to clear the schema cache when you rename the table. Marking CNW.

* No one should ever be hard-coding table names in code; you should get the table names from the API. D6 CCK had such functions, though few people used them, and we are now adding them to Field API. If everyone always gets the table names from a function, who cares whether they have the field id in them?

All that said, if we can be convinced that the race conditions/consistency problems are solved, I do not particularly care if we rename them as per this patch.

One other comment: Table names have max lengths. Adding "_deleted" takes up room. Do we need that string?

Comment #3

catch

he/him

English

CreditAttribution: catch commented 5 September 2009 at 14:18

The one situation you want to be hard coding table names in code, is when checking out stuff via the MySQL command line - and then it'd be lovely to be able to predict the name of the table instead of having to look it up first. So I'd be happy if this can go in.

Comment #4

quicksketch

he/him

CreditAttribution: quicksketch commented 5 September 2009 at 14:34

Status:

Needs work

» Needs review

File	Size
field_sql_storage_deleted_tables.patch	2.02 KB

* I thought it was conceptually simpler for the tables to always have the same name. Who needs "Well, the table name starts as foo, but then later we might rename to this other thing because..."?

As a developer I'm sensitive to the way my database is structured, this change makes a significant improvement in making our table names more "human-friendly", since to most developers the Field ID is a completely arbitrary number that we don't need to be adding.

* I was concerned about odd race conditions in which we've changed $field['deleted'] to 1 and/or renamed the table but not yet updated the cached schema array, etc. etc., leading to all kinds of hard to debug weirdness. Keeping the table names the same for their whole lifetime avoids all of that risk. Actually, this points out a bug in the current patch: You have to clear the schema cache when you rename the table. Marking CNW.

There might be very slim race conditions when deleting a field if 2 Apache instances were running and we changed the table name, but deleting a field seems like a situation where a millisecond down-time should be acceptable.

* No one should ever be hard-coding table names in code; you should get the table names from the API. D6 CCK had such functions, though few people used them, and we are now adding them to Field API. If everyone always gets the table names from a function, who cares whether they have the field id in them?

Regardless our names should make sense, with the "it doesn't matter" way of thinking, we might as well be making random hash table names. Clearly that's not something we want as part of our developer experience. Making the table names sensible is beneficial to our DX.

This version of the patch doesn't add the "_deleted" string to the table name, since most MySQL installs are limited to a 64 character table name limit. If we currently have restrictions on field name length, they should already enforce the field name is short enough to append the field ID.