When we added the ability to clean up deleted fields in #367753: Bulk deletion, we made a very strange change to the name of the field tables. Previously our tables had names such as field_data_body, but after adding the deletion ability we started naming our tables such as field_data_body_1, where "_1" is the field ID. We have these table names so that a deleted field table name does not conflict with a new field that is given the same name while the table's data is being deleted.

This patch makes it so that we rename the table upon deletion, rather than having the field ID tacked on the to the table name at all times. To test the efficiency of this, I created a table with 1 million rows, containing 1.5 GB of data (when exported as a compressed MySQL dump). After adding this data I tested a straight-up RENAME operation:

With 1,000,000 rows of data:

mysql> RENAME TABLE field_data_field_test_8 TO field_data_field_test;
Query OK, 0 rows affected (0.08 sec)

With 1 row of data:

mysql> RENAME TABLE field_data_field_test_8 TO field_data_field_test;
Query OK, 0 rows affected (0.01 sec)

So it does seem like there might be some kind of linear increase in processing time, but at the same time, if we can rename a table in 0.08 seconds when it contains a million rows, I don't think this is a performance problem of concern.

This patch makes it so that tables are named field_data_body while in use and field_data_body_deleted_1 while being deleted. The end result is table names that make much more sense both when in use and while being deleted.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

quicksketch’s picture

Issue tags: +Fields in Core, +API change

tagging

bjaspan’s picture

Status: Needs review » Needs work

I think this is fine in principle, and I considered this approach when doing the bulk delete patch. I decided to have the table names always contain the field id for these reasons:

* I thought it was conceptually simpler for the tables to always have the same name. Who needs "Well, the table name starts as foo, but then later we might rename to this other thing because..."?

* I was concerned about odd race conditions in which we've changed $field['deleted'] to 1 and/or renamed the table but not yet updated the cached schema array, etc. etc., leading to all kinds of hard to debug weirdness. Keeping the table names the same for their whole lifetime avoids all of that risk. Actually, this points out a bug in the current patch: You have to clear the schema cache when you rename the table. Marking CNW.

* No one should ever be hard-coding table names in code; you should get the table names from the API. D6 CCK had such functions, though few people used them, and we are now adding them to Field API. If everyone always gets the table names from a function, who cares whether they have the field id in them?

All that said, if we can be convinced that the race conditions/consistency problems are solved, I do not particularly care if we rename them as per this patch.

One other comment: Table names have max lengths. Adding "_deleted" takes up room. Do we need that string?

catch’s picture

The one situation you want to be hard coding table names in code, is when checking out stuff via the MySQL command line - and then it'd be lovely to be able to predict the name of the table instead of having to look it up first. So I'd be happy if this can go in.

quicksketch’s picture

Status: Needs work » Needs review
FileSize
2.02 KB

* I thought it was conceptually simpler for the tables to always have the same name. Who needs "Well, the table name starts as foo, but then later we might rename to this other thing because..."?

As a developer I'm sensitive to the way my database is structured, this change makes a significant improvement in making our table names more "human-friendly", since to most developers the Field ID is a completely arbitrary number that we don't need to be adding.

* I was concerned about odd race conditions in which we've changed $field['deleted'] to 1 and/or renamed the table but not yet updated the cached schema array, etc. etc., leading to all kinds of hard to debug weirdness. Keeping the table names the same for their whole lifetime avoids all of that risk. Actually, this points out a bug in the current patch: You have to clear the schema cache when you rename the table. Marking CNW.

There might be very slim race conditions when deleting a field if 2 Apache instances were running and we changed the table name, but deleting a field seems like a situation where a millisecond down-time should be acceptable.

* No one should ever be hard-coding table names in code; you should get the table names from the API. D6 CCK had such functions, though few people used them, and we are now adding them to Field API. If everyone always gets the table names from a function, who cares whether they have the field id in them?

Regardless our names should make sense, with the "it doesn't matter" way of thinking, we might as well be making random hash table names. Clearly that's not something we want as part of our developer experience. Making the table names sensible is beneficial to our DX.

This version of the patch doesn't add the "_deleted" string to the table name, since most MySQL installs are limited to a 64 character table name limit. If we currently have restrictions on field name length, they should already enforce the field name is short enough to append the field ID.

quicksketch’s picture

Version: 6.x-dev » 7.x-dev

Doh... wrong version.

bjaspan’s picture

Status: Needs review » Reviewed & tested by the community

I am still concerned that this will lead to low-level Schema API weirdness... but I cannot support my concerns with an example, so RTBC. ;-)

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Ahhh, now that's much nicer. Thanks for the RTBC, Barry.

Committed to HEAD.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

yched’s picture

IMO we should push this one step further: #650748: Tablename for deleted fields