Update, rather than Delete and insert - for significant database performance improvement [#1800286]

It is possible that i am missing the point as I cant find any discussions on the following issue:

On both field_sql_storage_field_storage_write() & hook_field_storage_write() we have the following comment

// Delete and insert, rather than update, in case a value was added.

The code (for update operations) then deletes the field and the revision field and inserts the field and revision field

This to my possibly ignorant eyes looks like incredibly inefficient. I would expect to see the following logic:
1) Query existing value for the field
2) If doesn't exist: Insert field and field revision.
3) If exists and value is different: Update field and field revision.

This will have two improvements:
1) Write to the database only if value changed
2) Select + Update are much faster then Delete + Insert

Some of the benefits on database performance I can see are:

Less database writes (insert/update)
Less Fragmentation by updating instead of delete+insert
Less UNDO and REDO info sql needs to handle
Faster transactions and less chance of deadlocks
Possibly less database index updates

I am no export on the Entity/Field API so if I have missed something significant my apologies, however if it is a valid point I would be happy to look into it further and possibly offer a patch

Comment	File	Size	Author
#36	Node_save_enhancement_using_upsert-1800286-36.patch	6.65 KB	azenned
#36	8.5.x: PHP 7.1 & MySQL 5.5 23,256 pass, 151 fail
#35	Node_save_enhancement_using_upsert-1800286-35.patch	6.55 KB	azenned
#35	8.5.x: PHP 7.1.x-dev & MySQL 5.5 Patch Failed to Apply PHP 7.1 & MySQL 5.5 Patch Failed to Apply PHP 7.2.x-dev & MySQL 5.5 Patch Failed to Apply PHP 7.2 & MySQL 5.5 Patch Failed to Apply

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

iAugur CreditAttribution: iAugur commented 2 October 2012 at 08:30

I totally agree with you here (and it is an issue/feature of Drupal 7.x too) But there may be some history behind this? On the face of it it does seem very inefficient and very pessimistic - a comparison and update or insert as you suggest seems preferable and far more efficient.
We have experienced deadlocks from this process on a dev site under heavy test load where products on a commerce site were being bulk updated.
It would be good to get this reviewed for 8. and any fix/change should be portable back to 7 as the code is largely the same.

Comment #2

guy_schneerson CreditAttribution: guy_schneerson commented 16 October 2012 at 09:44

Issue tags:

+Performance

adding Performance tag to issue

Comment #3

sun

German

Karlsruhe

CreditAttribution: sun commented 10 December 2012 at 02:52

This looks strange indeed.

Since the code does not state why it is done that way, it is safe to assume that there is no super-fundamental reason for that, and it's just that one developer thought it would make more sense to do it this way at some point. I agree that deleting and re-inserting data like this is definitely not a good idea for SQL backends.

AFAICS, the new code would have to query existing values first, compare them against new values, and lastly, perform the necessary insert/update + delete queries.

That said, one could possibly even attempt to perform the comparison based on the original field values in $entity->original - although that might involve some risks of race conditions, not sure.

Anyone up for writing a patch?

Comment #4

guy_schneerson CreditAttribution: guy_schneerson commented 10 December 2012 at 09:38

would be happy to give it a go

Comment #6

Damien Tournoud CreditAttribution: Damien Tournoud commented 17 July 2013 at 09:48

Until we decide to go to READ COMMITTED instead of READ REPEATABLE as our default transaction isolation, trying to do incremental modification is just going to increase our concurrency issues and race conditions. I would not recommend trying to tackle this now.

Comment #7

andyceo CreditAttribution: andyceo commented 17 July 2013 at 12:21

In MySQL driver we can use REPLACE instead UPDATE or INSERT.

Comment #8

guy_schneerson CreditAttribution: guy_schneerson commented 17 July 2013 at 16:32

Thanks @Damien Tournoud that makes sense as READ REPEATABLE gives no way of checking if data changed outside the transaction. Looks to me like the cost of READ REPEATABLE is incredibly high compared to the gains.

@andyceo thanks for the tip not sure if REPLACE solves the issue as it the same as a Delete + update and as far as I understand may be a bit more expensive as it will try an Insert before the delete, and that will fail on all entity updates.

Comment #9

danblack CreditAttribution: danblack commented 5 April 2014 at 08:04

REPLACE solves the issue as it the same as a Delete + update and as far as I understand may be a bit more expensive as it will try an Insert before the delete

Database implementers wouldn't have written it if there wasn't a benefit. This article suggests REPLACE over UPDATE in the case where there is no existing record.

#1650930: Use READ COMMITTED by default for MySQL transactions suggests a lower isolation level which I agree could have benefits in some cases.

drupal_write_record seems to be the current implementation for D8 so lets get it right.

So let's mock up a test case for performance, and try out some options like REPLACE and transaction isolation levels, and finding any bulk delete with large replacements (like #2229013-6: PDOException: SQLSTATE[HY000]: General error: 1030 Got error -1 from storage engine: TRUNCATE {cache_field} ; Array ( ) in cache).

Comment #10

mikeytown2 CreditAttribution: mikeytown2 commented 7 April 2014 at 19:26

Issue tags:

+database cache

Comment #11

pounard

French

CreditAttribution: pounard commented 7 April 2014 at 21:22

I suggest we explode the MergeQuery implementation for both PostgresSQL (using MERGE queries) and MySQL (using REPLACE INTO) queries and leave the default as-is for SQLite.

This will gain in both performance an reliability.

Comment #12

danblack CreditAttribution: danblack commented 7 April 2014 at 22:39

I'm writing the mysql implementation now. It falls back to the QueryMerge object if a REPLACE isn't possible.

I can't see where postgres merge is implemented in the main docs - this is the closest I've found to a syntax.

Comment #13

pounard

French

CreditAttribution: pounard commented 8 April 2014 at 16:55

Issue tags:

+Needs backport to 7.x

This should be backported to Drupal 7 as well.

Comment #14

danblack CreditAttribution: danblack commented 11 April 2014 at 00:22

lets wrap the postgresql implementation in a REPEATABLE-READ isolation if we can't find a MERGE implementation (#2236395: extended db_transaction / Database::Connection::startTransaction to include isolation level). sqlite is already serializable by default but it wouldn't hurt to do this the same in case someone lowers it to READ-UNCOMMITTED

Comment #15

danblack CreditAttribution: danblack commented 11 April 2014 at 00:41

Assigned:

Unassigned

» danblack

Comment #16

regilero CreditAttribution: regilero commented 11 April 2014 at 13:24

From my own tests using DELETE+INSERT is an huge mistake, inducing a lot of deadlocks.
Taking a mySQL server 5.5, opening 2 transactions, make a delete on rowid=1 on the first one, then delete on rowid=2 on the second one, and then try to insert rowid 1 on the first and rowid2 on the second, you have an insert deadlock.
MySQL adds some table level locks after the delete queries. Without theses deletes no insert deadlocks. tested on READ COMMITTED and REPEATABLE READS. Of course REPEATABLE READS could add some gap locking things which could make some deadlocks. ~~But here, even in READ COMMITTED mode you have a deadlock if two transactions are trying to save a field (arg!).~~. READ COMMITED mode is immune to that bug after MySQL 5.1 but I made the test on a 5.0 and had a crash.
This happens a lot with Drupal commerce and others Drupal usage where write operations are concurrent and numerous.

So, now that the merge queries deadlocks are fixed, why not using merge queries to update or insert fields?

Test:

create table t1 ( id varchar(128) NOT NULL DEFAULT ''), id2 varchar(128) NOT NULL default '', PRIMARY KEY (id,id2));

--> session 1

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
DELETE FROM t1 WHERE id=1;
INSERT INTO t1 (id,id2) VALUES (1,1);

In parallel, session 2:

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
DELETE FROM t1 WHERE id=2;
INSERT INTO t1 (id,id2) VALUES (2,2);

One of the transaction will get a deadlock, without the delete everything is fine.

Comment #17

danblack CreditAttribution: danblack commented 12 April 2014 at 03:16

Version:	8.x-dev	» 7.x-dev
Issue tags:	-database cache, -Needs backport to 7.x

field_sql_storage_field_storage_write() & hook_field_storage_write() are only in D7.

yes, reimplementing with merge sounds like a good idea.

In comment 12 my commitment was to use REPLACE in the db_merge for D8 which is actually #2239071: use REPLACE as part of mysql db_merge

I'll leave it for someone so port these two functions to use db_merge.

Comment #18

sun

German

Karlsruhe

CreditAttribution: sun commented 12 April 2014 at 12:25

Version:	7.x-dev	» 8.x-dev
Category:	Feature request	» Task
Issue tags:		+Entity Field API, +Entity system

ContentEntityDatabaseStorage::doSaveFieldItems() still DELETEs + INSERTs upon an entity update in D8.

Comment #19

pounard

French

CreditAttribution: pounard commented 16 April 2014 at 11:19

Agreed that this needs to be fixed in D8 also, but please make this happen for Drupal 7 first since there are a lot of real sites in real production environements suffering from this. This is kind of production-critical and this should be enough to bypass the "D8 first" rule.

Comment #20

danblack CreditAttribution: danblack commented 21 April 2014 at 11:10

Comment #21

pounard

French

CreditAttribution: pounard commented 10 July 2014 at 12:08

For your information I started working on #2297817: Do not attempt field storage write when field content did not change which partially solves this problem by preventing the field SQL storage to DELETE/INSERT when field data is unchanged while calling field_attach_update(). I'd also like some interested people to review the latest patch of this issue because it actually do reduces a lot SQL backend stress in scenarios where a lot of concurent entity save happen without revisioning enabled (for example, on some commerce sites which have disabled revisions on their entities, it actually does make a huge difference).

Comment #22

pounard

French

CreditAttribution: pounard commented 10 July 2014 at 12:09

Comment #23

nDigitHQ CreditAttribution: nDigitHQ commented 19 November 2014 at 18:49

This issue poses a severe performance degradation in D7 when updating taxonomy terms. Took me a while to track down this thread. I have ~200 taxonomy terms with 20 fields each.

Takes about 6-7 seconds per term to update because of the DELETE/INSERT instead of offloading this to the DB Driver.

In MySQL you can simply upsert which will save some time and keep the system from deleting content (and displaying a 404 on highly-trafficked sites).

If I can be of some help in getting this issue resolved please let me know and I'll jump in.

Comment #24

pounard

French

CreditAttribution: pounard commented 20 November 2014 at 13:44

@nDigitHQ try using #2297817: Do not attempt field storage write when field content did not change's patch

Comment #25

danblack CreditAttribution: danblack commented 21 November 2014 at 05:16

Assigned:

danblack

» Unassigned

Comment #26

nDigitHQ CreditAttribution: nDigitHQ commented 21 November 2014 at 18:38

@pounard -- thank you. I'll take a look there.

I ended up just using an UPSERT query. It reduced the time from ~6.75824163251239 seconds to ~0.39444303512573.

Code:

//Check to see if term exists in the default vocabulary
$matching_terms = taxonomy_get_term_by_name($symbol, __DEFAULT_VOCAB_MACHINE_NAME);

//Loop through existing term(s)
foreach ($matching_terms as $term) {
    $tid = $term->tid;

    foreach ($data as $k => $v) {
        
        $field_name = substr("field_".$k, 0, 31); //Max length of 32 characters on a field name.
        $v = htmlentities($v, ENT_QUOTES);
            
        //Update the field
        $sql = "
        INSERT INTO `field_data_{$field_name}` 
        (entity_type, bundle, deleted, entity_id, revision_id, language, delta, {$field_name}_value)
        VALUES ('taxonomy_term', '".__DEFAULT_VOCAB_MACHINE_NAME."', 0, $tid, $tid, 'und', 0, '$v')
        ON DUPLICATE KEY UPDATE {$field_name}_value='$v'";
        $query = db_query($sql);
        $result = $query->execute();
        
        //Update the revision
        $sql = "
        INSERT INTO `field_revision_{$field_name}` 
        (entity_type, bundle, deleted, entity_id, revision_id, language, delta, {$field_name}_value)
        VALUES ('taxonomy_term', '".__DEFAULT_VOCAB_MACHINE_NAME."', 0, $tid, $tid, 'und', 0, '$v')
        ON DUPLICATE KEY UPDATE {$field_name}_value='$v'";
        $query = db_query($sql);
        $result = $query->execute();
    }
    
    $time_post = microtime(true);
    $exec_time = $time_post - $time_pre;
    echo "Execution Time: $exec_time";
    exit;
       
}

This would obviously need to be modified to extend normal functionality but the query is there.

Comment #27

pounard

French

CreditAttribution: pounard commented 25 November 2014 at 08:38

Using smarter SQL for field INSERT or UPDATE is really hard to achieve since you not only have values but an order (the delta column) to keep in sync, means that almost every field write would require you to do some DELETE, INSERT and UPDATE queries at the same time. Not impossible but I think really hard to achieve in a smart and simple way. My guess is that the original DELETE/INSERT we know has been introduced by the EAV model which does not really fit into a classic relational database.

Comment #28

25 November 2014 at 08:38

Version:

8.0.x-dev

» 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #29

25 November 2014 at 08:38

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #30

ndobromirov CreditAttribution: ndobromirov at FFW commented 25 October 2016 at 10:19

Can the delta sorting be moved to the field-api PHP layer instead?

This way it should be still transparent to the end users of the field API, but they will be un-ordered in SQL, allowing the reduction of I/O there.

This is not excluding the write if only needed patch, mentioned above.

Comment #31

25 October 2016 at 10:19

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #32

25 October 2016 at 10:19

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #33

25 October 2016 at 10:19

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #34

matsbla CreditAttribution: matsbla commented 2 February 2018 at 22:25

Comment #35

azenned CreditAttribution: azenned commented 10 June 2018 at 17:16

File	Size
Node_save_enhancement_using_upsert-1800286-35.patch	6.55 KB
8.5.x: PHP 7.1.x-dev & MySQL 5.5 Patch Failed to Apply PHP 7.1 & MySQL 5.5 Patch Failed to Apply PHP 7.2.x-dev & MySQL 5.5 Patch Failed to Apply PHP 7.2 & MySQL 5.5 Patch Failed to Apply

I was working on the same issue last year,

I this this patch work fine , and i have got a significant performance improvement on multi sites content synchronizing .

Can someone do some benchmark ?

Comment #36

azenned CreditAttribution: azenned commented 10 June 2018 at 17:27

File	Size
Node_save_enhancement_using_upsert-1800286-36.patch	6.65 KB
8.5.x: PHP 7.1 & MySQL 5.5 23,256 pass, 151 fail

1 file was hidden/shown/deleted

File	Size
Node_save_enhancement_using_upsert-1800286-35.patch	6.55 KB
8.5.x: PHP 7.1.x-dev & MySQL 5.5 Patch Failed to Apply PHP 7.1 & MySQL 5.5 Patch Failed to Apply PHP 7.2.x-dev & MySQL 5.5 Patch Failed to Apply PHP 7.2 & MySQL 5.5 Patch Failed to Apply

Comment #37

10 June 2018 at 17:27

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #38

AaronMcHale

English

Edinburgh, Scotland

CreditAttribution: AaronMcHale at The University of Edinburgh commented 6 May 2019 at 14:30

Issue tags:

+Needs issue summary update

Is this still relevant? If so in addition to REPLACE as was mentioned earlier we should look at INSERT ... ON DUPLICATE KEY UPDATE (for MySQL at least). I've used the latter in the past and it seems to work well, but don't remember what the differences (if any) are between the two from a performance or compatibility standpoint.

Also, how similar is this issue to #2239071: use REPLACE as part of mysql db_merge, can one be closed in favour of the other, or are they different enough? I've skimmed most of this issue and a little of the linked one.

Comment #39

amateescu CreditAttribution: amateescu for Pfizer, Inc. commented 6 May 2019 at 15:15

I think @morgantocker's comment from #2239071-13: use REPLACE as part of mysql db_merge is very relevant for this issue too. As far as I understand, using REPLACE is basically the same as the DELETE + UPDATE operations that we're doing now, and it's also MySQL-specific.

Comment #40

6 May 2019 at 15:15

Version:

8.6.x-dev

» 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #41

6 May 2019 at 15:15

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Comment #42

6 May 2019 at 15:15

Version:

8.9.x-dev

» 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #43

6 May 2019 at 15:15

Version:

9.2.x-dev

» 9.3.x-dev

Comment #44

6 May 2019 at 15:15

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #45

6 May 2019 at 15:15

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Comment #46

6 May 2019 at 15:15

Version:

9.5.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Update, rather than Delete and insert - for significant database performance improvement

Comments

Related issues

Referenced by