It is possible that i am missing the point as I cant find any discussions on the following issue:

On both field_sql_storage_field_storage_write() & hook_field_storage_write() we have the following comment

// Delete and insert, rather than update, in case a value was added.

The code (for update operations) then deletes the field and the revision field and inserts the field and revision field

This to my possibly ignorant eyes looks like incredibly inefficient. I would expect to see the following logic:
1) Query existing value for the field
2) If doesn't exist: Insert field and field revision.
3) If exists and value is different: Update field and field revision.

This will have two improvements:
1) Write to the database only if value changed
2) Select + Update are much faster then Delete + Insert

Some of the benefits on database performance I can see are:

  • Less database writes (insert/update)
  • Less Fragmentation by updating instead of delete+insert
  • Less UNDO and REDO info sql needs to handle
  • Faster transactions and less chance of deadlocks
  • Possibly less database index updates

I am no export on the Entity/Field API so if I have missed something significant my apologies, however if it is a valid point I would be happy to look into it further and possibly offer a patch

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

iAugur’s picture

I totally agree with you here (and it is an issue/feature of Drupal 7.x too) But there may be some history behind this? On the face of it it does seem very inefficient and very pessimistic - a comparison and update or insert as you suggest seems preferable and far more efficient.
We have experienced deadlocks from this process on a dev site under heavy test load where products on a commerce site were being bulk updated.
It would be good to get this reviewed for 8. and any fix/change should be portable back to 7 as the code is largely the same.

guy_schneerson’s picture

Issue tags: +Performance

adding Performance tag to issue

sun’s picture

This looks strange indeed.

Since the code does not state why it is done that way, it is safe to assume that there is no super-fundamental reason for that, and it's just that one developer thought it would make more sense to do it this way at some point. I agree that deleting and re-inserting data like this is definitely not a good idea for SQL backends.

AFAICS, the new code would have to query existing values first, compare them against new values, and lastly, perform the necessary insert/update + delete queries.

That said, one could possibly even attempt to perform the comparison based on the original field values in $entity->original - although that might involve some risks of race conditions, not sure.

Anyone up for writing a patch?

guy_schneerson’s picture

would be happy to give it a go

Damien Tournoud’s picture

Until we decide to go to READ COMMITTED instead of READ REPEATABLE as our default transaction isolation, trying to do incremental modification is just going to increase our concurrency issues and race conditions. I would not recommend trying to tackle this now.

andyceo’s picture

In MySQL driver we can use REPLACE instead UPDATE or INSERT.

guy_schneerson’s picture

Thanks @Damien Tournoud that makes sense as READ REPEATABLE gives no way of checking if data changed outside the transaction. Looks to me like the cost of READ REPEATABLE is incredibly high compared to the gains.

@andyceo thanks for the tip not sure if REPLACE solves the issue as it the same as a Delete + update and as far as I understand may be a bit more expensive as it will try an Insert before the delete, and that will fail on all entity updates.

danblack’s picture

REPLACE solves the issue as it the same as a Delete + update and as far as I understand may be a bit more expensive as it will try an Insert before the delete

Database implementers wouldn't have written it if there wasn't a benefit. This article suggests REPLACE over UPDATE in the case where there is no existing record.

#1650930: Use READ COMMITTED by default for MySQL transactions suggests a lower isolation level which I agree could have benefits in some cases.

drupal_write_record seems to be the current implementation for D8 so lets get it right.

So let's mock up a test case for performance, and try out some options like REPLACE and transaction isolation levels, and finding any bulk delete with large replacements (like #2229013-6: PDOException: SQLSTATE[HY000]: General error: 1030 Got error -1 from storage engine: TRUNCATE {cache_field} ; Array ( ) in cache).

mikeytown2’s picture

Issue tags: +database cache
pounard’s picture

I suggest we explode the MergeQuery implementation for both PostgresSQL (using MERGE queries) and MySQL (using REPLACE INTO) queries and leave the default as-is for SQLite.

This will gain in both performance an reliability.

danblack’s picture

I'm writing the mysql implementation now. It falls back to the QueryMerge object if a REPLACE isn't possible.

I can't see where postgres merge is implemented in the main docs - this is the closest I've found to a syntax.

pounard’s picture

Issue tags: +Needs backport to 7.x

This should be backported to Drupal 7 as well.

danblack’s picture

lets wrap the postgresql implementation in a REPEATABLE-READ isolation if we can't find a MERGE implementation (#2236395: extended db_transaction / Database::Connection::startTransaction to include isolation level). sqlite is already serializable by default but it wouldn't hurt to do this the same in case someone lowers it to READ-UNCOMMITTED

danblack’s picture

Assigned: Unassigned » danblack
regilero’s picture

From my own tests using DELETE+INSERT is an huge mistake, inducing a lot of deadlocks.
Taking a mySQL server 5.5, opening 2 transactions, make a delete on rowid=1 on the first one, then delete on rowid=2 on the second one, and then try to insert rowid 1 on the first and rowid2 on the second, you have an insert deadlock.
MySQL adds some table level locks after the delete queries. Without theses deletes no insert deadlocks. tested on READ COMMITTED and REPEATABLE READS. Of course REPEATABLE READS could add some gap locking things which could make some deadlocks. But here, even in READ COMMITTED mode you have a deadlock if two transactions are trying to save a field (arg!).. READ COMMITED mode is immune to that bug after MySQL 5.1 but I made the test on a 5.0 and had a crash.
This happens a lot with Drupal commerce and others Drupal usage where write operations are concurrent and numerous.

So, now that the merge queries deadlocks are fixed, why not using merge queries to update or insert fields?

Test:

create table t1 ( id varchar(128) NOT NULL DEFAULT ''), id2 varchar(128) NOT NULL default '', PRIMARY KEY (id,id2));

--> session 1

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
DELETE FROM t1 WHERE id=1;
INSERT INTO t1 (id,id2) VALUES (1,1);

In parallel, session 2:

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
DELETE FROM t1 WHERE id=2;
INSERT INTO t1 (id,id2) VALUES (2,2);

One of the transaction will get a deadlock, without the delete everything is fine.

danblack’s picture

Version: 8.x-dev » 7.x-dev
Issue tags: -database cache, -Needs backport to 7.x

field_sql_storage_field_storage_write() & hook_field_storage_write() are only in D7.

yes, reimplementing with merge sounds like a good idea.

In comment 12 my commitment was to use REPLACE in the db_merge for D8 which is actually #2239071: use REPLACE as part of mysql db_merge

I'll leave it for someone so port these two functions to use db_merge.

sun’s picture

Version: 7.x-dev » 8.x-dev
Category: Feature request » Task
Issue tags: +Entity Field API, +Entity system

ContentEntityDatabaseStorage::doSaveFieldItems() still DELETEs + INSERTs upon an entity update in D8.

pounard’s picture

Agreed that this needs to be fixed in D8 also, but please make this happen for Drupal 7 first since there are a lot of real sites in real production environements suffering from this. This is kind of production-critical and this should be enough to bypass the "D8 first" rule.

danblack’s picture

pounard’s picture

For your information I started working on #2297817: Do not attempt field storage write when field content did not change which partially solves this problem by preventing the field SQL storage to DELETE/INSERT when field data is unchanged while calling field_attach_update(). I'd also like some interested people to review the latest patch of this issue because it actually do reduces a lot SQL backend stress in scenarios where a lot of concurent entity save happen without revisioning enabled (for example, on some commerce sites which have disabled revisions on their entities, it actually does make a huge difference).

pounard’s picture

nDigitHQ’s picture

This issue poses a severe performance degradation in D7 when updating taxonomy terms. Took me a while to track down this thread. I have ~200 taxonomy terms with 20 fields each.

Takes about 6-7 seconds per term to update because of the DELETE/INSERT instead of offloading this to the DB Driver.

In MySQL you can simply upsert which will save some time and keep the system from deleting content (and displaying a 404 on highly-trafficked sites).

If I can be of some help in getting this issue resolved please let me know and I'll jump in.

pounard’s picture

danblack’s picture

Assigned: danblack » Unassigned
nDigitHQ’s picture

@pounard -- thank you. I'll take a look there.

I ended up just using an UPSERT query. It reduced the time from ~6.75824163251239 seconds to ~0.39444303512573.

Code:

//Check to see if term exists in the default vocabulary
$matching_terms = taxonomy_get_term_by_name($symbol, __DEFAULT_VOCAB_MACHINE_NAME);

//Loop through existing term(s)
foreach ($matching_terms as $term) {
    $tid = $term->tid;

    foreach ($data as $k => $v) {
        
        $field_name = substr("field_".$k, 0, 31); //Max length of 32 characters on a field name.
        $v = htmlentities($v, ENT_QUOTES);
            
        //Update the field
        $sql = "
        INSERT INTO `field_data_{$field_name}` 
        (entity_type, bundle, deleted, entity_id, revision_id, language, delta, {$field_name}_value)
        VALUES ('taxonomy_term', '".__DEFAULT_VOCAB_MACHINE_NAME."', 0, $tid, $tid, 'und', 0, '$v')
        ON DUPLICATE KEY UPDATE {$field_name}_value='$v'";
        $query = db_query($sql);
        $result = $query->execute();
        
        //Update the revision
        $sql = "
        INSERT INTO `field_revision_{$field_name}` 
        (entity_type, bundle, deleted, entity_id, revision_id, language, delta, {$field_name}_value)
        VALUES ('taxonomy_term', '".__DEFAULT_VOCAB_MACHINE_NAME."', 0, $tid, $tid, 'und', 0, '$v')
        ON DUPLICATE KEY UPDATE {$field_name}_value='$v'";
        $query = db_query($sql);
        $result = $query->execute();
    }
    
    $time_post = microtime(true);
    $exec_time = $time_post - $time_pre;
    echo "Execution Time: $exec_time";
    exit;
       
}

This would obviously need to be modified to extend normal functionality but the query is there.

pounard’s picture

Using smarter SQL for field INSERT or UPDATE is really hard to achieve since you not only have values but an order (the delta column) to keep in sync, means that almost every field write would require you to do some DELETE, INSERT and UPDATE queries at the same time. Not impossible but I think really hard to achieve in a smart and simple way. My guess is that the original DELETE/INSERT we know has been introduced by the EAV model which does not really fit into a classic relational database.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

ndobromirov’s picture

Can the delta sorting be moved to the field-api PHP layer instead?

This way it should be still transparent to the end users of the field API, but they will be un-ordered in SQL, allowing the reduction of I/O there.

This is not excluding the write if only needed patch, mentioned above.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

matsbla’s picture

azenned’s picture

I was working on the same issue last year,

I this this patch work fine , and i have got a significant performance improvement on multi sites content synchronizing .

Can someone do some benchmark ?

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

AaronMcHale’s picture

Is this still relevant? If so in addition to REPLACE as was mentioned earlier we should look at INSERT ... ON DUPLICATE KEY UPDATE (for MySQL at least). I've used the latter in the past and it seems to work well, but don't remember what the differences (if any) are between the two from a performance or compatibility standpoint.

Also, how similar is this issue to #2239071: use REPLACE as part of mysql db_merge, can one be closed in favour of the other, or are they different enough? I've skimmed most of this issue and a little of the linked one.

amateescu’s picture

I think @morgantocker's comment from #2239071-13: use REPLACE as part of mysql db_merge is very relevant for this issue too. As far as I understand, using REPLACE is basically the same as the DELETE + UPDATE operations that we're doing now, and it's also MySQL-specific.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.