Problem/Motivation

In #2542776: Add an Upsert class, we added an UPSERT query to the database system which only worked on tables with one unique index / primary key, composed of a single field. This limitation was imposed by the minimum SQLite version supported by Drupal at the time, which lacked proper support for UPSERT queries. In the meantime, SQLite added a real UPSERT in 3.24.

Proposed resolution

For Drupal 9.x we increased the minimum version of SQLite to 8.26 in #3107155: Raise SQLite version requirement to 3.26 in Drupal 9 , and one of the main motivations was the improved UPSERT capability, per #3107155-15: Raise SQLite version requirement to 3.26 in Drupal 9 .

Now that all three core database drivers support UPSERT queries on unique/primary key constraints composed of multiple fields, we should take advantage of it and replace core's usage of "Merge" queries where possible.

The benefit is that a Merge query is potentially composed of 3 separate statements (SELECT, INSERT and UPDATE), while UPSERTs can do the same thing (in most situations) in a single statement.

Remaining tasks

Review.

User interface changes

Nope.

API changes

API addition, TBD.

Data model changes

Nope.

Release notes snippet

TBD.

Issue fork drupal-2547493

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

amateescu created an issue. See original summary.

mgifford’s picture

Status: Postponed » Active
amateescu’s picture

Status: Active » Closed (won't fix)

It was decided in the parent issue (#2542776: Add an Upsert class) that we cannot support multiple fields, mostly because of SQLite, so I'm going to close this one.

amateescu’s picture

Title: Decide if we can/should add support for unique fields composed of multiple fields for Upsert querries » Add support for unique fields composed of multiple fields for Upsert querries, and start using it for config and key/value tables
Version: 8.0.x-dev » 9.1.x-dev
Issue summary: View changes
Status: Closed (won't fix) » Needs review
Issue tags: +Performance
StatusFileSize
new12.87 KB
new13.05 KB

Now that Drupal 9.x requires SQLite 3.26, which has support for proper Upsert queries, we can finally do this :)

This patch depends on #3159073: Use the new UPSERT capability from SQLite 3.24, so I'm uploading a combined patch as well.

amateescu’s picture

Title: Add support for unique fields composed of multiple fields for Upsert querries, and start using it for config and key/value tables » Add support for unique / primary key constraints composed of multiple fields for Upsert querries, and start using it for config and key/value tables
daffie’s picture

Status: Needs review » Needs work

Patch looks good.

  1. The testbot is failing for PostgreSQL.
  2. +++ b/core/lib/Drupal/Core/Database/Query/Upsert.php
    @@ -47,13 +47,47 @@ public function __construct(Connection $connection, $table, array $options = [])
    +    $this->fields($fields);
    

    Why do we do a call to the method fields()? We are not doing that in the method key().

  3. +++ b/core/lib/Drupal/Core/Database/Query/Upsert.php
    @@ -47,13 +47,47 @@ public function __construct(Connection $connection, $table, array $options = [])
    +    if (empty($values)) {
    +      if (!is_numeric(key($fields))) {
    

    Can we combine the 2 if-statements in a single if-statement.

  4. Can we use the upsert also for Drupal\Core\KeyValueStore\DatabaseStorage::setIfNotExists()?
  5. Can we replace all calls to Drupal\Core\Database\Query\Merge? If so, can we deprecate Merge?
  6. Missing testing for upserts with the methods fields() with the parameter $values being not empty.
  7. +++ b/core/lib/Drupal/Core/Database/Query/Upsert.php
    @@ -47,13 +47,47 @@ public function __construct(Connection $connection, $table, array $options = [])
    +   * @param array $fields
    ...
    +  public function keys(array $fields) {
    

    The parameter $fields, is that only an array with values or can it also be an array with keys and values?

  8. Can we do some performance testing to make sure that with this patch Drupal becomes faster.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

jonmcl’s picture

@amateescu: Thank you for your work on this.

Unfortunately it no longer applies to 9.5.x or 10.x. Any chance you have rerolled this patch recently?

mlncn’s picture

Stripped down, MySQL only, no tests, try at an updated patch. Sorry will try to do more. Though i do think the adding the upsert multivalue capability and using elsewhere in Drupal can and should be separated issues.

amateescu’s picture

Status: Needs work » Needs review

Thanks @daffie for the review! Addressed most points in the MR, replying below to the other ones:

Can we use the upsert also for Drupal\Core\KeyValueStore\DatabaseStorage::setIfNotExists()?

Nope, because that method relies on the return status of the merge query, and upsert doesn't support that.

Can we replace all calls to Drupal\Core\Database\Query\Merge? If so, can we deprecate Merge?

Not sure, we can explore that in a separate issue.

The parameter $fields, is that only an array with values or can it also be an array with keys and values?

It's only an array with values.

Can we do some performance testing to make sure that with this patch Drupal becomes faster.

I ran time ddev drush si demo_umami -y three times before and after this change:

Before:
0,06s user 0,02s system 0% cpu 22,412 total
0,06s user 0,02s system 0% cpu 22,000 total
0,06s user 0,02s system 0% cpu 22,642 total

After:
0,05s user 0,03s system 0% cpu 21,223 total
0,05s user 0,03s system 0% cpu 20,941 total
0,06s user 0,02s system 0% cpu 20,550 total

So it definitely improves things :)

andypost’s picture

smustgrave’s picture

what are the odds this could be a breaking change for a contrib module?

amateescu’s picture

@smustgrave, I don't see how this can impact contrib, it's only adding functionality :)

catch’s picture

This looks good, agreed we should leave any deprecation of ::merge() for a separate issue, but maybe good to open that?

I guess my only question would be what the chances are of a third party driver that can't handle this, but they could probably fall back to a merge query or something?

amateescu’s picture

Yup, that's the fallback for any db that doesn't support it.

amateescu’s picture

Title: Add support for unique / primary key constraints composed of multiple fields for Upsert querries, and start using it for config and key/value tables » Add support for unique / primary key constraints composed of multiple fields for Upsert queries
Related issues: +#3549762: [PP-1] Deprecate Merge queries
ghost of drupal past’s picture

Erm what.

  1. With keys added you are now at a point where MERGE and UPSERT functionalities overlap so much that MERGE can just fall back to an upsert if the more complex methods are not called.
  2. As far as I can see with ag -lic upsert |grep -v ests/ there's not a single use of upsert in core. Merge is used widely. Personal anecdote: I needed to check git to see when upsert was added despite I am credited on the original issue from 2015. I am fairly sure I never used it.
  3. Why on earth would you drop MERGE when support is arising finally? In the Merge class doxygen I wrote "Other databases (most notably MySQL, PostgreSQL and SQLite) will emulate this statement by running a SELECT and then INSERT or UPDATE" so many years ago when life was worth living but by now this is not true as PostgreSQL 15 in 2022 added MERGE support -- and D11 requires 16. Documentation and code both should be upgraded accordingly.

tl;dr: drop UPSERT, make MERGE smart, implement full native MERGE, use it for PostgreSQL for now, rejoice. (Oracle and SQL Server drivers will pick up the native MERGE, too. Let's hope one day M{y,aria}SQL will too.)

mondrake’s picture

I'm not sure we necessarily have to get rid of either MERGE or UPSERT - googled a bit and it looks like MERGE is the ANSI standard, whereas UPSERT can be considered an implementation of MERGE with a specific purpose of "update if exists, insert if not exists". Which AFAICS is what Drupal is in need of when currently using MERGE. For the time being, I'd suggest moving ahead here as a generalization of UPSERT, while keeping MERGE for now.

I'd not add a separate ::keys() method here. Rather overload the existing ::key() method to accept string|array. And maybe deprecate passing a string. Or introduce a ::primaryKeyColumns() method to make clear you need to pass all the PK columns to the query.

amateescu’s picture

Issue summary: View changes

I arrived at the same conclusion as @mondrake, that it would be beneficial to keep both. I'd suggest we look at improving MERGE in another issue, because that one will need a lot of work to get done properly if we want to fully implement it, including support for deletions in a single statement for example.

Or introduce a ::primaryKeyColumns() method to make clear you need to pass all the PK columns to the query.

UPSERT queries can work with unique keys as well (as documented by the current MR), and generalizing "unique keys" and "primary keys" brings us back to the keys() method.

As for overloading key() and not adding keys(), I can't say I have a strong preference, just that I find it somewhat weird to pass an array to a key() method. But as I said, no strong preference, maybe a framework manager can weigh in.

there's not a single use of upsert in core

That's because it needs the functionality added in this issue, which wasn't available in SQLite when we added the Upsert query class :)

mondrake’s picture

UPSERT queries can work with unique keys as well (as documented by the current MR), and generalizing "unique keys" and "primary keys" brings us back to the keys() method.

I thought about this. TBH I find a bit confusing that, if you have a primary key (which I think it's the case of 100% tables in Drupal today?), you could UPSERT based on a different unique key - because the upsert may try to update or insert a primary key that would fail uniqueness. In other words it seems to me allowing to shoot oneself's foot. But maybe it's not for here.

How about this?

  • add a ::keyColumns(array $columns) method the array being list<string>; so we don't get into primary vs unique debate but we clearly say we need a list of columns (that could be a list with one element only, of course)
  • deprecate the ::key(string $field) method on behalf of ::keyColumns()
amateescu’s picture

Issue tags: +Performance

That works for me :) Updated the MR and wrote a change record: https://www.drupal.org/node/3565717

mondrake’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

Left comments inline. We also need a deprecation test showing that the upsert using ::key() still works. The PostgreSQL test failures seem legit.

needs-review-queue-bot’s picture

Version: 11.x-dev » main

Updated MR target branch from 11.x to main branch.

nod_ made their first commit to this issue’s fork.

ghost of drupal past’s picture

Here's the thing:

  1. You can't drop MERGE because it supports vastly more than UPSERT. And, again, having support for that is great because finally ANSI 2003 implementation have started to emerge. (There was no need to Google this, there's a mile long comment in the Merge class doxygen towards this.)
  2. Having UPSERT do the same as MERGE means there's more maintenance and more confusion from developers

The original UPSERT issue had this reasoning:

Currently we implement a subset of the ANSI SQL:2003 MERGE command but another subset is desirable, one that SQLite implements with INSERT ON CONFLICT REPLACE, MySQL implements with REPLACE or INSERT ON DUPLICATE KEY. We do not intend this to be too general and will strongly recommend to use only on tables with a primary key and no other unique keys.

Yeah, I was wrong, it's not another subset, rather it's a subset of the subset that Drupal Merge implements. Merge should have been made smart to use upsert when it can. But now that upsert is more capable you can call INSERT ON DUPLICATE KEY / ON CONFLICT when only ::key/::keys/::fields were called. Then deprecate upsert and drop it in due time. Even if someone used upsert the change is easy and painless because the same API can be kept, it's only the $query = new Upsert that needs to be changed to new Merge, the rest is the same. What's wrong with this plan? I can be wrong again, I sure was before.

But I see no reason to keep both. There has been no advantage named to keeping it. None. It was just restated that we shouldn't drop it but not why. Let me ask plain: what advantage would Drupal gain from having both that justifies the doubled maintenance and double the query API people need to learn?

amateescu’s picture

@ghost of drupal past, I looked into how other DBALs (and some ORMs) deal with this distinction, and it seems that Upsert is the more popular option by a large margin:

DBAL/Query Builder Language Upsert Support Upsert Method Cross-DB Upsert SQL MERGE Support Notes
Doctrine DBAL PHP ⚠️ Manual Raw SQL ⚠️ Manual Must write DB-specific SQL
Laravel Query Builder PHP ✅ Good upsert(), updateOrInsert() Underlying layer for Eloquent
SQLAlchemy Core Python ✅ Excellent .on_conflict_do_update() ⚠️ Partial ⚠️ Manual Dialect-specific insert constructs
Django QuerySet Python ✅ Good update_or_create(), bulk_create() Database abstraction layer
Prisma Client JS/TS ✅ Excellent upsert() Generated client, auto-translates
TypeORM QueryBuilder JS/TS ⚠️ Basic Manual .onConflict() ⚠️ Partial Requires DB-specific syntax
Sequelize JS/TS ✅ Good upsert() Promise-based query interface
Drizzle JS/TS ✅ Good .onConflictDoUpdate() ⚠️ Partial Lightweight, minimal overhead
JDBC/Hibernate Java ✅ Good ON CONFLICT DO clause (6.5+) ⚠️ Partial ✅ Good Can use native SQL or JPQL
jOOQ Java ✅✅ Excellent .onConflict(), .mergeInto() Type-safe SQL builder
ADO.NET/EF Core C# ⚠️ Limited Raw SQL or stored procs ⚠️ Manual No abstraction for upsert
Active Record Ruby ✅ Excellent upsert(), upsert_all() Clean query interface
GORM Go ✅ Good .Clauses() with conflict ⚠️ Partial Clause-based query building

So my proposal remains the same, keep the improved Upsert implementation from this issue, and let Merge evolve on its own with proper APIs for what we decide to support from the SQL standard.

mondrake’s picture

Since we're at this, I think we should also change Upsert::execute() to return true or fail with an exception.

The base class has this, which is a signal of fragility:

  /**
   * Executes the UPSERT operation.
   *
   * @return int
   *   An integer indicating the number of rows affected by the operation. Do
   *   not rely on this value as a precise indication of the actual rows
   *   affected: different database engines return different values.
   */

The mysqli driver in the MR cannot return an int, see comments inline.

So changing to true would just be more solid IMO.

mondrake’s picture

Status: Needs work » Needs review

Tests are green on all 4 core drivers.

ghost of drupal past’s picture

I am working on https://git.drupalcode.org/project/drupal/-/merge_requests/14356 mostly to demonstrate what I meant: certain merge queries can be run as upserts, this doesn't need a separate upsert API.

amateescu’s picture

mostly to demonstrate what I meant: certain merge queries can be run as upserts

No one questioned that though. It's pretty obvious that it should work :)

ghost of drupal past’s picture

> It's pretty obvious that it should work :)

Then why are we keeping upserts? There is still no answer to the why two APIs would be worth it. When I asked above I got an LLM generated table of what others are doing.

amateescu’s picture

Both @mondrake and myself answered that in #24, #25 and #32 (the paragraph below the LLM-generated table).

ghost of drupal past’s picture

It seems we have a breakdown in communication. Perhaps I was not clear in my proposal:

  1. replace all $connection->upsert calls with $connection->merge
  2. fold all the functionality of upsert into merge.

The advantages are:

  1. There's one API instead of two so developers need to learn only one.
  2. There's less core code to maintain, the base upsert class can be removed and eventually, as full ANSI SQL:03 support spreads, proprietary upsert implementations can be removed as well. The first one is PostgreSQL which had ON CONFLICT DO UPDATE and now it's not necessary. Hopefully one day the entire "can we do an upsert" logic will be removed from Merge.

Can you point out any specific weaknesses of this plan? Is there any upsert usage that is not covered like this? What code snippet could you provide that justifies the existence of upsert?

mondrake’s picture

#39 I would rather say one way communication only.

1) Let me be bluntly clear: I am not interested in debating this, sorry. Both @amatescu and myself already gave our input. I do not have other.

2) This issue title is 'Add support for unique / primary key constraints composed of multiple fields for Upsert queries'. You are deranging the issue. Please open another one where a MERGE vs UPSERT policy decision can be debated by those interested.

chi’s picture

we should also change Upsert::execute() to return true or fail with an exception.

What if the query updated zero records. Will it be considered as failure?

mondrake’s picture

#16

Can we use the upsert also for Drupal\Core\KeyValueStore\DatabaseStorage::setIfNotExists()?

Nope, because that method relies on the return status of the merge query, and upsert doesn't support that.

Actually IMHO we do not need MERGE nor UPSERT there: what is required is simply an INSERT, capturing with a no-op the exception that is triggered if the record exists already based on the PK. One SQL statement executed only vs the two that MERGE currently requires in Drupal.

That'd be a separate issue though.

mondrake’s picture

#41

What if the query updated zero records. Will it be considered as failure?

That's an impossible situation. Assuming a single row is UPSERTED, either it does not exists yet, and then it is added (=INSERT), or it exists already, and then it is UPDATEd. So one row is always involved. A failure is due either to context/syntax (missing table, wrong key, wrong columns) or to database in invalid state (connection down, database space or quota issue, tech problems, etc). But those conditions trigger execution exceptions, should not be treated as an executed SQL statement.

Same if multiple rows are UPSERTED, with the additional complexity that the rows count for the UPDATEd rows needs to distinguish between rows matched (= the number of rows that are potentially subject to change based on the conditions) and rows changed (= the number of matched rows that had changes to column values not part of the key columns).

So IMHO returning a value from Upsert::execute() is, simply, irrelevant. We should just catch exceptions due to the factors above. It would even be better to give it a void return type.

Now, unfortunately myqsli::$affected_rows returns 0 in case UPSERT updates the row. Hence the comment in its execute() method DOCblock:

   * @return true
   *   Differently from PDO MySQL, we are not able to return a number of rows
   *   here, because $stmt->rowCount() will return a positive number only in
   *   case the UPSERT actually inserts a record (updates are not tracked).
   *   mysqli->info also does not help as the INSERT INTO ... ON DUPLICATE KEY
   *   UPDATE statement execution does not fill in any data. So we return TRUE
   *   unless an exception is thrown.

This makes mysqli inconsistent vs the other drivers, which is just a complication we are adding on our shoulders since we should not care at all about the returned value.

mondrake’s picture

Let's revert the mysqli changes then, and see if tests still fail and why.

mondrake’s picture

mysqli kernel tests fail after the revert, where all other drivers are green.

mondrake’s picture

Status: Needs review » Needs work
mondrake’s picture

longwave’s picture

I think the MR here should not try to change everything at once. Instead let's just concentrate on upsert taking multiple keys (as per the issue title) and defer the conversion to followups.

I also think that key() being allowed to take string|array is the simpler API name? For the array variant, you are still passing a single composite key, structured as an array, so the name makes sense?

longwave’s picture

Status: Needs work » Needs review

MR!15248 extracts the minimum set of changes from MR!12959, and keeps backward compatibility by retaining the old key() method.

This is a problem for downstream database drivers, but won't this be a problem whatever we do here? As soon as we commit a multiple key upsert to core they won't know how to handle it.

mondrake’s picture

#50 maybe we could deprecate passing a string to ::key(), informing that as of D12 contrib/custom drivers have to support composite key as an array of column names? Yes won’t work if the driver overrides the method, but that should be rather an edge case.

longwave’s picture

Is that worth it though? That forces downstream users of upsert() to change all existing cases. The current MR only forces database driver maintainers (of which there are probably less than 10) to make a change to support this new feature, which they are going to have to do anyway.

mondrake’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs tests

#52 good point. Shall we, then?

alexpott changed the visibility of the branch 2547493-add-support-for to hidden.

alexpott’s picture

Version: main » 11.x-dev
Category: Task » Feature request
Status: Reviewed & tested by the community » Fixed

Committed and pushed 3ae3915e6fa to main and 2e8705a4bb1 to 11.x. Thanks!

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

  • alexpott committed 2e8705a4 on 11.x
    feat: #2547493 Add support for unique / primary key constraints composed...

  • alexpott committed 3ae3915e on main
    feat: #2547493 Add support for unique / primary key constraints composed...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.