Problem/Motivation
In #2542776: Add an Upsert class, we added an UPSERT query to the database system which only worked on tables with one unique index / primary key, composed of a single field. This limitation was imposed by the minimum SQLite version supported by Drupal at the time, which lacked proper support for UPSERT queries. In the meantime, SQLite added a real UPSERT in 3.24.
Proposed resolution
For Drupal 9.x we increased the minimum version of SQLite to 8.26 in #3107155: Raise SQLite version requirement to 3.26 in Drupal 9 , and one of the main motivations was the improved UPSERT capability, per #3107155-15: Raise SQLite version requirement to 3.26 in Drupal 9 .
Now that all three core database drivers support UPSERT queries on unique/primary key constraints composed of multiple fields, we should take advantage of it and replace core's usage of "Merge" queries where possible.
The benefit is that a Merge query is potentially composed of 3 separate statements (SELECT, INSERT and UPDATE), while UPSERTs can do the same thing (in most situations) in a single statement.
Remaining tasks
Review.
User interface changes
Nope.
API changes
API addition, TBD.
Data model changes
Nope.
Release notes snippet
TBD.
| Comment | File | Size | Author |
|---|
Issue fork drupal-2547493
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 2547493-upsert-mvp
changes, plain diff MR !15248
- 2547493-add-support-for
changes, plain diff MR !12959
Comments
Comment #2
mgiffordComment #3
amateescu commentedIt was decided in the parent issue (#2542776: Add an Upsert class) that we cannot support multiple fields, mostly because of SQLite, so I'm going to close this one.
Comment #4
amateescu commentedNow that Drupal 9.x requires SQLite 3.26, which has support for proper Upsert queries, we can finally do this :)
This patch depends on #3159073: Use the new UPSERT capability from SQLite 3.24, so I'm uploading a combined patch as well.
Comment #5
amateescu commentedComment #6
daffie commentedPatch looks good.
Why do we do a call to the method fields()? We are not doing that in the method key().
Can we combine the 2 if-statements in a single if-statement.
The parameter $fields, is that only an array with values or can it also be an array with keys and values?
Comment #13
jonmcl commented@amateescu: Thank you for your work on this.
Unfortunately it no longer applies to 9.5.x or 10.x. Any chance you have rerolled this patch recently?
Comment #14
mlncn commentedStripped down, MySQL only, no tests, try at an updated patch. Sorry will try to do more. Though i do think the adding the upsert multivalue capability and using elsewhere in Drupal can and should be separated issues.
Comment #16
amateescu commentedThanks @daffie for the review! Addressed most points in the MR, replying below to the other ones:
Nope, because that method relies on the return status of the merge query, and upsert doesn't support that.
Not sure, we can explore that in a separate issue.
It's only an array with values.
I ran
time ddev drush si demo_umami -ythree times before and after this change:Before:
0,06s user 0,02s system 0% cpu 22,412 total
0,06s user 0,02s system 0% cpu 22,000 total
0,06s user 0,02s system 0% cpu 22,642 total
After:
0,05s user 0,03s system 0% cpu 21,223 total
0,05s user 0,03s system 0% cpu 20,941 total
0,06s user 0,02s system 0% cpu 20,550 total
So it definitely improves things :)
Comment #17
andypostComment #18
smustgrave commentedwhat are the odds this could be a breaking change for a contrib module?
Comment #19
amateescu commented@smustgrave, I don't see how this can impact contrib, it's only adding functionality :)
Comment #20
catchThis looks good, agreed we should leave any deprecation of ::merge() for a separate issue, but maybe good to open that?
I guess my only question would be what the chances are of a third party driver that can't handle this, but they could probably fall back to a merge query or something?
Comment #21
amateescu commentedYup, that's the fallback for any db that doesn't support it.
Comment #22
amateescu commentedOpened #3549762: [PP-1] Deprecate Merge queries.
Comment #23
ghost of drupal pastErm what.
keysadded you are now at a point where MERGE and UPSERT functionalities overlap so much that MERGE can just fall back to an upsert if the more complex methods are not called.ag -lic upsert |grep -v ests/there's not a single use of upsert in core. Merge is used widely. Personal anecdote: I needed to check git to see when upsert was added despite I am credited on the original issue from 2015. I am fairly sure I never used it.tl;dr: drop UPSERT, make MERGE smart, implement full native MERGE, use it for PostgreSQL for now, rejoice. (Oracle and SQL Server drivers will pick up the native MERGE, too. Let's hope one day M{y,aria}SQL will too.)
Comment #24
mondrakeI'm not sure we necessarily have to get rid of either MERGE or UPSERT - googled a bit and it looks like MERGE is the ANSI standard, whereas UPSERT can be considered an implementation of MERGE with a specific purpose of "update if exists, insert if not exists". Which AFAICS is what Drupal is in need of when currently using MERGE. For the time being, I'd suggest moving ahead here as a generalization of UPSERT, while keeping MERGE for now.
I'd not add a separate
::keys()method here. Rather overload the existing::key()method to acceptstring|array. And maybe deprecate passing a string. Or introduce a::primaryKeyColumns()method to make clear you need to pass all the PK columns to the query.Comment #25
amateescu commentedI arrived at the same conclusion as @mondrake, that it would be beneficial to keep both. I'd suggest we look at improving MERGE in another issue, because that one will need a lot of work to get done properly if we want to fully implement it, including support for deletions in a single statement for example.
UPSERT queries can work with unique keys as well (as documented by the current MR), and generalizing "unique keys" and "primary keys" brings us back to the
keys()method.As for overloading
key()and not addingkeys(), I can't say I have a strong preference, just that I find it somewhat weird to pass an array to akey()method. But as I said, no strong preference, maybe a framework manager can weigh in.That's because it needs the functionality added in this issue, which wasn't available in SQLite when we added the Upsert query class :)
Comment #26
mondrakeI thought about this. TBH I find a bit confusing that, if you have a primary key (which I think it's the case of 100% tables in Drupal today?), you could UPSERT based on a different unique key - because the upsert may try to update or insert a primary key that would fail uniqueness. In other words it seems to me allowing to shoot oneself's foot. But maybe it's not for here.
How about this?
::keyColumns(array $columns)method the array beinglist<string>; so we don't get into primary vs unique debate but we clearly say we need a list of columns (that could be a list with one element only, of course)::key(string $field)method on behalf of::keyColumns()Comment #27
amateescu commentedThat works for me :) Updated the MR and wrote a change record: https://www.drupal.org/node/3565717
Comment #28
mondrakeLeft comments inline. We also need a deprecation test showing that the upsert using ::key() still works. The PostgreSQL test failures seem legit.
Comment #29
needs-review-queue-bot commentedUpdated MR target branch from 11.x to main branch.
Comment #31
ghost of drupal pastHere's the thing:
The original UPSERT issue had this reasoning:
Yeah, I was wrong, it's not another subset, rather it's a subset of the subset that Drupal Merge implements. Merge should have been made smart to use upsert when it can. But now that upsert is more capable you can call INSERT ON DUPLICATE KEY / ON CONFLICT when only ::key/::keys/::fields were called. Then deprecate upsert and drop it in due time. Even if someone used upsert the change is easy and painless because the same API can be kept, it's only the $query =
new Upsertthat needs to be changed tonew Merge, the rest is the same. What's wrong with this plan? I can be wrong again, I sure was before.But I see no reason to keep both. There has been no advantage named to keeping it. None. It was just restated that we shouldn't drop it but not why. Let me ask plain: what advantage would Drupal gain from having both that justifies the doubled maintenance and double the query API people need to learn?
Comment #32
amateescu commented@ghost of drupal past, I looked into how other DBALs (and some ORMs) deal with this distinction, and it seems that Upsert is the more popular option by a large margin:
upsert(),updateOrInsert().on_conflict_do_update()update_or_create(),bulk_create()upsert().onConflict()upsert().onConflictDoUpdate()ON CONFLICT DOclause (6.5+).onConflict(),.mergeInto()upsert(),upsert_all().Clauses()with conflictSo my proposal remains the same, keep the improved Upsert implementation from this issue, and let Merge evolve on its own with proper APIs for what we decide to support from the SQL standard.
Comment #33
mondrakeSince we're at this, I think we should also change
Upsert::execute()to returntrueor fail with an exception.The base class has this, which is a signal of fragility:
The mysqli driver in the MR cannot return an int, see comments inline.
So changing to
truewould just be more solid IMO.Comment #34
mondrakeTests are green on all 4 core drivers.
Comment #35
ghost of drupal pastI am working on https://git.drupalcode.org/project/drupal/-/merge_requests/14356 mostly to demonstrate what I meant: certain merge queries can be run as upserts, this doesn't need a separate upsert API.
Comment #36
amateescu commentedNo one questioned that though. It's pretty obvious that it should work :)
Comment #37
ghost of drupal past> It's pretty obvious that it should work :)
Then why are we keeping upserts? There is still no answer to the why two APIs would be worth it. When I asked above I got an LLM generated table of what others are doing.
Comment #38
amateescu commentedBoth @mondrake and myself answered that in #24, #25 and #32 (the paragraph below the LLM-generated table).
Comment #39
ghost of drupal pastIt seems we have a breakdown in communication. Perhaps I was not clear in my proposal:
The advantages are:
Can you point out any specific weaknesses of this plan? Is there any upsert usage that is not covered like this? What code snippet could you provide that justifies the existence of upsert?
Comment #40
mondrake#39 I would rather say one way communication only.
1) Let me be bluntly clear: I am not interested in debating this, sorry. Both @amatescu and myself already gave our input. I do not have other.
2) This issue title is 'Add support for unique / primary key constraints composed of multiple fields for Upsert queries'. You are deranging the issue. Please open another one where a MERGE vs UPSERT policy decision can be debated by those interested.
Comment #41
chi commentedWhat if the query updated zero records. Will it be considered as failure?
Comment #42
mondrake#16
Actually IMHO we do not need MERGE nor UPSERT there: what is required is simply an INSERT, capturing with a no-op the exception that is triggered if the record exists already based on the PK. One SQL statement executed only vs the two that MERGE currently requires in Drupal.
That'd be a separate issue though.
Comment #43
mondrake#41
That's an impossible situation. Assuming a single row is UPSERTED, either it does not exists yet, and then it is added (=INSERT), or it exists already, and then it is UPDATEd. So one row is always involved. A failure is due either to context/syntax (missing table, wrong key, wrong columns) or to database in invalid state (connection down, database space or quota issue, tech problems, etc). But those conditions trigger execution exceptions, should not be treated as an executed SQL statement.
Same if multiple rows are UPSERTED, with the additional complexity that the rows count for the UPDATEd rows needs to distinguish between rows matched (= the number of rows that are potentially subject to change based on the conditions) and rows changed (= the number of matched rows that had changes to column values not part of the key columns).
So IMHO returning a value from
Upsert::execute()is, simply, irrelevant. We should just catch exceptions due to the factors above. It would even be better to give it avoidreturn type.Now, unfortunately
myqsli::$affected_rowsreturns 0 in case UPSERT updates the row. Hence the comment in its execute() method DOCblock:This makes mysqli inconsistent vs the other drivers, which is just a complication we are adding on our shoulders since we should not care at all about the returned value.
Comment #44
mondrakeLet's revert the mysqli changes then, and see if tests still fail and why.
Comment #45
mondrakemysqli kernel tests fail after the revert, where all other drivers are green.
Comment #46
mondrakeComment #47
mondrakeComment #48
longwaveI think the MR here should not try to change everything at once. Instead let's just concentrate on
upserttaking multiple keys (as per the issue title) and defer the conversion to followups.I also think that
key()being allowed to takestring|arrayis the simpler API name? For the array variant, you are still passing a single composite key, structured as an array, so the name makes sense?Comment #50
longwaveMR!15248 extracts the minimum set of changes from MR!12959, and keeps backward compatibility by retaining the old
key()method.This is a problem for downstream database drivers, but won't this be a problem whatever we do here? As soon as we commit a multiple key upsert to core they won't know how to handle it.
Comment #51
mondrake#50 maybe we could deprecate passing a
stringto::key(), informing that as of D12 contrib/custom drivers have to support composite key as an array of column names? Yes won’t work if the driver overrides the method, but that should be rather an edge case.Comment #52
longwaveIs that worth it though? That forces downstream users of
upsert()to change all existing cases. The current MR only forces database driver maintainers (of which there are probably less than 10) to make a change to support this new feature, which they are going to have to do anyway.Comment #53
mondrake#52 good point. Shall we, then?
Comment #55
alexpottCommitted and pushed 3ae3915e6fa to main and 2e8705a4bb1 to 11.x. Thanks!