Merge queries using db_merge

Last updated on
14 October 2016

Merge queries are a special type of hybrid query. Although a syntax is defined for them in the SQL 2003 specification, virtually no database supports the standard syntax. Most, however, provide some alternate implementation using a database-specific syntax. The Merge query builder in Drupal abstracts the concept of a Merge query out into a structured object that can be compiled down to the appropriate syntax for each database. These are sometimes called "UPSERT" queries, a combination of UPDATE and INSERT.

In the general sense, a Merge query is a combination of an Insert query and an Update query. If a given condition is met, such as a row with a given primary key already existing, then an Update query is run. If not, an Insert query is run. In the most common case, it is equivalent to:

if (db_query("SELECT COUNT(*) FROM {example} WHERE id = :id", array(':id' => $id))->fetchField()) {
  // Run an update using WHERE id = $id
}
else {
  // Run an insert, inserting $id for id 
}

The actual implementation varies widely from database to database. Note that while Merge queries are conceptually an atomic operation, they may or may not be truly atomic depending on the implementation for a specific database. The MySQL implementation is a single atomic query, for example, but the degenerate case (above) is not.

The most common idioms for Merge queries are listed below.

Just set it

db_merge('example')
  ->key(array('name' => $name))
  ->fields(array(
      'field1' => $value1,
      'field2' => $value2,
  ))
  ->execute();

In the above example, we instruct the query to operate on the "example" table. We then specify one key field, 'name', with a value of $name. We then specify an array of values to set.

If a row already exists in which the field "name" has the value $name, then fields field1 and field2 will be set to the corresponding values in that existing row. If such a row does not exist, one will be created in which name has the value $name, field1 has the value $value1, and field2 has the value $value2. Thus at the end of the query, the end result is the same regardless of whether or not the row already existed.

Conditional set

In some cases, you may want to set values differently depending on whether or not the record, as identified by the key() fields, already existed. There are two ways to do that.

db_merge('example')
  ->insertFields(array(
      'field1' => $value1,
      'field2' => $value2,
  ))
  ->updateFields(array(
    'field1' => $alternate1,
  ))
  ->key(array('name' => $name))
  ->execute();

The above example will behave the same as the first, except that if the record already exists and we are updating it, field1 will be set to $alternate1 instead of $value1 and field2 will not be affected. The updateFields() method accepts either a single associative array of values or two parallel numeric arrays, one of fields, one of values, that must be in the same order.

db_merge('example')
  ->key(array('name' => $name))
  ->fields(array(
      'field1' => $value1,
      'field2' => $value2,
  ))
  ->expression('field1', 'field1 + :inc', array(':inc' => 1))
  ->execute();

In this example, if the record already exists then field1 will be set to its current value plus 1. That makes it very useful for "counter queries", where you want to increment some counter in the database every time a certain event happens. field2 will still be set to the same value regardless of whether the record exists or not.

Note that expression() may be called multiple times, once for each field that should be set to an expression if the record already exists. The first parameter is the field, the second is an SQL fragment indicating the expression the field should be set to, and the optional third parameter is an array of placeholder values to insert into the expression.

There is also no requirement that a field used in expression() be already present in fields().

Precedence

Given the above API it is quite possible to define queries that do not logically make sense, say if a field is set to both be ignored and to be set to an expression if the record already exists. To minimize potential errors, the following rules apply:

  • If a field is set to an expression(), that takes priority over updateFields().
  • If values are specified in updateFields(), only those fields will be altered if the record already exists. Fields not specified in updateFields() will not be affected.

Note that it may still be possible to define queries that do not make sense. It is up to the developer to ensure that a nonsensical query is not specified as the behavior in that case is undefined.