By default, each time a migration is run, any previously unimported source items are imported (along with any previously-imported items marked for update). If the source data contains a timestamp that is set to the creation time of each new item, and changed to the update time every time the item is updated, then you can have those updated items automatically reimported by setting the field as your highwater field.

To take advantage of highwater marks, define $this->highwaterField in your Migration constructor to indicate what field returned by your source query will reflect updates to your data.

So, if your query looks like

    $query = db_select('migrate_example_wine', 'w')
             ->fields('w', array('wineid', 'name', 'body', 'excerpt', 'accountid',
              'posted', 'last_changed', 'variety', 'region', 'rating'));

and the last_changed field is a (UNIX integer) timestamp that is set to the created date/time when the source object is created, and updated whenever it is changed, in your constructor include:

    $this->highwaterField = array(
      'name' => 'last_changed',
      'alias' => 'w',  
      'type' => 'int', 
    );

Because the last_changed field is a UNIX timestamp, 'type' => 'int" is required here. If you have date/time fields that are lexicographically sortable (e.g., '2011-05-19 17:53:12'), you can omit the 'type' entry.

If your source is an SQL database, sort your query by the highwater field:

    $query->orderBy('last_changed');

What precisely happens when your migration is setup to use highwater marks?

  1. The first time you import, Migrate saves the highwater field value as the "highwater mark".
  2. Over time, new content is added (with last_updated values greater than the highwater mark saved by Migrate), and old content is updated (changing the last_updated value to be greater than the highwater mark).
  3. The next time you run import, Migrate will automatically alter your source query to pull all content where the highwater field is greater than its saved highwater mark. That means it will both import any content added since the last time it ran, and also re-import any content changed since that time.

Thus, if you want to schedule (e.g., via cron) regular updates of your destination site taking into account both inserts and updates of your source data, highwater marks are very useful.

Comments

chrowe’s picture

Is there a way to check if the destination has been modified since the last migration and only update the destination node if it is unchanged, so you don't overwrite any edits on the destination.

dalin’s picture

Beware if for example your highwater field is datetime, you cannot do something like this in prepare():

$row->$date_field = str_replace(' ', 'T', $row->$date_field);

Because this means that the highwater field stored by Migrate will now have the 'T', but when it goes to compare with the date of that row it will do something like

if ('2012-02-04 18:02:00' <= '2012-02-04T17:01:45') {
  // Highwater mark not reached.
  continue;
}

Solution: Grab the field twice in your source query, once to import, and once just for the highwater mark.

________________________
Dave Hansen-Lange
Director of Technical Strategy, FourKitchens.com

Benjamin Birkenhake’s picture

Does the Highwater Mark also work with XML Migrations?
I tried a few things, but always get the same Error:

"Undefined property: stdClass::$field_highwater File /sites/all/modules/contrib/migrate/includes/source.inc, line 304"

And another Question: What doeses this mean: " (along with any previously-imported items marked for update)".
How do I mark Items for Update?
I read through the whole documentation but could find anything on that …

dgtlmoon’s picture

I think it's a DB only thing :(

mvdve’s picture

It is possible when the XML has a field with time or an update counter.

rogernyc’s picture

Is it possible to apply more than one highwater mark to a single migration?

For example, to get a row to update if either a 'last_changed' field in the primary source table OR a 'last_changed' field in a leftjoined source table has changed.

thanks.

neruda001’s picture

Hi,
I've a date field to use as highwater and no time information is provided to me, therefore two edit on the source row on the same day will result in the same highwater value. In a scenario where a content is updated many times a day and migration runs many times a day also, only one update a day will be migrated. Next updates on the same source row in the same day will not be migrated because migration module consider only record with highwater field > of the last imported highwater.
When you have no time information and using date as highwater field, consider to modify the migrate module to catch source records having highwater field >= of the last highwater imported value.

In this scenario I've edited in line 322 in /plugins/source/sql.inc

$conditions->condition($highwater, $this->activeMigration->getHighwater(),'>='); (>= instead of > condition)

and in line 313 in /includes/source.inc
if ($row->{$this->highwaterField['name']} >= $this->activeMigration->getHighwater()) { (>= instead of > condition)

this will apply globally on all the migrations, even that having date and time as highwater, or having other type of highwater field.

I'm considering if it would be useful to propose to add this feature as migration configuration in future migrate module version.

Pablo

pavel.taikov’s picture

I use custom JSON source. Data is unsorted by the highwater. As I have noticed highwater mark in migrate_status table updates after each item processed during same migration. So if some item needs to be updated but follows another with greater highwater - update fails. Am I right or is there another reason? Any beauty solution, besides sorting the source?