A fellow Acquian is faced with an issue that I've heard of before - given a source which is a longish CSV file, which is to be regularly synced (new records plus changed records updated in Drupal), but no suitable field for using highwater marks - how do we detect changed records to avoid rewriting everything on each import operation? He's using an md5 hash of the source row, which I think is the best that can be done under those circumstances, and I think that could be supported directly in MigrateSource:
- Add a hash column to the migrate map table.
- Support a track_hashes option to the MigrateSource constructor.
- When track_hashes is enabled, take a hash of the raw source row (before prepareKey() or anything else is called) and save it away.
- If we find that the source row already has a map table entry, compare the incoming hash to the saved hash. If they match, skip it (before calling prepareRow).
- When saveIDMapping is called when done processing a row, save the hash.
I'm focused on the wizard API work at the moment, but patches welcome....