Hello,
I have got existing Drupal 8.5 site with hundreds of nodes. With migrate, if i try to upgrade, new nodes are rather created. Node have a field which is being used as key however that seem to be not working. It seems that node are only updated if their record present in migrate_map_xyz. Any suggestions?

CSV data

serial,type
ABC123,Laptop
XYZ124,Laptop
QRS567,Desktop

Comments

ranavaibhav created an issue. See original summary.

dillix’s picture

Version: 8.5.0-rc1 » 8.6.x-dev
ranavaibhav’s picture

Any advice on this please?

quietone’s picture

@ranavaibhav, How are you running the migration, drush? Do you want to overwrite the existing node or just some fields on the node?

mikeryan’s picture

You'll need to set overwrite_properties in your destination configuration to a list of the D8 fields you wish to overwrite - see https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugi....

ranavaibhav’s picture

@quitetone - yes, using drush with - - update flag. In my example, there are three fields out of which one is the unique key and other two are being updated.

@mikeryan - correct, I do have that overwrite_properties set. Here is updated code https://codeshare.io/5PPppd

ranavaibhav’s picture

Is it possible at all to update pre-existing content via Migrate?

mikeryan’s picture

Yes. Note that overwrite_properties must list the destination properties to be overwritten, not the source properties that feed them.

ranavaibhav’s picture

StatusFileSize
new9.36 KB

This is really not working. Tried almost every possible available options.

Recap
Presently there are "existing" content which were created manually prior to migration. For those, field_type need updated. field_serial is the key. With below code, the migration creates rather new nodes instead of updating existing nodes

Node Fields
2949564_node_fields.PNG

YML:

id: simple_assets
migration_group: asset_import
label: Simple Assets

source:
  plugin: csv
  path: public://migrate/simple_assets.csv
  delimiter: ','
  enclosure: '"'
  header_row_count: 1
  keys:
    - serial
  column_names:
    0:
      serial: 'Serial/Service Tag'
    1:
      type: 'Type'
  track_changes: true

process:
  type:
    plugin: default_value    
  uid:
    plugin: default_value
    default_value: 19
  title:
    plugin: concat
    source:
      - type
      - serial
    delimiter: ' Delimiter'
  field_serial: serial
  field_type:
    plugin: entity_generate
    source: type

destination:
  plugin: entity:node
  default_bundle: article
  translations: true
  overwrite_properties:
    -field_type
juliakoelsch’s picture

I also was unable to update existing content. I tried to trace the use of overwrite_properties in the code, but could not figure out the issue.

Our YML file was very similar to yours, @ranavaibhav: csv source, trying to update one field -- a multi-value text field. I tried keying off of a field in the node (product number), and I also updated our csv to add a column with node id to key off of that, and neither worked.

I scoured the internet for a working example and couldn't find one. Would someone who has been successful with an import that only updates existing nodes post their yml file?

mikeryan’s picture

As it happens, I just minutes ago published a blog post and corresponding code - although this post doesn't discuss overwrite_properties (that's planned in a follow-up), you can see the working configuration at https://gitlab.com/mikeryan/d8-migrate-example-002/blob/master/acme_migr....

mikeryan’s picture

Oh, @ranavaibhav - I see the reason new nodes are created is that you aren't specifying what nodes to update - you need to map the nids in your processing pipeline.

I also see "type:" with a default_value plugin but no default_value specified. Since you have default_bundle specified in your destination plugin, you should just remove the mapping to type from the process section.

mikeryan’s picture

Status: Active » Postponed (maintainer needs more info)

Well, I went ahead and published that blog post, adding a paragraph for the case where you're updating already-existing content. The key example lines not previously in this thread:

process:
  nid: nid_to_update

Where "nid_to_update" is a column in your CSV of the nids you're updating.

juliakoelsch’s picture

Thank you @mikeryan! I'll check out your blog posts, and compare to my efforts to see what I am doing incorrectly. I really appreciate you taking the time to share your knowledge.

ranavaibhav’s picture

Echo @Spry_Julia. Thank you so very much @mikeryan. Your blog posts are extremely helpful.

I guess I misunderstood the complete concept then. The catch here was that I needed to specify nid_to_update in my process plugin. I was under the impression that the "key" which is in my case "serial" would be used during the process to lookup existing entity but that is not the case.

In my setup we have nodes being manually created by many folks during the day and during nightly data sync, we would need certain nodes updated for which the manual entry was incorrect.

Because the nodes are created manually, the map table has no knowledge of these entries thus I would now have to find a way to extract NID for the nodes first which I could populate in CSV for migration to be successful.

ranavaibhav’s picture

I though I would share how I got around this problem in D7. This is pure hack but worked like charm. I guess a process plugin could be built here to lookup existing entity via defied key to avoid duplicate entries.

function migrate_mapper_assistance_node_insert($node) {

	if ($node->type == 'asset'){

		// Extract Service Tag value from the node
		$service_tag = $node->field_service_tag['und'][0]['value'];

		// Query 'migrate_map_assetmigration' table to check whether any record exist for the Service Tag. If not, create one.
		$st_query = db_query('SELECT sourceid1 FROM {migrate_map_assetmigration} WHERE sourceid1=:service_tag', array(':service_tag' => $node->field_service_tag['und'][0]['value']))->fetchField();

		if($st_query == NULL) {
			//drupal_set_message(t('Service Tag NOT found.'));
			db_insert('migrate_map_assetmigration')
				->fields(array(
					'sourceid1' => $service_tag,
					'destid1' => $node->nid,
					'needs_update' => 0,
					'rollback_action' => 0,
					'last_imported' => 0,
					'hash' => '',
				))
				->execute();
		} 
		else {
			//drupal_set_message(t('Service Tag found.'));
		}
	}
}

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

gravit’s picture

It is a little odd that we are all the way to 8.6 and this is still an issue, as this was a major use case in drupal 7 with the feeds module.

Currently I have also had no success tricking migrate into updating existing NIDs that weren't originally created by migrate.

This design just does not work in a user-land scenario where you have a live website that has data being created by users through the UI, and you want to use migration to update fields of that data from external sources. (Something like a "stock quantity" field for a product SKU for example).

@ranavaibhav - maybe a solution would be some type of shim / map generate module that would look at the new node IDs on a production site, and inject a key to the map table - so that you could use migration to update them. Ideally, a migrate plugin should allow us to just specify a GUID (just like feeds) to go and update content regardless of which module was the creator of that content...

mikeryan’s picture

@gravit: If you map the nid you want updated in the process pipeline, and specify the properties you want to alter in overwrite_properties, it does work just fine - we're doing it in my current project.

If the data source doesn't have the desired nid directly, if you're using migrate_plus you can use the entity_lookup process plugin to look it up from your unique key, see https://www.mtech-llc.com/blog/ada-hernandez/entity-lookup-generate-migr....

joekers’s picture

Just wanted to say thanks @mikeryan for your comments here and your blog posts.

I managed to get the overwrite_properties working well where I had one migrate importing product variations and product entities from one database, then a different migrate updating specific fields on these entities from a separate database.

mikeryan’s picture

Status: Postponed (maintainer needs more info) » Fixed

You're welcome!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

aharown07’s picture

Old thread, but this one still shows up frequently in searches on how to use migrate to update already existing Drupal content. There doesn't seem to be a lot out there on this topic.
So, some more detail some may find helpful.

My scenario: I had several thousand media entities I needed to alter the created dates on. They originally came from an outside Drupal data source using migrate_source_csv, but I forgot that the result would give each media item a 'created' date equal to the time of the import and didn't include correct data in the csv source.
Of course, the csv source doesn't have mid numbers in it for matching to the existing media items.

So here's an example of how to update existing media items with csv data, when you don't have mid numbers in the csv but do have another unique identifier you can use.

Note: this updated existing items, but at the end of the import, the status redout says "xxx created" and "0 updated." I don't know why. I don't have any duplicates. My guess is that in a single import config file situation, 'updated' is not tracked.

I use tab as delimiter in my 'csv' files.
field_legacyid is where the old unique identifier is stored on each media item from the original migration
The media items being updated are of a custom type: document

The entity_lookup processor is necessary because of the lack of mid values in the source csv.

uuid: 1bcec3e7-0a49-4473-87a2-6dca09b91abjan-docmedupd
id: doc_mediaupdate
label: Import media field data to set create date
migration_group: updates

source:
  plugin: 'csv'
  path: '/srv/imports/docmediaupd.tab'
  delimiter: "\t"
  enclosure: '"'
  header_offset: null
  ids: [legacyid]
  fields:
    0:
      name: legacyid
      label: 'Unique identifier'
    1:
      name: legacycreatedate
      label: 'Original created date' 

process:
  mid:
    #use legacyid to find matching media items
    #and pull mid from each
    #by matching legacyid to field_legacyid
    plugin: entity_lookup
    source: legacyid
    entity_type: media
    buldle-key: bundle
    bundle: document
    value_key: field_legacyid
    ignore_case: true
  created:
    #set created date: unix format is required, but
    #source data is in n/j/Y format, so... convert to U
    plugin: format_date
    from_format: 'n/j/Y'
    to_format: 'U'
    from_timezone: 'UTC'
    to_timezone: 'UTC'
    source: legacycreatedate

destination:
  plugin: entity:media
  default_bundle: document
  overwrite_properties:
    - created

(Edit: there's a couple of typos in the bundle_key line above. But this config worked. I don't know why it worked.)

This was difficult to piece together, involved much trial and error. But it seems to have had the desired result.
In views, this created date is currently called "authored on" for media.

One other detail. Having not attempted an update before, the drush syntax wasn't clear to me: drush migrate:import [migrationname] --update

Here's an example for nodes

Note: This information should almost certainly go in the migrate documentation pages somewhere, but I don't know where. If someone can point me in the right direction, I'll move it there and link.

In this example, I have a few thousand nodes of the content type "Source." The content type has a custom field: field_tcounty. For some nodes, these fields were not populated during a prior migration.
Goal: import county data into field_tcounty on these nodes only.
The source csv does not contain NID values but has a unique id field: srcid. This field matches the values in a custom field in the Source content type: field_source_id.

This example is also from a tab delimited source file. The file consisted of two 'columns' with no header.

# This worked. Assumes the CSV source does not have
# NIDs in it. NIDs are obtained by reference to the 
# the nodes that match the "source:" field under the 
# lookup plugin
# To run: drush migrate:import srccounties --update
uuid: 1bcec3e7-0a49-4473-87a2-6dca09b91abaxsrcupd
id: srccounties
label: Import Source counties as an update
migration_group: updates

source:
  plugin: 'csv'
  path: '/srv/imports/SourceCounties.tab'
  delimiter: "\t"
  enclosure: '"'
  header_offset: null
  ids: [srcid]
  fields:
    0:
      name: srcid
      label: 'Unique Id'
    1:
      name: county
      label: 'County'

process:
  nid:
    plugin: entity_lookup
    source: srcid
    entity_type: node
#   bundle_key: type
#   failed when bundle_key was used
#   worked with the bundle_key line commented out
    bundle: source
    value_key: field_source_id
    ignore_case: true
    access_check: 0
  field_tcounty:
    source: county
    plugin: get

destination:
  plugin: entity:node
  default_bundle: source
  overwrite_properties:
    - field_tcounty

migration_dependencies: null
annie2512’s picture

This really helped me. Thanks @mikeryan and @aharown07

I have this Faculty CT that is already populated, and one of the fields have to be populated from an external API. I was able to get this working using the overwrite_properties, entity_lookup plugin (skip_on_empty plugin used to skip row if entity_lookup does not find a Faculty node). Also used drush migrate-import ---update, to update all nodes, including previously-imported items, with the current data. Hope this helps someone.

ressa’s picture

Great thread, it really helped me a lot. I was looking for a way to import multiple CSV-files, sharing a unique nid. I wanted to run multiple migrations, populating different fields of a content type. I really wanted to avoid concatenating all the CSV-files into a single CSV-file.

The key was to also map the nid, as outlined by @mikeryan:

process:
  nid: nid_to_update

Where "nid_to_update" is a column in your CSV of the nids you're updating.

That was all that was needed. No need to add extra overwrite_properties parameters.

arnaud-brugnon’s picture

#23 is problably the closest solution.

Use entity_lookup plugin is probably the best solution (nid_to_update is problably the uggliest solution)

One precision to make entity_lookup works like a charm : Don't miss one import key (otherwise entity lookup will fail).
It means you will have to set entity_type, bundle, bundle_key and value_key.
bundle_key may seems overkill but you have to define it.

Full migration file here

id: tax_classification_import
label: Import tax classifications
migration_group: product_import

source:
  plugin: import_tax_classification
  data_fetcher_plugin: http
  data_parser_plugin: json
  verify: FALSE
  fields:
    - name: code
      label: 'Code'
      selector: productTaxClassification/code
    - name: taxRate
      label: 'Tax rate'
      selector: tax/taxRate
  ids:
    code:
      type: string
  request_options:
    verify: false

process:
  tid:
    plugin: entity_lookup
    source: code
    entity_type: taxonomy_term
    bundle: tax_classification
    bundle_key: vid
    value_key: field_code
    ignore_case: true
    access_check: 0
  vid:
    plugin: default_value
    default_value: tax_classification
  name: code
  field_code: code
  status:
    plugin: default_value
    default_value: 1
  field_rate: taxRate

destination:
  plugin: 'entity:taxonomy_term'
  overwrite_properties:
    - field_rate
nick_sh’s picture

I've tried to update already existing commerce products, but I had an error - "Duplicate entry '1045' for key 'PRIMARY': INSERT INTO "commerce_product"

process:

  product_id:
    plugin: entity_lookup
    source: Nr
    entity_type: commerce_product
    bundle_key: type
    bundle: course
    value_key: field_course_nr_
    ignore_case: true
    access_check: 0

What should I do here?