In ContentLoader::entityExists()
on line 617, the existence check matches all fields on the entity that aren't arrays when checking for the existence of an entity.
// Load entity type handler.
$entity_handler = $this->entityTypeManager->getStorage($entity_type);
// @todo Load this through dependency injection instead.
$query = \Drupal::entityQuery($entity_type);
foreach ($content_data as $key => $value) {
if ($key != 'entity' && !is_array($value)) {
$query->condition($key, $value);
}
}
$entity_ids = $query->execute();
I have a Taxonomy (a bunch of locations) where the terms have several plain-text fields (field_room_code, field_building_code) as well as more complex fields (e.g. an address in field_address). The array-check properly excludes the address from the query, but the plain-text fields are included query. Because my plain-text fields are included in the query, any changes to them result in duplicate taxonomy terms.
One potential fix for this would be to change:
if ($key != 'entity' && !is_array($value)) {
To:
if ($key != 'entity' && !is_array($value) && !preg_match('/^field_/', $key)) {
Comment | File | Size | Author |
---|
Comments
Comment #2
adamfranco CreditAttribution: adamfranco at Middlebury College commentedAttached is a patch that makes the change suggested.
If using custom fields for existence-checking is a desired feature, it would be nice to have those fields be specified in some way so that other fields can be updated and the content re-imported with the updates.
Comment #3
adamfranco CreditAttribution: adamfranco at Middlebury College commentedComment #4
sluceroThe main concern I have with this approach is the assumption that all fields to be excluded will follow the pattern of
field_*
. While this is true of fields created through the field UI, it's not universally true since field created from code may not be prefixed in this way.The question of whether to include content fields in the existence identifications, however, comes down to the use case and structure being used for the content imports.
In most cases, the content is being made available for installation, but it isn't expected to be the sole source of the content for the site. Based on this workflow, defining IDs within the content file opens up a lot of risk for content overlap and ID clashing. In this scenario all of the content should be defined without assigned ID values and existence checking would require comparison of at least some subset of the content fields. To your point, being able to configure which fields are to be used for this comparison could be a very helpful feature.
Comment #5
sluceroAs an alternative workaround for this issue of which fields are being used in the entity comparison you should be able to define even the plain text field values with an array structure which would exclude them from the comparison. For example, a plain text field may be populated in either of these ways:
Comment #6
adamfranco CreditAttribution: adamfranco at Middlebury College commentedThanks for the feedback and work-around suggestion. My use-case for the yaml_content module is to allow version-controlling of taxonomy terms and similar throughout the life of a site and allow these to be migrated from a development server to production.
The eventual workflow will be:
drush cim
anddrush yaml-content-import-profile profile_name
to activate the new configuration and content.Since YAML content doesn't support export now, I'm just manually updating the YAML taxonomy and then importing.
I appreciate the downside of basing decisions off field names. As for identity determination, what do you think about a configuration that maps an entity_type string to a set of fields to use to build an identifier when importing? The default case could be the current behavior of all fields, with a configuration entry for an entity_type reducing the fields compared. I can work up a patch that implements this if that would be helpful.
Comment #7
sluceroI think you're identifying a very good opportunity to make the system more flexible through optional configuration. I think it would be very important to maintain the current behavior as a default to keep the system working with minimal overhead, but added options for customization is a good direction to go.
This is the route I've been going with the v2 rewrite to support a more plug-able system for the processing commands. Depending how it was written, this added behavior for customizing and extending the behavior for identity matching could be very valuable and something I agree would be very helpful.
Providing a patch to start this work would be very helpful if still interested. I could see a number of ways for this to be implemented cleanly including:
I'm definitely open to other suggestions as well, or even just a proof of concept, but I would prefer to extract the content matching logic from the ContentLoader to make it more maintainable.
Comment #8
sluceroSetting back to "Active" and relating to #2893055: Expand logic for identifying content to update since discussion is relating back to identifying existing content.
Comment #9
sluceroChanging this to a feature request since the functionality currently works as expected, but a new configuration option is being proposed.
Comment #10
sluceroRelabeling the ticket to more effectively describe the new feature being requested.
Comment #11
sluceroIn line with these comments above in #7, I've created #2949625: Move Entity Loading and Lookup into Helper Service to work on extracting this logic into a centralized and reusable service.
Next steps
Comment #12
sluceroMarking this as postponed until work in #2949625: Move Entity Loading and Lookup into Helper Service can be completed and stabilized.