When calculating hash, all data from source is taken into an account even when not used by any of the mappings. In some cases this causes big unnecessary hit for performance.

Eg. if you are creating user accounts from columns "name" and "email". In addition to those source contains "changed" timestamp. Whenever changed timestamp is updated user_save is triggered but without any real changes to update.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mikran’s picture

This patch changes the hash in a way that it's calculated from mapped data only.

Status: Needs review » Needs work

The last submitted patch, improved_hash_calculation-1950182-1.patch, failed testing.

mikran’s picture

Ohh fake columns need additional isset() check there.

mikran’s picture

Status: Needs review » Needs work

The last submitted patch, improved_hash_calculation-1950182-4.patch, failed testing.

mikran’s picture

Last patch for the day. Improved fake mappings and lying comments removed.

mikran’s picture

Status: Needs work » Needs review
twistor’s picture

Title: Irrelevant source updates are causing unnecessary performance hit » Only update when mapped fields are updated.
Category: bug » feature
Status: Needs review » Needs work
+++ b/plugins/FeedsProcessor.incundefined
@@ -758,12 +758,21 @@ abstract class FeedsProcessor extends FeedsPlugin {
-    return hash('md5', serialize($item) . serialize($this->config['mappings']));
+    $mapped_item = array();
+    foreach ($this->config['mappings'] as $mapping) {
+      if (isset($item[strtolower($mapping['source'])])) {
+        $mapped_item[$mapping['source']] = $item[strtolower($mapping['source'])];
+      }
+      else {
+        // non-existent sources are sometimes used to trigger updates. Ensure
+        // that hash is changed if mapping settings are changed.
+        $mapped_item[$mapping['source']] = $mapping;
+      }
+    }
+    return hash('md5', serialize($mapped_item));

This should just use array_key_intersect().

We can't special-case for the CSV parser here. Oh how I hate it for its lowercase shenanigans. I guess we'd have to add a method on parsers, something like getMappingSourceList() or a better name.

The old way of appending serialize($this->config['mappings']) should stay as well.

mikran’s picture

Status: Needs work » Needs review
FileSize
3.48 KB
2.37 KB

My friday-mind can't come up with a better name either but proposed changes are here

rudiedirkx’s picture

This is the full source hash, or hash per item?

mikran’s picture

re #10: hash per item

mikran’s picture

Issue summary: View changes

typo

joelpittet’s picture

Issue summary: View changes

@mikran What about making this configurable/hookable so that you can filter out the ones you don't want to include and then you can use that $mappings in the API docs as an example if you want to only used mapped ones?

That way you won't break BC. I currently use the other fields as extra insurance that a field has updated.

mikran’s picture

Yes it is a BC break but can you explain the use case a bit more? Feeds is still an alpha and I'm facing this issue in every project so I'd imagine it's quite common.

MegaChriz’s picture

Looks good to me. In the attached patch, I have moved the test to a separate test method, so that this feature is more visible in the tests. This is handy when the time comes to port all recently added features of Feeds to the D8 version.
I did notice that when only the order of the source columns changed, items were updated regardless of this change. For example, in case of a CSV, when the colums were first in this order: "guid,title,body" and then a column is moved so that the order becomes "title,body,guid" all items are updated even when no data changed. I think though that such a case is very rare, so I don't find that a good reason to refuse this patch.

@joelpittet
A workaround for your use case (ensure that an item is updated) is to map a source to a dummy target.

MegaChriz’s picture

Status: Needs review » Fixed

Committed #17.

  • MegaChriz committed 453dddf on 7.x-2.x authored by mikran
    Issue #1950182 by mikran, MegaChriz, twistor: Only update when mapped...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

MegaChriz’s picture

Created a follow-up issue for the port to D8: #2990155: Only update when mapped fields are updated