Feeds actually replaces nodes when updating due to avoiding of node_load.
Perhaps it's ok in some scenarios... But what if someone needs to update just a few cck-fields or product prices?
or maybe create new cck-field and then fill it with feeds (keeping other data intact) - lots of new opportunities...
Please add an option to enable/disable node_load.

I had to replace

 // If updating populate nid and vid avoiding an expensive node_load().
$node->nid = $nid;
$node->vid = db_result(db_query('SELECT vid FROM {node} WHERE nid = %d', $nid)); 

with $node = node_load($nid, NULL, TRUE); for the last two projects with ubercart to update prices/store with feeds.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

That's all it needed? Nice.

So we could replace the "update existing" checkbox with a three radio button choice:

Update existing

(*) Do not update existing item nodes
( ) Replace existing item nodes
( ) Update existing item nodes (slower than replacing them)

I'd be happy to accept a patch, ideally with a test.

There *may* be a duplicate issue on the queue. Couldn't find it right now though.

Monkey Master’s picture

I made a patch (under windows, hope it's ok) with #1 radios
also changed resulting message to contain created/replaced/updated counts (with format_plural)

alex_b’s picture

Status: Active » Needs review

At first glance this is looking great. Need some time to do a closer review. Will get back to you over the weekend.

Monkey Master’s picture

Also small fixes needed for mappers - cleanup before updating

taxonomy.inc: Remove target vocabulary terms from a node before adding new ones
filefield.inc: Do not keep files already stored in a target filefield field

Otherwise there will be a problem with repeating updates - the same terms and files will be added again and again.

Monkey Master’s picture

FileSize
559 bytes

Just checked content.inc. It also tries to keep old data in a target field - without node_load() it doesn't make sense anyway... $field = isset($node->$target) ? $node->$target : array();
So a patch similar to filefield - replace with $field = array();

Monkey Master’s picture

Sorry, missed isset() check in taxonomy.inc - updated patch
Added zip archive with all 4 patches

Monkey Master’s picture

I think mappers keep old field values to be able to save multiple source fields to a single target field. (For example, save 'pic1' & 'pic2' to 'pics'). In that case my mapper patches break this functionality. So I have a better solution: drop each target field value only once for a node.

1. In FeedsNodeProcessor.inc add temporary array to node object before $this->map($item, $node); and unset it after:

        $node->feeds_targets = array();
        // Execute mappings from $item to $node.
        $this->map($item, $node);
        unset($node->feeds_targets); 

2. In mappers, for example content.inc, if target field is not in that array - add it there and drop its value. So target value will be droped only once per map process.

  if (!in_array($target, $node->feeds_targets)) {
    $node->feeds_targets[] = $target;
    unset($node->$target);
  } 

I use temporary array in a node ($node->feeds_targets) because only node is available in hook_feeds_set_target(). Another option is to use some global variable for that purpose.

What you think?
I can prepare mapper patches if this solution fits.

alex_b’s picture

I think mappers keep old field values to be able to save multiple source fields to a single target field.

Could you be more specific? Where do you see this happening?

Otherwise:

1. I'd love to keep the update user feedback simple, no need to track replaces and updates separately in FNP::process().
2. Let's use a CONSTANT for possible values 0, 1, 2 for 'update_existing'.
3. We'll need tests that verify the replace behavior.

The attached patch merges all patches of #6 into a single one - was there a specific reason why you attached them separately? Either way, this is the preferred format.

Monkey Master’s picture

Could you be more specific? Where do you see this happening?

I didn't see it - I tried to find a reason for the code in hook_content_feeds_set_target()'s e.g. mappers/content.inc: $field = isset($node->$target) ? $node->$target : array(); instead of simple $field = array(); .

1. I'd love to keep the update user feedback simple, no need to track replaces and updates separately in FNP::process().

Personally I'm happy with no tracking

2. Let's use a CONSTANT for possible values 0, 1, 2 for 'update_existing'.

I'll make a patch today

3. We'll need tests that verify the replace behavior.

Didn't used tests before. maybe somebody else could do it?..

Monkey Master’s picture

The code in FeedsNodeProcessor.inc overlaps with my other feature request - Keep batch processing when mapping fails for particular node. I can wait till it will be committed to build upon that version with stats update after node_save()

Monkey Master’s picture

FileSize
6.87 KB

Added constants for 'update_existing'

Also changed messages block to have import results in watchdog:

    // Set messages.
    $messages = array();
    if ($batch->created) {
      $messages[] = format_plural($batch->created, 'Created one @type node.', 'Created @count @type nodes.', array('@type' => node_get_types('name', $this->config['content_type'])));
    }
    if ($batch->replaced) {
      $messages[] = format_plural($batch->replaced, 'Replaced one @type node.', 'Replaced @count @type nodes.', array('@type' => node_get_types('name', $this->config['content_type'])));
    }
    if ($batch->updated) {
      $messages[] = format_plural($batch->updated, 'Updated one @type node.', 'Updated @count @type nodes.', array('@type' => node_get_types('name', $this->config['content_type'])));
    }
    if (!count($messages)) {
      $messages[] = t('There is no new content.');
    }
    foreach ($messages as $message) {
      drupal_set_message($message);
    }
    watchdog('feeds', 'File %file processed: %msg', array('%file' => $batch->getFilePath(), '%msg' => implode(' ', $messages)));
mweixel’s picture

@Monkey Master: Excellent work! I have been trying do find the time to look at this, as I too need to preserve existing additional node data during updates. See #688696: NodeProcessor: Ignore changes to certain fields upon node update for earlier discussions and case examples.

I did experience some problems in the taxonomy mapper. PHP kept blowing up when the mapper's taxonomy_feeds_set_target function tried to assign existing taxonomy ids to the updated node, reporting Cannot use object of type stdClass as array. That didn't start happening until I applied your 753426_update_replace.patch. Has that happened to you?

Will your patches work against alpha 13?

@alex_b: Any chance that this functionality will find its way into the module, or will it remain something that has to be patched in?

alex_b’s picture

Version: 6.x-1.0-alpha12 » 6.x-1.x-dev

This patch should go in.

Monkey Master’s picture

I did experience some problems in the taxonomy mapper

My patch adds only the following code to taxonomy mapper:

  if (isset($node->taxonomy)) {
    // Remove target vocabulary terms from node before adding new ones
    foreach ($node->taxonomy as $term) {
      if ($term->vid == $vocab->vid) {
        unset($node->taxonomy[$term->tid]);
      }
    }
  }

Please give more details about that error, I don't have it

Will your patches work against alpha 13?

yes, it works

mweixel’s picture

@Monkey Master: I applied the patch to an alpha 13 upgrade today. One of the hunks failed: the changes to FeedNodeProcessor that perform a node_load if the config isn't set to FEEDS_NODE_REPLACE_EXISTING. I tried to make the edit manually and then reload my feed, which I new contained updated items. After accessing the feed, Drupal errors, reporting "An HTTP error 500 occurred. /travelwarnings2/?q=batch&id=44&op=do". Checking the server log, I see that the problem is at line 77 in the mappers/taxonomy.inc, which is throwing the " Cannot use object of type stdClass as array" error.

So, if your change to taxonomy.inc isn't involved at that point, could it be that the change to FeedNodeProcessor is somehow interfering with the batch processing?

Monkey Master’s picture

FileSize
7.17 KB

I updated the patch, with a tiny fix - constant in FeedsNodeProcessor::configDefaults()

the problem is at line 77 in the mappers/taxonomy.inc

Do you use clean alpha13 with my patch only?
Did you tried to import without my patch? or in different mode e.g. REPLACE?

alex_b’s picture

Status: Needs review » Needs work
FileSize
5.45 KB

- Rerolled after patch did not apply due to recent commits.
- Removed separate tracking of replaced vs updated stories. Simplifies patch.
- All existing tests pass.

Todo:

- Tests.

cruelgamer’s picture

Tested this patch using drupal 6.16 and every version of feeds. I attempted to populate a new cck field using a cvs import with only a GUID and data for the new field. Each attempt yielded similar results. After import most fields are blank save for the last CCK field and the taxonomy fields, part of the patch also fails against alpha15. Can't remember which part exactly, but I believe it was the taxonomy part.

I also tried hard coding the php code <? $node = node_load($nid, NULL, TRUE); ?> into FeedsNodeProcessor and got the same results

If anyone is having success with this, please let me know what builds of drupal and feeds your using. So far I am having no luck.

naxoc’s picture

FileSize
6.74 KB

Here is a re-roll to keep up with head. Still no test written, but I will try to find the time.

smartinm’s picture

mattman’s picture

Just stumbled upon this issue when looking into extending the FeedsNodeProcessor class.

Here is yet another use case to support this feature being implemented in FeedsNodeProcessor.

I'm using http://drupal.org/project/parser_ical iCal Parser to pull an .ics feed from a Google Calendar. Using Google Calendar is great for managing time based events. (plus you can use iCal to manage the events on a Mac - very nice)

However, the only thing desired from the calendar update (in my case) is the title, timestamp and location (if/when parser_ical supports geo data).

I'm not mapping the description (body) field because this is added/enhanced within the drupal site by an editor. Therefore, the description and any other cck fields need to be preserved. Right now they are being wiped out per the issue. Location module (location_cck) rows are also being killed as well as the author, etc.

Here's my vote for this patch making it in as soon as possible. I was actually thinking of how I might implement such a feature and was leaning towards a "lock this field" or "preserve this field" checkbox.

Initially, I was thinking it would appear within the mapping of the fields, but this wouldn't provide the level of control needed because mapping only accounts for a portion of all possible fields.

Forcing an update of ONLY the mapped fields, which is suggested here I believe, is a great way to go (to start with), but the most flexible, I believe, would be settings on the content type itself, where you could check which fields should be ignored on any update from FeedsNodeProcessor.

This would add some confusion to the interface, but this could be answered by putting a note below the mapping listing mentioning that fields could be locked on updates within the settings for the content type.

Looking forward to the addition and GREAT work on this updated version of how Feeds is working, especially on the interface.

Matt Petrowsky @gotdrupal

andrewlevine’s picture

Subscribe. This patch is also a dependency of my feature request #828000: Use multiple data sources/parsers to save into the same nodes.

andrewlevine’s picture

FileSize
5.41 KB

Re-roll against latest head. Still no test. Same as naxoc, I'll try to get time

andrewlevine’s picture

FileSize
5.43 KB

Aghh sorry. I introduced a syntax error. Here is the valid re-roll.

andrewlevine’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
15.5 KB

Here is the patch from #24 with tests

andrewlevine’s picture

Status: Reviewed & tested by the community » Needs review

Whoops, meant to mark needs review

andrewlevine’s picture

FileSize
1.39 KB

Wow another thing. You'll need to add this file to the tests/feeds directory and rename it to nodes_changes_1.csv.

alex_b’s picture

Great to see this moving forward. Won't have the time to review until the end of the week though. You can use cvsdo to add new file.

andrewlevine’s picture

Thanks for the tip. attached is the patch which now includes the new file.

alex_b’s picture

This version fixes a problem where properties set by FeedsNodeProcessor itself (e. g. user ids) got overwritten even in UPDATE_EXISTING mode - see modifications to buildNode().

The test case in #29 only checks for functioning REPLACE_EXISTING mode, which is already covered in FeedsRSStoNodesTest. What's new and needs tests is UPDATE_EXISTING mode. This patch adds a simple assertion to FeedsRSStoNodesTest covering UPDATE_EXISTING.

andrewlevine: would be great to get your last review here. I think we're close to committing.

All tests passing.

andrewlevine’s picture

Just as a preliminary look (will look in depth later), did you mean to remove all the changes to the mappers? I haven't looked into what they are for but I kept them from #19

alex_b’s picture

Status: Needs review » Needs work

#31: That was accidental. We need to bring that back (see #4).

I am worried though that resetting the field when mapping will lead to some regressions - existing deployments of feeds may assume that mappers act additive, allowing using a mapping target more than once in a mapping.

Can't think of a good solution to this problem atm.

andrewlevine’s picture

Status: Needs work » Needs review

I think I found a way around the problems we were having thinking about filefield and mappers in general (where we have no extra metadata storage). If you think about it, the current update_existing implementation currently has the same problem (it doesn't know when a new file has been added or replaced because it doesn't store the original filename or filehash). The only reason this problem doesn't surface is because the whole node is replaced and the files are re-uploaded.

So why can't we do this same behavior for the new UPDATE_EXISTING behavior (with node_load)? We can! The only change we will have to make is to keep some sort of state in the $node variable or else pass state in as an extra argument to the map method(). Assuming the latter, the code would look something like this:

In FeedsProcessor::map(), instead of

foreach ($this->config['mappings'] as $mapping) {
  // ...stuff...
  $callback($target_item, $mapping['target'], $value);
  // ...stuff...
}

we have something like

$state = array();
foreach ($this->config['mappings'] as $mapping) {
  // ...stuff...
  $callback($target_item, $mapping['target'], $value, $state);
  // ...stuff...
}

Then in filefield.inc and our other mappers which have no metadata storage and can't be simply ID based (like taxonomy.inc can) we have

function filefield_feeds_set_target($node, $field_name, $value, &$state = NULL) {
// ...stuff...
$items = (!empty($state) && isset($state['filefield']->$field_name)) ? $state['filefield']->$field_name : array();
// ...stuff....
$state['filefield']->$field_name = $items;
}

This should be backwards compatible with all current mappers using REPLACE_EXISTING (previous update_existing).

We will usually be able to use the ID approach with mappers like taxonomy.inc (and won't have to worry about this problem). However, we can always resort to this approach for filefield and other additive mappers if we make this minor addition of the optional $state argument.

Marking my approach as 'needs review'. If that sounds OK I will proceed with a patch. This functionality is important for work so I am willing to work on this until it gets in :)

alex_b’s picture

andrewlevine and I just discussed #33 as a viable solution to continue to support aggregating into fields (additive mapping, issue brought up in #32).

For the purpose of this patch I propose to stop supporting aggregating into fields and modify behavior as suggested in #4.

The likeliness that somebody is using this undocumented behavior is rather small, I'd like to find out before we actually implement support for it. I have opened a separate issue on it here: #840626: Support using same mapping target multiple times.

This patch brings back modifications introduced in #4.

andrewlevine’s picture

Took a more thorough look at #34 and it looks good. I am going to test it on my application that depends on these features later today.

I agree with your comment in #30 that the tests in patch #29 were broken, but if I fixed them, I do think that they are a more extensive set than what is included in the last patch (#34). Is there any reason you added those couple assertions instead of modifying the tests I wrote in #29?

alex_b’s picture

Is there any reason you added those couple assertions instead of modifying the tests I wrote in #29?

The tests in 29 are very verbose when we really only need a quick assertion of whether updating works. I'm anxious to keep the number of assertions low as running all tests in Feeds already takes ages.

andrewlevine’s picture

I have tested the patch against my application and it works perfectly.

I sympathize with your desire to keep the test run time down, it took about 10 minutes for them to complete on my setup initially. However, IMHO tests should test all possible cases because they mean a human won't have to. If I set concurrency to 6 it completes in just over 1 minute which isn't bad anyway. My tests were intended to ensure these cases: 1. Importing partial data over a currently imported feed, 2. Importing where no content should be changed, 3. initial import from a feed. I can probably remove some assertions but I feel testing these 3 cases are useful.

That said, I won't push the point any further if you disagree, I will just ask about the tests in #34:
I'm not exactly sure what the assertions of the UPDATE_EXISTING case are proving. Is it that items did not recreate/replace nodes because they still don't have authors after the UPDATE query? Wouldn't the same assertion prove true if running the import did nothing at all? Even if we need to keep to a limited number of tests/assertions I think we'd benefit by testing "Importing partial data over a currently imported feed" with a separate feed file with deleted info.

alex_b’s picture

The test checks whether a local change persists when using UPDATE_EXISTING mode. The test sets feed item's user id to 0, reimports with UPDATE_EXISTING and verifies that the user id has not been reset.

Point 2. (initial import from Feed) is currently covered in FeedsRSStoNodesTest. Point 3 could be easily added.

andrewlevine’s picture

Status: Needs review » Reviewed & tested by the community

Putting aside our disagreement on the ideal extensiveness of testing, in my mind this is RTBC. I don't think it makes sense to add point 3 with the current tests.

alex_b’s picture

Status: Reviewed & tested by the community » Needs work

Link tests are failing because link does in fact aggregate URL and title into the same field. Hammering out solutions under: #840626: Support using same mapping target multiple times

Sagar Ramgade’s picture

Hi All,

I am trying to save/ update nodes from multiple sources, from the one feed i am trying to save the nodes and from other i am trying to update few cck fields of nodes saved from the first feed.
I tested the patch http://drupal.org/node/753426#comment-3145468, with update existing items option, ti doesn't works.
I get update x nodes but when i check those node fields they seems blank. Can anyone guide me how to achieve it.
Thank you in advance.

lyricnz’s picture

Without this patch, Feeds is rather broken IMHO. One of the quite common use-cases is keeping a set of data in Drupal in sync with an external source, via CSV or RSS etc. Unless Feeds can update nodes without overwriting (non-mapped) changes made by users, it's not terribly useful for this.

PS: can we just apply the patch as-is, and continue with any outstanding issues separately?

#840626: Support using same mapping target multiple times

andrewlevine’s picture

Here patch #34 without the changes to the mappers which should be taken care of by #840626: Support using same mapping target multiple times

smoothk’s picture

@alex_b and andrewlevine

Hi guys,

could you please briefly explain how this is supposed to work?

I get the patch working (both last one from #43 and the one from #34), since now nodes get updated and not created from scratch.
I thought from other comments that the patch's goal was also to preserve data that has not changed on the feed.

My case is: I have pictures manually uploaded (cck imagefield), they get deleted on every feed update and they shouldn't.

After countless tests it finally seems to be working (I'm no programmer) so I just wanted to know if this is how it is supposed to work:
CCK fields which are not mapped don't get deleted nor updated?

By the way, patch from #43 gives an error about duplicated terms in taxonomy: but I supposed it is part of splitting the issues in two, the other being #840626: Support using same mapping target multiple times

Thank you so much for your work, it saved the day for me.

andrewlevine’s picture

@smoothk, try applying the patch from #840626: Support using same mapping target multiple times as well.

alex_b’s picture

Status: Needs work » Needs review

#44

CCK fields which are not mapped don't get deleted nor updated

Correct.

After applying #840626, I am giving this another review now.

alex_b’s picture

Status: Needs review » Fixed

Reviewed, all tests passing, committed #43 - thank you everybody for sticking through this.

http://drupal.org/cvs?commit=389860

andrewlevine’s picture

Thanks all for the help getting this patch in. Just in case you are interested (and based on this commit), I contributed a module that allows you to import from multiple feeds into the same nodes: http://drupal.org/project/feeds_node_multisource

alex_b’s picture

FYI, we forgot to update naming conventions in non - node processors: #853144: Consistent use of "replace" vs "update"

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.