It would be very helpful to have the option to RE-import (or update/overwrite/whatever) existing nodes, rather than automatically creating new nodes for any imported data. My organization would be willing to underwrite at least some of this development.

We have a site, for example, that has various data on all 14,000+ U.S. school districts -- federal funding, test scores, student demographics, etc. Certain fields (# of students in a district) change over time, while others need to be added as new data becomes available each year. Right now our only options seem to be:

  1. Enter/revise the data by hand, node by node (!)
  2. Work directly in the mysql tables
  3. Purge all the existing nodes, then re-create them via Node Import

Options 2 & 3 are both do-able, but being able to use Node Import to "revise & extend" existing nodes would be much, much cleaner. And I can imagine lots of other use for this -- there are still so many cases where data used by a site "lives" and is maintained elsewhere. (This would also have the potential to seriously mess up a site, I know -- but at least for User 1, it'd be a great tool to have.)

Would extending Node Import in this way be possible/practical? Or are the complexities in terms of VIDs, etc., that I'm missing?

Thanks,
TKS

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Zach Harkey’s picture

there are still so many cases where data used by a site "lives" and is maintained elsewhere.

This is very true. I also have a client in this same situation. The ability to sync/update nodes from a csv would be a boon for administering large groups of nodes that are actually maintained in another system.

My client is willing to contribute to the effort as well. Can anyone estimate a cost of development for this feature?

Zach Harkey’s picture

I just found these other two threads that are pretty much driving at this same direction, there seems to be several people willing to support this feature.

Automated Node Import/Export via RSS/XML/FTP Feed
node_import future directions

makbeta’s picture

I just would like to second this request. Ability to update/overwrite nodes will be extremely helpful as it allows for easy bulk editing of the nodes.
Thanks.

TKS’s picture

For what it's worth, we just went ahead and did our updates directly in mysql. It's actually fairly easy, IF you're updating fields that are unique to one CCK nodetype. (When multiple nodetypes use a field, that field gets its own table -- and I haven't worked the mysql join requirements for dealing with that.)

It's really just a matter of...

  1. BACKING UP YOUR DRUPAL DATABASE!
  2. Making sure your new data table has a column that contains unique values that can be used to join with table of existing Drupal data. We used a custom CCK field that had a "District ID" -- but the Node ID would probably be ideal.
  3. Import that new data table into your Drupal Database -- we called our imported table "new_district_data". (You can do it in a different database, but the Updating is easier if everything's in one.
  4. Execute this mysql code -- either in phpMyAdmin or via the command line:
    UPDATE content_type_content_CCK-NODETYPE-HERE, new_district_data
    SET content_type_content_CCK-NODETYPE-HERE.field_DRUPAL-FIELDNAME_value = new_district_data.UPDATE-FIELDNAME
    WHERE content_type_content_CCK-NODETYPE-HERE.field_district_id_value = new_district_data.district_id;
    
  5. Delete your temporary table from the Drupal Database.

Once we were certain the updates were working, we strung together the UPDATE commands for each of the fields being overwritten, and ran them through in one big batch.

It helps to make sure the table you create/import with the new data have column names that match up with what what's in the Drupal DB, but it's not required -- you just need to make sure you're matching them up correctly. After all, mysql won't know that you don't really want to overwrite the old "phone number" field with the new dataset's "fax number" field! (Did I mention that backing up your DB before you begin is key?...)

kingandy’s picture

Man, I was just about to make this exact request.

I think you could do it without worrying about nid/vid, if you put the onus on the data provider to ensure that if a field is nominated as a 'key' field then each value is unique - in the same way as you're given the option to ensure all node titles are unique.

Actually, that might be a good starting point - extend the 'unique node title' check to overwrite the existing node instead of refusing to import the new one, and add functionality to use other (e.g. CCK) fields ...

Just a thought.

martinkong’s picture

Acutally, this may be very easy. Just update supported/node.inc, node_node_import_fields(), just add

'nid' => t('Node: NID'),

into the $fields array.

Then add the a id field into the csv. You should then be able to map the id field to nid. If you want a revision, make sure you choose "Revision" at the publishing option, otherwise, no revision will be created.

Of course, you will need to get the correct nids for all the records in the csv. I have tried it with couple simple CCK types and it worked. Haven't tried any CCK with fancy fields though.

May need more work if you want a CSV file to do both insert and update.

Hope this helps.

kingandy’s picture

I basically want to bump this request because it's come up again for me.

Without delving too deeply into the inner workings of the module, I think the way I'd like to see it behave is to have a select box at the stage before the preview - alongside the 'unique titles' box and the default value settings and the category behaviour options and so forth - which optionally allows one to select a Unique/Key Field. Probably restricted to fields which have had import fields mapped to them on the previous screen.

Then for each row of the file, instead of beginning the next stage with an empty node object, the module could search the database for a matching key value and (if found) run a node_load for the object in question. The values of this node object could then be altered according to the fields present in the table and saved as normal.

I realise I'm sweeping a world of complicated processing under the carpet there - the field may be in any one of a number of tables, not to mention that node_load does not populate the $node object in the same way as node_save requires them, particularly in the case of CCK radio button fields - but that's the basic behaviour I'd be hoping for.

oliveyrc’s picture

I would really like this ability too, I have CSV export from a database that includes a unique reference/primary key, this is maintained that end I don't need to worry about.

I've created my CCK node type which I wish to import the rows into. Then give the ability to map the unique/primary key, from the import file to a field specified in the CCK node type, and the option what to do with it, update/merge, skip, overwrite.

How does that sound for a start? I've not been developing in Drupal too long and think this would be a nice little project to get me started into module development. Feel free to chip in and suggest ideas improvements, hopefully it would be nice to get some testing done on this too by people wanting the same functionality.

philippejadin’s picture

@oliveyrc : Did you get any result on this project? This would be a very very nice addition.

chrisroditis’s picture

I'd be more than willing to help oliveyrc! Do you anything we can test?

oliveyrc’s picture

I have to hold my hand up here, I've not managed to spend much time on this just yet, I seemed to get side tracked into adding a header skip row option, that and spending a couple of days over xmas ill has not helped, if people are interested in this then I'll fire it back up and have another look.

Sound good?

Rich

philippejadin’s picture

I'm definitely interested and would like to help if I'm given some directions on the best way to implement this.

In all cases thank you for keeping this feature request alive !

Philippe

redmood’s picture

has someone managed to do that ? i mean update or insert based on a specific field comparison ?

drupaloSa’s picture

I'm also interested in this type of functionality.

Summit’s picture

Version: 5.x-1.x-dev » 5.x-1.3

Subscribing, greetings, Martijn

gthing’s picture

I would love to have this feature as described by comment #7. That's exactly the way I need it to work.

robomalo’s picture

FileSize
8.24 KB

I work at an art museum and here is my similar scenario and solution to post #7. I don't know if this could help anyone else, and only works when you have unique titles (the unique field as described in post #7), but don't click on unique titles. I'll try to explain this as well as I can without confusing everyone.

The node title is the artwork's accession number and this will never change. Sometimes they give me exports from our artworks database and some fields change. So I modified node.inc to add a second checkbox under node options that reads "Update nodes with matching titles."

Basically my node.inc queries the node table, then pulls and assigns the node id to the import. You preserve your node id's, for comments, node reference, etc., but all the other fields are replaced with the new data.

I know my attachment won't work for everyone here, but maybe it can act as a catalyst for a more robust feature like this. Download the attachment, remove "_.txt" to make it node.inc, and replace the current node.inc in node_import/supported. Just play with this as a proof of concept for post #7. In my case, it works perfectly.

drutube’s picture

I just wanted to note here that the last patch or include file replacement (as it isn't actually a patch) works quite nicely in a certain circumstance. I was trying to create users and advanced profiles from an excel export from an old ASP site. I had usernames and emails and some scattering of field data associated like gender and occupation. The problem is that user import doesnt work with CCK fields very easily or at all. So by using the above reference include file I just mapped the title to the already created username and was able to bring the data into the users advance profile fields. Worked fine.

zeezhao’s picture

The fix in #17 works well for me, using 5.x-1.6. Thanks for the code!

OliverColeman’s picture

Version: 5.x-1.3 » 5.x-1.6
FileSize
2.71 KB

I've created another patch at http://drupal.org/node/271809 that allows updating existing nodes with the same title. For some reason I didn't check to see if this sort of feature request had already been made. The patch I've made is slightly different to that of this thread as it relies solely on the node title (at the moment anyway; however it would be easy to modify to allow comparing against any field in the node including nid. I would like to implement this at some point as it makes it much more flexible for little extra complexity).

My patch works the same way as that of #17, but uses a drop down box with the options of "Don't import", "Create new node", and "Update existing node" instead of adding a new checkbox.

There seems to be a bug with both my patch and that in #17 (not surprising since they work the same way), it was posted against my patch: http://drupal.org/node/271809#comment-890732

I've marked my feature request as a duplicate of this one.

OliverColeman’s picture

FileSize
655 bytes
402 bytes

I can't duplicate the first part of the error I mentioned in #20 (http://drupal.org/node/271809#comment-890732):

I tried it out, and I discovered that when there are duplicate titles, and "Update existing node" has been selected, even though the preview rightly highlights the errors, the duplicate nodes get re-imported anyway...

also no error file gets created.

Could be because the title had an apostrophe within it e.g. "ABC's title"?

I created two import files (attached) and applied them in order with the "Update existing nodes" option chosen. The first file contains rows for three nodes, with no errors. The second file contains rows for three new nodes, two of which have errors, and updates for the three existing nodes (duplicate titles), two of which have errors. The new nodes with errors don't get imported, and the existing nodes with errors in the update data don't get updated. I just realised the test files are for Ubercart Products, apologies for not using a standard type, but easy enough to adapt..

The second part of the error I can duplicate: no error file is downloadable. However no error file is downloadable when the regular option of Creating new nodes (instead of updating) is in effect either, so perhaps this is a different Node Import bug.

zeezhao’s picture

Thanks for your reply.

The first bug actually happens during the "unique node" load scenario. (Sorry for the mix-up...) i.e.

- assuming you already have a title in the database
- then you try and load a new import with the same title with the "unique node" check box or drop down selected,
- the preview complains that there as duplicates (as expected)
- but the duplicate gets physically loaded into the database (not expected...)

I am also using products (ubercart) too with essentially the first version uc_product.inc though.

conniec’s picture

This worked for me AFTER I checked Revisions on the content type page (this sounds like it should be obvious, but it was't to me , so I'm shaing my lesson learned.)

Connie

nico_’s picture

hey Oliver,

first: Thanks for the patch. I tested your code for a little while, and updating and importing of nodes seem to work quite well. However I noticed the following (maye this is related to the error you are reffering to):

If you
1. Import a CSV
2. Update one node manually (e.g.) by filling in a field that has not been filled out by Node Import (of course neither change title nor content_type)
3. Run the same CSV import again and choose "Update existing Nodes" from your select field.

than, as for the one node that has been manually changed, a new node is being created.

I do not know if this is the way it should be, but on our system, some fields are being filled by Node import and others are changed manually, so it would obviously make sense to update the nodes fields rather than duplicate it like in the described case.

OliverColeman’s picture

Hmm. In response to #24, I've discovered some more odd behaviour, but couldn't reproduce that in #24.

1. Some nodes in admin/content/node are marked as new after the second import even though they have the same nid as before, and some are marked as updated (expected). The ones that were marked as new were ones that had fields manually modified, but they still get marked as new even after a third import when no fields have been changed since the last import. This doesn't seem like a big issue, though it could be confusing.

2. With the Page type, most of the fields manually changed between imports seem to get wiped clean if nothing is mapped to them (at least the log, a cck field, 'Promoted to front page', and 'Sticky at top of lists'), the only field that doesn't seem to get wiped if nothing is mapped to it is the Body.

Choosing to create a new revision or not made no difference to these results.

I tested this feature pretty thoroughly for importing Ubercart Products. Almost none of the manually changed Product fields are wiped when updating a product and not specifying a mapping to the fields. The only ones that did were ones that get set to a sensible default during processing if there's no mapping to them (I wrote the UC Product extension for node_import). Perhaps that's what's causing problem 2, the Page or node import code is setting default values (ie empty strings) for non-mapped fields during processing, which then overwrite the existing value. Though I don't know why this wouldn't affect the body field (maybe it's allowed to be null?)

Unfortunately I don't have time to look at the code at the moment, not sure when I will.

goodeit’s picture

Great patch, Oliver! Do you have any plans to make it compatible with fields other than the Node Title (e.g. a custom 'primary-key' type CCK field)?

Thanks!

OliverColeman’s picture

Glad you liked it Goodeit. :) I don't have any specific plans to allow selecting nodes on other fields simply because I'm very busy and the client I developed it for doesn't want it. However if I did I just realised the thing to do would be to provide a select box that lets you pick any of the fields that are mapped to from the CSV as well as the nid (thereby providing maximum flexibility for every node type). This should be fairly trivial to implement.

batbug2’s picture

I would LOVE this functionality, because my database doesn't have the Unique titles, but has Unique ID field, which never change. And the database is 3000+ items, it takes few hours to delete all nodes from drupal and re-import from scratch! Thanks in advance!

goodeit’s picture

@ oliver: "fairly trivial" is subjective ;)

@batbug2: If you are manually deleting all your nodes, you may wish to check out delete_all, a module that might help reduce your deletion time a bit.

yngvewb’s picture

Being able to select a unique CCK ID field would just do this module perfect!
If you are re-importing I guess it’s very likely that you have an ID from another system you would like to check against, rather than against the title, that actually not is a unique field.

batbug2’s picture

@goodeit, i have to delete all nodes of just one node type

goodeit’s picture

@batbug2: I'm sorry, I saw this in the features list and assumed it was in the module. Apparently it was added but they never released a new version. You could try getting the new files straight from CVS (at your own risk). Or just keep an eye out for a new version.

AD-DA’s picture

Subscribing.

asak’s picture

Very cool. subscribing.

PatFrat’s picture

Being able to select a unique CCK ID field would be useful.
subscribing

giannhsv’s picture

I think being able to update/overwrite existing nodes and add new ones by providing a unique CCK ID field would be a killer feature.

Subscribing.

timl’s picture

Another thread (http://drupal.org/node/301209) now pointing to this thread

ianchan’s picture

Would this module work together with http://drupal.org/project/datasync?

dieter@drupal.org’s picture

subscribing

parrottvision’s picture

Brilliant idea. What a difference it will make. Happy to contribute. Will CSV Parser module do similar?

Subscribing.

drdmmr’s picture

Very interested in this.

Subscribing.

corpseHU’s picture

Category: feature » bug

Hi!

Very good idea!

I'm test te patch, but i gave this:

www:/data/http/drupal/sites/all/modules/node_import/supported# patch < /data/install/node.inc_.diff
patching file node.inc
Hunk #1 FAILED at 185.
Hunk #2 succeeded at 279 (offset 4 lines).
1 out of 2 hunks FAILED -- saving rejects to file node.inc.rej

Not problem that FAILED?

temp’s picture

Can i use patch form #20 for 6.x version?

DiJae’s picture

Will this feature be included in the next version of node_import?

drdmmr’s picture

subscribing

usa2k’s picture

subscribing

Also interested in optional overwrite
I use Titles to autogenerate node names
Double import from example-title gets example-title-0

An override interactive, and override all would be very cool.
The revision idea is a good idea if all the import was changed data.

Encarte’s picture

subscribing

nevmoor’s picture

subscribing

roeneman’s picture

subscribing

John Gentilin’s picture

I posted some code here http://drupal.org/node/422282, waiting on someone to provide feedback on if this is a valid node_import / drupal approach..

My approach was to write a new module that would hook into the {module}_import_form_{formname}_node_form_alter($data)
callback... When that occurs, I grab the form data that is about to be updated, do a look up based on my unique columns then insert the nid / vid into the $data array. I am working with Drupal 6.x, but from my research, this should work with little change for 5.x. Actually it looks easier in 5.x because http://api.drupal.org/api/function/node_save/5 will automatically update vid.. For 6.x I cheat and used the existing vid.

The approach is a little technical, but it is only a few lines of code to master.

-John G

tomsm’s picture

I agree. I also need this feature. :-)

Subscribing

fishhaddock1’s picture

Version: 5.x-1.6 » 5.x-1.9

I am using node import to import in XLS products into ubercart. It worked really well for additions, but being able to update products would be amazing. I tried the patch but it failed, as I am using node import 5.x-1.9 ... can any one look into fixing the patch for the latest version of node import?

tetramentis’s picture

Version: 5.x-1.9 » 6.x-1.0-rc4

subscribing

Bobuido’s picture

subscribing

quinns’s picture

Just wanted to pitch in that the direct method of updating the database listed in #4 works quite well. Be sure to flush all your Drupal caches after performing this.