It would be very helpful to have the option to RE-import (or update/overwrite/whatever) existing nodes, rather than automatically creating new nodes for any imported data. My organization would be willing to underwrite at least some of this development.
We have a site, for example, that has various data on all 14,000+ U.S. school districts -- federal funding, test scores, student demographics, etc. Certain fields (# of students in a district) change over time, while others need to be added as new data becomes available each year. Right now our only options seem to be:
- Enter/revise the data by hand, node by node (!)
- Work directly in the mysql tables
- Purge all the existing nodes, then re-create them via Node Import
Options 2 & 3 are both do-able, but being able to use Node Import to "revise & extend" existing nodes would be much, much cleaner. And I can imagine lots of other use for this -- there are still so many cases where data used by a site "lives" and is maintained elsewhere. (This would also have the potential to seriously mess up a site, I know -- but at least for User 1, it'd be a great tool to have.)
Would extending Node Import in this way be possible/practical? Or are the complexities in terms of VIDs, etc., that I'm missing?
Thanks,
TKS
Comment | File | Size | Author |
---|---|---|---|
#21 | test1.csv_.txt | 402 bytes | OliverColeman |
#21 | test2.csv_.txt | 655 bytes | OliverColeman |
#20 | node.inc_.diff | 2.71 KB | OliverColeman |
#17 | node.inc_.txt | 8.24 KB | robomalo |
Comments
Comment #1
Zach Harkey CreditAttribution: Zach Harkey commentedThis is very true. I also have a client in this same situation. The ability to sync/update nodes from a csv would be a boon for administering large groups of nodes that are actually maintained in another system.
My client is willing to contribute to the effort as well. Can anyone estimate a cost of development for this feature?
Comment #2
Zach Harkey CreditAttribution: Zach Harkey commentedI just found these other two threads that are pretty much driving at this same direction, there seems to be several people willing to support this feature.
Automated Node Import/Export via RSS/XML/FTP Feed
node_import future directions
Comment #3
makbeta CreditAttribution: makbeta commentedI just would like to second this request. Ability to update/overwrite nodes will be extremely helpful as it allows for easy bulk editing of the nodes.
Thanks.
Comment #4
TKS CreditAttribution: TKS commentedFor what it's worth, we just went ahead and did our updates directly in mysql. It's actually fairly easy, IF you're updating fields that are unique to one CCK nodetype. (When multiple nodetypes use a field, that field gets its own table -- and I haven't worked the mysql join requirements for dealing with that.)
It's really just a matter of...
Once we were certain the updates were working, we strung together the UPDATE commands for each of the fields being overwritten, and ran them through in one big batch.
It helps to make sure the table you create/import with the new data have column names that match up with what what's in the Drupal DB, but it's not required -- you just need to make sure you're matching them up correctly. After all, mysql won't know that you don't really want to overwrite the old "phone number" field with the new dataset's "fax number" field! (Did I mention that backing up your DB before you begin is key?...)
Comment #5
kingandy CreditAttribution: kingandy commentedMan, I was just about to make this exact request.
I think you could do it without worrying about nid/vid, if you put the onus on the data provider to ensure that if a field is nominated as a 'key' field then each value is unique - in the same way as you're given the option to ensure all node titles are unique.
Actually, that might be a good starting point - extend the 'unique node title' check to overwrite the existing node instead of refusing to import the new one, and add functionality to use other (e.g. CCK) fields ...
Just a thought.
Comment #6
martinkong CreditAttribution: martinkong commentedAcutally, this may be very easy. Just update supported/node.inc, node_node_import_fields(), just add
'nid' => t('Node: NID'),
into the $fields array.
Then add the a id field into the csv. You should then be able to map the id field to nid. If you want a revision, make sure you choose "Revision" at the publishing option, otherwise, no revision will be created.
Of course, you will need to get the correct nids for all the records in the csv. I have tried it with couple simple CCK types and it worked. Haven't tried any CCK with fancy fields though.
May need more work if you want a CSV file to do both insert and update.
Hope this helps.
Comment #7
kingandy CreditAttribution: kingandy commentedI basically want to bump this request because it's come up again for me.
Without delving too deeply into the inner workings of the module, I think the way I'd like to see it behave is to have a select box at the stage before the preview - alongside the 'unique titles' box and the default value settings and the category behaviour options and so forth - which optionally allows one to select a Unique/Key Field. Probably restricted to fields which have had import fields mapped to them on the previous screen.
Then for each row of the file, instead of beginning the next stage with an empty node object, the module could search the database for a matching key value and (if found) run a node_load for the object in question. The values of this node object could then be altered according to the fields present in the table and saved as normal.
I realise I'm sweeping a world of complicated processing under the carpet there - the field may be in any one of a number of tables, not to mention that node_load does not populate the $node object in the same way as node_save requires them, particularly in the case of CCK radio button fields - but that's the basic behaviour I'd be hoping for.
Comment #8
oliveyrc CreditAttribution: oliveyrc commentedI would really like this ability too, I have CSV export from a database that includes a unique reference/primary key, this is maintained that end I don't need to worry about.
I've created my CCK node type which I wish to import the rows into. Then give the ability to map the unique/primary key, from the import file to a field specified in the CCK node type, and the option what to do with it, update/merge, skip, overwrite.
How does that sound for a start? I've not been developing in Drupal too long and think this would be a nice little project to get me started into module development. Feel free to chip in and suggest ideas improvements, hopefully it would be nice to get some testing done on this too by people wanting the same functionality.
Comment #9
philippejadin CreditAttribution: philippejadin commented@oliveyrc : Did you get any result on this project? This would be a very very nice addition.
Comment #10
chrisroditis CreditAttribution: chrisroditis commentedI'd be more than willing to help oliveyrc! Do you anything we can test?
Comment #11
oliveyrc CreditAttribution: oliveyrc commentedI have to hold my hand up here, I've not managed to spend much time on this just yet, I seemed to get side tracked into adding a header skip row option, that and spending a couple of days over xmas ill has not helped, if people are interested in this then I'll fire it back up and have another look.
Sound good?
Rich
Comment #12
philippejadin CreditAttribution: philippejadin commentedI'm definitely interested and would like to help if I'm given some directions on the best way to implement this.
In all cases thank you for keeping this feature request alive !
Philippe
Comment #13
redmood CreditAttribution: redmood commentedhas someone managed to do that ? i mean update or insert based on a specific field comparison ?
Comment #14
drupaloSa CreditAttribution: drupaloSa commentedI'm also interested in this type of functionality.
Comment #15
Summit CreditAttribution: Summit commentedSubscribing, greetings, Martijn
Comment #16
gthing CreditAttribution: gthing commentedI would love to have this feature as described by comment #7. That's exactly the way I need it to work.
Comment #17
robomalo CreditAttribution: robomalo commentedI work at an art museum and here is my similar scenario and solution to post #7. I don't know if this could help anyone else, and only works when you have unique titles (the unique field as described in post #7), but don't click on unique titles. I'll try to explain this as well as I can without confusing everyone.
The node title is the artwork's accession number and this will never change. Sometimes they give me exports from our artworks database and some fields change. So I modified node.inc to add a second checkbox under node options that reads "Update nodes with matching titles."
Basically my node.inc queries the node table, then pulls and assigns the node id to the import. You preserve your node id's, for comments, node reference, etc., but all the other fields are replaced with the new data.
I know my attachment won't work for everyone here, but maybe it can act as a catalyst for a more robust feature like this. Download the attachment, remove "_.txt" to make it node.inc, and replace the current node.inc in node_import/supported. Just play with this as a proof of concept for post #7. In my case, it works perfectly.
Comment #18
drutube CreditAttribution: drutube commentedI just wanted to note here that the last patch or include file replacement (as it isn't actually a patch) works quite nicely in a certain circumstance. I was trying to create users and advanced profiles from an excel export from an old ASP site. I had usernames and emails and some scattering of field data associated like gender and occupation. The problem is that user import doesnt work with CCK fields very easily or at all. So by using the above reference include file I just mapped the title to the already created username and was able to bring the data into the users advance profile fields. Worked fine.
Comment #19
zeezhao CreditAttribution: zeezhao commentedThe fix in #17 works well for me, using 5.x-1.6. Thanks for the code!
Comment #20
OliverColeman CreditAttribution: OliverColeman commentedI've created another patch at http://drupal.org/node/271809 that allows updating existing nodes with the same title. For some reason I didn't check to see if this sort of feature request had already been made. The patch I've made is slightly different to that of this thread as it relies solely on the node title (at the moment anyway; however it would be easy to modify to allow comparing against any field in the node including nid. I would like to implement this at some point as it makes it much more flexible for little extra complexity).
My patch works the same way as that of #17, but uses a drop down box with the options of "Don't import", "Create new node", and "Update existing node" instead of adding a new checkbox.
There seems to be a bug with both my patch and that in #17 (not surprising since they work the same way), it was posted against my patch: http://drupal.org/node/271809#comment-890732
I've marked my feature request as a duplicate of this one.
Comment #21
OliverColeman CreditAttribution: OliverColeman commentedI can't duplicate the first part of the error I mentioned in #20 (http://drupal.org/node/271809#comment-890732):
I created two import files (attached) and applied them in order with the "Update existing nodes" option chosen. The first file contains rows for three nodes, with no errors. The second file contains rows for three new nodes, two of which have errors, and updates for the three existing nodes (duplicate titles), two of which have errors. The new nodes with errors don't get imported, and the existing nodes with errors in the update data don't get updated. I just realised the test files are for Ubercart Products, apologies for not using a standard type, but easy enough to adapt..
The second part of the error I can duplicate: no error file is downloadable. However no error file is downloadable when the regular option of Creating new nodes (instead of updating) is in effect either, so perhaps this is a different Node Import bug.
Comment #22
zeezhao CreditAttribution: zeezhao commentedThanks for your reply.
The first bug actually happens during the "unique node" load scenario. (Sorry for the mix-up...) i.e.
- assuming you already have a title in the database
- then you try and load a new import with the same title with the "unique node" check box or drop down selected,
- the preview complains that there as duplicates (as expected)
- but the duplicate gets physically loaded into the database (not expected...)
I am also using products (ubercart) too with essentially the first version uc_product.inc though.
Comment #23
conniec CreditAttribution: conniec commentedThis worked for me AFTER I checked Revisions on the content type page (this sounds like it should be obvious, but it was't to me , so I'm shaing my lesson learned.)
Connie
Comment #24
nico_ CreditAttribution: nico_ commentedhey Oliver,
first: Thanks for the patch. I tested your code for a little while, and updating and importing of nodes seem to work quite well. However I noticed the following (maye this is related to the error you are reffering to):
If you
1. Import a CSV
2. Update one node manually (e.g.) by filling in a field that has not been filled out by Node Import (of course neither change title nor content_type)
3. Run the same CSV import again and choose "Update existing Nodes" from your select field.
than, as for the one node that has been manually changed, a new node is being created.
I do not know if this is the way it should be, but on our system, some fields are being filled by Node import and others are changed manually, so it would obviously make sense to update the nodes fields rather than duplicate it like in the described case.
Comment #25
OliverColeman CreditAttribution: OliverColeman commentedHmm. In response to #24, I've discovered some more odd behaviour, but couldn't reproduce that in #24.
1. Some nodes in admin/content/node are marked as new after the second import even though they have the same nid as before, and some are marked as updated (expected). The ones that were marked as new were ones that had fields manually modified, but they still get marked as new even after a third import when no fields have been changed since the last import. This doesn't seem like a big issue, though it could be confusing.
2. With the Page type, most of the fields manually changed between imports seem to get wiped clean if nothing is mapped to them (at least the log, a cck field, 'Promoted to front page', and 'Sticky at top of lists'), the only field that doesn't seem to get wiped if nothing is mapped to it is the Body.
Choosing to create a new revision or not made no difference to these results.
I tested this feature pretty thoroughly for importing Ubercart Products. Almost none of the manually changed Product fields are wiped when updating a product and not specifying a mapping to the fields. The only ones that did were ones that get set to a sensible default during processing if there's no mapping to them (I wrote the UC Product extension for node_import). Perhaps that's what's causing problem 2, the Page or node import code is setting default values (ie empty strings) for non-mapped fields during processing, which then overwrite the existing value. Though I don't know why this wouldn't affect the body field (maybe it's allowed to be null?)
Unfortunately I don't have time to look at the code at the moment, not sure when I will.
Comment #26
goodeit CreditAttribution: goodeit commentedGreat patch, Oliver! Do you have any plans to make it compatible with fields other than the Node Title (e.g. a custom 'primary-key' type CCK field)?
Thanks!
Comment #27
OliverColeman CreditAttribution: OliverColeman commentedGlad you liked it Goodeit. :) I don't have any specific plans to allow selecting nodes on other fields simply because I'm very busy and the client I developed it for doesn't want it. However if I did I just realised the thing to do would be to provide a select box that lets you pick any of the fields that are mapped to from the CSV as well as the nid (thereby providing maximum flexibility for every node type). This should be fairly trivial to implement.
Comment #28
batbug2 CreditAttribution: batbug2 commentedI would LOVE this functionality, because my database doesn't have the Unique titles, but has Unique ID field, which never change. And the database is 3000+ items, it takes few hours to delete all nodes from drupal and re-import from scratch! Thanks in advance!
Comment #29
goodeit CreditAttribution: goodeit commented@ oliver: "fairly trivial" is subjective ;)
@batbug2: If you are manually deleting all your nodes, you may wish to check out delete_all, a module that might help reduce your deletion time a bit.
Comment #30
yngvewb CreditAttribution: yngvewb commentedBeing able to select a unique CCK ID field would just do this module perfect!
If you are re-importing I guess it’s very likely that you have an ID from another system you would like to check against, rather than against the title, that actually not is a unique field.
Comment #31
batbug2 CreditAttribution: batbug2 commented@goodeit, i have to delete all nodes of just one node type
Comment #32
goodeit CreditAttribution: goodeit commented@batbug2: I'm sorry, I saw this in the features list and assumed it was in the module. Apparently it was added but they never released a new version. You could try getting the new files straight from CVS (at your own risk). Or just keep an eye out for a new version.
Comment #33
AD-DA CreditAttribution: AD-DA commentedSubscribing.
Comment #34
asak CreditAttribution: asak commentedVery cool. subscribing.
Comment #35
PatFrat CreditAttribution: PatFrat commentedBeing able to select a unique CCK ID field would be useful.
subscribing
Comment #36
giannhsv CreditAttribution: giannhsv commentedI think being able to update/overwrite existing nodes and add new ones by providing a unique CCK ID field would be a killer feature.
Subscribing.
Comment #37
timl CreditAttribution: timl commentedAnother thread (http://drupal.org/node/301209) now pointing to this thread
Comment #38
ianchan CreditAttribution: ianchan commentedWould this module work together with http://drupal.org/project/datasync?
Comment #39
dieter@drupal.org CreditAttribution: dieter@drupal.org commentedsubscribing
Comment #40
parrottvision CreditAttribution: parrottvision commentedBrilliant idea. What a difference it will make. Happy to contribute. Will CSV Parser module do similar?
Subscribing.
Comment #41
drdmmr CreditAttribution: drdmmr commentedVery interested in this.
Subscribing.
Comment #42
corpseHU CreditAttribution: corpseHU commentedHi!
Very good idea!
I'm test te patch, but i gave this:
www:/data/http/drupal/sites/all/modules/node_import/supported# patch < /data/install/node.inc_.diff
patching file node.inc
Hunk #1 FAILED at 185.
Hunk #2 succeeded at 279 (offset 4 lines).
1 out of 2 hunks FAILED -- saving rejects to file node.inc.rej
Not problem that FAILED?
Comment #43
temp CreditAttribution: temp commentedCan i use patch form #20 for 6.x version?
Comment #44
DiJae CreditAttribution: DiJae commentedWill this feature be included in the next version of node_import?
Comment #45
drdmmr CreditAttribution: drdmmr commentedsubscribing
Comment #46
usa2k CreditAttribution: usa2k commentedsubscribing
Also interested in optional overwrite
I use Titles to autogenerate node names
Double import from example-title gets example-title-0
An override interactive, and override all would be very cool.
The revision idea is a good idea if all the import was changed data.
Comment #47
Encarte CreditAttribution: Encarte commentedsubscribing
Comment #48
nevmoor CreditAttribution: nevmoor commentedsubscribing
Comment #49
roeneman CreditAttribution: roeneman commentedsubscribing
Comment #50
John Gentilin CreditAttribution: John Gentilin commentedI posted some code here http://drupal.org/node/422282, waiting on someone to provide feedback on if this is a valid node_import / drupal approach..
My approach was to write a new module that would hook into the {module}_import_form_{formname}_node_form_alter($data)
callback... When that occurs, I grab the form data that is about to be updated, do a look up based on my unique columns then insert the nid / vid into the $data array. I am working with Drupal 6.x, but from my research, this should work with little change for 5.x. Actually it looks easier in 5.x because http://api.drupal.org/api/function/node_save/5 will automatically update vid.. For 6.x I cheat and used the existing vid.
The approach is a little technical, but it is only a few lines of code to master.
-John G
Comment #51
tomsm CreditAttribution: tomsm commentedI agree. I also need this feature. :-)
Subscribing
Comment #52
fishhaddock1 CreditAttribution: fishhaddock1 commentedI am using node import to import in XLS products into ubercart. It worked really well for additions, but being able to update products would be amazing. I tried the patch but it failed, as I am using node import 5.x-1.9 ... can any one look into fixing the patch for the latest version of node import?
Comment #53
tetramentis CreditAttribution: tetramentis commentedsubscribing
Comment #54
Bobuido CreditAttribution: Bobuido commentedsubscribing
Comment #55
quinns CreditAttribution: quinns commentedJust wanted to pitch in that the direct method of updating the database listed in #4 works quite well. Be sure to flush all your Drupal caches after performing this.