Last updated August 31, 2010. Created on April 4, 2007.
Edited by Gogowitsch, GoddamnNoise, LinL, Reviresco. Log in to edit this page.

How-to: Import data into CCK nodes

Do you have data that you want to import into your drupal database, such that each row of data becomes a new node? Then this post may be for you. This originally stemmed from a comment in the forums importing data into CCK created node.

(N.B. There is also a Node Import module that will import data from csv and tsv text files into most types of CCK fields.)

There are two issues addressed by this post

  • Which tables need to have data added to them.
  • An example of one import method, using MS Excel for creating the SQL strings

Assumptions, as this is somewhat technical:

  • You understand the basic concepts of drupal node data (you should be somewhat comfortable with what "nid", "vid" and "uid" are).
  • You should know something about the SQL language and field data types. You should also be comfortable interacting with your drupal database using what ever tool you use (I am using phpMyAdmin, so some of the directions may be specific to that).
  • You should attempt this first on a test site. You should not try this on your live-site and come back posting a comment that it didn't work and your site is a mess and you need urgent help - this isn't intended to sound harsh, but realistic.
  • And if you want to use the MS Excel example at the end for creating the SQL strings then, you will need to know how to use MS Excel including creating formulas and using functions.

Section 1: Which Tables

Drupal Tables - Required - Each of these is important, and you should review them and know what each field will store; it is fairly clear when you look at them.

node
this stores the primary information about each node, you need a record in this table for each node
node_revisions
this stores body and teaser fields and allows for revisions, you should have a record in this table for each node
node_comment_statistics
stores info on comment stats for each node. If the Comment module is disabled you do NOT need to import data into the "node_comment_statistics" table. If the Comment module is enabled, then you do, even if the default comment setting is 'disabled' for the node-type.
sequences
you need to update 2 rows. each node has one "nid" and minimally one "vid", the next "nid" and "vid" values are store in this table, once you know what values you will be using, update the two rows with the next number, the rows have name values of "node_nid" and "node_revisions_vid"

Drupal - Additional Concerns - this does not address all possible concerns, but gives an idea of additional tables to look at.

Taxonomy
If you want to import data as part of your taxonomy, you need look at
  • "vocabulary" one row for the vocabulary the "vid" comes form the sequences table, adjust accordingly - I don't know that I would use SQL to get this data into your database - just use your normal drupal interface for creating the vocabulary/category
  • "term_data" one row for each term, the "tid" comes form the sequences table, adjust accordingly. Keep your terms simple at first - not structured in a tree, you can rearrange them in a tree after the fact using the normal drupal interface
  • "term_hierarchy" one row for each term, use "0" for parent - see note above
  • "term_node" should be obvious, relates a node to a term ("nid" and "tid")
  • "term_relation" and "term_synonym" these two are optional with regard to taxonomy and starts to get complex, if you need this, you should be able to figure out, again, I would suggest keeping it simple and avoiding this table in short term - see note above
Node Access
If you are using one of the various module that implement some sort of node access scenario - you may need to add data to the "node_access" table
Organic Groups
If you are using Organic Groups and the node-type you are importing is a "Group" you will need to add data to "og", "og_uid", and possibly "og_uid_global"
If you are using Organic Groups and the node-type you are importing is a child of a "Group" you will need to add data to "og_ancestry"

CCK Table(s)
With CCK, each node type has an additional table named "content_type_{node-type}" with the fields "nid" and "vid". In the following, "my_table" is used as a type to provide some context, so with a node type of "My Table" the CCK table would be "content_type_my_table"

With CCK, the data model is fairly complex and needs a bit of understanding. On one level, is which table your CCK field is located. It was just noted that CCK will create a table called "content_type_my_table", and one would first think that all the fields would be located in that table, but CCK allows for some pretty complex stuff, so it doesn't always locate the field in the obvious table.

CCK Field Location Scenarios

Unique Single Value Fields
A CCK field that does NOT allow multiple values and is not re-used, meaning that on the main field listing at admin/content/types/fields ONLY one node-type appears in the "Used in" column, it also means you did NOT use the "Add existing field" option with the field).
This field will be located in the "content_type_my_table" table.
Re-Used Single Value Fields
A CCK field that does NOT allow multiple values but is re-used, meaning that on the main field listing at admin/content/types/fields MORE THAN one node-type appears in the "Used in" column, it also means you DID use the "Add existing field" option with the field.
This field will be located in its own table "content_field_{field_name}" table. In addition to your field information, the table will have "nid" and "vid" columns.
Multiple Value Fields
A CCK field that DOES allow multiple values, can be either Unique or Re-Used.
This field will be located in its own table "content_field_{field_name}" table. In addition to your field information, the table will have "nid", "vid", and "delta" columns, "delta" is a number, 0 for the first value, 1 for the second, etc.

Another issue with CCK is the field's name. Depending on the CCK field type the field name changes or you might have multiple database fields for a single CCK field. This is not "field data type" but the CCK field type such as "node reference", "text", etc. First, you should know that CCK will prepend your field names with the prefix "field_" In the following "my_field" will be used to provide context so with a field named "My Field" the CCK field name would be "field_my_field"

CCK Field Name Scenarios (not all scenarios are listed - but this should be enough examples to guide you)

CCK Field Type = Node Reference
CCK adds "_nid" to the end of the field name so that the database field name = "field_my_field_nid", the value stored in the field will be a "nid" of a node
CCK Field Type = Integer, Decimal or Text
CCK adds "_value" to the end of the field name so that the database field name = "field_my_field_value", the value stored will be the text the user types in.
CCK Field Type = Text with Filtered Text option
In addition to the "_value" field noted above, CCK adds a field with the "_format" suffix, to store the format/filter value, so that the database field name = "field_my_field_format" is added.
CCK Field Type = Link
CCK adds 3 fields to the database, the 3 database fields have "_url" (for the URL value), "_title" (for the Title value) and "_attributes" (for the Attributes value), so that the database fields "field_my_field_url", "field_my_field_title", and "field_my_field_attributes" are added.

Another issue is the input option such as a "Select List" or "Checkboxes" that limits the choices the user can make. You are allowed to use "keyed" values in the "Allowed values list". If you are using keyed values, remember it's the "key" that will be stored in your CCK tables - and to have them show up properly you should list all your "key|value" pairs before you import your data.

Section 2: An example of an import method, using MS Excel for creating the SQL strings

Step 1: Preparation

  • Get a sheet of paper to make some notes
  • Set up your node type with all the fields you need - using the normal drupal interface
  • Create a sample node using your normal drupal interface ("Create Content" -> etc...) filling in every field (or at least every CCK field) - make a note of it's "nid" value
  • Create another sample node using your normal drupal interface ("Create Content" -> etc...) leaving every non-required field blank (or at least every CCK field) - make a note of it's "nid" value - CCK will insert NULL, 0, or something else, having this example will help you know what you should use if you don't have a value in your data.
  • Get your data that you are importing into an MS Excel spreadsheet and add two columns for "nid" and "vid"
  • Make a note of all the tables that you need to import data into, you will need to create at least one "INSERT" for each
  • Find out the next "nid" and "vid" values from the "sequences" table, and update the "sequences" table based on how many new rows you need will need. Make a note of the lowest and highest "nid" and "vid"
  • Fill in the "nid" and "vid" columnns in your spreadsheet. Each row should increment by one. I use Edit->Fill->Series to fill in the columns incrementing by 1

Step 2: Creating the SQL Strings and Files

Working with one table at a time, export 1 row of data from the table to get the querystring needed.
If you don't recognize a field, or don't know what value to store in the field, do a query that outputs the data from your 2 sample nodes, and use one of those values as appropriate. If that doesn't help, or make sense, then STOP. Don't do something unless it seems clear and reasonable. Get some more information first.

So for the "node" table I get

--
-- Table structure for table `node`
--

CREATE TABLE `node` (
  `nid` int(10) unsigned NOT NULL auto_increment,
  `vid` int(10) unsigned NOT NULL default '0',
  `type` varchar(32) NOT NULL default '',
  `title` varchar(128) NOT NULL default '',
  `uid` int(11) NOT NULL default '0',
  `status` int(11) NOT NULL default '1',
  `created` int(11) NOT NULL default '0',
  `changed` int(11) NOT NULL default '0',
  `comment` int(11) NOT NULL default '0',
  `promote` int(11) NOT NULL default '0',
  `moderate` int(11) NOT NULL default '0',
  `sticky` int(11) NOT NULL default '0',
 
  *** I REMOVED SOME STUFF HERE ***

) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=3643 ;

--
-- Dumping data for table `node`
--

INSERT INTO `node` VALUES (1, 1, 'page', 'My Page Title', 1, 1, 1168116129, 1170524637, 0, 0, 0, 0);

Then in MS Excel, create a formula, something like the following, to create the querystring.

="INSERT INTO node VALUES (" & A2 & ", " & B2 & ", 'tip', '" & C2 & "', 1, 1, 1168116129, 1168116129, 0, 0, 0, 0);"

Column A is my "nid" value
Column B is my "vid" value
Column C is my "title" value
etc.

Copy the output of the column in a text editor, remove the first row (you will see it is not an "INSERT" statement, save the file. You are ready to import this set of data. Move on to the next table, repeat this for each of the tables you need.

Section 3: Importing the Data

  • Put your site in admin mode (not required, but just in case).
  • Import each of the files, one at a time, noting any errors, and fixing all errors.
  • Once you start the import process, complete it, including fixing any errors before going back to your site.
  • Do some spot checking of a few nodes to make sure it worked.
  • Put your site back into normal mode (if applicable).

Issues, Problem Areas, Advice...

The biggest issue is apostrophes and quotation marks. If you have a text editor that highlights/color-codes SQL Statements and/or provides line numbering they can be pretty easy to find and fix. Also you will get an error and it will tell you which line was bad, easy enough to remove the lines above and fix the line with the error and then re-import it.

There are a number of ways to import data, I like using SQL strings simply because I use a text editor highlights/color-codes SQL Statements and provides line numbering, so it is pretty easy to find and fix problems.

Importing long text fields, such as body and/or teaser. Remember the fields store what the user types and the filters are used to interpret it for rendering. It can be difficult to parse the teaser from the body.

Also don't forget if you aren't successful, you can always delete it.

DELETE FROM [table] WHERE nid BETWEEN [x] AND [y]

If you find you are doing a lot of extra stuff, you need to sometimes empty (truncate) the "cache" table and/or the "cache_content" table.

If you realize that you "inserted" the wrong value - such as "node->promote = 1" you can do an "UPDATE" statement to fix the one field.

UPDATE [table] SET [field]=[value] WHERE nid BETWEEN [x] AND [y]

I would also recommend doing a test of maybe just the first 5 rows of your data imported into each of the appropriate drupal tables. Then go to your site and view those nodes to see if it worked. No need to import 3000 items to find something doesn't work.

Ultimately remember, you should be in a test site, if you can't get it to work, then nothing bad has really happened, except maybe some lost time.

Looking for support? Visit the Drupal.org forums, or join #drupal-support in IRC.

Comments

jgoldfeder’s picture

This is great info.

Here are some suggestions for what has worked for me while importing over 20k records.

Instead of following the instructions in section 2, here is how I imported the data. I'm personally comfortable with MYSQL, but the process I used reduced the opportunity for errors. Also, this seemed much easier and faster. Just backup the db before and after you import data. Remember to shut down the site or only operate on a test server when doing this. If you have folks editing stuff, it will break your import. To make life easier, find the last VID and take the next number for both the VID and NID for imported records so you don't have to track which is which (again, simple = good).

Steps
- Create all the databases inside of Drupal and configure them there
- Create one dummy record to ensure you have the proper format and export it
- Match your data to the exported data
- Use phpmyadmin to import the data after saving the data in csv format (zip it if you want)

anonymous07’s picture

Greg Bear’s picture

If you are updating via PHP, I found this function handy: cache_clear_all('content:'. $node->nid .':'. $node->vid, 'cache_content');
It is found all over the CCK module itself and is used to reset the CCK content cache when an item is updated.

This becomes an issue when you might be importing directly to a CCK field over and over again. For instance, I have a "product" content type with many CCK fields. These nodes get updated 4 times each day from a point of sale system. The sql updates direct to the CCK tables were working fine, but if you viewed a node it would be showing the old values for the CCK fields from the cache. If someone submitted that, you'd overwrite the updated changes.

The code above will clear a specific node, after it has been updated.

schildi’s picture

you can do it also using

delete from cache_content where cid like 'content:YOUR_NODEID:%';

See also (mysql syntax):

select CONCAT_WS(':','content',n.nid,n.vid) from node n, node_revisions r where n.nid=r.nid and n.vid=r.vid and YOUR_FILTER ... ;
Michael Phipps’s picture

I've been putting off building an importing routine because there was just too much to think about using this approach. I'm sure I'd make a mistake and not realise, until it was in production a few months.

I've just seen a much better solution at http://drupal.org/node/67887 which uses node_save() and very clean code instead of all this sql manipulation.

<?php
               
global $user;

       

$node = new StdClass();
       
// Information for base node
       
$node->type = 'content type'// *** This needs to be a valid content type
       
$node->uid = $user->uid;

       

// Optional - Useful if you want the create date/time to reflect something other than now
        // $node->created = ** Valid unix time stamp

       

$node->status = 1// Set to 0 if you do not want the content published
       
$node->promote = 0// Set to 1 if you do not want the content promoted to the front page
       
$node->sticky = 0// Set to 1 if you do not want the content sticky
       
$node->title = 'This is the title for the content';
       
$node->body = 'The body of the node';
       
$node->teaser = 'The teaser of the node';

       

node_save($node);
       
$nid = $node->nid;
?>

I think this a much more sane approach. CCK nodes can be handled in this manner as well. See more discussion on this at http://drupal.org/node/67887

excaliburst’s picture

Push!

daveslc’s picture

Is the above method (and list of tables) valid for drupal 6?

Carino’s picture

push!

drupsites’s picture

^ is this vaild for Drupal 6 as well?

markatasu’s picture

Under Assumptions you suggest familiarity with terms and concepts. Can you provide links or references to resources for some of this basic information? Obviously a search on "nid" has been useless. As someone who hasn't got under the hood very much (and for my site at work I have no access to the database), where can I find some basic instruction on this? Most of the books I've found address either the front-end experience exclusively, or focus on the back end at a level suitable for users with degrees in programming.