I was tasked with developing an import tool to pull over our 40,000+ items from our old database. There is a lot of variety to how our old data has been stored, so our nodes had to match the old models and our import tool had to account for this. Also, we have a lot of information that would benefit from being cross-referenced (e.g. 35,000 contacts shared and used as node references for the 40,000 pieces).
So, I tried using the ImportExport API and I ran into wall with it. We needed to be able to take one piece of source data and export it as many nodes as neccessary (e.g. 1 piece and its 4 supporting nodes). As the export builds its data, it makes node ids. makes nodes for them and then puts them in as references in the "main" node. My working title for this module is "direct_node_import" and I have written it for 4.7x with an eye to importing CCK created content-types.
Decisions/Assumptions:
- Expect that command line executions will be common. When dealing with large record sets, it could take hours to process
- Expect that the import destination will be remote. In our case, the old server and the new server are 7,000 miles apart. The old server is unlikely to handle Drupal well, so the export work is done there and the import work is done on the new server
- Use the import site as the source for import directives and taxonomy. To this end, I built in a way to export individual taxonomy references. The exporting site will store these and use them so that it makes the fewest trips possible.