I have a couple of thousand scientific journal articles in xml format I would like to import into drupal.

What is the best way to go?

Comments

pobster’s picture

Write a custom module which 'sucks' in say 50 articles at a time (to relieve the strain on the server), use hook_cron; http://api.drupal.org/api/5/function/hook_cron

Are all the files in the same format?

...You could also just 'translate' them on the fly, provide a list of the files using file_scan_directory then theme the output. Probably not the best idea though...

Pobster

mu’s picture

I have FTP-access to a directory with all files. All journal articles have the format. A DTD is available. The file number grows constantly.

The nice thing is:

All files have keywords included (taxonomy).

tobbe_s’s picture

Did you take a look at Biblio module? It's tailored to handle references to scientific articles etc. One of the features is to import references from xml-files or files exported from e.g. EndNote.

mu’s picture

I did.

But the format I have seems to be not available there.

It's a special format given away by the National Library of the US to its licencees.

However it might be a good idea to talk to the biblio developers.

tobbe_s’s picture

Talk to the developers or have a look at one of the files that defines input formats (e.g. endnote8_parser.inc). I don't think it would be very hard to customize that file to handle the National Library format (if it is well formatted XML). I have not used Biblio for a very long time or for a very large database, but the features are very nice.

If you have access to the program Endnote, then you could perhaps import the xml files to EndNote, and then export the EndNote library to an XML file that you import to biblio. I've done it using EndNote v.10, and it was absolutely painless.