I have a bunch of old html files of newspaper articles which I've stripped the markup out of and turned into a csv file so that I can pull them into Drupal.
Well... I say I've stripped out the html. Actually, I've left in the
and
code so that the original paragraphs stay in place, so that we are left with nice readable text.
Unfortunately, what we are left with is something that looks like this:
<p>
Alongside lectures and coping with huge increases in alcohol intake, one of the learning experiences facing the 320,000 new students who go to university over the next couple of weeks is independent living. Halls of residence, buying your own place, lodging with a family or renting with friends: which of these is cheapest, easiest and least stressful?<p>
Here, four students describe the advantages and disadvantages of the different choices:
etc, etc... (sorry about the ugly code formatting... the only way I could get the p's into the post!)
Now, clearly I can't just add line breaks in the csv file, as this would leave each article flowing over more than one row, making csv import pretty impossible.
Is there any way I can persuade node_import to pay heed to the html tags which I'm leaving in the imported data? I've had a look through the menu options and can't see any way of doing it. If it's not possible, then can anyone suggest some sort of workaround that I could apply using find/replace to the data before I try and import it?