We have a parsing script to convert 10,000+ static html files to drupal nodes.
However, we are running into a bit of a snag. Everything looks great when just spitting out the title and body text from the parsing, but stuff get's ugly when trying to put it into Drupal
After using the code shown below, we have several problems:
1. Everything is cut off after an apostrophe for both title and body. It's like the node_save function stops after it reaches an apostrophe. So if the title is "Matt's Stuff", it's saved in Drupal as just "Matt". But the $pageTitle variable does have the entire title in it. Same for body. The body will just be cut off after the first apostrophe.
2. Sometime the html code for spaces and other certain characters (like " ") show up in the title and body instead of the actual character. Very annoying.
Here is the drupal related part:
$node = new StdClass;
$node->title = $pageTitle;
$node->type = 'story';
$node->status = 1;
$node->body = $fileContent;
$node->uid = $authorID;
$node->created = $dateMod;
$node->changed = $node->created;
$node->promote = 0;
$node->sticky = 0;
$node->format = 4;
$node->path = $inputDirectoryPath . '/' . $inputFileName;
node_prepare($node);
node_save($node);