Seems to die when it hits output such as the above in the .xml import file. Trying to debug; will add a new 'if' to handle and see; more later :)

Comments

1kenthomas’s picture

I would say this was my problem (setup), but I can do a dumpXmlReader and it parses the whole file. Therefore I don't know why the existing code causes a hang, but it does :|

Rewriting as case statements.

1kenthomas’s picture

Looks to be an error in how categories are parsed: @

next($wordpress_import->data['categories']);

Well maybe. Still rewriting.

1kenthomas’s picture

Ditto tag names. You can't execute a read when you don't know what's coming next :)

1kenthomas’s picture

Etc... current code does a while(read()) and then executes reads randomly inside that loop, assuming it knows what it will read next... bad bad bad... crashes XMLReader.

1kenthomas’s picture

... ok, so even if the read is rewritten & doesn't crash XMLReader, it goes through the importing process and creates no nodes...

more to come :)

finex’s picture

I confirm the bug :-(

lavamind’s picture

Status: Active » Postponed (maintainer needs more info)

What version of PHP are you using ?

Also, could you please provide a sample export file on which the problem is manifested.

lavamind’s picture

Assigned: Unassigned » lavamind
1kenthomas’s picture

Hi, sorry, buried in other projects.

This was with 5.2.6 and 5.3.1. I though there might be an issue with my version of XML Parser, so I also switched those in/out.

I'll provide a sample file ASAP; what code rewrites I did do, got me farther (and clearer) but did not resolve (alas).

Thanks for your reply & your help!

-Ken

finex’s picture

I've used the following PHP version:
PHP 5.2.6-1+lenny4 with Suhosin-Patch 0.9.6.2 (cli) (built: Nov 22 2009 02:38:03)

I cannot provide an example file. Anyway I've solved using the old import version (1.1).

1kenthomas’s picture

StatusFileSize
new14.15 KB

Attached as tar.gz.

This file validates (passes test around l. 261, import_read_wxr) but causes WSOD later. One post found.

PHP Version 5.2.6-3ubuntu4.5, though I've tried on PHP up to 5.3.1.

Thanks again-- other XML as I get a chance.

1kenthomas’s picture

StatusFileSize
new25.13 KB

This finds only one post (though there are multiple) and also WSODs w/out import.

I will try exporting from a different WP environment, just in case...

finex’s picture

Title: Import chokes on XML errors produced by Wordpress » dies on <category domain="category" nicename="nicename"><![CDATA[Nicename]]></category> from WP 2.9.x
Status: Active » Postponed (maintainer needs more info)

edit: I've answered to the wrong thread. Sorry

lavamind’s picture

Status: Postponed (maintainer needs more info) » Active

There are two problems I found in these XML files.

First, there are atom:link elements. For some reason, Wordpress includes these tags in its output but without declaring the "atom" namespace, therefore producing malformed XML. For now, try removing all tags beginning with atom:link from your export file.

Secondly, in the "mtucker" example, there's an XML error on line 618 : Wordpress included the & (ampersand) character as an XML value, but strings containing that character should be enclosed as CDATA, or escaped using &amp;. Try correcting that mistake and your data should import properly.

I will try and see if I we could detect these XML errors and refuse to import if it finds any of them.

lavamind’s picture

Title: dies on <category domain="category" nicename="nicename"><![CDATA[Nicename]]></category> from WP 2.9.x » Import chokes on XML errors produced by Wordpress
lavamind’s picture

Okay I tracked it down to this Wordpress bug report : http://core.trac.wordpress.org/ticket/9633

In summary, this problem has only been fixed recently in the Wordpress export code, and will be released with version 3.0.

Let me reiterate that this is a problem with WordPress export, as in some cases it produces invalid XML. But what is really frightening is that the developpers don't seem to really care about generating proper XML, since their own WXR importer is a dumb regexp script.

I'll need to make some modifications to the way XMLReader is used so that XML errors are better tolerated...

lavamind’s picture

Title: dies on <category domain="category" nicename="nicename"><![CDATA[Nicename]]></category> from WP 2.9.x » Import chokes on XML errors produced by Wordpress
Status: Postponed (maintainer needs more info) » Fixed

Fixed in 6.x-2.x-dev.

First, aborts if XMLReader doesn't reach the end of the WXR file, which most likely indicates XML problems.

Secondly, documents possible problem causes/solutions in the README.

Thanks!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

nikitas’s picture

i had the same problem while exporting from wordpress and importing to drupal . . .
used the 6.x 2.x dev. version and removed the atom links plus these xml items from each post and everything worked just fine. . .!!!

<wp:meta_key>_wp_attachment_metadata</wp:meta_key>
<wp:meta_value><![CDATA[a:5:{s:5:"width";s:3:"500";s:6:"height";s:3:"375";s:14:"hwstring_small";s:23:"height='96' width='128'";s:4:"file";s:108:"/home/wpcom/public_html/wp-content/blogs.dir/47f/12741230/files/2010/03/cebaceb1cf84cf83ceb9cebaceb1cf82.jpg";s:10:"image_meta";a:10:{s:8:"aperture";s:3:"2.8";s:6:"credit";s:0:"";s:6:"camera";s:19:"Canon PowerShot G10";s:7:"caption";s:0:"";s:17:"created_timestamp";s:10:"1269348203";s:9:"copyright";s:0:"";s:12:"focal_length";s:3:"6.1";s:3:"iso";s:3:"200";s:13:"shutter_speed";s:4:"0.02";s:5:"title";s:0:"";}}]]></wp:meta_value>
</wp:postmeta>

.. .
so if the .xml cant be imported just remove those items from your posts !!!