Feed Import - import multiple XML files with xinclude
Sometimes you have to import two or more xml files in the same way. There are several possibilities: creating a feed configuration for every xml file, creating a PHP function that uses Feed Import API or,
you can take a different approach by using xml xinclude.
Why? Because you'll have to maintain just one feed configuration and you can do this only from UI.
When? Usually when the xml files have the same structure.
How? Pretty simple, using xml XInclude.
Note: you'll need at least version 3.2 of Feed Import
Create a new feed configuration using as source (reader) XMLDocument or DOMDocument.
Don't use the URL setting, in this case it is better to use a raw source (option is somewhere at the bottom).
For this example we'll import some rss feeds containing issues of Feed Import project. We want just the bug reports (https://drupal.org/project/issues/rss/feed_import?categories=1) and the feature requests (https://drupal.org/project/issues/rss/feed_import?categories=3).
The rss xml looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xml:base="https://example.com/base" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>Channel title</title>
<link>http://example.com/linkch</link>
<description></description>
<language>en</language>
<item>
<title>Some title 1</title>
<link>http://example.com/link</link>
<description>Some description 1</description>
<pubDate>Thu, 14 Jun 2014 19:05:55 +0300</pubDate>
<dc:creator>Creator name</dc:creator>
<guid isPermaLink="true">http://example.com/link</guid>
</item>
<item>
<title>Some title 2</title>
<link>http://example.com/link</link>
<description>Some description 2</description>
<pubDate>Thu, 14 Jun 2014 19:05:55 +0300</pubDate>
<dc:creator>Creator name</dc:creator>
<guid isPermaLink="true">http://example.com/link</guid>
</item>
</channel>
</rss>
So, we create a simple xml structure which will include the two rss using xinclude. The procedure is simple, first we have to use the xinclude namespace and then just add references to rss xml files. Additionally we filter the included xml files using xpointer (which accepts as argument an xpath). Take a look at the following structure demonstrating the usage of xinclude:
<?xml version="1.0" encoding="UTF-8"?>
<items xmlns:xi="http://www.w3.org/2003/XInclude">
<xi:include href="https://drupal.org/project/issues/rss/feed_import?categories=1" xpointer="xpointer(//item)">
<xi:fallback>No bug reports</xi:fallback>
</xi:include>
<xi:include href="https://drupal.org/project/issues/rss/feed_import?categories=3" xpointer="xpointer(//item)">
<xi:fallback>No feature requests</xi:fallback>
</xi:include>
</items>
To be able to use xinclude, for "LibXml options" also select "Implement XInclude substitution".
All paths for fields will be based on the processed structure, which will be somthing like:
<?xml version="1.0" encoding="UTF-8"?>
<items xmlns:xi="http://www.w3.org/2003/XInclude">
<item>
<title>Some title 1</title>
<link>http://example.com/link</link>
<description>Some description 1</description>
<pubDate>Thu, 14 Jun 2014 19:05:55 +0300</pubDate>
<dc:creator>Creator name</dc:creator>
<guid isPermaLink="true">http://example.com/link</guid>
</item>
<item>
<title>Some title 2</title>
<link>http://example.com/link</link>
<description>Some description 2</description>
<pubDate>Thu, 14 Jun 2014 19:05:55 +0300</pubDate>
<dc:creator>Creator name</dc:creator>
<guid isPermaLink="true">http://example.com/link</guid>
</item>
<!-- Other items -->
</items>
That's it! Below is an exported version of feed configuration using xinclude.
{
"entity": "node",
"settings": {
"uniq_path": "guid",
"preprocess": null,
"feed": {
"protect_on_invalid_source": false,
"protect_on_fewer_items": 0
},
"processor": {
"name": "default",
"class": "FeedImportProcessor",
"options": {
"items_count": 0,
"skip_imported": false,
"reset_cache": 100,
"break_on_undefined_filter": true,
"skip_defined_functions_check": false
}
},
"reader": {
"name": "xml",
"class": "SimpleXMLFIReader",
"options": {
"url": "",
"parent": "\/\/item",
"class": "SimpleXMLElement",
"options": {
"256": "256",
"16384": "16384",
"1024": "1024"
},
"namespaces": "",
"stream": "",
"raw": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\\n<items xmlns:xi=\"http:\/\/www.w3.org\/2003\/XInclude\">\\n <xi:include href=\"https:\/\/drupal.org\/project\/issues\/rss\/feed_import?categories=1\" xpointer=\"xpointer(\/\/item)\">\\n <xi:fallback>No bug reports<\/xi:fallback>\\n <\/xi:include>\\n <xi:include href=\"https:\/\/drupal.org\/project\/issues\/rss\/feed_import?categories=3\" xpointer=\"xpointer(\/\/item)\">\\n <xi:fallback>No feature requests<\/xi:fallback>\\n <\/xi:include>\\n<\/items>"
}
},
"hashes": {
"name": "uhm",
"class": "UnpublishingHashManager",
"options": {
"group": "test_xinclude",
"ttl": "60",
"update_chunk": "300",
"insert_chunk": "300"
}
},
"filter": {
"name": "default",
"class": "FeedImportMultiFilter",
"options": {
"param": "[field]",
"include": null
}
},
"fields": {
"title": {
"field": "title",
"column": false,
"paths": [
"title"
],
"default_action": 3,
"default_value": "",
"update_mode": 0,
"filters": [
],
"prefilters": [
]
},
"body": {
"field": "body",
"column": true,
"paths": [
"description"
],
"default_action": 2,
"default_value": "",
"update_mode": 0,
"filters": {
"Convert html entities": {
"function": "html_entity_decode",
"params": [
"[field]"
]
}
},
"prefilters": [
]
},
"created": {
"field": "created",
"column": false,
"paths": [
"pubDate"
],
"default_action": 1,
"default_value": "now",
"update_mode": 0,
"filters": {
"Convert to timestamp": {
"function": "strtotime",
"params": [
"[field]"
]
}
},
"prefilters": [
]
}
},
"static_fields": {
"type": "article",
"body": {
"format": "full_html"
}
},
"functions": [
]
}
}
Help improve this page
You can:
- Log in, click Edit, and edit this page
- Log in, click Discuss, update the Page status value, and suggest an improvement
- Log in and create a Documentation issue with your suggestion