Select your desired import method.

Feed Import provides by default six methods of content import (readers).

Other modules can add more readers. If you want to create your own reader follow this tutorial.

XML document

Imports content from XML files using XPATH 1.0 for paths.

  • URL to a valid XML resource - where is the file located (external url, local file paths, streams...)
  • Parent XPATH - this is the context path for desired items
  • SimpleXMLElement class - you can use your own implementation of SimpleXMLElement
  • LibXml options - select desired libxml options
  • Register namespaces for XML - you can register namespaces if xml needs them
  • Stream context options - see Stream context options
  • Raw source string - see Raw source

XML chunked

Imports content from huge XML files using XPATH 1.0 for paths.
This methods reads file in chunks trying to recompose each item when needed.
In this case we will not load whole xml file in memory, meaning that we can import even files of GB size.

  • URL to a valid XML resource - where is the file located (external url, local file paths, streams...)
  • Parent XPATH - this is the context path for desired items
  • Chunk size in bytes - how many bytes to read in each chunk. You can increase this if memory limit allows you
  • Substring function - what function to use for substring. Default php substring works in most cases. Drupal substring is not recommended
  • XML properties - set xml properties for each item
  • Stream context options - see Stream context options
  • SimpleXMLElement class - you can use your own implementation of SimpleXMLElement

DOM document HTML/XML

Imports content from XML or HTML files using DomDocument. You can register and use php functions in XPATHs.

Field paths must begin in some cases with . (dot) because // will search all, not only those relative to parent. For example check context path.

  • Document format - what type of document is: XML or HTML
  • URL to a valid XML/HTML resource - where is the file located (external url, local file paths, streams...)
  • Parent XPATH - this is the context path for desired items
  • Register php functions for XPATHs - in order to use PHP functions in xpath you have to register them.
    Example of xpath using substr as registered function: //book[php:functionString("substr", title, 0, 3) = "PHP"]
  • LibXml options - select desired libxml options
  • Silence load errors - will not report errors on document load
  • Strict error checking - throws DOM errors if any
  • Preserve whitespace - do not remove redundant white space
  • Resolve externals - load external entities from a doctype declaration
  • Recover - try to parse non-well formed documents
  • Normalize document - puts the document in a "normal" form by simulating save and load
  • Stream context options - see Stream context options
  • Raw source string - see Raw source

SQL query

Reads data from an SQL resultset. You have to provide connection string, username, password (if any) and the query.
Paths are column names from resultset. You can group multiple columns using | (pipe).

  • Data Source Name - (DSN) used to connect to a database. If are connecting to a not mysql-like database please first check if you have the driver installed. For more info check PDO
  • Username - database username
  • Password - database password (if any)
  • SQL Query - the query to execute in order to extract desired info. You can use ? or :param_name as placeholder for params or just write them into sql query
  • Query params - params that will be binded to query (one param per line). Param format is :name=value (where :name is the placeholder) or simply the value if you want to replace the ? placeholder

CSV file

Reads data from CSV files. Paths must be indexes or column names. You can group multiple paths using | (pipe).

You'll need php >= 5.3.

  • URL to a valid CSV resource - where is the file located (external url, local file paths, streams...)
  • Use column names for paths - if the csv file have on the first line column names please activate this.
    Only when this is active you can use colum name in paths, othrwise you'll have to use indexes
  • Delimiter - delimiter char
  • Enclosure - enclosure char
  • Escape -escape char
  • Stream context options - see Stream context options

JSON file

Reads data from JSON files. Path format is a/b/c and you can group multiple paths using | (pipe).

  • URL to a valid JSON resource - where is the file located (external url, local file paths, streams...)
  • Parent path - this is the context path for desired items. This is optional
  • Stream context options - see Stream context options
  • Raw source string - see Raw source

Context/Parent path

A parent paths select all relevants regions that contains all item properties.

Take a look at the following xml example:

  <doc>
    <name>Some name</name>
    <items>
      <item group="1">
        <title>Title 1</title>
        <body>Body 1</body>
      </item>
      <item group="7">
        <title>Title 2</title>
        <body>Body 2</body>
      </item>
      ...
      <item group="3">
        <title>Title n</title>
        <body>Body n</body>
      </item>
    </items>
  </doc>

If we want to import all items the parent xpath will be /doc/items/item or //item. If we need only items which have group=1 we use the following xpath: //item[@group="1"]

Now the paths for fields must be relative to parent. So, for title can be: title or when using DomDocument ./title

Now a JSON example:

  {
    doc: {
      name: "Some name",
      items: [
        {
          title: "Title 1",
          body: "Body 1"
        },
        {
          title: "Title 2",
          body: "Body 2"
        },
        ...
        {
          title: "Title n",
          body: "Body n"
        }
      ]
    }
  }

To get all items we need the following path: doc/items

Field paths are relative to parent so for title it sould be simply: title

Stream context options

You can use stream context options in JSON format.

In most cases this is not needed, but some sites or requires for example a special header or a special User-Agent in order to give the resource.

You can read more about stream context here.

A good example is drupal.org, which requires an User-Agent. So, if you want to crawl something from drupal.org you will need something like:

  {
    "http": {
      "method": "GET",
      "header": "User-Agent: Mozilla\/5.0 Gecko\/20100101 Firefox\/23.0\r\n"
    }
  }

Raw source

When you only want to test the mapping you can use a string rather than a file or other resource.

You should not have any URL in settings while testing with raw source string.

Next part » Edit fields

AttachmentSize
XML reader81.37 KB
Chunked XML reader69.27 KB
Dom Document reader107.48 KB
SQL reader61.71 KB
CSV reader52.78 KB
JSON reader47.65 KB

Comments

lucsan’s picture

I had an xml feed from computerweekly with an xmlns="http://etc" in the opening tag (in this case )

When using DomDocument with XML selected as format (I couldn't get XML Document to work) Feed Import uses loadHTMLfile if raw source is used and loadXML if a uri is used. (is this an oversight?)

Parent path need a double backslash as the output is now wrapped in an html and body tag and the namespace xmlns must be used,

ie: //xmlns:containers/xmlns:container

fields need to be addressed in the same fashion xmlns:field_name_tag

and naturally (?) the Register namespace for XLM needs xmlns=http://whatever.

Note: xmlns is an example of a namespace, your namespace might be different.