This project is in maintenance mode. No new features will be added. New installs should use Feeds extensible parsers instead.

Feeds XPath Parser is a Feeds plugin for parsing XML and HTML documents. It enables site builders to leverage the power of Feeds to easily import data from complex, external data sources. Each element you wish to extract is setup using configurable mapping queries, saving time for developers who would otherwise have to code complex, specific-use modules. It also enables end-users to build web scrapers and other useful tools within Drupal.

You may need this module if you would like to:

  • Import XML or HTML documents into nodes, users, taxonomy terms, or regular database tables
  • Scrape webpages like regular feed sources with scheduling, updating, and expiring
  • Extract content from HTML documents to create a semantic data bank or mashup

Features

  • Builtin query debugger to assist with writing XPath queries
  • Tidy support for badly formatted markup
  • Variable substitution, allowing you to use the value from one or more queries as arguments in another
  • Various workarounds that cover up PHP’s idiosyncrasies with XPath

Notes

If you’re not familiar with XPath, but know CSS or jQuery, you might be interested in Feeds QueryPath Parser which has the same features with a different query language.

Credits

** You MUST run update.php after upgrading.

Project Information

Downloads