Feeds HTML Parser for Node Creation

By brycesenz on 22 Jun 2010 at 23:58 UTC

Hello,

I currently have a Drupal site set up and am using the Feeds module (http://drupal.org/project/feeds) to create CCK nodes from a few RSS feeds. This is a standard use case of that module, at least as far as I know.

The need has arisen for the ability to also create content from non-RSS/non-XML sources. I need a new Parser created for the Feeds module that would allow for one to populate CCK fields based on the parsing of raw HTML content. My first thought is that the user should be allowed to define a regular expression for each field, with the field then being populated by the output of the regular expression applied to the raw HTML content. However, I am open to suggestions on different solutions which might be easier for the developer.

For reference, I have also looked into the Import HTML module, but feel that this route (a new Feeds Parser) will be a better long term solution.

Oh, and in case this is ever a consideration in the "Paid Services" section: I fully intend for this to be made open to the rest of the Drupal community (assuming that the wonderful creators of Feeds want it).

Please contact me with a quote if you would be interested in this work, and we can work out the details.

Cheers,
Bryce

Comments

xpath module?

hanno commented 16 July 2010 at 21:16

Maybe you can consider working with the xpathparser project?

---
Datascape

Yes!

brycesenz commented 24 July 2010 at 07:04

Yes, I've been alerted to the project and am actively looking into the code. I have a hard time telling how well it suits my needs at this exact stage, because the documentation is so sparse. I sincerely don't mean that as a knock in any way; it's just one truth to young projects. In any case, I'm excited to see that there is demand for such functionality out there, and working now to test my way through the module and help debug.

Thanks for the heads up!

Feeds HTML Parser for Node Creation

Comments

xpath module?

Yes!

News items

Our community

Documentation

Drupal code base

Governance of community