Hello,
I currently have a Drupal site set up and am using the Feeds module (http://drupal.org/project/feeds) to create CCK nodes from a few RSS feeds. This is a standard use case of that module, at least as far as I know.
The need has arisen for the ability to also create content from non-RSS/non-XML sources. I need a new Parser created for the Feeds module that would allow for one to populate CCK fields based on the parsing of raw HTML content. My first thought is that the user should be allowed to define a regular expression for each field, with the field then being populated by the output of the regular expression applied to the raw HTML content. However, I am open to suggestions on different solutions which might be easier for the developer.
For reference, I have also looked into the Import HTML module, but feel that this route (a new Feeds Parser) will be a better long term solution.
Oh, and in case this is ever a consideration in the "Paid Services" section: I fully intend for this to be made open to the rest of the Drupal community (assuming that the wonderful creators of Feeds want it).
Please contact me with a quote if you would be interested in this work, and we can work out the details.
Cheers,
Bryce
Comments
xpath module?
Maybe you can consider working with the xpathparser project?
---
Datascape
Yes!
Yes, I've been alerted to the project and am actively looking into the code. I have a hard time telling how well it suits my needs at this exact stage, because the documentation is so sparse. I sincerely don't mean that as a knock in any way; it's just one truth to young projects. In any case, I'm excited to see that there is demand for such functionality out there, and working now to test my way through the module and help debug.
Thanks for the heads up!