CVS edit link for twistor

I have been developing with Drupal for a couple of years now, but I have never contributed anything back. Today, I wrote a plugin for the Feeds module that allows someone to run XPath queries against an HTML, or XML document. My primary motivation for this is to scrape webpages in a sane manner, and create nodes from the data. The feeds module provides such an elegant solution to this, that I just have to share it. Also planned for this module is regex support, if anyone feels like being masochistic.
In the near future, I have two other modules in the works. One, is an api module that wraps up different message queue apis such as STOMP, zeromq, activemq, etc, into a generic api. I'm currently looking at both the Queue module, and the Messaging module to see if it would make more sense to implement on top of one of those, but the application is significantly different enough that I doubt it. I'm watching Pipe Dream with baited breath.
The second is, what I would call, a Comet api. I am working on one that wraps up different Comet implementations such as Orbitedand APE. I would also like to use this to implement websockts, and SSE should servers for those become stable. The goal is to make it as invisible as possible by tying into the existing AJAX and AHAH apis, and simply circumventing the normal request process. These different realtime technologies are happening and it seems to me that Drupal is missing out.

Comments

twistor’s picture

StatusFileSize
new1.49 KB

Here's the module.

twistor’s picture

Note: I am aware of the Scraper module, but that seems to be a bit of a dud.

twistor’s picture

Status: Postponed (maintainer needs more info) » Needs review
avpaderno’s picture

Status: Needs review » Needs work
Issue tags: +Module review

Hello, and thanks for applying for a CVS account.

As reported from the CVS application requirements, the proposed module needs to not duplicate the work done for an existing project. May you describe the differences between the proposed module, and the existing project (the Drupal version for which the module is created is not a difference we are interested in)?

twistor’s picture

The scraper module allows you to specify a URL and a beginning and end point of a page. It then takes that section of a page and puts it into a block.
First off, my module allows for querying any XML/HTML documents, not just webpages. They could be local files as well, or anything that the Feeds module can provide. Second, you can specify an arbitrary number of parts to pull from a page and put those into fields. Finally, this module is a plugin, it's built on top of existing modules for maximum code reuse.

avpaderno’s picture

Status: Needs work » Needs review
twistor’s picture

Status: Needs review » Needs work
StatusFileSize
new1.87 KB

Updated module. Not quite as easy to break things. Added support to choose between XML and HTML.

twistor’s picture

Status: Needs work » Needs review
twistor’s picture

StatusFileSize
new2.07 KB

Now you can choose which fields output raw XML/HTML. More error checking. Support for leaving a field blank.

meatbag’s picture

http://drupal.org/project/feeds_xmlparser

Here is an existng module which seems to do the same thing.
I suggest that you contribute there.

avpaderno’s picture

http://drupal.org/project/feeds_xmlparser is not hosted on Drupal.org; it would be more difficult for twistor to contribute to that project.

avpaderno’s picture

Status: Needs review » Fixed
  $info['FeedsXPathParser'] = array(
    'name'        => 'XPath parser',
    'description' => 'Queries an XML or HTML document using XPath.',

Those strings are not translated, as all the strings appearing on the user interface should be. The description should be Queries a XML or HTML document using XPath.

Thank you for your contribution! I am going to update your account.
These are some recommended readings to help with excellent maintainership:

You can find more contributors chatting on the IRC #drupal-contribute channel. So, come hang out and stay involved.
Thank you, also, for your patience with the review process.
Anyone is welcome to participate in the review process. Please consider reviewing other projects that are pending review. I encourage you to learn more about that process and join the group of reviewers.

I thank all the dedicated reviewers as well.

tobbe_s’s picture

The description should be "Queries a XML or HTML document using XPath".

No, the description was correct to begin with. It should be "Queries an XML or HTML document using XPath.". "a" is only used before consonant sounds, whereas "an" is used before vowel sounds. Compare "an hour", "a European" and "an X-ray".

Status: Fixed » Closed (fixed)
Issue tags: -Module review

Automatically closed -- issue fixed for 2 weeks with no activity.

avpaderno’s picture

Component: Miscellaneous » new project application
Assigned: Unassigned » avpaderno
Issue summary: View changes