loop through html [#1869792]

Hi there,

I want to scrape a news site with Feeds Xpath Parser. The news site is very basic, it has 10 ~ 15 titles, and below those titles a little block of text.

How can I loop through all the titles and blocks of text and have nodes created? Also it has to check every 30 mins for new articles, and if there are it should create a new node.

Any help will be greatly appreciated.

//W

Comments

Comment #1

blogook CreditAttribution: blogook commented 20 December 2012 at 10:09

I cant believe that there are not more people having the same issue, or perhaps even know how to resolve this one.

Settings for XPath HTML parser

context:
.//*[@id='content']

title:
.//*[@class='article-title']/table/tr/td/h1/div/a/text()

Doing the above will grab all the titles from the HTML page, however it will only create one node titled : ARRAY. If I change it as follows:

.//*[@class='article-title'][2]/table/tr/td/h1/div/a/text()

It neatly grabs the 2nd title. A work around can be that I create 10 parsers all doing the same thing except each parser grabs a different title, but that's just not how I would like it to work.

So pleassseee .. If someone knows how to do it, dont be shy and let us know :-) I have searched extensively the drupal forum, the issues for this module and googeled like a maniac hoping to find a solution. But to no avail :(

thanks in advance,

W//

loop through html

Comments

Comment #1

Thank you to these Drupal contributors

News items

Our community

Documentation

Drupal code base

Governance of community