Link Scraper

Experimental project

This is a sandbox project, which contains experimental code for developer use only.

This is a simple module that will populate a node from the metadata of another web page, similar to the way that Facebook populates data for links shared on Facebook.

Create a content type with a link field, a body or description field, and an image field. Optionally include a text field to hold the site name of the source, a link field for the data from 'article:author' and a term reference field for the data from 'article:tag'.

Configure the scraper at admin/structure/link_scraper and indicate the content type to populate, the link field that will be used as the source of the page content, fields that should receive the contents of the source web page.

Create a new node. Paste a link into the title field and the 'url' part of the link field. Leave all the other fields blank. To avoid the need to even populate the title, use the auto_entitylabel module and set it to populate the title with the link field value if the title is empty.

When saved, the requested field content will be retrieved from Facebook Open Graph information on the source page and used to populate the node. If there is no Open Graph information, it will search for Twitter information, then fall back to the title and description meta tags of the source page.

Known issues: If the url contains redirection, there is no way to retrieve information from it. Some pages have redirects for login or ads that make it impossible to retrieve the source code of the page, and none of the metadata will be inserted into the node.

Project information

Unsupported
Not supported (i.e. abandoned), and no longer being developed. Learn more about dealing with unsupported (abandoned) projects
No further development
No longer developed by its maintainers.
Created by KarenS on 6 May 2013, updated 27 May 2013