Rule: ScrapingBot Image Crawler (AI Interpolator ScrapingBot)

Last updated on
16 February 2024

This page has not yet been reviewed by AI Interpolator maintainer(s) and added to the menu.

Base data:

Summary:
The ScrapingBot Image Crawler will take a link field and scrapes that webpage and tries to figure out using the Readability library what is the header or meta data image that is being used when you share this article on Social Media. It takes that image and stores it in an image field.

Module needed:
AI Interpolator ScrapingBot

Field types to populate:
Image field (core image module).

Base Fields types to use as context:

  • Link

Extra Requirements:
You need a ScrapingBot account.

If you want you can use the code DRUPALAI for 20% off the price the first month. This message pays for testing and development of the module.

Extra Settings:

None

Extra Advanced Settings:

Use Chrome

This will make sure that you use a browser instead of just using a network scraper that does not render the website.

Wait for Network

Check this if you want to wait for most ajax requests to finish until returning the Html content when using Chrome. This can slowdown or fail your scraping if some requests are never ending.

Proxy Country

Set the country to proxy the request from. Very useful for instance if you are scraping American websites that does not adhere to GDPR and just block the website.

Use Premium Proxy

Uses a Premium Proxy to scrape websites that are aware of server IPs. For instance use this if you spider Rakuten or Netflix. Note that this costs 10 times as much credits, or 20 times when JS rendering is on.

Possible example use cases:

  • If you have links to external websites and you want to show your own preview image of that link, this makes it available.
  • If you do internal scraping of pages for textual analysis and you want some graphical element to showcase the webpage, this can be used.

Help improve this page

Page status: No known problems

You can: