Rule: Title Crawler (AI Interpolator ScrapingBot)

Last updated on

16 February 2024

Base data:

Summary:
The ScrapingBot Crawler will take a link field and scrape that webpage and figure out the title of that webpage and fill out a Text (plain) field.

If you will only scrape server-side rendered webpages from your own computer/server, look into AI Interpolator Simple Crawler that does the same thing, but for free.

Module needed:
AI Interpolator ScrapingBot

Field types to populate:
Text (plain) field (core field).

Base Fields types to use as context:

Link

Extra Requirements:
You need a ScrapingBot account.

If you want you can use the code DRUPALAI for 20% off the price the first month. This message pays for testing and development of the module.

Extra Settings:

None

Extra Advanced Settings:

Use Chrome

This will make sure that you use a browser instead of just using a network scraper that does not render the website.

Wait for Network

Check this if you want to wait for most ajax requests to finish until returning the Html content when using Chrome. This can slowdown or fail your scraping if some requests are never ending.

Proxy Country

Set the country to proxy the request from. Very useful for instance if you are scraping American websites that does not adhere to GDPR and just block the website.

Use Premium Proxy

Uses a Premium Proxy to scrape websites that are aware of server IPs. For instance use this if you spider Rakuten or Netflix. Note that this costs 10 times as much credits, or 20 times when JS rendering is on.

Possible example use cases:

Any type of job or workflow where you need to scrape a webpage for content you might want to show the title of that webpage. This does that.

Help improve this page

Page status: No known problems

You can:

Log in, click Edit, and edit this page
Log in, click Discuss, update the Page status value, and suggest an improvement
Log in and create a Documentation issue with your suggestion

On this page

AI Interpolator

Rule: Title Crawler (AI Interpolator ScrapingBot)

Base data:

Extra Settings:

Extra Advanced Settings:

Use Chrome

Wait for Network

Proxy Country

Use Premium Proxy

Possible example use cases:

Tags

Help improve this page

News items

Our community

Documentation

Drupal code base

Governance of community