HTTPRL Spider uses the excellent HTTP Parallel Request & Threading Library to perform an internal cache seeding spider via Drush. This is not a module, it's a command line call that can be used to effectively rebuild all Entity Cache and Boost cache files for an entire site. By default it will spider all known, front-facing entities in a site with a single call. It also has some hooks to allow for adding additional non entity paths to target (such as views and other menu paths).

Installation

Just run drush dl httprl_spider to get the latest version; it's a drush plugin.

Simplest example

drush cc all
drush hss --nonblocking

This will clear all caches and then issue nonblocking requests for all front-facing entity paths. This has performance implications so make sure that you run this as part of a nightly job and not all the time (especially on larger sites) as this is effectively a self-imposed denial of service attack. A bit safer call is drush hss which issues the calls but sequentially and yet more reasonable is drush hss node to have it only spider nodes on the site.

Another common call

Say you want to request just a single path as to rebuild the cached version of it (like the front page or another critical page). drush hsr node would issue a call against the path "/node" and rebuild it in this example.

More ridiculous example

We now support XMLRPC Page Load which allows for simulating the loading of a page via xmlrpc.php calls. This allows for setting up crontab jobs that simulate a user hitting the site. This is very useful if you use authcache or alternate cache bins and want to have it seem like a "author" accessed all the pages on the site without them actually doing it. This also allows you to go around traditional login prompts to seed cached pages without actually exposing those pages to the internet as the information in the request is never returned in the call (it just tricks drupal into rendering the result as delivered via apache / nginx).

drush @site hss node --xmlrpcuid=10

This example will simulate all nodes requested for user 10 (assuming that XMLRPC Page Load is enabled on the site). There's also a rather insane demonstration function (depending on the scale of your site) called huss.

drush @site huss node

This will load every active user in the site, then spider the spider, effectively creating derivatives for every page as if every user in the site hit it. If done effectively with something like authcache, after each role is accounted for it should be really fast to snag subsequent page loads. This is rather experimental but amazingly cool!

History

This project draws on concepts originally pioneered by Drush Entity Cache Loader with the main difference being that while it seems entity cache tables it then makes a http request for the url to the object it just cached. This causes not only the entity cache to be populated but css/js aggregated files, boost static html copies, menu and any other cache system to be populated. This is effectively the same thing as an anonymous user going to the page except they won't have to take the performance hit to be delivered the content.

This project is a spin off of #2087955: Add Drush support for command line HTTPRL calls which provides the same functionality but will be managed in this place as agreed upon by the maintainer.

ELMS

This project has been created as part of the ELMS initiative at Pennsylvania State University.

Supporting organizations: 
Developed, Implemented

Project information

Releases