Use parallel requests [#2458237]

As title. Using the HTTP Parallel Request & Threading Library likely improves performance of the cron job that "visits" all node pages and makes it less likely we'll have time-out errors.

Comments

Comment #1

lolandese CreditAttribution: lolandese commented 24 March 2015 at 09:45

It comes down to using the contrib function httprl_request() instead of drupal_http_request(), but only if the mentioned module is enabled.

Probably we should use it in a non-blocking way, not waiting for the response back. We wonder however if this would still result in the cache being rebuilt for the concerning page? To test. Currently we use drupal_http_request() with the HEAD method. The server MUST NOT return a message-body in the response. It turns out Drupal rebuilds the full page's cache anyway.

Furthermore we have tested the cache warmer only with a limited amount of nodes. We should create dummy content, using Devel_generate, each containing arandon Flickr image (using a random sorted Flickr block based on geo, date or taxonomy) to see if the cache effectively rebuilds on all.

Last but not least, we currently implemented a batch processing if if cache lifetime is substantially bigger than the cron interval (see code below). It seems however that the cache is cleared after cron run in any case for all pages, making our batch processing obsolete. This is unexpected behaviour. Could we change/override this in cron?

/**
 * Virtually visits all nodes of selected content types to ensure the cache of
 * these pages is rebuild to avoid long page loads for a real visitor.
 * Note that with the HEAD method the server MUST NOT return a message-body in
 * the response. It turns out Drupal will rebuild the full page's cache anyway.
 */
function flickrcachewarmer_run($nids) {
  // Visit each node.
  foreach ($nids as $nid) {
    if ((variable_get('flickr_curl2', 0) || !function_exists('stream_socket_client')) && function_exists('curl_version')) {
      $result = flickr_curl_http_request($GLOBALS['base_url'] . '/node/' . $nid, array(
        'method' => 'HEAD',
      ));
      $cmethod = 'cURL';
    }
    elseif (function_exists('stream_socket_client')) {
      $result = drupal_http_request($GLOBALS['base_url'] . '/node/' . $nid, array(
        'method' => 'HEAD',
      ));
      $cmethod = 'stream_socket_client';
    }
    if (isset($result)) {
      if ($result->code != 200 && ($cmethod == 'stream_socket_client' || $cmethod == 'none') && function_exists('curl_version')) {
        // Try to use cURL when drupal_http_request returns a different code than
        // 200 (valid request, no errors). Most likely are 403 (forbidden) or 408
        // (Request Timeout).
        $result = flickr_curl_http_request($GLOBALS['base_url'] . '/node/' . $nid, array(
          'method' => 'HEAD',
        ));
        $cmethod = 'cURL';
        $message = t('Automatic fallback to the cURL connection method kicked in on nid ') . $nid . t(' to handle the request. Result code from the failing request: ') . $result->code;
        drupal_set_message($message, 'warning', FALSE);
        watchdog('flickr', $message, array(), WATCHDOG_WARNING);
        // Even the cURL method returns an error.
        if ($result->code != 200) {
          // Debug info.
          if (variable_get('flickr_debug', 0) == 2 && module_exists('devel')) {
            dpm(t("Value of 'result' on nid ") . $nid . t(" with error in 'function flickr_request()' with connection method ") . "'" . $cmethod . t("' in 'flickr.inc':"));
            dpm($result);
          }
          flickr_set_error(t("Could not connect to Flickr, Error: @error", array('@error' => $result->error)));
        }
      }
    }
    else {
      $message = t("There seems to be no connection method available on your server. Neither 'stream_socket_client' nor 'cURL'.");
      drupal_set_message($message, 'error', FALSE);
      watchdog('flickr', $message, array(), WATCHDOG_ERROR);
    }
  }
}