I've done some experiments to find out how much memory is needed to download data via HTTP with Feeds. To measure this I enabled the "Display memory usage" setting from the Devel module and executed the cron manually using the link on the Drupal status report page (admin/reports/status).
I made sure that Feeds would need to download a large file of about 50 MB in size via HTTP. I found out that Feeds needs about three times the memory of the filesize to fetch and parse that file. So more than 150 MB of memory.

With the help of http://stackoverflow.com/questions/7967531/php-curl-writing-to-file#answ... I accomplished to only use a third of the memory. I could optimize it much further by completely dismiss reading the file contents in memory, but that may cause backwards incompatibility issues for parsers that expect to be able to receive the full data so I leave that out for now.
There's only one thing I wasn't able to solve yet due to insufficient knowledge about proxy servers. The following code I could not yet update in the optimization:

<?php
// When using a proxy, remove extra data from the header which is not
// considered by CURLINFO_HEADER_SIZE (possibly cURL bug).
// This data is only added when to HTTP header when working with a proxy.
// Example string added: <HTTP/1.0 200 Connection established\r\n\r\n>
// This was fixed in libcurl version 7.30.0 (0x71e00) (April 12, 2013),
// so this workaround only removes the proxy-added headers if we are using
// an older version of libcurl.
$curl_ver = curl_version();

if ($proxy_server && $curl_ver['version_number'] < 0x71e00 && _drupal_http_use_proxy($uri['host'])) {
  $http_header_break = "\r\n\r\n";
  $response = explode($http_header_break, $result->data);
  if (count($response) > 2) {
    $result->data = substr($result->data, strlen($response[0] . $http_header_break), strlen($result->data));
  }
}
?>

So I would love some help with that.

Patch will follow.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

MegaChriz created an issue. See original summary.

MegaChriz’s picture

Note that this solution currently uses an anonymous function which would break support for PHP 5.2.5+.

Status: Needs review » Needs work

The last submitted patch, 2: feeds-download-optimize-2880129-2.patch, failed testing.

MegaChriz’s picture

Status: Needs work » Needs review
FileSize
2.98 KB
1.2 KB

In case a http request results into a 304 status code, there will be no file contents. It doesn't make sense in this case to read the rest of the temporary file as filesize($temp_file) will be equal to curl_getinfo($download, CURLINFO_HEADER_SIZE); in this case.

This patch should at least fix some of the test failures.

Status: Needs review » Needs work

The last submitted patch, 4: feeds-download-optimize-2880129-4.patch, failed testing.

MegaChriz’s picture

Status: Needs work » Needs review
FileSize
2.96 KB
931 bytes

fread() reads only 8192 bytes at max. Therefore we need a loop for reading the rest of the file instead.