per subject :)

CommentFileSizeAuthor
#9 FeedsHTTPFetcher.inc_.patch3.16 KBserbanghita
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

Title: Ability to specify zipped xml feeds » Support archived and compressed feeds (zip, tar.gz)
Component: Miscellaneous » Code
Category: support » feature

This is a feature request. Would have to be implemented on the fetcher level, both on file fetcher and HTTP fetcher.

wayout’s picture

Title: Support archived and compressed feeds (zip, tar.gz) » +1 for this

My source only uses compressed feeds :( . Anyone have a patch for this by any chance (sorry i'm not a programmer)?

wayout’s picture

Title: +1 for this » Support archived and compressed feeds (zip, tar.gz)
tomcatuk’s picture

Also interested - got access to a feed, but ony compresssed (zip or gzip)

alex_b’s picture

The challenge here is going to be to detect if a resource is compressed and if yes, the compression format. There is no standard way for doing this. We could try in that order: content type per HTTP header, then file extension.

Anonymous’s picture

I'm using the custom Feeds XML Parser (found in the issue queue somewhere).

I was able to implement a quick fix to read the gzip, in my FeedsXMLParser.inc:

find:

$xml = @simplexml_load_file($file);

change it to:

// check source extension and decide to use zlib
$path_parts = pathinfo($source->config['FeedsHTTPFetcher']['source']);
$use_zlib   = $path_parts['extension'] == 'gz' ? 'compress.zlib://' : '';

$xml = @simplexml_load_file($use_zlib.$file);

Basically, the Feeds parser will read the $file as 'compress.zlib://http://www.mysite.com/feed.xml.gz'.

This may look strange, but the "compress.zlib://" prefix is actually a PHP url prefix that behaves as a gunzip().

Another example:

file_get_contents("compress.zlib:///myphp/test.txt.gz");

You should find the right line in your Parser.inc to implement this yourself.

alex_b’s picture

Should be implemented on fetcher / batch level. When a file is uploaded from client or downloaded from web, uncompress it right away. Note: of course, let's not uncompress enclosures.

serbanghita’s picture

alex_b is right, i've already implemented the code, i'll post a patch or the code here in 1 or 2 days.

The xml / gzip switch is done in <strong>getRaw()</strong> from <strong>FeedsHTTPFetcher.inc</strong>. 
I'm looking at $result->headers['Content-Type']:
 1. if it's xml, let it be
 2. if it's gzip, i'm creating a temporary .gz file on the disk read it, uncompress it and put the xml code, let it be

PS: tried to use directly http://www.php.net/manual/en/function.gzuncompress.php but this function is buggy

I will code the same stuff for zip support.

@alex_b is there any way i can modify this from another module? Like feeds_social?

Thanks!

serbanghita’s picture

FileSize
3.16 KB

Here is the solution. Full support for gzip and zip.
I've modified modules/feeds/plugins/FeedsHTTPFetcher.inc the getRaw() method.
I've also attached a patch.

Check it out!

  /**
   * Implementation of FeedsImportBatch::getRaw();
   */
  public function getRaw() {
    feeds_include_library('http_request.inc', 'http_request');
    //dpm($this->url, 'url');
    $result = http_request_get($this->url);
    //dpm($result->headers, 'headers');
    //dpm($result->headers['Content-Type'], 'Content-Type');
    //Check if we go the proper zip or gzip headers.
    $compressed = false;
    if(strpos($result->headers['Content-Type'], 'application/zip')!==false ||
       strpos($result->headers['Content-Type'], 'application/octet-stream')!==false && strpos($result->headers['Content-Type'], '.zip')!==false){
        $ext = 'zip';
        $compressed = true;
    }
    if(strpos($result->headers['Content-Type'], 'application/x-gzip')!==false ||
       strpos($result->headers['Content-Type'], 'application/octet-stream')!==false && strpos($result->headers['Content-Type'], '.gz')!==false){
        $ext = 'gz';
        $compressed = true;
    }

    if($compressed){

        if(!function_exists('gzopen') || !function_exists('zip_open')){
            drupal_set_message('Zipped file encoding detected, but you dont have PHP with <strong>php_zlib</strong> or <strong>php_zip</strong> support. You file has <em>'.$ext.'</em> extension.','warning');
        } else {

            $tmp_filename_ = md5(time()).'.'.$ext;
            $fsd = file_save_data($result->data, file_directory_path().'/'.$tmp_filename_, FILE_EXISTS_REPLACE);

            if($fsd && $ext=='gz'){
                $zd = gzopen(file_directory_path().'/'.$tmp_filename_, 'r');
                while (!gzeof($zd)) {
                    $contents .= gzread($zd, 10000);
                }
                gzclose($zd);
                drupal_set_message('Gzip encoding detected. Decoding the file to xml.');
            }
            if($fsd && $ext=='zip'){
                $zd = zip_open(file_directory_path().'/'.$tmp_filename_);
                while ($zip_entry = zip_read($zd)) {
                    if (zip_entry_open($zd, $zip_entry)) {
                        $contents .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
                        zip_entry_close($zip_entry);
                    }
                }
                zip_close($zd);
                drupal_set_message('Zip encoding detected. Decoding the file to xml.');


            }
            //Delete the temporary file.
            @unlink(file_directory_path().'/'.$tmp_filename_);

            $result->data = trim($contents);

        }
    }
    
    if (!in_array($result->code, array(200, 201, 202, 203, 204, 205, 206))) {
      throw new Exception(t('Download of @url failed with code !code.', array('@url' => $this->url, '!code' => $result->code)));
    }
   

    return $result->data;
  }
}
tomcatuk’s picture

OK, this might sound like a dumb question, but would I be right in thinking this patch is only intended for compressed XML?

serbanghita’s picture

Oh, i get it. I've should have used a general set of messages. Still the patch works for any other files.
I'll repost the improved code and messages.

tomcatuk’s picture

Thanks Serbanghita, just patched the latest release before uploading it. Initial results are looking great.

Alex....is this likely to make it into the next release?

alex_b’s picture

Status: Active » Needs work

#12: Here is what I see is open:

- Break out decompression functionality into its own helper method in FeedsImportBatch. Goal: Make it available to both, FeedsFileBatch and FeedsHTTPBatch classes.
- Implement decompression for getRaw() and getFilePath() for both, FeedsHTTPBatch class and FeedsFileBatch class.
- Clean up code.
- Add tests.

Michsk’s picture

oh my, this would be awesome

pepej’s picture

Title: Support archived and compressed feeds (zip, tar.gz) » +1 for this
pepej’s picture

Title: +1 for this » Support archived and compressed feeds (zip, tar.gz)
johnhorning’s picture

Did this ever go anywhere? I could use this feature on my Commission Junction product data feed.

ashcin47’s picture

Issue summary: View changes

@serbanghita the patch does not work on drupal 7. can somebody please share a solution for drupal 7

twistor’s picture

Version: 6.x-1.x-dev » 7.x-2.x-dev
MegaChriz’s picture

Status: Needs work » Closed (duplicate)