One of the mappers that never quite made it to FeedAPI was a Imagefield/Filefield mapper.

The clostest patch that almost made it is: #319538: Mapper for FileField / ImageField

Especially with the ability to use Feeds as an import engine, a mapper like this would be a lifesaver for someone trying to import many images or files into a drupal site.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Gerard McGarry’s picture

Can I just add my voice to this request? I've been experimenting with FeedAPI to create a news aggregation site using Drupal. In order to create an even remotely interesting layout, importing images from the feed and refactoring them for Imagefield would be a critical part of the functionality.

How difficult would it be to port the ImageGrabber module across to work with Feeds? I'd be happy to help testing this against one of my test servers if that's of any use.

pbuyle’s picture

Status: Active » Needs review
FileSize
17.46 KB

Here is a patch with a filefield mapper and its test case.

The mappers assumes a source that provides values as array of array of array of enclosures as done by FeedsSimplePieParser. It is more flexible when it comes to the what en enclosure is as it will accept objects, arrays and strings. For an object, it assume the link properties is the URL of the enclosure. For an array, it does the same with the link key. It assume a string to be the direct URL. Relative URLs are assumed to be local files and are not fetched using HTTP (or other where ini_get('allow_url_fopen') is true).

The test case needs more work since it doesn't cover all those possibilities.

pbuyle’s picture

Here is a more complete and fixed patch.
- the test case check for filename (without extension) in created node edit pages.
- dpm() and tabbed indents have been removed
- Support both SimplePie_Enclosure and FeedResultEnclosure (see #641522: Consolidate import stage results and provide generalized enclosure class).

mylos’s picture

I downloaded this patch and I tested it on my site. I have CSV file with few columns and one for images (image_url) so I have URL of image. So tell me please is possible to get those picture from that URL, download it and upload to node's CCK image_field ?

regards
Milos

PS. I got this error:
warning: Invalid argument supplied for foreach() in /home/pinefactory/pine-chest-of-drawers.com/sites/all/modules/feeds/mappers/filefield.inc on line 36.

pbuyle’s picture

It is not currently possible. The mapper expect the source's value to be an array of arrays of arrays. I don't think the CSV Parser can produce such an array. AFAIK, the only compatible parser is the SimplePie one.

It shouldn't be hard to fix it for your case. I would like to work on it, but I don't have enough time at the moment. An existing test case (using SimpleTest) would help.

jerdavis’s picture

Status: Needs review » Needs work

I needed to get this working with CSV data, so I took the example and made a couple of quick tweaks to allow it to work with a non-array as well. This probably needs a bit of cleanup, but works. It'll also likely need the tests updated.

<?php
// $Id:$

/**
 * @file
 * On behalf implementation of Feeds mapping API for filefield.module (CCK).
 * 
 * @todo Support the <em>list</em> subfields (howto provides default value ?)
 * @todo Support the <em>data</em> subfileds (or its own subfiedss ?)
 */

/**
 * Implementation of hook_feeds_node_processor_targets_alter()
 */
function filefield_feeds_node_processor_targets_alter($targets, $content_type) {
  $info = content_types($content_type);
  $fields = array();
  if (isset($info['fields']) && count($info['fields'])) {
    foreach ($info['fields'] as $field_name => $field) {
      if ($field['type'] == 'filefield') {
        $name = isset($field['widget']['label']) ? $field['widget']['label'] : $field_name;
        $targets[$field_name] = array(
          'name' => $name,
          'callback' => $field['type'].'_feeds_set_target',
          'description' => t('The URL for the CCK !name field of the node.', array('!name' => $name)),
        );
      }
    }
  }
}

function filefield_feeds_set_target($node, $field_name, $value) {
  static $fields = array(); 
  $items = isset($node->{$field_name}) ? $node->{$field_name} : array();
  // value structrue is expected to by an array of array of array of enclosure
  if (is_array($value)) {
    foreach($value as $type => $v) {
      foreach($v as $subtype => $enclosures) {
        foreach($enclosures as $enclosure) {
          $error = true;
          $url = _enclosure_resolve($enclosure);
          $file = _fetch_from_url($url);
        }
      }
    }
  }
  else {
    $file = _fetch_from_url($value);
  }
  if($file) {
    $field = content_fields($field_name, $node->type);
    $target_dir = filefield_widget_file_path($field, user_load($node->uid));
    $info = field_file_save_file($file, array(), $target_dir);
    if($info) {
      if($field['list_field']) {
        $info['list'] = $field['list_default'];
      }
      $items[] = $info;
      $error = false;
    }
  }
  if($error) {
    watchdog("feeds", "Could not save enclosure @url in field %field_name", array('@url' => $url, '%field_name' => $field_name));
  }
  $node->$field_name = $items;
}

/**
 * Get the filename for an enclosure URL. If the URL is a valid external URL,
 * its content is download as a temporary file. Otherwise, it is considered
 * as a filename.
 *
 * @param $url
 *   The URL
 * @return
 *   The filename for $url, or 0 on error.
 */
function _enclosure_resolve($enclosure) {
  //Get enclosure URL
  if(is_object($enclosure)) {
    if(method_exists($enclosure, 'get_link')) {
      //SimplePie_Enclosure style
      $url = $enclosure->get_link();
    } elseif(method_exists($enclosure, 'getUrl')) {
      //Generalized enclusore interface, as in inital patch in http://drupal.org/node/641522
      $url = $enclosure->getUrl();
    } elseif(isset($enclosure->link)) {
      //Fallback to a link property
      $url = $enclosure->link;
    }
  } elseif(is_array($enclosure)) {
    $url = $enclosure['link'];
  } elseif(is_string($enclosure)) {
    $url = $enclosure;
  } else {
    return 0;
  }
  return $url;
}

/**
 * Fetch file from URL
 */
function _fetch_from_url($url) {
  //Resolve URL
  if(valid_url($url, true)) {
    //this enclosure is an absolute URL, download it.
    $filepath = file_destination(file_directory_temp().'/'.strtolower(get_class($this)).'/'.basename($url), FILE_EXISTS_RENAME);
    if (ini_get('allow_url_fopen')) {
      return copy($url, $filepath) ? $filepath : 0;
    } else {
      feeds_include_library('http_request.inc', 'http_request');
      $result = http_request_get($url);
      if ($result->code != 200) {
       	watchdog('feeds', 'Download of @url failed with c_enclosure_resolveode !code.', array('@url' => $url, '!code' => $result->code));
      	return 0;
      } else { 
      	return file_save_data($result->data, $filepath);
      }
    }
  } else {
    return file_exists($url) ? $url : 0; 
  }
}
pbuyle’s picture

Status: Needs work » Needs review
FileSize
18.49 KB

Here is a version that should accept yet more structures for $value in filefield_feeds_set_target. It first try to resolve it as an enclosure. If it fails but if $value is an array, it the same process is applied on all its items. A enclosure can be an object, an array or a string.

branislav.mikic’s picture

nice work mongolito404,
thank you a thousand times!

mylos’s picture

Thanks man,

You are my hero. That import it's working like charm with image cache and CCK.
Look at example: http://www.pine-chest-of-drawers.com

regards and many thanks
Milos

Jordi Bufí’s picture

Hi everybody,

I've downloaded the patch and applied it to the necessary files, but there isn't a "filefield" option on the select list for sources (i'm on the "mapping for node processor" page.

I have also created an image filefield on the "feed" node type but it isn't shown on the target select list.

How is it supposed to work?

BWPanda’s picture

Status: Needs review » Reviewed & tested by the community

Working for me too.

rjbrown99’s picture

This is working for me. I'm doing CSV imports of URLs to the field, and it is successfully going out and grabbing the remote image and saving them into the field. I'm able to manipulate via Imagecache. Works well. Thanks.

jarea’s picture

Forgive me if I am being a little thick, but assuming 624088_7_mapper_filefield.patch is the one to run with, to what do I apply this patch? And should it reside in the feeds/mappers directory?

Thanks for the clarification.

rjbrown99’s picture

Drop it into the feeds root directory, then patch -p0 < mypatchfile

Jordi Bufí’s picture

Hi again,

I partly fixed my problem: I had created the image filefield on the wrong content type ("feed", not "feed item").

Now I can select the target "Image" but I don't know how to configure it for getting an image from the html of an RSS element (i.e. the first one). It's possible?

pbuyle’s picture

The mapper handles assigning files from enclosures found in the item to a FileField field. File linked from HTML in the item 'description' are not supported.

PS: IMHO, the best way to handle this is the create a subclass of the SimplePie or Common syndication parser that will extract the URL first linked file from the each item 'description' and add it to its 'enclosures'. It would be even better to fix the imported RSS to use enclosures instead of html link in description.

Jordi Bufí’s picture

Many thanks mongolito404!

rjbrown99’s picture

FWIW, I'm doing my imports via CSV and it works well. My imported CSV field is just a URL with the absolute path to an image.

http://my.server.com/path/to/IMAGE.JPG

alex_b’s picture

Status: Reviewed & tested by the community » Needs work
FileSize
18.93 KB

- Replaced tabs indents with spaces
- Some code style fixes

Looking at this patch better, we should be able to get away with none of the enclosure object encapsulation and merely accept URLs and arrays of URLs as a mapping source. We could patch FeedsSimplePieParser to expose enclosures as arrays of URLs. This would make the patch much simpler by maintaining the same level of functionality. More powerful enclosure handling can be always added in when needed.

rjbrown99’s picture

Interesting potential issue.

I'm doing an import of CSV fields, and one field is an imagefield with a URL like:
http://somesite.com/images/IMAGE_CORRECT_NAME.jpg

Well, one of the imported image fields has the wrong URL. Instead of IMAGE_CORRECT_NAME.jpg they are just feeding me IMAGE_NAME.jpg, which results in a 404 when you visit that URL.

The result is that the feed import never completes. I have to manually stop and kill the import process. If I fix the name of the JPG to the correct name, the feed completes normally.

This is using the patch from #7.

pbuyle’s picture

Some formats provide more than an URL for enclosures. RSS enclosures have a required length and mime-type. Other formats may provides a description of the enclousre as a text. Reducing enclosure to URL means that this information is lost on parsing and cannot be used by mapper. My current code for the mapper in #624088 doesn't include handling for this information but I see two immediate uses:
* The mime-type of an enclosure mapped to an ImageField (which is a FileField) can be used to drop any non-image enclosure
* The description of an enclosure can assigned to the $info['data']['description']

I was waiting for #641522 to be resolved (and some freetime) to work on this. The idea is to rewrite _enclosure_resolve to always return an instance of the enclosure interface defined there. Either because the passed in $enclosure is such an object or because it is a string that can be used to create an new, temporary, enclosure object for an url of a local file.

rjbrown99’s picture

OK, I have more information on the 404 issue from my previous post. In a nutshell, Feeds stops importing when I encounter an imagefield URL that results in a 404.

1) I added a few statements to filefield.inc to see which of the 'if' paths I was following down the Resolve URL trail in the _enclosure_resolve function. In my case, I fall into the "allow_url_fopen" camp. The current code in this module looks like this:

if (ini_get('allow_url_fopen')) {
      return copy($url, $filepath) ? $filepath : 0;
    }

Based on reading the PHP website for the copy function, perhaps it should be enclosed in an if statement to handle the condition where it fails to copy? This is their demonstration code:

if (!copy($file, $newfile)) {
    echo "failed to copy $file...\n";
}

2) I noticed that if I 'cancel' the import, and then re-import, it picks up at the item AFTER the 404 item that stops it. So at least for me, if I encounter a stuck import I can cancel, re-start, and it will keep going.

pbuyle’s picture

 return copy($url, $filepath) ? $filepath : 0;

is (roughly) equivalent to

if (copy($url, $filepath)) {
  return $filepath;
}
else {
  return 0;
}

So copy failures should be handled. You should have an error message in the watchdog. But the import shouldn't freeze. So obviously something is wrong.

rjbrown99’s picture

Hm good point. I realized that after I read about what ? does :)

Here's my log entry:

Dec 12 15:30:52 myhostname drupal: http://mysite.com|1260660652|php|1.2.3.4|http://mysite.com/node/1498/import|http://mysite.com/node/1498/import|1||copy(http://www.retailersite.com/img/products/BROKENIMAGE__L.jpg) [function.copy]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found#015#012 in /path/to/drupal/sites/all/modules/feeds/mappers/filefield.inc on line 100.

Line 100 is return copy($url, $filepath) ? $filepath : 0;

That looks about right as a watchdog error. I'm not sure why the import stops. I left it for hours and it was completely frozen. I'm going to work on a few more imports over the next few days and will report back.

q0rban’s picture

Title: Port FeedAPI Mappers: ImageFiled/FileField » Port FeedAPI Mappers: ImageField/FileField

Subscribe

alex_b’s picture

Assigned: Unassigned » alex_b

#24: I can't confirm that a 404 would result in a stuck import process. I'm also "falling in the allow_furl_open" camp.

I did see though that some of the files are HUGE. I've will change them to a smaller size.

This is the error I get when hitting a 404 resource:

warning: copy(http://localhost/asdf) [function.copy]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /opt/local/apache2/htdocs/d6/sites/all/modules/feeds/plugins/FeedsSyndicationParser.inc on line 52.
Cannot write content to /tmp/FeedsSimplePieEnclosure-asdf

rjbrown99’s picture

I can reproduce this if I delete the nodes that correspond to the imported items that have the broken link and then re-process the import file. Can I add any additional debug details or statements that could help to shed more light on it? I can also grab packet captures off of the wire if it would be useful.

My CSV import files aren't huge - on the order of ~1000-2000 rows/items.

alex_b’s picture

FileSize
23.56 KB

I'm trying to refocus the patch after #641522: Consolidate import stage results and provide generalized enclosure class got larger in scope (my fault - sorry mongolito404). However, I think we can get strong enclosure support (mongolito404 explained in #21 why we need it) and filefield mapping with one patch - see attachment.

This patch builds on #7/#19, all tests are passing. The most important modifications are:

  • Introduces a class FeedsElement that encapsulates any single mapping source element. A source element can now be either a simple data type (string, number) or a FeedsElement or an array of either of the former two.
  • It further introduces a class FeedsEnclosure extends FeedsElement that supports additional methods like getContent() that are particular to enclosures.
  • FeedsSimplePieEnclosure is a simple adapter derived from FeedsEnclosure that encapsulates SimplePie_Enclosure object.

These changes simplify the code in filefield.inc substantially while creating a straightforward foundation that future special source elements can extend.

Open tasks/questions:

  1. In #7 here was a case checking for $enclosure->link which I didn't take over in this patch - which parser produces this format? Am I missing something on this thread?
  2. As we introduce an additional source element type FeedsElement we should update existing mappers to check for it.
  3. FeedsSyndicationParser should support FeedsEnclosure - can happen in a follow up patch.
  4. Is there another way of mapping to a filefield rather than using field_file_save_file() which forces us to download a resource to a file and then copy it again because it asks for a local file?
  5. file_save_data() fallback which kicks in if allow_furl_open is FALSE in FeedsEnclosure::getFile() is broken. I think it was also broken in the previous patches. It essentially downloads empty files.
  6. File saving is not concurrency safe. Multiple different files with the same name saved at the same time may override each other (any good ideas on how to make this bullet proof?).
alex_b’s picture

Assigned: alex_b » Unassigned

Not working on this atm.

alex_b’s picture

Title: Port FeedAPI Mappers: ImageField/FileField » Mapper: ImageField/FileField (formalize feed elements)

More comprehensive title as #28 formalizes feed source elements better.

JayCally’s picture

How can you get this to work with imagecache? It works great but doesn't create the imagecache files. So I get a broken image link for the thumbnail and small in story image. Anyone know how to implement imagecache?

BenK’s picture

Subscribing....

rjbrown99’s picture

Update: I don't think my 'hung' feed import is due to a 404'd item. I think it's due to a different issue I just created: #681160: Scalability - PHP Memory exhausted stops the Feed import.

David Goode’s picture

FileSize
23.52 KB

Fixed problem number 5 in alex's #28 post and rerolled. The issue was just a minor misuse of the drupal api calls.

David

alex_b’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
6.66 KB

#34: David - thanks for the review and the fix. I have done some minor cleanup of the patch and added the __toString() method to FeedsElement which should allow us to treat FeedsElement objects as strings.

Will commit now in

#28 1. deferred - not critical
#28 2. This is fixed by using __toString() method in FeedsElement.
#28 3. deferred #682508: FeedsSyndicationParser: support enclosures
#28 4. deferred - not critical
#28 5. fixed in #34
#28 6. deferred - not critical

alex_b’s picture

FileSize
23.95 KB

Adding missing files. Disregard patch in #35.

alex_b’s picture

Status: Reviewed & tested by the community » Fixed

Committed, thank you everybody for your work on this patch.

http://drupal.org/cvs?commit=313192

arski’s picture

Status: Closed (fixed) » Fixed

Hi,

I just patched my feeds module and everything seems to be in place. However, when I try to run an import from a .csv file to create some nodes I get the following error:

Fatal error: Class 'FeedsEnclosure' not found in .../httpdocs/sites/all/modules/feeds/mappers/filefield.inc on line 56

I can find this class where it was supposed to be added by the patch but something seems to be failing.. do I need to do something special to make Drupal catch up or something?

Thanks a lot,

Martin

rjbrown99’s picture

Try using HEAD. I think Alex moved that class to a location that can be found by the other supporting modules.

See this issue: #686462: Move FeedsEnclosure class to FeedsParser.inc

arski’s picture

Hi,

Just tried with HEAD.. same result.. maybe its not loaded somehow by the file mapper? :(

Thanks,
Martin

arski’s picture

Hi again,

I guess the latest patch wasn't in HEAD yet, I just applied the patch in that other thread and it works.

Thanks a lot.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

netentropy’s picture

will this make it into a -dev verison in the future?

publicmind’s picture

Status: Fixed » Closed (fixed)

JFYI, FeedAPI ImageGrabber has been ported for Feeds module, (http://drupal.org/project/feeds_imagegrabber). have a look, and if you can, please provide feedback.

regards,
publicmind

netentropy’s picture

i will check it out but i remember from the older version that it doesn't work if someone loads an image from their pc without spaces, then imagegrabber won't read the cck html code and grab the image

publicmind’s picture

its not known to me, please describe it in the issue queue of FIG, so that proper action can be taken to resolve it.

regards,