Closed (fixed)
Project:
Feeds
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
5 Nov 2009 at 10:09 UTC
Updated:
3 Mar 2010 at 14:59 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
Gerard McGarry commentedCan I just add my voice to this request? I've been experimenting with FeedAPI to create a news aggregation site using Drupal. In order to create an even remotely interesting layout, importing images from the feed and refactoring them for Imagefield would be a critical part of the functionality.
How difficult would it be to port the ImageGrabber module across to work with Feeds? I'd be happy to help testing this against one of my test servers if that's of any use.
Comment #2
pbuyle commentedHere is a patch with a filefield mapper and its test case.
The mappers assumes a source that provides values as array of array of array of enclosures as done by FeedsSimplePieParser. It is more flexible when it comes to the what en enclosure is as it will accept objects, arrays and strings. For an object, it assume the link properties is the URL of the enclosure. For an array, it does the same with the link key. It assume a string to be the direct URL. Relative URLs are assumed to be local files and are not fetched using HTTP (or other where ini_get('allow_url_fopen') is true).
The test case needs more work since it doesn't cover all those possibilities.
Comment #3
pbuyle commentedHere is a more complete and fixed patch.
- the test case check for filename (without extension) in created node edit pages.
- dpm() and tabbed indents have been removed
- Support both SimplePie_Enclosure and FeedResultEnclosure (see #641522: Consolidate import stage results and provide generalized enclosure class).
Comment #4
mylos commentedI downloaded this patch and I tested it on my site. I have CSV file with few columns and one for images (image_url) so I have URL of image. So tell me please is possible to get those picture from that URL, download it and upload to node's CCK image_field ?
regards
Milos
PS. I got this error:
warning: Invalid argument supplied for foreach() in /home/pinefactory/pine-chest-of-drawers.com/sites/all/modules/feeds/mappers/filefield.inc on line 36.
Comment #5
pbuyle commentedIt is not currently possible. The mapper expect the source's value to be an array of arrays of arrays. I don't think the CSV Parser can produce such an array. AFAIK, the only compatible parser is the SimplePie one.
It shouldn't be hard to fix it for your case. I would like to work on it, but I don't have enough time at the moment. An existing test case (using SimpleTest) would help.
Comment #6
jerdavisI needed to get this working with CSV data, so I took the example and made a couple of quick tweaks to allow it to work with a non-array as well. This probably needs a bit of cleanup, but works. It'll also likely need the tests updated.
Comment #7
pbuyle commentedHere is a version that should accept yet more structures for $value in filefield_feeds_set_target. It first try to resolve it as an enclosure. If it fails but if $value is an array, it the same process is applied on all its items. A enclosure can be an object, an array or a string.
Comment #8
branislav.mikic commentednice work mongolito404,
thank you a thousand times!
Comment #9
mylos commentedThanks man,
You are my hero. That import it's working like charm with image cache and CCK.
Look at example: http://www.pine-chest-of-drawers.com
regards and many thanks
Milos
Comment #10
Jordi Bufí commentedHi everybody,
I've downloaded the patch and applied it to the necessary files, but there isn't a "filefield" option on the select list for sources (i'm on the "mapping for node processor" page.
I have also created an image filefield on the "feed" node type but it isn't shown on the target select list.
How is it supposed to work?
Comment #11
Anonymous (not verified) commentedWorking for me too.
Comment #12
rjbrown99 commentedThis is working for me. I'm doing CSV imports of URLs to the field, and it is successfully going out and grabbing the remote image and saving them into the field. I'm able to manipulate via Imagecache. Works well. Thanks.
Comment #13
jarea commentedForgive me if I am being a little thick, but assuming 624088_7_mapper_filefield.patch is the one to run with, to what do I apply this patch? And should it reside in the feeds/mappers directory?
Thanks for the clarification.
Comment #14
rjbrown99 commentedDrop it into the feeds root directory, then patch -p0 < mypatchfile
Comment #15
Jordi Bufí commentedHi again,
I partly fixed my problem: I had created the image filefield on the wrong content type ("feed", not "feed item").
Now I can select the target "Image" but I don't know how to configure it for getting an image from the html of an RSS element (i.e. the first one). It's possible?
Comment #16
pbuyle commentedThe mapper handles assigning files from enclosures found in the item to a FileField field. File linked from HTML in the item 'description' are not supported.
PS: IMHO, the best way to handle this is the create a subclass of the SimplePie or Common syndication parser that will extract the URL first linked file from the each item 'description' and add it to its 'enclosures'. It would be even better to fix the imported RSS to use enclosures instead of html link in description.
Comment #17
Jordi Bufí commentedMany thanks mongolito404!
Comment #18
rjbrown99 commentedFWIW, I'm doing my imports via CSV and it works well. My imported CSV field is just a URL with the absolute path to an image.
http://my.server.com/path/to/IMAGE.JPG
Comment #19
alex_b commented- Replaced tabs indents with spaces
- Some code style fixes
Looking at this patch better, we should be able to get away with none of the enclosure object encapsulation and merely accept URLs and arrays of URLs as a mapping source. We could patch FeedsSimplePieParser to expose enclosures as arrays of URLs. This would make the patch much simpler by maintaining the same level of functionality. More powerful enclosure handling can be always added in when needed.
Comment #20
rjbrown99 commentedInteresting potential issue.
I'm doing an import of CSV fields, and one field is an imagefield with a URL like:
http://somesite.com/images/IMAGE_CORRECT_NAME.jpg
Well, one of the imported image fields has the wrong URL. Instead of IMAGE_CORRECT_NAME.jpg they are just feeding me IMAGE_NAME.jpg, which results in a 404 when you visit that URL.
The result is that the feed import never completes. I have to manually stop and kill the import process. If I fix the name of the JPG to the correct name, the feed completes normally.
This is using the patch from #7.
Comment #21
pbuyle commentedSome formats provide more than an URL for enclosures. RSS enclosures have a required length and mime-type. Other formats may provides a description of the enclousre as a text. Reducing enclosure to URL means that this information is lost on parsing and cannot be used by mapper. My current code for the mapper in #624088 doesn't include handling for this information but I see two immediate uses:
* The mime-type of an enclosure mapped to an ImageField (which is a FileField) can be used to drop any non-image enclosure
* The description of an enclosure can assigned to the $info['data']['description']
I was waiting for #641522 to be resolved (and some freetime) to work on this. The idea is to rewrite _enclosure_resolve to always return an instance of the enclosure interface defined there. Either because the passed in $enclosure is such an object or because it is a string that can be used to create an new, temporary, enclosure object for an url of a local file.
Comment #22
rjbrown99 commentedOK, I have more information on the 404 issue from my previous post. In a nutshell, Feeds stops importing when I encounter an imagefield URL that results in a 404.
1) I added a few statements to filefield.inc to see which of the 'if' paths I was following down the Resolve URL trail in the _enclosure_resolve function. In my case, I fall into the "allow_url_fopen" camp. The current code in this module looks like this:
Based on reading the PHP website for the copy function, perhaps it should be enclosed in an if statement to handle the condition where it fails to copy? This is their demonstration code:
2) I noticed that if I 'cancel' the import, and then re-import, it picks up at the item AFTER the 404 item that stops it. So at least for me, if I encounter a stuck import I can cancel, re-start, and it will keep going.
Comment #23
pbuyle commentedis (roughly) equivalent to
So copy failures should be handled. You should have an error message in the watchdog. But the import shouldn't freeze. So obviously something is wrong.
Comment #24
rjbrown99 commentedHm good point. I realized that after I read about what ? does :)
Here's my log entry:
Line 100 is return copy($url, $filepath) ? $filepath : 0;
That looks about right as a watchdog error. I'm not sure why the import stops. I left it for hours and it was completely frozen. I'm going to work on a few more imports over the next few days and will report back.
Comment #25
q0rban commentedSubscribe
Comment #26
alex_b commented#24: I can't confirm that a 404 would result in a stuck import process. I'm also "falling in the allow_furl_open" camp.
I did see though that some of the files are HUGE. I've will change them to a smaller size.
This is the error I get when hitting a 404 resource:
Comment #27
rjbrown99 commentedI can reproduce this if I delete the nodes that correspond to the imported items that have the broken link and then re-process the import file. Can I add any additional debug details or statements that could help to shed more light on it? I can also grab packet captures off of the wire if it would be useful.
My CSV import files aren't huge - on the order of ~1000-2000 rows/items.
Comment #28
alex_b commentedI'm trying to refocus the patch after #641522: Consolidate import stage results and provide generalized enclosure class got larger in scope (my fault - sorry mongolito404). However, I think we can get strong enclosure support (mongolito404 explained in #21 why we need it) and filefield mapping with one patch - see attachment.
This patch builds on #7/#19, all tests are passing. The most important modifications are:
class FeedsElementthat encapsulates any single mapping source element. A source element can now be either a simple data type (string, number) or aFeedsElementor an array of either of the former two.class FeedsEnclosure extends FeedsElementthat supports additional methods like getContent() that are particular to enclosures.FeedsSimplePieEnclosureis a simple adapter derived fromFeedsEnclosurethat encapsulatesSimplePie_Enclosureobject.These changes simplify the code in filefield.inc substantially while creating a straightforward foundation that future special source elements can extend.
Open tasks/questions:
$enclosure->linkwhich I didn't take over in this patch - which parser produces this format? Am I missing something on this thread?FeedsElementwe should update existing mappers to check for it.field_file_save_file()which forces us to download a resource to a file and then copy it again because it asks for a local file?file_save_data()fallback which kicks in ifallow_furl_openis FALSE inFeedsEnclosure::getFile()is broken. I think it was also broken in the previous patches. It essentially downloads empty files.Comment #29
alex_b commentedNot working on this atm.
Comment #30
alex_b commentedMore comprehensive title as #28 formalizes feed source elements better.
Comment #31
JayCally commentedHow can you get this to work with imagecache? It works great but doesn't create the imagecache files. So I get a broken image link for the thumbnail and small in story image. Anyone know how to implement imagecache?
Comment #32
BenK commentedSubscribing....
Comment #33
rjbrown99 commentedUpdate: I don't think my 'hung' feed import is due to a 404'd item. I think it's due to a different issue I just created: #681160: Scalability - PHP Memory exhausted stops the Feed import.
Comment #34
David Goode commentedFixed problem number 5 in alex's #28 post and rerolled. The issue was just a minor misuse of the drupal api calls.
David
Comment #35
alex_b commented#34: David - thanks for the review and the fix. I have done some minor cleanup of the patch and added the __toString() method to FeedsElement which should allow us to treat FeedsElement objects as strings.
Will commit now in
#28 1. deferred - not critical
#28 2. This is fixed by using __toString() method in FeedsElement.
#28 3. deferred #682508: FeedsSyndicationParser: support enclosures
#28 4. deferred - not critical
#28 5. fixed in #34
#28 6. deferred - not critical
Comment #36
alex_b commentedAdding missing files. Disregard patch in #35.
Comment #37
alex_b commentedCommitted, thank you everybody for your work on this patch.
http://drupal.org/cvs?commit=313192
Comment #38
arski commentedHi,
I just patched my feeds module and everything seems to be in place. However, when I try to run an import from a .csv file to create some nodes I get the following error:
Fatal error: Class 'FeedsEnclosure' not found in .../httpdocs/sites/all/modules/feeds/mappers/filefield.inc on line 56
I can find this class where it was supposed to be added by the patch but something seems to be failing.. do I need to do something special to make Drupal catch up or something?
Thanks a lot,
Martin
Comment #39
rjbrown99 commentedTry using HEAD. I think Alex moved that class to a location that can be found by the other supporting modules.
See this issue: #686462: Move FeedsEnclosure class to FeedsParser.inc
Comment #40
arski commentedHi,
Just tried with HEAD.. same result.. maybe its not loaded somehow by the file mapper? :(
Thanks,
Martin
Comment #41
arski commentedHi again,
I guess the latest patch wasn't in HEAD yet, I just applied the patch in that other thread and it works.
Thanks a lot.
Comment #43
netentropy commentedwill this make it into a -dev verison in the future?
Comment #44
publicmind commentedJFYI, FeedAPI ImageGrabber has been ported for Feeds module, (http://drupal.org/project/feeds_imagegrabber). have a look, and if you can, please provide feedback.
regards,
publicmind
Comment #45
netentropy commentedi will check it out but i remember from the older version that it doesn't work if someone loads an image from their pc without spaces, then imagegrabber won't read the cck html code and grab the image
Comment #46
publicmind commentedits not known to me, please describe it in the issue queue of FIG, so that proper action can be taken to resolve it.
regards,