Im getting or ’ characters in my feed. I see this is a character encoding issue.
However, the feed I am pulling from is "iso-8859-1". Is this not an acceptable character set? If not, is there a way to convert it using feeds?
For example: http://www.medworm.com/rss/userss.php?qu=PCOS&journals=on
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- generator="FeedCreator 1.7.2" -->
<rss version="2.0">
The output looks likes this (in a view) http://www.pcosvancouver.com/research
My basic feed importer is set as:
HTTP Fetcher
Download content from a URL. - Auto detect feeds
Common syndication parser - Parse XML feeds in RSS 1, RSS 2 and Atom format.
Node processor
Text Format - Full HTML
Replace existing nodes
Nodes Never Expire
Thank you for any assistance. I am not sure if this is a bug, or just a noob error.
Comments
Comment #1
peem83 CreditAttribution: peem83 commentedI have the same issue with importing nodes. my csv contain 'ë' characters. when I change csv file encoding type import inserts empty title for nodes.
Comment #2
HunterElliott CreditAttribution: HunterElliott commentedI believe that to have it import higher-end ascii characters properly, you must save your feed/export your feed as a UTF-8 file.
As an example, export out something from Excel that has these characters as a regular CSV, then reopen the file in Excel. You'll see they're all garbage characters now. Then export the original file as a Unicode Text file, your characters should show properly.
(you can also just open these exported files in Notepad or some other plain-text editor)
Comment #3
colle901 CreditAttribution: colle901 commentedFor CSV files, the UTF-8 encoding is simple to do and works for me. However, I need to know if there is a solution for XML feeds from external sites where I do not have any control over the supplied character encoding?
Comment #4
xaqroxDuplicate of #1428272: Added support of encoding conversions to the CSV Parser
Comment #4.0
xaqroxThank you message added
Comment #5
erwangel CreditAttribution: erwangel commentedI'm reopening this issue because the "duplicate" on which it was closed is only about "csv import" or the issue is rather generic to all importers.
Here is my case :
symptom : the same as the one that initiated the issue (accentuated characters like "é", "ù", etc converted as "é", "ù" after Feeds import
collateral problem : feeds tamper could not preg_match strings (filter words)
cause/origin: incoming feed displayed encoding iso-8859-1 (<?xml version="1.0" encoding="iso-8859-1"?>) while server's header was utf-8 (Content-Type: application/rss+xml; charset=utf-8)
solution : change/correct the feed's "displayed" encoding and if "real encoding"
so in common_syndication_parser.inc, added the following
Discussion: this worked for me but this is probably not the best place to do it as it will only correct "Common syndication parser". Perhaps a better place is http_request.inc in http_request_get function after headers are read (line 200). Also th mb_detect_encoding will not always give the right result.
Here is a similar problem and a patch that was committed to an old version of drupal's common.inc Error when importing non utf-8 feeds
Comment #6
babusaheb.vikas CreditAttribution: babusaheb.vikas commentedI have used feeds version 7.x-2.0-alpha8 and its work for me.
So check it once with feeds version 7.x-2.0-alpha8 I hope your problem will solved.
Comment #7
MegaChriz CreditAttribution: MegaChriz commentedRetitling.
@erwangel
Regarding the comment in #6, have you tested if this is still an issue in the latest dev of Feeds?
Comment #8
MegaChriz CreditAttribution: MegaChriz commentedComment #9
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedMarked #1916100: Font error as a duplicate.
Comment #10
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedMarked #2832484: HTTP header charset is ignored as a duplicate.
Comment #11
erwangel CreditAttribution: erwangel commented@MegaChriz #7 : sorry for the late answer. Yes for me it works fine now ! (7.x-2.0-beta2)