After installing an import.module, feeds are importing in a wrong coding - not utf-8(b.w. my browser doesn`t even recognize this encoding). And this is happens with all russian rss-sources in my news feed:(
And my taxonomy is not working with news feed, just no one of rss-feeds is marked with terms of a new vocabluary for feed news.

Comments

matteo’s picture

Probably it's not the right way, but it works...
I hacked import module in line 748. Changed UTF-8 into ISO-8859-1.
It works fine now !!!

Matteo

alex-and-r’s picture

It's happening because russian feeds are exported in win-1251 encoding when the default encoding for the drupal is utf-8. So the only possible way of importing these feeds is to decode them from win to utf when they are imported by the drupal.
The same problem was with my MovableType blog where a perl-module which is responsible for parsing the RSS-feeds works in utf-8. So it gave me an error or transformed characters in rhe feeds in such way that i cannot read them anymore. I solved it by using another perl-module which transformed the text from one encoding to another (in this case: from win to utf).

I think the same aproach can be used here, but i'm not good at all in PHP programming so cannot give direct instructions what to do... :(

P. S. И кстати, Арсений, мы тут с тобой вроде русские, а общаемся на англицком! ;) Вот она, глобализация, в действии! :)

dries’s picture

The problem is that PHP's XML parser only supports a handful of charsets and AFAIK windows-1251 is not one of them. It's a limitation of PHP: little we can do about it without making things utterly complex.

Maybe ask them to export their Russian content using UTF-8? It would make their feeds more accessible as all XML parsers can coop with UTF-8.

alex-and-r’s picture

Thanks, Dries, for your attention to this topic.

Special charsets for russian language is the main problem of russian sector of internet, cuz there are so many of them, that choosing the apropriate one is a real pain in the ass... Nowdays almost all the sites are encoded in win-1251 and so the RSS-feeds. It's some kind of tradition extrapolated from sites to feeds.
And if we ask them to move to utf-8 they won't do it, cuz this charset isn't popular in the russian internet yet. Anfortunatly we stick to traditions and use old charsets...

I don't think that situation will change in next 2 years or so... But finally we'll migrate to utf-8, but it won't be tomorrow. :(

So the only solution that i see id to decode rss feed into utf and than pass it to the parser...

dries’s picture

Two more suggestions:

  1. Ask the publisher of the feed to create two feeds, one using win-1251 and one using utf-8.
  2. Dig the PHP documentation and mailing list archives. Other Russians must have had the exact same problem.
alex-and-r’s picture

Today with the 4.3.2 issued and promissed "Fixed the news aggregator's encoding detection" i once again returned to the problem of agregation of russian feeds.
But the charsets supported by PHP function "xml_parser_create" are "ISO-8859-1", "UTF-8" and "US-ASCII" so when i try to update russian feed encoded in win-1251 i get errors "windows-1251: unsupported source encoding in import.module on line 332" So i tried to put in import.module on line 331 something like

if ($encoding <> "utf-8") {
 iconv ($encoding, "utf-8", $data);
 $encoding = "utf-8";
}

But it didn't help cuz instead of errors concerning incorrect charset for xml_parser_create i began to get errors about not well formed XML. So i guess i stuck once again and have no sollution for my problem...

P. S. And sorry if i messed something up with the above listed code, cuz i'm not good at PHP-programming at all!

glass-1’s picture

This code is working:


if ($encoding != "utf-8") {
$data = iconv ($encoding, "utf-8", $data);
$encoding = "utf-8";
}

alex-and-r’s picture

I think i solved the problem! I added in import.module in line 331 two lines of code that converts the $data variable from its $encoding to utf-8 by means of iconv function and then sets $encoding variable to utf-8.
And now i can reed news-feeds in win-1251 in my drupal engine (by the way i have 4.3.2 version)!!! I'm so glad, but i still afraid that i missed something and my solution has some flaw...

P. S. Unfortunatly i cannot post code example here cuz drupal says that my code is suspicious... :(