I'm trying to import some content through a FeedCSVParser to create some content but I get the following error when a field contains accents like 'É'...

The error is :
SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC9...' for column 'title' at row 1

Anyone ?

Thanks !

Files: 
CommentFileSizeAuthor
#25 feeds-7.x-2.x-fix-parsercvs-1140194.patch535 bytesorb

Comments

emorency’s picture

Title: QLSTATE[HY000]: General error: 1366 Incorrect string value for a field with accents » HQLSTATE[HY000]: General error: 1366 Incorrect string value for a field with accents
emorency’s picture

Title: HQLSTATE[HY000]: General error: 1366 Incorrect string value for a field with accents » SQLSTATE[HY000]: General error: 1366 Incorrect string value for a field with accents
worldlinemine’s picture

We also have run into this issue. Unfortunately, it makes importing content in foreign languages or with foreign names quite difficult. This issue appears to occur in a wide range of special characters (i.e. diacritics) and is a strong indication that when attempting to write to the db it's not sending UTF8 characters.

Please note that saving to any of the fields via the standard edit-save on a node does function.

A few examples from our own data import would be:
’ “ ” ì ‘ ' … è ä ó ú í é

A sample error looks like:
SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xA0',\x0A ...' for column 'message' at row 1

Because these are proper terms or quoted materials, altering them would alter the meaning of the material:
"Iphigénie en Tauride" is not the same as "Iphigenie en Tauride"
"La Pasión Según Antígona Pérez" is not the same as "La Pasion Segun Antigona Perez"

If any more information would be of assistance, I'd be very happy to provide it.

After some additional research I ran across two possible approaches to a solution:
- the first option is to use php's utf8_encode() function.
- the second option uses some code with the PDO using set names (though there seemed to be some discussion as to whether or not this worked for everyone.

    $handle = new PDO("mysql:host=localhost;dbname=dbname",
    'username', 'password', 
    array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));

The following link had some other useful information:

http://akrabat.com/php/utf8-php-and-mysql/

I've tried to track down where the connection to the db is occurring in feeds, but it appears that it is calling some hooks in core. If someone could point me to where in the code the connection to the db is occurring so that I can test inserting the PDO code, I'd be happy to try that myself. Its possible that the issue is in some hook/api in core that feeds is calling, but I'm not savvy enough to identify the location. Thanks for the assistance.

worldlinemine’s picture

Version: 7.x-2.0-alpha3 » 7.x-2.x-dev
Priority: Normal » Major

I'm updating this to a major priority because of the impact on anyone importing data depending on utf8 working as well as switching this to the dev version since it is impacting both 7.x-2.0-alpha3 and 7.x-2.0-dev.

Thanks.

bc24102’s picture

unfortunately I am getting this error as well..

Yurii’s picture

Same error for me

adrien.felipe’s picture

I also have this issue.
Could the module just sanitize all in utf-8?

marcoscano’s picture

subscribe

bb_dogg’s picture

You need to ensure the file you are importing is actually in UTF-8 format, and not ANSI for instance, as it will cause the error described.

A number of text editing tools like UltraEdit, Notepad++ etc. can do the conversion.

In Notepad++ for instance, open you csv-file, select "Encoding" > "Convert to UTF-8" and then save your file.

You should now be able to import it successfully.

worldlinemine’s picture

We are definitely saving it as a utf-8 format. So that is not the issue. Within microsoft products for instance it allows you to save csv or tab delimited files as utf-8.

However, feeds (in our install) is definitely not able to properly handle the diacritics present in our files even when the file is saved as utf-8 format. We are not saving the files as ANSI.

If you review my comments in [http://drupal.org/node/1140194#comment-4477876] I describe things in greater detail. Unfortunately, I saw another potential solution recently, but I have misplaced the link. If it would help, I'd be happy to attach test data which fails to import.

martinduys’s picture

I am having exactly the same issue as @worldlinemine. My csv file is definitely formatted utf-8; but I am still receiving the same error.

martinduys’s picture

I'm really not qualified to make suggestions, so apologies in advance for this; but I came across this question on stackoverflow: http://stackoverflow.com/questions/2615611/utf-8-database-problem , which seems to indicate that there may be an issue with the connection to the database? Could that be it?

I really can't move forward until I have sorted out this issue, but am as stuck as stuck can be, so would appreciate any help that I can get.

ChristopheDG’s picture

I downloaded a feed as UTF-8 and got the same error when importing it.
But then I noticed a '®' in one of the lines. When I removed it, I didn't get any error...

abeger’s picture

@worldlinemine: The utf8_encode solution worked for me. Thanks!

sylvanos’s picture

Hi,

I'm getting the same error with french RSS feeds. Could you explain how the utf8_encode solution worked for you?

Here's a few feeds that will cause the error randomly:

http://toutcru.blogspot.com/feeds/posts/default
http://jeanneemard.wordpress.com/feed/
http://feeds.feedburner.com/ladetentegenerale
http://feeds.adviso.ca/adviso-blogue-interne

using:
7.x-2.0-alpha4
Drupal 7.7

sylvanos’s picture

In case anyone is having this issue, I solved my problem by altering the format of my MySQL tables. Some of them were set to latin1_swedish... probably because we used the module Backup and migrate and a bug is causing the tables to lose their format #1100146: Problems with the input of diacritical characters (PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value). Anyway, i've run a small script to alter all the tables and everything is fine now. Here's the script:

// your connection
mysql_connect("localhost","drupal7","***");
mysql_select_db("drupal7");

// convert code
$res = mysql_query("SHOW TABLES");
while ($row = mysql_fetch_array($res))
{
    foreach ($row as $key => $table)
    {
        mysql_query("ALTER TABLE " . $table . " CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci");
        echo $key . " => " . $table . " CONVERTED
"; } }
ursula’s picture

I had this problem as well with the feed aggregator, and also with just creating a page using cut-and-paste from a site with special characters. I fixed it by first changing the default character set of my mysql installation to utf8, and then fixing all drupal tables. Here is the detailed recipe for the fix using mysql, drush, backup-migrate, and a text editor. I am using MySQL 5.0.
A great post explaining the issue (but not drupal focused) can be found on http://www.bluebox.net/news/2009/07/mysql_encoding

On the mysql prompt:
show variables like 'char%';
Probably returns something similar to the printout below, or a mixture of different encodings (latin1, utf8, and even something swedish).

mysql> show variables like 'char%'; 
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     | 
| character_set_connection | latin1                     | 
| character_set_database   | latin1                     | 
| character_set_filesystem | binary                     | 
| character_set_results    | latin1                     | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+

More about tables in latin1:

(on the mysql prompt using your database):
show table status where collation like '%latin1%';
result: | tablename | MyISAM |      ... cut some columns    | latin1_swedish_ci |     NULL | 

(Yes, there is swedish in MySQL by default!)

First: Fix your mysql installation, so that it always uses utf8:
Edit your my.cnf file (usually in /etc). Add the following lines:

[client]
default-character-set = utf8
[mysqld]
default-character-set = utf8

(If you already have [client] and [mysqld] sections, add the character-set lines to these sections)

Now, restart mysql to set these new variables (in CentOS: service mysqld stop;service mysqld start);
The variables should now look like this:

mysql> show variables like 'char%'; 
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | utf8                       | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+

Next, you'll have to change the character set settings for all of your existing database tables.
Some special characters in existing tables might get screwed up in the process, so it's important to make a backup first.
In my case, the offending tables were all feeds, and I didn't care about these. I did try to create a page once with special characters, before I converted the tables to the correct character encoding, and the insert got rejected with the error message above.

Create a backup of your database:

drush cc all  
drush bb

Optional, add the name of your site as defined in your .drush directory like drush @example.com bb
This should have created a backup in the designated manual backup destination.

Use a text editor to change all instances of latin1 to utf8:

cp yourbackupfile newbackupfile # makes a copy
vi newbackupfile   # starts vi editor with newbackupfile
:1,$s/DEFAULT CHARSET=latin1 /DEFAULT CHARSET=utf8 /g  # substitution of latin1 with utf8
:wq  #saves the file

This substitutes all occurrences of "latin1 " with "utf8 "; You should now search for latin1 in the document, in case you have latin1_swedish_ci in your tables, exchange those, too. I purposely added a space after latin1 to avoid substituting latin1 derivatives. I don't know whether the character set gets dumped using the swedish specification, didn't happen to me.

To check on remaining latin1 occurrences, use grep:
grep -i latin1 newbackupfile
If you get a result, you'd need to fix those lines, too.

Now, load your new file:
mysql -u user -p"yourpassword" yourdatabase <newbackupfile
(or use drush for this. I usually use the commandline mysql, so I haven't tested the drush version)

Check whether the site is working properly. This should do it, and also fix it for future drupal sites in the same mysql installation.

If the site is broken, restore the un-altered backup file, and repeat the procedure while making sure that you didn't accidentally break an insert statement.

rfay’s picture

Thanks for the excellent writeup!

johnbarclay’s picture

For debugging, Keep in mind that you can just stick some of the content you are having trouble with in the feeds/test/feeds files and run the simpletests. Debugging encoding issues is a pain. This will determine if feeds truly has an issue with a character, or if you file, http, or sql datasource isn't encoded well. Just make sure the file is saved as utf8.

harryc’s picture

Wordpad++ conversion to UTF-8 worked for me. Thanks BB Dog

What is weird is that I have been importing all day using the same spreadsheet layout and converting the files to CSV for import and have had no problems.

Then the General Error 1366. Out of no where!

TechNikh’s picture

I had the same issue.
I opened the csv file in note pad and saved as UTF8 and that solved the problem.

zeta1600’s picture

bb_dog... thanks!

AaronELBorg’s picture

nice work, ursula.

wicked helpful.

EDIT: Actually, bb_dog's notepad trick (#10) worked for me. You's all a bunch of ballers!! Grazi.

orb’s picture

SQLSTATE[HY000]: General error: 1366 Incorrect string value ...

Fix for feeds/libraries/ParserCSV.inc

-      while ($currentIndex <= strlen($line)) {
+      while ($currentIndex <= drupal_strlen($line)) {
OnkelTem’s picture

Hi!

Q to the module developers:

Why don't you just add the support for ANSI? How you imagine possible workflow for an ordinary shop, regulary exporting data to their website? Like an old lady clicking EXPORT in a backoffice app, getting a CSV in an native ANSI enc like windows-1251/1252, then she is doing all this magic converting encoding before uploading a file to website, huh? Odd...

IMO, encoding configuration is definitely must be in the Importer's configuration.

UPD. Added encoding coversion support: #1428272: Added support of encoding conversions to the CSV Parser

emackn’s picture

@OnkelTem: While the data contract isn't really set in stone anywhere, issues like this always come seem to crop up. Feeds can't really account for all the different ways data files are created. Why not create a RSS feed on your end and eliminate the need for a manual process.

OnkelTem’s picture

@emackn

Why not create a RSS feed on your end and eliminate the need for a manual process.

Eric, you have too good opinion about office backends :)
Lets take a look on the market of office/book-keeping automation systems in Russia. I cound't find an statistical agencies data, but found voting results on a popular forum klerk.ru:

http://forum.klerk.ru/showthread.php?t=217649&page=7

I believe this results represent real situation in Russia - more or less.

So what we see from the picture:
- "1C" company takes 86% of market (!).
- Инфо-бухгалтер (Info-buhgalter) takes 4 % market. I'm mentioning it only because my current project where I use Feeds gets exported data exactly from this system.

Ok, so what are the 1C 7.7 (which allocates 55% of market) and 1C 8.x?

There is no point to mention, that they work on Windows platform only. Several years ago I read an article denoting that Microsoft should actually pay rewards to 1C, since the only reason why companies in Russia have to buy Windows - is to run 1C.
Let me tell you now what 1C is.
Most popular version is 1C 7.7 as you see. It is an old-school crap still natively using DBF for storing data! Yeah, they can also use MSSQL, but this support looks like a hack.
1C is built from/and works on scripts on a language they invented themselves. It has limited set of functions, monstrous Basic-like syntax, brainfucked functioning model and nobody can really extend it, like we do with Drupal for example. Closed-source proprietary shit in other words. Should I mention that there is NO UNICODE and never been? No encoding conversions - nothing like that. So you can export data either in CP-1251 or CP-866.
Making an RSS feed?
I believe you already start getting the big picture of the game called IT in Russia :)
I ask myself - what piece of software would feed it? 1C itself is like a black box, nothing out nothing in. It can't even do anything by cron; It can't open ports; it has no TCP/IP server-side which can be used by non-1C client software. So without hacking and using 3-rd party software, its not so much you can with it.

1C 8.x is another story. As far as I know, they have reworked it, so now its better. They even got a feature to open ports on the computer! Good chance of winning, is it? :)

So, Eric, we are bound to use this crap. The most easy way to do things - is to ask a 1C programmer (servicing an 1C setup in a company) - to add export feature to CSV in WINDOWS-1251.

emackn’s picture

Status: Active » Fixed

@orb - committed your patch.

derhasi’s picture

@emackn, the patch of #25 does not solve the problem. A better approach is #1428272: Added support of encoding conversions to the CSV Parser which needs review.

Summit’s picture

Hi, Notepad UTF8 worked for me. Thanks!
greetings, Martijn

susheel_c’s picture

Version: 7.x-2.x-dev » 7.x-2.0-alpha4

Thanks for all the info. I'm using alpha 4 and ran into the problem a few hours ago.

I've used node import before and remembered that there was always trouble with line endings with data exported as a .csv from Microsoft Word (windows or mac). So we used to convert line endings using Notepad++ or Textmate.

I decided to try the same approach here, and when I got the error, saved the file I was importing as UTF-8 with LF line endings in Textmate. Voila! No trouble importing the .csv file.

Thanks a ton again.

bernabsb’s picture

Component: Code » Feeds Import
Status: Fixed » Closed (fixed)

I had the same error when importing in Portuguese language to my site.

Since this language uses characters like ç, ã, é, ê, etc... I did as follows (worked fine for me):

Save data as csv ms-dos;
Open file with TextWrangler software;
Save as:
Line Breaks -> Unix (LF)
Encoding -> Unicode (UTF-8)

Then when importing it worked as expected.

Thanks you all!!!

Bernardo.

wuinfo’s picture

Version: 7.x-2.x-dev » 7.x-2.0-alpha4
Priority: Normal » Major
Status: Fixed » Closed (fixed)

It is working for me. Using Notepad to convert encode to UTF8.

OnkelTem’s picture

Version: 7.x-2.0-alpha4 » 7.x-2.x-dev
Priority: Major » Normal
Status: Closed (fixed) » Fixed

@bernabsb

I believe its more up to developer to resolve a bug fixed / closed(fixed).

Version: 7.x-2.0-alpha4 » 7.x-2.x-dev
Priority: Major » Normal

Automatically closed -- issue fixed for 2 weeks with no activity.

KingMoore’s picture

subscribinate

rfay’s picture

@KingMoore, we've stopped using "Subscribe" comments (finally). There's a "follow" link at the top right of every issue. Please use that instead. It saves an incredible number of interruptions (via email) to an incredible number of people.

Thanks!

KingMoore’s picture

@rfay thanks and sorry for ruining everyone's day/productivity

bensnyder’s picture

@ursula thanks! saved the day!

thetech249’s picture

I had the same problem I just added a "title" column to the csv and left it blank work fine after that

GiorgosK’s picture

saving my feed (csv) as UTF8 got rid of the problem

MXT’s picture

I can confirm that converting source CSV to UTF8 resolve the problem.

calebtr’s picture

This thread is very helpful - thanks ursula and all.

The Feeds Tamper module lets you modify fields before writing to the database. It is pluggable.

For a data source I can't control, I wrote a plugin for Feeds Tamper that converted my fields to UTF-8.

Right after I did that, I noticed that someone had already contributed a better patch (that would apply to more than just my use case) in the Feeds Tamper issue queue - http://drupal.org/node/1817516.

ayalon’s picture

Here is my settings

#Drupal Fix 3
#http://drupal.org/node/1140194
#http://www.farinspace.com/utf8-in-mysql/
init_connect='SET collation_connection = utf8_unicode_ci; SET NAMES utf8;'
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_unicode_ci
skip-character-set-client-handshake

Last Line was necessary on RHEL

OnkelTem’s picture

@calebtr

While this can probably be done at Feeds Tamper side, I personally consider fixing charset as a task of another level - it should be done by Feeds, since Feeds Tamper works with fields while all your data can be in wrong encoding. I barely imagine a case when you need to convert Field A, but not Fields B and C.
The real solution would be to fix input data encoding as it was done by this patch - #1428272: Added support of encoding conversions to the CSV Parser. It should be updated to work with latest Feeds though.

dmegatool’s picture

sumaiyajaved’s picture

Component: Feeds Import » Code

Hi, Notepad UTF8 worked for me. Thanks! Martijn

purabkharat’s picture

I am thinking to add the patach.
I am thinking convering the whole txt file to UTF-format and parsing the CSV is not good idea. Instead of that you can use following code.
It issue happen when character set is not in UTF-8 format.
If you got the error while using the db_insert then you can use the following code
$val = mb_check_encoding($val, 'UTF-8') ? $val : utf8_encode($val);

if it is just matter of some field values.

Drupalor Developer

purabkharat’s picture

I am thinking to add the patch.
I am thinking converting the whole txt file to UTF-format and parsing the CSV is not good idea. Instead of that you can use following code.
It issue happen when character set is not in UTF-8 format.
If you got the error while using the db_insert then you can use the following code
$val = mb_check_encoding($val, 'UTF-8') ? $val : utf8_encode($val);

if it is just matter of some field values.

Drupalor Developer

Rosamunda’s picture

Hi there,
I´ve tried every solution:
Saved as utf-8 with notepad++
Checked the database, and everything is utf8_general_ci

I can´t upload the file, these are the errors:

SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\x82nez J...' for column 'title' at row 1
Error message SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\x82nez J...' for column 'message' at row 1

FYI, there´s NO "message" field in the imported file, nor in the destination node.

Any other possible solution?
Thanks!!
Rosamunda

Rosamunda’s picture

Status: Closed (fixed) » Active

Forgot to reopen this one... I think it´s better than opening a new issue...?

Rosamunda’s picture

Status: Active » Closed (fixed)

I must tell you that this patch, and not any of the other solutions, worked for me.
Just in case anyone wonders...

:)

BrightBold’s picture

Thanks Rosamunda. That patch worked for me too.

mitchseiden’s picture

@worldlinemine: The textpad utf8_encode solution worked for me. Thanks heaps

Nacho_css’s picture

Issue summary: View changes

#10 worked for me as well. Just changed encoding to UTF-8 using "TextWrangler". Thanks!

kaczorski’s picture

I resolve my problem with this issue in that way:
(it is mix of upper gived helps, especial thanks to ursula)
1. do backup to sql file
2. in backup sql file find&replace for all invoking of pairs:
- latin1_general_ci -> utf8_unicode_ci
- latin1 -> utf8
3. restore database from that fixed backup file

I had faced with this problem when I updated my drupal from 7.11 to 7.28.

intoonactive’s picture

#10 worked for me.

marchellodepello’s picture

thanks TechNikh ad #22.
Save with wordpad on utf8 did it.

aniket.mohite88’s picture

Comment #10 Helped me as well.

Acertijo’s picture

thank you #10

hughworm’s picture

utf8_encode() solution from #3 works for me. Thanks!

jaesperanza’s picture

Before doing all suggested, do #10 first :) notepad++ did the trick, sheesh! this saves you all the trouble and reading! :)

frederico’s picture

#10 worked for me too. Thanks bb_dogg!!! -Frederico

n_nelson350’s picture

#10 worked for me....Thanks bb_dogg

Jeff Bullock’s picture

Thank you to everyone above for your posts. I saved my csv file with Notepad and specified the encoding as UTF-8. That worked for me.

hughworm’s picture

The solution given in #10 might be fine for developers but I can't give it my end customer as a satisfactory solution, especially if the source is a feed. A proper fix is needed that preserves the text characters. Simply adding something like #49 seems to work, though there may be other complications relating to SQL table formats.

hockey2112’s picture

#10 worked for me. Thanks!

drummondf’s picture

#10 worked for me as well, thank you!

A note to people who may still be having trouble with this - if you go to import a csv file with the same name as the one that is already loaded and returning failures it will continue to use the old file. You need to rename the file or clear the old one out in order for it to trigger the new file.

Cheers!

lahode’s picture

Why not using www.drupal.org/project/feeds_tamper_php
and
return utf8_encode($field);
Works fine for me

deepsfriend4u’s picture

Thanks lahode,
feeds_tamper_php module is works fine for me also.

achekulaev’s picture

If you came here looking for the solution for the problem and you have JSON feeds you're importing then see https://www.drupal.org/node/1824506

girideepa’s picture

Thats true, opening the file in notepad and saving as utf-8 solved the problem
thanks

kobee’s picture

#70 feeds_tamper_php worked for me as well, thank you!

nithinkolekar’s picture

@Maintainers of Feeds
this issue should be linked in module's page , notifying user to save txt/csv files in upf-8 encoding.

RAWDESK’s picture

conversion to utf8 fixed my issue too

sachinwable’s picture