Hi -

I've installed the Internationalization module and enabled the 'Language Switcher' block.

I'm able to translate a Page into languages using the "normal" Latin alphabet (German, French, Spanish, etc.) and use the Language Switcher to display the Page in the desired language.

However, in the case of Russian this is only partially working . If I create a simple Page having:

Title: Добро пожаловать!
Body: Добро пожаловать в mydomain.com !

the the browser displays the Page as:

Title: ???? ??????????!
Body: ???? ?????????? mydomain.com !

This is happening even though my browser's View > Encoding is already set to "UTF-8" (Unicode).

As a partial workaround, I found a site which converts Cyrillic characters to HTML entities:

http://jotpuree.com/utils/encodeCyrillicInHtml.php

So I pasted "Добро пожаловать в" into the field on this site, and it returned:

&# 1044;&# 1086;&# 1073;&# 1088;&# 1086; &# 1087;&# 1086;ж&# 1072;л&# 1086;&# 1074;&# 1072;&# 1090;&# 1100; &# 1074;

Note: For display purposes here in this Drupal forum post, a space has been added after every # character. Otherwise, this would display as "Добро пожаловать в" rather than letting you see the HTML entities! To change this back to actual HTML entities, remove the space after each # character!

I can then paste these HTML entities into the (beginning of the) Body of my Page so that the Body displays correctly, as:

Body: Добро пожаловать в mydomain.com !

However, this workaround doesn't work for the Page's Title field. I assume this is because the Page's Title field is being parsed as TEXT, not as HTML.

It's also interesting to note that the label for "Russian" in the 'Language Switcher' block does display properly, as:

Русский

I understand that Drupal fully supports UTF-8 (Unicode) by default - and I assume you're seeing the Russian characters included in this Drupal forum post!

So it seems that Drupal can handle Cyrillic (Russian) characters, at least in the Body of a Page, and in the 'Language Switcher' block.

I am confident there must some way to make not only the Page Body and the Language Switcher label ("Русский") but also the Page Title display in Russian.

Thanks!

Comments

cog.rusty’s picture

Do you have Drupal's .htaccess file, and does it contain these lines?

  php_value mbstring.http_input             pass
  php_value mbstring.http_output            pass
  php_value mbstring.encoding_translation   0

Does that happen with new content, or only with content migrated from another Drupal installation?

What MySQL version? Is it 4.0 or older, or 4.1 or newer?
If it is a newer MySQL version, check your database with phpmyadmin to see if the default character set and collation of the tables, their columns, and the database itself is utf8_general_ci (or even utf8_unicode_ci, if that is consistent).

Also try to set AddDefaultCharset utf-8 or AddDefaultCharset Off in .htaccess (I don't expect this to help, but anyway...)

j4’s picture

Dear cog.rusty,

I am also having the same problem. I have huge texts in Russian to add, so even though the html code converter suggested does work, I cannot get my client to use it as Drupal will no longer be a CMS site then!! Please help. I checked my .htaccess file, the three lines given above are there. MySQL version is 5.0.51a-community. Default character set adn collation are utf_unicode_ci.
Have not tried AddDefaultCharset. I am really in soup. Also some of my primary links disappear when I switch over to Russian language using the switcher. Eagerly awaiting an early reply as I have my first review with the client this weekend. I am using Drupal 6.
Thank you
Warm regards
jaya

Jaya

cog.rusty’s picture

What Drupal version?
Do the question marks appear when you are viewing a node in Drupal, or only when using some custom database query?

Check the character set and collation specifically of the node_revisions table and its individual columns.
Check the text in the node_revisions table. Has it already become question marks in the database, or is it stored correctly and the problem happens when Drupal displays it?

How are you adding the texts?
From what format are you copying them? Are they plain text?
Is the Drupal text area where you are pasting the texts a plain text area or some wysiwyg editor (and which one)?

Is the iconv() function available in your PHP? What is the result of this php code:

$check_iconv = function_exists('iconv') ? "iconv() exists" : "iconv() doesn't exist";
print $check_iconv;
j4’s picture

Dear cog.rusty,

First I want to thank you for the lengthy questionnaire. Now I knwo my problem will get solved!! :-)
My answers are in italics:
What Drupal version?

Drupal 6.6

Do the question marks appear when you are viewing a node in Drupal, or only when using some custom database query?

When viewing the node

Check the character set and collation specifically of the node_revisions table and its individual columns.
Check the text in the node_revisions table. Has it already become question marks in the database, or is it stored correctly and the problem happens when Drupal displays it?

Node revisions table has question marks. And the collation for node revisions is all latin1_swedish_ci

How are you adding the texts?
From what format are you copying them? Are they plain text?
Is the Drupal text area where you are pasting the texts a plain text area or some wysiwyg editor (and which one)?

Am copying them from plain text from a word doc sent by the client. Am pasting them into the "body".

Is the iconv() function available in your PHP? What is the result of this php code:

I really dont know in which php to find this...:-(

I will correct the node revision table collation and get back to you with the status.

Once again thank you very much.

Warm regards
jaya

Jaya

j4’s picture

The node revision tables collation did the trick. Thank you so much. The drupal support system is really wonderful.

Warm regards
jaya

Jaya

thelittlefrenchy’s picture

So ... I have the same problem
What did you do? just change one table "node_revisions" to UTF8?
so simple and i am sitll lost? :)

what happens with the text in other languages, any impact, data loss if i do this?

Thanks for sharing the solution
Drupal looks nice, and the community even nicer, so I am trying to improve ;)

cog.rusty’s picture

You must describe your situation with more details. Do you have "text in other languages" (what languages) or do you have question marks? What MySQL version?

If you have MySQL 4.1 or newer and you are getting question marks for all non-English languages, then you must change the charset/collation of all the tables and/or individual columns which have latin1/latin1_swedish_ci to utf8/utf8_general_ci. Also you may need to change the default charset/collation of the database itself. If one non-English language is OK but other languages are not, then you may need to do some conversion first, of the language which works. If you have an older MySQL version, then none of this applies and you may need to follow a more complicated procedure.

apthorburn’s picture

Hi,

Just wanted to say that this post was excellent and saved me a long evenign of work. I was in the process of moving a site from one server to another and I hit the snag of foreign language characters not displaying as expected (they did display on correctly on the other server).

For the two tables nodes_revisions and languages I reset the Collation to utf8_general_ci and also did this for the schema just be sure. It all worked a treat after a quick reload of the data.

Thanks again

Andrew