I am running a multilanguage drupal site with cyrillic (russian), german and english content and had permanent problems with cyrillic characters.

Drupal uses UTF-8 which is hardly coded at 12 places in the the common.php and some of the core modules. Because there are at teast three additional places with charset settings (apache, database and the OS itself), I had permament problems to submit cyrillic content (only internet explorer from MS worked properly). When you are backing up your content to a mirror server, theese problems multiply (see my comments concerning postgres backup and restore at http://drupal.org/book/view/434). The solution was to switch to an 8 bit (cyrillic) charset.

You would greatly help me to switch from release to release by putting a central/general charset setting in the conf.php file.

Of course, the best solution is to properly configure the server OS, the apache and the drupal database. But what can you do, if you have no admin rights at the sever ?

Is there a better (less invasive) solution for my problem ?

Comments

Steven’s picture

Changing Drupal's character set is not as easy as flicking a switch: there are many issues with sending out mail, importing/exporting RSS feeds, etc.

We chose UTF-8 because it includes every possible character set, and because now, conversion to UTF-8 is relatively easy (if you have iconv available, Drupal will use it). UTF-8 is also much more robust than some other encodings (such as SJIS or Big5) where substring matching and such can be problematic.

There are several Drupal sites which use UTF-8 encoded cyrillic without any issues whatsoever.

Try turning off Drupal's cache. If it suddenly does work with UTF-8,
edit includes/common.inc and change the line:

header("Content-Type: text/html; charset=utf-8");

into

drupal_set_header("Content-Type: text/html; charset=utf-8");

Then you will be able to use cache with UTF-8. This was a known issue with certain server configurations and has been fixed in CVS.

aam’s picture

Thanks for your tips Steven, I will try drupal_set_header().

But I do not use drupal cache yet. Even I will solve the problem with submitting cyrilic content there is a problem with database backup and restore on the command line (through php in my case).

Have the consolefont on the server and the database font setting have UTF-8 too to work properly?

Steven’s picture

Stick to UTF-8, once and for all. Trust me, it's for the best.

I'm not a unixhead, but normally console stuff shouldn't have an effect on the actual data, though you will need a UTF-8 locale for the characters to show up correctly.

If you need to convert legacy data, use iconv.

aam’s picture

After adding the CharsetSourcEn=uft-8 line in drupal's .htaccess file it works with both code variants: header() and drupal_set_header().

Thanks! Now I will have a lot of work on moving all cyrrilic content to uft-8...

killes@www.drop.org’s picture

I have been myself advocating this kind of approch, but it just causes too much trouble. Try to find out whay cyrillic input is failing with utf-8.

aam’s picture

My website is hosted on a server with "russian apache", a special version of apache, where some settings are hard coded into the source code. One of this settings is the serverside chaacter set, which is not overridable by <meta http-equiv="Content-Type" CONTENT="text/html; charset=iso-8859-1"> setting in the html file.
Because of this two settings the result depends on your browser:

  • Microsoft IE uses first the meta tag (if it is included in the html file) and then looks for a server setting.
  • Mozilla uses the serverside setting first, an then the meta tag (when there are no server settings).

The solution to my problem is the following setting in the .htaccess file:

CharsetSourceEnc utf-8

May be it's a good idea to include this setting generally into the .htaccess shipped along with drupal.

aam’s picture

...after all of this talk I must say, that the CachsetSourceEnc setting seems to have no matter anymore.

May be there were significant changes in drupal from version 4.2.0 to 4.4.1: I was forced to use the 8-bit cyrillic charset windows-1251 with an older release of drupal.