Last updated January 13, 2010. Created on June 9, 2004.
Edited by Heine, ronald_istos, LeeHunter, Steven. Log in to edit this page.

Several people have asked how to specify the character encoding that Drupal uses. The short answer is: you can't.

Drupal uses UTF-8 for encoding all its data. This is a Unicode encoding, so it can contain data in any language. You no longer need to worry about language specific encodings for your website (such as Big5, GB2312, Windows-1251 or 1256, ...). Also, when Drupal imports external XML data (such as RSS or XML-RPC), it is automatically converted into UTF-8 (iconv support for PHP will be required for most encodings).

If you really want to change Drupal's encoding, you will experience a lot of troubles, because of the various ways Drupal can receive and send out data (web, e-mail, RSS, XML-RPC, etc).

Looking for support? Visit the Drupal.org forums, or join #drupal-support in IRC.

Comments

GaryWong’s picture

Hi

Nice explanation of why there 's no need to fuss with Drupal's character encoding, but what about the underlying data?

For example, if I'm using mySQL v5.5 and I want to support French accented characters, shouldn't I use 'default character set utf8 default collate utf8_general_ci' (as described in http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html) when I create the database?

Just wonderin',
TIA

Gary

---
Victoria, BC
Canada

philsward’s picture

Drupal uses UTF-8 for encoding all its data.

Receiving some "drupal Warning: session_start() [function.session-start]: Cannot send session cache limiter - headers already sent" on the common.inc file, led me to a page that mentioned a file might not be encoded properly which led me to check the encoding of common.inc and it's showing up as: charset=us-ascii along with 90% of the other inc files...

I thought maybe it was my FTP transfer which come to find out was set to change the encoding to ASCII if the extension was .inc so I disabled that and tried again. Same encoding... Then, I decided to wget the tar file straight from Drupal, extracted it and I'm still getting the same charset=us-ascii on 90% of the .inc files.

So... does this post still ring true for D7 when the tar'd files straight from the source aren't encoded as UTF-8? Or has someone screwed up on the file encoding and nobody's caught it over the last handful of D7 updates?

Confused... o_O

Tiaan’s picture

@philsward: I believe the info is still accurate. The files you are referring to are probably all using only a small subset of UTF, such that they can be said to be encoded as both US-ASCII and UTF-8 at the same time. Since UTF-8 was designed to be a superset of US-ASCII (7-bit), the distinction really only matters when non-ASCII characters are used in a file. For many code files, the distinction does not really matter.