Microsoft for some reason decided not to use unicode for its code pages for foreign character sets. This causes problems in converting file names to unicode.

Windows is using code page 1255 for Hebrew. See the link above for a complete list.

I was creating my own data loading program to load lots of products into Ubercart. I'm creating a jewish music store and many files have hebrew names. I had problems getting Hebrew to work after doing the conversion manually using $name = transliteration_get($name);
till I fond the following function. I don't remember where I saw it.

function win-heb_to_unicode($heb) {
$utf = preg_replace("/([\xE0-\xFA])/e","chr(215).chr(ord(\${1})-80)",$heb);
return $utf;   
}

If you know of any other functions to convert windows code pages please add them to this page.

I wonder if there would be a way to figure out the code page and do this automaticly.

Comments

smk-ka’s picture

The documentation to transliteration_get() says that the input must be UTF-8 encoded. If it isn't, it is your job to convert it, and AFAIK there is nothing that can be automated, since the code page only tells you how to interpret the 255 characters it contains, e.g. as latin or cyrillic or hebrew. Since the 255 possible character codes are always the same, there is the need to "switch" the code page to match the desired interpretation.

And regarding the conversion, wouldn't the following also work: drupal_convert_to_utf8($heb, 'CP1255') ?

amateescu’s picture

Status: Active » Closed (fixed)

Looks like this issue was answered in #1.