Transliteration provides a central transliteration (romanization) service to other Drupal modules, and cleans file names during upload by replacing unwanted characters.

Generally speaking, Transliteration takes Unicode text and tries to represent it in US-ASCII characters (universally displayable, unaccented characters) by attempting to transliterate the pronunciation expressed by the text in some other writing system to Roman letters.

According to Unidecode, from which most of the transliteration data has been derived, "Russian and Greek seem to work passably. But it works quite bad on Japanese and Thai."

In Drupal 8 core

Transliteration functionality is now part of Drupal 8 core. See the Transliteration change notice for details.

The rest of this page describes the Drupal 7 Transliteration contributed module. Note that the Core transliteration functionality in Drupal 8 Core does not include any configuration options, update screens, or the like. Only the Third Party Integration and Language-Specific Replacements sections below are somewhat relevant, but see the change notice referenced above for details on how to use the Drupal 8 transliteration service's equivalents.

Install

Install the module in the usual way

If you are installing to an existing Drupal site, you will likely want to fix existing file names after installation, which will update all file names containing non-ASCII characters. However, if you have manually entered links to those files in any contents, these links will break since the original files are renamed. Because of this, it is a good idea to test the conversion first on a copy of your web site. You'll find the retroactive conversion at Configuration and modules >> Media >> File system >> Transliteration.

Configure

This module doesn't require special permissions.

This module can be configured from the File system configuration page (Configuration and modules >> Media >> File system >> Settings).

  1. Transliterate file names during upload: If you need more control over the resulting file names you might want to disable this feature here and install the FileField Paths module (http://drupal.org/project/filefield_paths) instead
  2. Lowercase transliterated file names: It is recommended to enable this option to prevent issues with case-insensitive file systems

Third-party integration

Third party developers seeking an easy way to transliterate text or file names may use transliteration functions as follows:

if (function_exists('transliteration_get')) {
  $transliterated = transliteration_get($text, $unknown, $source_langcode);
}

or, in case of file names:

if (function_exists('transliteration_clean_filename')) {
  $transliterated = transliteration_clean_filename($filename, $source_langcode);
}

Note that the optional $source_langcode parameter specifies the language code
of the input. If the source language is not known at the time of transliter-
ation, it is recommended to set this argument to the site default language:

  $output = transliteration_get($text, '?', language_default('language'));

Otherwise the current display language will be used, which might produce
inconsistent results.

Language-specific replacements

This module supports language-specific variations in addition to the basic transliteration replacements. The following guide explains how to add them:

  1. First find the Unicode character code you want to replace. As an example, we'll be adding a custom transliteration for the cyrillic character 'г' (hexadecimal code 0x0433) using the ASCII character 'q' for Azerbaijani input.
  2. Transliteration stores its mappings in banks with 256 characters each. The first two digits of the character code (04) tell you in which file you'll find the corresponding mapping. In our case it is data/x04.php.
  3. If you open that file in an editor, you'll find the base replacement matrix consisting of 16 lines with 16 characters on each line, and zero or more additional language-specific variants. To add our custom replacement, we need to do two things: first, we need to create a new transliteration variant for Azerbaijani since it doesn't exist yet, and second, we need to map the last two digits of the hexadecimal character code (33) to the desired output string:
    $variant['az'] = array(0x33 => 'q');
    List of language codes
    Any Azerbaijani input will now use the appropriate variant.
    Also take a look at data/x00.php which already contains a bunch of language specific replacements. If you think your overrides are useful for others please file a patch.

Resources

Comments

Van'Denis’s picture

This module can be configured from the File system configuration page (Configuration and modules >> Media >> File system >> Settings).

This module can be configured from the URL aliases configuration page (Administration >> Configuration >> Search and metadata >> URL aliases)
Drupal 8.9.12