This thread is for discussing transliteration of strings for core. I am well aware of the controversity of this topic, however, there are perfectly valid use cases for transliterated strings:
- as column or table name to stay compatible with all DBMS
- as value for the
id
orname
attribute of HTML elements
Currently, for example CCK simply drops non-ascii characters which leads to names like field___
for non-ascii langauges like Russian, Japanese and so on. Similarly, if the content of id
attributes is created from user input text, invalid IDs are generated because no alteration of the text occurs. HTML’s (and XHTML’s!) DTD specify that ” only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used” (http://www.w3.org/TR/xhtml1/#C_8).
Comments
Comment #1
smk-ka CreditAttribution: smk-ka commentedAnother use case for transliterated strings:
NB: I had the same issue with image.module uploads in general and implemented a solution based on the PHP UTF-8 project.
Comment #2
JirkaRybka CreditAttribution: JirkaRybka commentedAnother related issue: http://drupal.org/node/43505
Discuss cleaning of filenames at upload.module teritory, including a few people stating that file uploded with native non-english name might be completely impossible to open/download on some other platforms. Also there's my lame attempt at conversion of latin('english') alphabet matching chars (assuming that it'll do for related languages and there's nothing to lose for others; better solution would be of course welcome, unless too costy performance-wise).
Comment #3
smk-ka CreditAttribution: smk-ka commentedThere is a new module that takes care of transliteration and cleaning of filenames:
Transliterate filenames
It is based on the solution that I mentioned in #1, and is able to process generic file uploads.
Comment #4
drewish CreditAttribution: drewish commentedhow is this not a duplicate of http://drupal.org/node/110972 ?
Comment #5
apadernoI would think that a reply about this topic has been already given in #110972: Implement drupal-standard transliteration . I don't see any reason to continue to debate about something that has already had a clear answer.
Comment #6
apaderno