Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Chi created an issue. See original summary.

petrovnn’s picture

\core\lib\Drupal\Component\Transliteration\data\ru.php

<?php

/**
 * @file
 * Russian transliteration data for the PhpTransliteration class.
 */

$overrides['ru'] = [
  0x42F => 'JA',
  0x44F => 'ja',
];
andypost’s picture

andypost’s picture

Version: 8.4.x-dev » 8.6.x-dev
Murz’s picture

Status: Active » Needs review
FileSize
740 bytes

The problem is not only with 'я' symbol, but also for some other characters: ё, ж, й, х, ч, ш, щ, ъ, ы, ь, ю, я.
This already was fixed in 7.x version of Transliteration, but seems lost when porting: #357254: Transliteration of Russian letters

So I extend the list with all needed overrides, patch is attached.

Murz’s picture

Also we must note, that transliteration rules from ru.php file works only when Drupal interface language, so if we try to transliterate Russian word in non-Russian interface language - this rules will not applied.

So we can apply this rules globally, using /core/lib/Drupal/Component/Transliteration/data/x04.php file - this way will work with any interface language and do correct transliteration of all Russian characters.

Murz’s picture

Here is another patch, that fix problem with Russian symbols transliteration globally, not only for Russian interface language.

andypost’s picture

I bet we should fix both!

Murz’s picture

Here is combined patch that fix the problem globally in both files.

andypost’s picture

Assigned: Unassigned » amateescu
Status: Needs review » Reviewed & tested by the community

It looks great for me! Much more natural to parse.
Assigning to Maintainer!
Looking at #2926187: Better Greek transliteration probably this require @xjm to commit

amateescu’s picture

Assigned: amateescu » Unassigned
Status: Reviewed & tested by the community » Needs work

We only need the overrides if a character needs to be transliterated differently in a specific language. See #567832-52: Transliteration in core and the next comment for a similar question and answer.

Murz’s picture

Ok, so here is a patch that fix errors only in default values, without touching overrides.

Murz’s picture

Status: Needs work » Needs review
amateescu’s picture

Assigned: Unassigned » xjm
Status: Needs review » Reviewed & tested by the community

Nice, the patch looks good to me. Passing over to @xjm :)

andypost’s picture

Would be great to see this backported to 8.5

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 12: transliteration-ru_2932249_12.patch, failed testing. View results

Mixologic’s picture

Status: Needs work » Reviewed & tested by the community

Temporary testbot hiccup.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 12: transliteration-ru_2932249_12.patch, failed testing. View results

andypost’s picture

Status: Needs work » Reviewed & tested by the community
xjm’s picture

Issue summary: View changes
FileSize
352.47 KB

Спасибо.

Вот цветовая разница:

Сейчас я смотрю на юикод (это правильно?). Минуточку...

xjm’s picture

Status: Reviewed & tested by the community » Needs review

Okay I read over the diff carefully looking at the order of the actual characters in:
https://en.wikipedia.org/wiki/List_of_Unicode_characters#Cyrillic

Everything in the 0x10 through 0x40 (Russian) looks correct. The other rows appear to be Ukranian which I don't read or speak; does anyone else here on the issue? I can try to read up on it but that will take more time. :)

One small question I had about the Russian transliteration. Ц seems to be transliterated as "c". Is that normal/what Russians use when transliterating? As an anglophone I would phonetically write it as "ts".

Thanks!

xjm’s picture

Issue summary: View changes
FileSize
236.56 KB
543.25 KB

Hm, both seem to be used a lot, with "c" about twice as frequent as "ts". "Cvety" does give me pictures of flowers though whereas "tsvety" seems to be about some rock band. :)

So looks like (my phonetic assumption nonwithstanding) it is usually "c". So ignore my final question; just the "help please with Ukranian review". :)

xjm’s picture

Status: Needs review » Reviewed & tested by the community

Ah, the only thing changed in the Ukranian rows is ё which is missing from the alphabetical order of the Russian, so I think this is correct.

Back to RTBC. I'll probably commit this later today.

xjm’s picture

Title: Incorrect transliteration of some cyrillic characters » Incorrect transliteration of some Russian cyrillic characters

Retitling since I don't think we reviewed the rest of the Ukranian. :)

Chi’s picture

"tsvety" seems to be about some rock band. :)

On your screenshot that band is also referenced as "The flowers".

'c' and 'ts' are used interchangeably. I propose we stick to 'ts' as we did it in Drupal 7.

Chi’s picture

I propose we stick to 'ts' as we did it in Drupal 7.

Never mind, Drupal 7 actually uses 'c'.

andypost’s picture

This fix is for common Russian translit
Ц mostly used as C (traditionally) but sounds like "ts" (Tsar)
The same applies to Ч used as ch but most English speakers pronounce it like "tsh" (probably because they listen it more softer then native "change")

xjm’s picture

Title: Incorrect transliteration of some Russian cyrillic characters » Incorrect transliteration of some Russian Cyrillic characters

Oops, looks like I forgot to come back to this issue. :)

Thanks @andypost and @Chi, makes sense.

Fixing title capitalization and saving issue credit. I thought about whether this might just be a normal bug, but even the А vs. Я by itself is pretty disorienting, so I've kept it as major.

  • xjm committed cbac738 on 8.6.x
    Issue #2932249 by Murz, xjm, andypost, Chi: Incorrect transliteration of...

  • xjm committed 28a5c91 on 8.5.x
    Issue #2932249 by Murz, xjm, andypost, Chi: Incorrect transliteration of...
xjm’s picture

Version: 8.6.x-dev » 8.5.x-dev
Assigned: xjm » Unassigned
Status: Reviewed & tested by the community » Fixed

Committed and pushed to 8.6.x. Thanks! I also backported it to 8.5.x as a major bugfix.

Anonymous’s picture

#20: ❤️ xjm по-русски, балдеж!

#25: This would help to eliminate a lot of illiterate mistakes, eg: "буцы/бутсы -> butsy". But I completely agree with #26/ #27. For example on the site http://translit-online.ru/ you can get 240 different combinations, and this is not the limit 😱 So it's better to focus on one popular option.

Personally for me the most controversial is the й -> y instead of j. But after long and painful arguments inside of me, I agree that the y is preferable 🙏🏻

#28: Absolutely, I had a rather amusing embarrassment when on the page with the names of employees, the name "Яна (Янина)" was displayed as "Ana". Given that these are two different female names. And "Ана" is also written illiterate (right "Анна") 😯

Now traslit works fine! Great thanks!

xjm’s picture

Adding credit for @amateescu as well for the review in #11 (thanks @amateescu)!

  • xjm committed 5fa7932 on 8.5.x
    Revert "Issue #2932249 by Murz, xjm, andypost, Chi: Incorrect...

  • xjm committed 861fa35 on 8.6.x
    Revert "Issue #2932249 by Murz, xjm, andypost, Chi: Incorrect...

  • xjm committed a4ee1b1 on 8.6.x
    Issue #2932249 by Murz, xjm, andypost, Chi, amateescu: Incorrect...

  • xjm committed 8cb7dfb on 8.5.x
    Issue #2932249 by Murz, xjm, andypost, Chi, amateescu: Incorrect...
xjm’s picture

(The revert and recommit is to add @amateescu to the commit message.)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.