Transliteration causes 2 capital letters at the beginning of a word [#3000630]

Comment	File	Size	Author
#48	3000630-48.patch	4.81 KB	krzysztof domański
#45	3000630-45.patch	4.81 KB	krzysztof domański
#45	3000630-43-45.txt	888 bytes	krzysztof domański
#43	core_3000630-43.patch	4.74 KB	krzysztof domański
#40	core_3000630-40.patch	7.66 KB	krzysztof domański
#40	interdiff-28-40.txt	7.33 KB	krzysztof domański
#28	drupal-transliteration-causes-2-capital-letters-3000630-14.patch	4.33 KB	scott_euser
#20	core_transliteration_unknown_unicode_test.txt	1016 bytes	krzysztof domański
#14	drupal-transliteration-causes-2-capital-letters-3000630-14.patch	4.33 KB	scott_euser
#14	interdiff-3000630-12-14.txt	4.23 KB	scott_euser
#12	drupal-transliteration-causes-2-capital-letters-3000630-12.patch	2.76 KB	scott_euser
#12	interdiff-3000630-9-12.txt	3.78 KB	scott_euser
#9	drupal-transliteration-causes-2-capital-letters-3000630-9.patch	2.98 KB	scott_euser
#9	interdiff-3000630-7-9.txt	2.1 KB	scott_euser
#7	drupal-transliteration-causes-2-capital-letters-3000630-7.patch	2.34 KB	scott_euser
#5	drupal-transliteration-causes-2-capital-letters-3000630-5.patch	1.28 KB	scott_euser

Comment #1

18 September 2018 at 15:04

APolitsin created an issue. See original summary.

Log in or register to post comments

Comment #2

andypost

he/him

Russian

commented 18 September 2018 at 16:14

Title:	2 capital letters at the beginning of a word	» Transliteration causes 2 capital letters at the beginning of a word
Issue summary:	View changes
Issue tags:	-russian	+Needs tests

Now sure how to fix that right, looks it needs special processing to "camelize" uppercased letters

Log in or register to post comments

Comment #3

scott_euser commented 22 September 2018 at 13:19

Status:

Active

» Needs review

This could use some feedback - I think this is the most appropriate way, but perhaps someone has a better idea:

Check that the string is mixed case, otherwise leave as is (eg, all caps remain all caps, lowercase remain all lowercase)
If so, if the transliteration of a single character results in multiple characters AND the original character was uppercase, ucfirst it

Patch attached. Tests needed.

Log in or register to post comments

Comment #4

scott_euser commented 22 September 2018 at 13:20

... and the patch

Log in or register to post comments

Comment #5

scott_euser commented 22 September 2018 at 13:24

Status	File	Size
new	drupal-transliteration-causes-2-capital-letters-3000630-5.patch	1.28 KB

Issue uploading patch, trying again with file rename.

Log in or register to post comments

Comment #6

scott_euser commented 22 September 2018 at 13:25

Issue tags:

+DistributedSprintUK18

Log in or register to post comments

Comment #7

scott_euser commented 22 September 2018 at 13:35

Issue tags:

-Needs tests

Status	File	Size
new	drupal-transliteration-causes-2-capital-letters-3000630-7.patch	2.34 KB

Now with unit test

Log in or register to post comments

Comment #8

vijaycs85

London, UK

commented 22 September 2018 at 13:43

Over all, looks good @scott_euser. Just one minor comment:

+++ b/core/lib/Drupal/Component/Transliteration/PhpTransliteration.php
@@ -107,6 +107,10 @@ public function removeDiacritics($string) {
+    // String is mixed case if not all uppercase and not all lowercase.

@@ -126,6 +130,12 @@ public function transliterate($string, $langcode = 'en', $unknown_character = '?
+      // If this is a capitalised letter of a mixed case word, only capitalise
+      // the first letter and lowercase any subsequent letters.

+++ b/core/tests/Drupal/Tests/Component/Transliteration/PhpTransliterationTest.php
@@ -185,4 +185,19 @@ public function testSafeInclude() {
+    // Test with a mixed case word where a single character results in mutliple
+    // and the single character was originally capitalised. The result of the
+    // below should be 'Shtrikhkod' not 'SHtrikhkod'.

Great comments. Very easy to understand what's going on.

+++ b/core/tests/Drupal/Tests/Component/Transliteration/PhpTransliterationTest.php
@@ -185,4 +185,19 @@ public function testSafeInclude() {
+    $input = 'Штрихкод';
+    $expected_output = 'Shtrikhkod';

This could use a dataProvider to have multiple dataset.

Log in or register to post comments

Comment #9

scott_euser commented 22 September 2018 at 14:01

Status	File	Size
new	interdiff-3000630-7-9.txt	2.1 KB
new	drupal-transliteration-causes-2-capital-letters-3000630-9.patch	2.98 KB

Thanks for the feedback! I have switched it to use a data provider matching the style of ::testRemoveDiacritics(). As a result, I have moved the comment explanation of the test above the method.

Log in or register to post comments

Comment #10

vijaycs85

London, UK

commented 22 September 2018 at 14:07

Status:

Needs review

» Reviewed & tested by the community

Looks great!

Log in or register to post comments

Comment #11

longwave

he/him

English

UK

commented 22 September 2018 at 21:01

Status:

Reviewed & tested by the community

» Needs review

+++ b/core/lib/Drupal/Component/Transliteration/PhpTransliteration.php
@@ -107,6 +107,10 @@ public function removeDiacritics($string) {
+    $mixed_case = (!ctype_lower($string) && !ctype_upper($string));

Do we have to consider strings that contain characters other than letters? For example $mixed_case is true for the strings "a " and "A1", but these are not strictly mixed case.

+++ b/core/tests/Drupal/Tests/Component/Transliteration/PhpTransliterationTest.php
@@ -185,4 +185,40 @@ public function testSafeInclude() {
+  public function testTransliterationMixedCase($original, $expected) {

While the comments are good I am not sure this needs its own test method, I think this can just be folded into providerTestPhpTransliteration().

Log in or register to post comments

Comment #12

scott_euser commented 23 September 2018 at 12:09

Status	File	Size
new	interdiff-3000630-9-12.txt	3.78 KB
new	drupal-transliteration-causes-2-capital-letters-3000630-12.patch	2.76 KB

Thanks for the thorough look! I have updated the patch to account for single character words and words with numbers mixed in and added that to the tests. Needed to convert to using mb_strtoupper and mb_strtolower comparison to handle the new cases.

I also combined the texts without losing the detailed comments - a similar long comment explanation of 3 and 4 byte characters already existed within the array so I followed that style.

If you have a chance to re-review would be greatly appreciated.

Log in or register to post comments

Comment #13

krzysztof domański

Poland

commented 23 September 2018 at 16:32

Status:

Needs review

» Needs work

The code below checks if the whole string is mixed case so "Щастие ЩЩЩ" return "Schastie SchSchSch". IMO this should return "Schastie SCHSCHSCH".

public function transliterate($string, $langcode = 'en', $unknown_character = '?', $max_length = NULL) {
  $result = '';
  $length = 0;

  // String is mixed case if it consists of both uppercase and lowercase
  // letters. To accurately check this, remove any numbers and check that
  // remaining characters are not all uppercase and not all lowercase.
  $alpha_string = preg_replace('/\\d/', '', $string);
  $mixed_case = (strlen($alpha_string) > 1 && mb_strtolower($alpha_string) !== $alpha_string && mb_strtoupper($alpha_string) !== $alpha_string);

  // Split into Unicode characters and transliterate each one.
  foreach (preg_split('//u', $string, 0, PREG_SPLIT_NO_EMPTY) as $character) {
    $code = self::ordUTF8($character);

We should check if single word is mixed case. The test also needs something like this:
['bg', 'Щастие ЩЩЩ', 'Schastie SCHSCHSCH'],

Log in or register to post comments

Comment #14

scott_euser commented 23 September 2018 at 17:20

Status:

Needs work

» Needs review

Status	File	Size
new	interdiff-3000630-12-14.txt	4.23 KB
new	drupal-transliteration-causes-2-capital-letters-3000630-14.patch	4.33 KB

Makes sense thank you for reviewing! I have added that in and added coverage for multiple words with different cases, potentially with punctuation.

Log in or register to post comments

Comment #15

krzysztof domański

Poland

commented 24 September 2018 at 05:21

Status:

Needs review

» Needs work

The patch #14 does not respect $max_length.

foreach (preg_split('//u', $word, 0, PREG_SPLIT_NO_EMPTY) as $character) {

  // (...)

  // Check if this exceeds the maximum allowed length.
  if (isset($max_length)) {
    $length += strlen($to_add);
    if ($length > $max_length) {
    // There is no more space.
    $results = array_filter($results);
      return implode(' ', $results);
    }
  }

Now the code above is in the inner loop so it checks the length of words.

For example $max_length is equal 20. Our string has 200 letters and contains only words shorter than 20 letters. It will never be trim inside this loop. After joining the words, we will get longer than 20 letters (approximately 200 letters).

Log in or register to post comments

Comment #16

scott_euser commented 24 September 2018 at 07:25

Status:

Needs work

» Needs review

Thanks for rechecking. Apologies for questioning but have you actually tried that? Length is defined outside the foreach words loop + the spaces between words are also now added to length.

There is also test coverage for max length and it verifies that it is working.

Perhaps you can describe the steps you took to get it to incorrectly handle max length?

Log in or register to post comments

Comment #17

krzysztof domański

Poland

commented 24 September 2018 at 07:38

@scott_euser You're right. It respects $max_length. Apologies for incorrect comment #15.

Log in or register to post comments

Comment #18

krzysztof domański

Poland

commented 24 September 2018 at 10:29

Status:

Needs review

» Needs work

I tested the patch #14 checking $unknown_character.

In some cases unexpected results are returned. When an unknown character is between two spaces, everything is correct, but when an unknown character is next to the "normal" character, the data is truncated.

For example expected value for 'Hel' . $unknown_character . 'o World' is "Hel?o World" but it returns "H World". It probably always cuts to the first letter.

How to reproduce:

$transliteration = new PhpTransliteration();
$unknown_character = chr(0x80);

// Without any unknow character
// "Hello World" - expected
// "Hello World" - return
$hello_1 = 'Hello World';
$test_1 = $transliteration->transliterate($hello_1);

// Unknown character between two spaces
// "Hello ? World" - expected
// "Hello ? World" - return
$hello_2 = 'Hello ' . $unknown_character . ' World';
$test_2 = $transliteration->transliterate($hello_2);

// Unknown character between two "normal" characters
// "Hel?o World" - expected
// "H World" - return
$hello_3 = 'Hel' . $unknown_character . 'o World';
$test_3 = $transliteration->transliterate($hello_3);

// Unknown character between one space and one "normal" character
// "Hell? World" - expected
// "H World" - return
$hello_4 = 'Hell' . $unknown_character . ' World';
$test_4 = $transliteration->transliterate($hello_4);

The test also requires something similar to the following code. I think we also need other cases.

['en', 'Hello ' . $unknown_character . ' World', 'Hello ? World'],
['en', 'Hel' . $unknown_character . 'o World', 'Hel?o World'],
['en', 'Hell' . $unknown_character .' World', 'Hell? World'],

Log in or register to post comments

Comment #19

krzysztof domański

Poland

commented 24 September 2018 at 10:41

Now I noticed that I use in example the same variable like transliterate() argument.
transliterate($string, $langcode = 'en', $unknown_character = '?', $max_length = NULL)

Please rename this or use chr(0x80).

['en', 'Hello ' . chr(0x80) . ' World', 'Hello ? World'],
['en', 'Hel' . chr(0x80) . 'o World', 'Hel?o World'],
['en', 'Hell' . chr(0x80) .' World', 'Hell? World'],

Log in or register to post comments

Comment #20

krzysztof domański

Poland

commented 24 September 2018 at 13:14

Status:

Needs work

» Needs review

Status	File	Size
new	core_transliteration_unknown_unicode_test.txt	1016 bytes

After I reset the code, I ran the test which the following cases.

// Illegal/unknown unicode.
['en', chr(0xF8) . chr(0x80) . chr(0x80) . chr(0x80) . chr(0x80), '?'],
['en', 'Hello ' . chr(0x80) . ' World', 'Hello ? World'],
['en', 'Hel' . chr(0x80) . 'o World', 'Hel?o World'],
['en', 'Hell' . chr(0x80) .' World', 'Hell? World'],

It ended with a failure so there was such a problem before. IMO Patch #14 looks good.

3) Drupal\Tests\Component\Transliteration\PhpTransliterationTest::testPhpTransliteration with data set #15 ('en', 'Hell▒ World', 'Hell? World')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'Hell? World'
+'H'

Now I'm not sure. Do we have to create a separate issue (Unknown unicode in transliteration not work correctly) or continue in this?

Log in or register to post comments

Comment #21

scott_euser commented 24 September 2018 at 15:58

Hi Krzysztof Domański,

Thank you for further reviewing and confirming about max length. Yes, lets keep the scope to the single character resulting in 2 characters. If there are other issues with transliteration it should be a new issue since this code isn't causing those issues.

So if you are then happy with it, can you set it back to RTBC?

Thanks!
Scott

Log in or register to post comments

Comment #22

krzysztof domański

Poland

commented 24 September 2018 at 16:58

Status:	Needs review	» Reviewed & tested by the community
Related issues:		+#3001997: Transliteration a string containing an unknown character (e.g. 0x80) is not valid

Patch #14 looks good!

I added a new issue to a separate problem Transliteration a string containing an unknown character (e.g. 0x80) is not valid

Log in or register to post comments

Comment #23

scott_euser commented 24 September 2018 at 17:21

Sounds good thanks! I'll see if I can take a look at it and see what's going wrong.

Log in or register to post comments

Comment #24

imyaro commented 26 September 2018 at 04:02

Patch works good, but why did you use word "Щастие"? It confuses a little because it supposed to be a Russian word, but written with the mistake (Like "Happinez" instead of "Happiness"). Probably it will be better to change it?

We can use something like "Шина" (Shina) / "Шоссе" (Shosse) instead. And as the last phrase "Шла Саша по ШОССЕ" (Shla Sasha po SHOSSE).

Log in or register to post comments

Comment #25

scott_euser commented 26 September 2018 at 06:18

Hi zvse,

Thanks for the feedback! Actually its the correct spelling in Bulgarian for happiness (and 'bg' is indicated in the test as the original language). Hope that helps clear things up!

Scott

Log in or register to post comments

Comment #26

11 October 2018 at 12:28

Status:

Reviewed & tested by the community

» Needs work

The last submitted patch, 14: drupal-transliteration-causes-2-capital-letters-3000630-14.patch, failed testing. View results

Log in or register to post comments

Comment #27

scott_euser commented 11 October 2018 at 18:15

Status:

Needs work

» Reviewed & tested by the community

Seems re-run of test was triggered and the test failed to complete, stopped partway through. Re-running again now and re-setting status back to where it was.

Log in or register to post comments

Comment #28

scott_euser commented 11 October 2018 at 18:16

Status	File	Size
new	drupal-transliteration-causes-2-capital-letters-3000630-14.patch	4.33 KB

Log in or register to post comments

Comment #29

catch

he/him

English

commented 20 November 2018 at 11:58

Status:

Reviewed & tested by the community

» Fixed

Committed and pushed bb7fb6a3dd to 8.7.x and 35c3d18ae0 to 8.6.x. Thanks!

Log in or register to post comments

Comment #30

20 November 2018 at 11:58

catch committed bb7fb6a on 8.7.x

Issue #3000630 by scott_euser, Krzysztof Domański, APolitsin, vijaycs85...

Log in or register to post comments

Comment #31

20 November 2018 at 11:58

catch committed 35c3d18 on 8.6.x

Issue #3000630 by scott_euser, Krzysztof Domański, APolitsin, vijaycs85...

Log in or register to post comments

Comment #32

scott_euser commented 21 November 2018 at 07:35

Thanks!

Log in or register to post comments

Comment #33

tacituseu commented 22 November 2018 at 18:39

This might have introduced intermittent test failures (trailing spaces?):
https://www.drupal.org/pift-ci-job/1126573

1) Drupal\Tests\Core\Transliteration\PhpTransliterationTest::testPhpTransliterationWithAlter with data set #1 ('zz', '@Sqnz3\)} ', '@Sqnz3\)} ')
'@Sqnz3\)} ' transliteration to '@Sqnz3\)}' is identical to '@Sqnz3\)} ' for language 'zz' in service instance.
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-@Sqnz3\)} 
+@Sqnz3\)}

/var/www/html/core/tests/Drupal/Tests/Core/Transliteration/PhpTransliterationTest.php:56

https://www.drupal.org/pift-ci-job/1127403

1) Drupal\Tests\Component\Transliteration\PhpTransliterationTest::testPhpTransliteration with data set #0 ('en', 'XM'?Gj|P' ', 'XM'?Gj|P' ')
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'XM'?Gj|P' '
+'XM'?Gj|P''

/var/www/html/core/tests/Drupal/Tests/Component/Transliteration/PhpTransliterationTest.php:93

Log in or register to post comments

Comment #34

krzysztof domański

Poland

commented 6 January 2019 at 08:58

I added a new issue #3015684: Protect transliteration so that it does not trim whitespace.

Log in or register to post comments

Comment #35

krzysztof domański

Poland

commented 18 January 2019 at 22:39

-- edited --

Log in or register to post comments

Comment #36

krzysztof domański

Poland

commented 18 January 2019 at 22:40

-- edited --

Log in or register to post comments

Comment #37

24 November 2018 at 00:24

alexpott committed ce031c3 on 8.7.x

Revert "Issue #3000630 by scott_euser, Krzysztof Domański, APolitsin,...

Log in or register to post comments

Comment #38

24 November 2018 at 00:24

alexpott committed e8e94d4 on 8.6.x

Revert "Issue #3000630 by scott_euser, Krzysztof Domański, APolitsin,...

Log in or register to post comments

Comment #39

alexpott

he/they

English

🇪🇺🌍

commented 24 November 2018 at 00:26

Status:

Fixed

» Needs work

This change definitely should not cause random test fails - #3015802: Random fail in \Drupal\Tests\Core\Transliteration\PhpTransliterationTest and also I think transliteration should not affecting spacing.

Log in or register to post comments

Comment #40

krzysztof domański

Poland

commented 18 January 2019 at 22:33

Status:

Needs work

» Needs review

Status	File	Size
new	interdiff-28-40.txt	7.33 KB
new	core_3000630-40.patch	7.66 KB

Thanks for revert! New patch.

Log in or register to post comments

Comment #41

krzysztof domański

Poland

commented 18 January 2019 at 22:43

Unnecessary code

// String is mixed case if it consists of both uppercase and lowercase
// letters. To accurately check this, remove any numbers and check that
// remaining characters are not all uppercase and not all lowercase.
$alpha_string = preg_replace('/\\d/', '', $value);
$mixed_case = (strlen($alpha_string) > 1 && mb_strtolower($alpha_string) !== $alpha_string && mb_strtoupper($alpha_string) !== $alpha_string);

Less:

// String is mixed case if it consists of both uppercase and lowercase
// letters.
$mixed_case = (strlen($value) > 1 && mb_strtolower($value) !== $value && mb_strtoupper($value) !== $value);

Log in or register to post comments

Comment #42

krzysztof domański

Poland

commented 6 January 2019 at 09:05

Needs reroll after fix #3015992: Not affecting spacing in PhpTransliterationTest.

Log in or register to post comments

Comment #43

krzysztof domański

Poland

commented 18 January 2019 at 22:38

Issue summary:

View changes

Status	File	Size
new	core_3000630-43.patch	4.74 KB

12 files were hidden/shown/deleted

Status	File	Size
hidden	drupal-transliteration-causes-2-capital-letters-3000630-5.patch	1.28 KB
hidden	drupal-transliteration-causes-2-capital-letters-3000630-7.patch	2.34 KB
hidden	interdiff-3000630-7-9.txt	2.1 KB
hidden	drupal-transliteration-causes-2-capital-letters-3000630-9.patch	2.98 KB
hidden	interdiff-3000630-9-12.txt	3.78 KB
hidden	drupal-transliteration-causes-2-capital-letters-3000630-12.patch	2.76 KB
hidden	interdiff-3000630-12-14.txt	4.23 KB
hidden	drupal-transliteration-causes-2-capital-letters-3000630-14.patch	4.33 KB
hidden	core_transliteration_unknown_unicode_test.txt	1016 bytes
hidden	drupal-transliteration-causes-2-capital-letters-3000630-14.patch	4.33 KB
hidden	interdiff-28-40.txt	7.33 KB
hidden	core_3000630-40.patch	7.66 KB

1. Matches any words and spaces to handle mixed case and check if the word consists of both uppercase and lowercase letters.

// Matches any words and spaces to handle mixed case per word and keep
// multiple spaces.
preg_match_all("/[\S]+|[\s]+/", $string, $matches, PREG_PATTERN_ORDER);

foreach ($matches[0] as $str) {
  if (isset($max_length) && strlen($result) >= $max_length) {
    break;
  }
  // String is mixed case if it consists of both uppercase and lowercase
  // letters.
  $mixed_case = (strlen($str) > 1 && mb_strtolower($str) !== $str && mb_strtoupper($str) !== $str);

  // Split into Unicode characters and transliterate each one.
  foreach (preg_split('//u', $str, 0, PREG_SPLIT_NO_EMPTY) as $unicode_character) {
    if (preg_match('/\s/', $unicode_character)) {
      $to_add = $unicode_character;
    }
    else {
      $to_add = $this->transliterateSingleCharacter($unicode_character, $langcode, $unknown_character);
      // If this is a capitalised letter of a mixed case word, only
      // capitalise the first letter and lowercase any subsequent letters.
      // For example Шоссе should be transliterated into Shosse not SHosse.
      if ($mixed_case && strlen($to_add) > 1 && mb_strtoupper($to_add) === $to_add) {
        $to_add = ucfirst(strtolower($to_add));
      }
    }

    // Check if this exceeds the maximum allowed length.
    $length += strlen($to_add);
    if (isset($max_length) && $length > $max_length) {
      break;
    }

    $result .= $to_add;
  }
}

Log in or register to post comments

Comment #44

apolitsin commented 6 August 2019 at 11:35

Do we need add some Name.Lastname mask for nice transliteration for email ?
```diff
+ ['ru', 'Борис.Шпак', 'Boris.Shpak'],
```

Log in or register to post comments

Comment #45

krzysztof domański

Poland

commented 7 August 2019 at 06:27

Version:

8.6.x-dev

» 8.7.x-dev

Status	File	Size
new	3000630-43-45.txt	888 bytes
new	3000630-45.patch	4.81 KB

Log in or register to post comments

Comment #46

7 August 2019 at 06:27

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.9 was released on November 6 and is the final full bugfix release for the Drupal 8.7.x series. Drupal 8.7.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.8.0 on December 4, 2019. (Drupal 8.8.0-beta1 is available for testing.)

Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #47

7 August 2019 at 06:27

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #48

krzysztof domański

Poland

commented 22 June 2020 at 15:56

Version:

8.9.x-dev

» 9.1.x-dev

Status	File	Size
new	3000630-48.patch	4.81 KB

3 files were hidden/shown/deleted

Status	File	Size
hidden	core_3000630-43.patch	4.74 KB
hidden	3000630-43-45.txt	888 bytes
hidden	3000630-45.patch	4.81 KB

Re-rolled

Log in or register to post comments

Comment #49

22 June 2020 at 15:56

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Log in or register to post comments

Comment #50

krzysztof domański

Poland

commented 14 November 2020 at 19:50

Retested. https://www.drupal.org/pift-ci-job/1868691 It was a random test failure #2825845: DST-related test failures in FilterDateTimeTest.

Log in or register to post comments

Comment #51

andypost

he/him

Russian

commented 14 November 2020 at 21:30

Status:

Needs review

» Reviewed & tested by the community

Back to rtbc, timezones may change so separate issue

Log in or register to post comments

Comment #52

alexpott

he/they

English

🇪🇺🌍

commented 6 December 2020 at 12:52

Status:

Reviewed & tested by the community

» Needs work

+++ b/core/lib/Drupal/Component/Transliteration/PhpTransliteration.php
@@ -130,31 +130,71 @@ public function transliterate($string, $langcode = 'en', $unknown_character = '?
+      if (isset($max_length) && strlen($result) >= $max_length) {
+        break;
       }
...
+        if (isset($max_length) && $length > $max_length) {
+          break;
         }

+++ b/core/tests/Drupal/Tests/Component/Transliteration/PhpTransliterationTest.php
@@ -174,6 +174,16 @@ public function providerTestPhpTransliteration() {
+      ['bg', 'Щастие', 'Schastie'],

The first check can be removed if the second one is break 2;

+++ b/core/lib/Drupal/Component/Transliteration/PhpTransliteration.php
@@ -130,31 +130,71 @@ public function transliterate($string, $langcode = 'en', $unknown_character = '?
+        if (preg_match('/\s/', $unicode_character)) {
+          $to_add = $unicode_character;
+        }
+        else {

Is this worth it? I don't think it is. Doing a regular expresssion is probably more expensive that doing...

$code = self::ordUTF8(' ');
if ($code < 0x80) {
  // Already lower ASCII.
  return chr($code);
}

Which is what is effectively happening in HEAD.

+++ b/core/lib/Drupal/Component/Transliteration/PhpTransliteration.php
@@ -130,31 +130,71 @@ public function transliterate($string, $langcode = 'en', $unknown_character = '?
+          $to_add = $this->transliterateSingleCharacter($unicode_character, $langcode, $unknown_character);
...
+  /**
+   * Transliterates single character from Unicode to US-ASCII.
+   *
+   * @param string $character
+   *   A single character.
+   * @param string $langcode
+   *   The language code of the language the character is in.
+   * @param string $unknown_character
+   *   The character to substitute for characters without transliterated
+   *   equivalents.
+   *
+   * @return string
+   *   Non-US-ASCII character transliterated to US-ASCII character, and unknown
+   *   character replaced with $unknown_character.
+   */
+  protected function transliterateSingleCharacter($character, $langcode, $unknown_character) {

I'm not convinced that turning this into a method call is correct. We can make this a method call if we need the functionality somewhere else atm we don't

Log in or register to post comments

Comment #53

alexpott

he/they

English

🇪🇺🌍

commented 6 December 2020 at 12:58

Also capitalisation rules are harder... German proper nouns for example... what is supposed to happen for шШш the code in #48 gives us shShsh whereas HEAD would give us shSHsh which I think might be closer to the intent. Not sure.

Log in or register to post comments

Comment #54

alexpott

he/they

English

🇪🇺🌍

commented 6 December 2020 at 12:59

Also re #53 the issue title claims this is about the beginning of words but as #53 shows the current code is also affecting the middle of words too.

Log in or register to post comments

Comment #55

6 December 2020 at 12:59

Version:

9.2.x-dev

» 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #56

6 December 2020 at 12:59

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #57

6 December 2020 at 12:59

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #58

6 December 2020 at 12:59

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #59

6 December 2020 at 12:59

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #60

6 December 2020 at 12:59

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Transliteration causes 2 capital letters at the beginning of a word

Problem/Motivation

Proposed resolution

Remaining tasks

Comments

Child issues

Related issues

Referenced by