Problem/Motivation

When generating URL aliases from nodes with danish characters in the title, the characters 'å' and 'ø' gets translated incorrectly to 'a' and 'o' instead of 'aa' and 'oe' using Transliteration.

In other words; the node 'På tur med øl' becomes 'pa-tur-med-ol' instead of 'paa-tur-med-oel'.

A danish translation exists in core/lib/Drupal/Component/Transliteration/data/dk.php, which is used by the PhpTransliteration class. 'dk' is not the correct langcode for Denmark. It should be 'da':

$overrides['dk'] = [
  0xC5 => 'Aa',
  0xD8 => 'Oe',
  0xE5 => 'aa',
  0xF8 => 'oe',
];

Proposed resolution

Renaming the file to da.php and changing the array key from 'dk' to 'da' fixes this issue. However, for backwards compatibility it might be better to just add a new file named 'da.php', since people might have added their own workarounds.

Correct PhpTransliterationTest.php

Remaining tasks

None

User interface changes

None

API changes

None

Data model changes

None

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

nielsstampe created an issue. See original summary.

nielsstampe’s picture

nielsstampe’s picture

nielsstampe’s picture

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Krzysztof Domański’s picture

Assigned: nielsstampe » Unassigned
Priority: Normal » Major
Status: Active » Needs review
FileSize
26.35 KB
1.47 KB

I confirmed that the langcode for Denmark is "da" (see attachment). Country code for Denmark is "DK" core/lib/Drupal/Core/Locale/CountryManager.php. It could have been misleading.

'DJ' => t('Djibouti'),
'DK' => t('Denmark'),
'DM' => t('Dominica'),
However, for backwards compatibility it might be better to just add a new file named 'da.php', since people might have added their own workarounds.

I think we should leave only one version, but I may be wrong. In the new patch, "dk.php" is deleted.

PhpTransliterationTest.php also needs updating:

['dk', $two_byte, 'A O U Aa Oe aouaaoehello'],
['dk', $random, $random],
Krzysztof Domański’s picture

Issue summary: View changes
Issue tags: +Quick fix
borisson_’s picture

Status: Needs review » Reviewed & tested by the community

I think I agree with the assesment that we should leave the wrong file behind, and use @trigger_error to mark it as deprecated. I'm not sure, so setting this to rtbc to bring this to maintainer attention.

If we don't want to do the trigger error, this looks good to go

Krzysztof Domański’s picture

If we delete the wrong "dk.php" file it will never break the transliteration, because the following code (core/lib/Drupal/Component/Transliteration/PhpTransliteration.php) checks if the file exists:

protected function readLanguageOverrides($langcode) {
  // Figure out the file name to use by sanitizing the language code,
  // just in case.
  $file = $this->dataDirectory . '/' . preg_replace('/[^a-zA-Z\-]/', '', $langcode) . '.php';

  // Read in this file, which should set up a variable called $overrides,
  // which will be local to this function.
  if (is_file($file)) {
    include $file;
  }
  if (!isset($overrides) || !is_array($overrides)) {
    $overrides = [$langcode => []];
  }
  $this->languageOverrides[$langcode] = $overrides[$langcode];
}

Possible solutions used so far by programmers:

1. Adding "da.php" file with patch #4 or creating a similar custom patch (probably the most common).
2. Extend class PhpTransliteration in custom module with following code:

protected function readLanguageOverrides($langcode) {
  // Figure out the file name to use by sanitizing the language code,
  // just in case.
  $file = $this->dataDirectory . '/' . preg_replace('/[^a-zA-Z\-]/', '', $langcode) . '.php';

  // Incorrect file in Danish translation
  if (!is_file($file) && $langcode == 'da') {
    $file = $this->dataDirectory . '/dk.php';
  }

  // Read in this file, which should set up a variable called $overrides,
  // which will be local to this function.
  if (is_file($file)) {
    include $file;
  }
  if (!isset($overrides) || !is_array($overrides)) {
    $overrides = [$langcode => []];
  }
  $this->languageOverrides[$langcode] = $overrides[$langcode];
}

What will happen if we remove wrong file:

In the first case, after the core update (e.g. to 8.7), Composer will inform you that the patch #4 could not be applied. It is safe.
In the second case, one or the other file will be used, so it is also safe.

That's why we probably do not have to leave a wrong file.

Status: Reviewed & tested by the community » Needs work
Krzysztof Domański’s picture

Status: Needs work » Needs review

The last submitted patch, 8: core_transliteration-danish-incorrect-2895315-8-D8.patch, failed testing. View results

I launched the re-test. It's pass.

borisson_’s picture

Status: Needs review » Reviewed & tested by the community

Back to rtbc, with the same remarks as #10.

plach’s picture

Version: 8.6.x-dev » 8.7.x-dev

This should be fixed in the dev branch first.

catch’s picture

Status: Reviewed & tested by the community » Fixed

We should be fine to remove the old file, it will never be found unless someone configures a custom 'dk' language, and it's data rather than an API as such. However, given that, only committing this to 8.7.x so it comes out in a minor release.

Committed 2580766 and pushed to 8.7.x. Thanks!

  • catch committed 2580766 on 8.7.x
    Issue #2895315 by nielsstampe, Krzysztof Domański: Danish characters are...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.