Postponed on #2454829: Configuration translation UI does not support plural sources/targets.

Problem/Motivation

The form for configuring and translating numeric fields has been copied multiple times: NumericField::buildOptionsForm(), TranslateEditForm::buildForm(), PluralVariants::getSourceElement(), and PluralVariants::getTranslationElement(). (That in itself is a problem -- it should be centralized.)

The problem is that the labels used in this form are not clear. Not only that, they are not appropriate for many languages. The reason is that while English, Spanish, etc. have two forms for plural expressions (singular, plural), many other languages have either one form, multiple forms, or two forms that are not "singular" and "plural".

Here's an easy-to read list of the rules:
http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html
And here's a more definitive set of rules that is not as easy to code or follow, but should be more accurate since it comes from Unicode.org:
http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_r...

And this related issue demonstrates actual real-world problems that have resulted from our localization teams misunderstanding the labels (the same labels are used on localize.drupal.org)... These are the best experts we have in the Drupal community, and they can be educated on how to understand the labels, and even they are having difficulty. For ordinary users who are not part of a localization community, who are trying to translate their own Drupal site, the problem would be even worse.
#2538142: Some po files have wrong plural translations

So, we need to fix this!

Next: some examples of actual observed problems arising from these labels on localize.drupal.org from that other issue.

Languages with only 1 plural form

In this case the label for the translation is "Singular form". Bad!

So we are seeing output like:

msgid_plural "@count hours"                                      
msgstr[0] "1 сағат"

With this translation, '12 hours' will be translated as '1 сағат'.

Languages with 2 plural forms

For all languages with 2 plural forms, we are currently using these labels:
- Singular form
- Plural form

This is OK for English, Spanish, French, etc. But there are languages where the 1st form is actually for all numbers that end in 1 (Icelandic, for instance), or for all non-zero numbers (Javanese). So for instance, here's a translation from the current Javanese po file:

msgid_plural "@count comments"                                      
msgstr[0] "1 komèntar"                                               
msgstr[1] "@count komèntar"

In this case the result is that, '0 comments' will be translated as '1 komèntar' in Javanese, because the first plural form is used for 0 and the second for all non-zero numbers. This is not correct.

Languages with more than 2 plural forms

The labels in the UI currently are:

  • 'Singular form'
  • 'First plural form'
  • '2. plural form'
  • '3. plural form'
  • etc.

The problem here is that in many languages, the first form is for something like "Numbers that end in 1", not really singular. Since the English form they are translating will say something like "1 item", it's likely they will put "1 item" in their language as well for the first form, instead of using "@count item" or the equivalent.

So really what is needed are language-specific labels. So for Russian, the labels might be something like:
- Form for numbers ending in 1 but not 11
- Form for numbers ending in 2, 3, 4, but not 12, 13, 14
- Form for all other numbers

Proposed resolution

Add a centralized function or method that generates those labels appropriate for a given language, and translate them in the interface language.

Remaining tasks

Decide. Implement. Review.

User interface changes

We will have appropriate labels for every languages.

API changes

None.

Files: 
CommentFileSizeAuthor
#30 numeric_fieds_form_labels-2499639-30.patch8.99 KBmaxocub
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,568 pass(es), 15 fail(s), and 2 exception(s). View

Comments

maxocub’s picture

maxocub’s picture

Title: Use better labels for numeric fields translation to multiple plurals langauges » Use better labels for numeric fields translation to multiple plurals languages
Gábor Hojtsy’s picture

Issue tags: -ui +language-ui, +language-config

Not sure those are better labels. In #2449597: Number formatters: Make it possible to configure format_plural on the formatter level @jhodgdon argued the labels overall should be improved.

jhodgdon’s picture

OK... So as things are now, translating plural forms seems like something only a member of the drupal localization team could do (or someone who is familiar with how it is done there).

For instance, say that I'm a Russian/English speaker. Here's the Russian formula from that plural formulas site:
nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);

So:
- If the number ends in the digit 1, use the first form of the noun, unless it ends in the digits 11.
- If it ends in 2, 3, or 4, use the second form, unless it ends in 12, 13, or 14.
- In other cases, use the third form.
What I don't know is what a Russian speaker would call these forms. But... wouldn't it make sense to let our Russian localization team figure that out? So maybe we need to have strings like this:
t('Singular form'), t('Plural form') ==> for the usual 2-form language case
... not sure what the strings would be for the multi-form cases? I was just going to suggest something like
t('First/singular form'), t('Second/plural form'), t('Third/plural form'), etc. but I do not think that is so good either.

maxocub’s picture

I made a quick search to find how Russian plural forms are called and found this site: http://www.russianlessons.net/lessons/lesson11_main.php

Maybe it would be clearer to have those kinds of labels (for Russian, as an example):
- Numbers ending in: 1 (but not 11)
- Numbers ending in: 2, 3, 4 (but not 12, 13, 14)
- Numbers ending in: 5, 6, 7, 8, 9, 0 (and 11, 12 ,13, 14)

So for Russian, the label 'Singular form' is not quite right, since it's the form for any numbers ending in 1.

I don't know if it would be possible, but we could parse the plural formulas and generate those labels. But I don't know any of those languages and maybe there is some exceptions that would be hard to put in a label.

jhodgdon’s picture

Would something like this work, making a new function in locale.module:

locale_get_plural_form_labels($langcode) {
  $exceptions = array(
     'ru' => array( t('Form for numbers ending in 1'), t('Form for numbers ending in 2, 3, 4'), t('Form for other numbers')),

     // other exceptions here

   }

   if (isset($exceptions[$langcode])) { return $exceptions[$langcode]; }

   $num_forms = (get the number of plural forms);

   if ($num_forms == 2) {
        return array(t('Singular form'), t('Plural form'));
   }

   $forms = array(t('Singular form'), t('First plural form'), t('Second plural form'), t('...) up to 6th;
   return [truncate $forms to $num_forms + 1 array items];
}

That would also centralize the generation of these labels, which seems like a Very Good Thing.

jhodgdon’s picture

So that was a bit rough. I think we'd also need to handle the "only one form" case, for languages without plurals, not sure how we do that in the UI now? Also that last line should truncate to $num_forms items, not $num_forms + 1.

Gábor Hojtsy’s picture

Note that you may be using German to edit a Polish view. So when configuring format plural stuff, you need all the variants for Polish (more than German), because the config you are editing mandates that. That you happen to edit it on a German UI does not matter for the number of fields. It will affect the labels printed on the forms however, which is why your ideas above don't work well (to let the labels be translated as appropriate for the config language). Because in this example, the form labels will be printed in German for the Polish plural configuration :P

jhodgdon’s picture

That's exactly why I think we need a function like #6, actually.

So say my UI language is German, and I'm editing Russian plural forms.

To make the UI, I call locale_get_plural_form_labels('ru'), and it will return:

array( t('Form for numbers ending in 1'), t('Form for numbers ending in 2, 3, 4'), t('Form for other numbers'))

which will take the 3 labels I need for Russian, and translate their text into German.

Right?

maxocub’s picture

About #7 and languages without plurals, I looked at all the po files (for beta11) and none of them have 'nplurals=1; plural=0;' in the header, but some of them don't even have a 'Plural-Forms:' line (bo, kk, ky, lo, rhg, tr, ug, vi).

Drupal then assumes that those languages have 2 plural forms.

If I add the line 'Plural-Forms: nplurals=1; plural=0;' to one of those po files header and import it, then it won't have a plural form. I then tested it in the translation UI and only one field is displayed (the singular form).

If we want to support languages without plurals, should we add a requirement that those languages include a 'Plural-Forms:' line in their po file, or should we remove the default 'nplurals=2; plural=n>1;'?

jhodgdon’s picture

Question: do all of the languages that use 2 plurals have the plurals line in their .po file?

If so, it seems like the default should be nplurals=1 for .po files with no information... but I agree it would be better if we had definite information for everything.

Although, people can also create their own languages in the UI, and presumably they would not always have plural forms information when they create it (not sure if they even can), so in that case, given that most languages have 2 plural forms, 2 is a more sensible default.

OK I have no idea what the best default would be :) but yes let's get that info into all of our own .po files?

Gábor Hojtsy’s picture

2 is the most sensible default, which is why that is. Drupal core does not even allow editing the plural forms because that would be a highly technical (under the hood) thing to expose. Only .po file imports may set the plural form (or the l10n_pconfig contrib module exposes this setting with several suggested settings for languages).

Also all .po files on localize.drupal.org/ftp.drupal.org should have a plural form exported. If some are missing, that is a problem. No language is allowed on localize.drupal.org without an explicit plural rule set and that is carried over to the exported .po files, unless there are bugs with that.

maxocub’s picture

The po files I looked at were the beta11 ones on ftp.drupal.org, and yes, some of them are missing the plural forms line.

#11: @jhodgdon: Yes, all of the 2 plural forms languages have the plural line in their po file.

All of the files missing the plural line seems to be language without plural forms, (except Turkish for which I found conflicting information):

bo, kk, ky, lo, rhg, tr, ug, vi

(those languages all have no plural on http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html, except Turkish, but on http://cgit.drupalcode.org/l10n_pconfig/tree/l10n_pconfig.module#n204 it does)

I agree that the default should stay as it is and that we should add the information on all po files.

jhodgdon’s picture

Turkish appears to have plurals. Here's a page I found when I searched the internet for "turkish plural": http://en.wikibooks.org/wiki/Turkish/Plural
and there were a bunch of other pages with similar information, so it seems to be accurate.

Also I see Turkish on that pluralforms page and it does also say 2 plural forms there.... Maybe you meant a different language?

maxocub’s picture

I was talking about Turkish.

On http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html it says 2 plural forms,
but on http://cgit.drupalcode.org/l10n_pconfig/tree/l10n_pconfig.module#n204, it says only one (but this one may be outdated)

And for your link in #14, the last lines say:

When a number other than one(two, three, etc.) precedes a noun, the noun remains in singular.

room, two rooms ==> oda, iki oda
hour, three hours ==> saat, üç saat
day, four days ==> gün, dört gün

Confusing...

jhodgdon’s picture

Oh that's interesting. So Turkish has two plural forms for nouns. In format_plural() we're generally putting in "@count [noun]", and in that case we would want it to say "1 oda" for one, "2 oda" for two. But someone could use format_plural() without actually including @count, in theory anyway... and often in English we would not put in the number for singular (presumably also for Turkish), so I think we still would need to allow translators to have 2 forms (singular, plural) for Turkish, right? It looks like Drupal has a bug then in this case.

Gábor Hojtsy’s picture

It is a localization server bug to not include the plural forms for the singular case. I don't know this was not recognized before honestly, I am pretty floored, but well... Opened #2502381: Singular plural formulas are not exported to .po files and proposed a fix there to l10n_server. As for replacing @count in the singular version, Drupal 7 and 8 already do this. See https://api.drupal.org/api/drupal/includes%21common.inc/function/format_... the @count is added to the args at all times and is used for replacement for the singular case too. That the English source string does not include @count for the singular case does not stop Turkish from using it in the "singular" case, which may be their universal case for all. (Turkish on localize.drupal.org is configured to have a single plural form). See http://cgit.drupalcode.org/l10n_pconfig/tree/l10n_pconfig.module#n115 for a list of uses of the $one variant. Localize.drupal.org has a much expanded list of languages now and l10n_pconfig should be updated with current data, but this gives an idea of format distribution among different languages with Drupal.

I think #2502381: Singular plural formulas are not exported to .po files should be resolved on l.d.o, which will make at least future .po files correct with the plural formulas. That is/was a sidetrack for this issue anyway, it should not affect how the core UI works or looks like AFAIS.

Gábor Hojtsy’s picture

#2502381: Singular plural formulas are not exported to .po files is theoretically fixed, rolled out and the .po files are being regenerated. It will take some time to verify since then the files will still need to be synced to ftp.drupal.org where you get them downloaded from. That happens either once or twice a day, I don't remember.

jhodgdon’s picture

Regarding Turkish, I think we should file an issue to discuss (with the Turkish localize.d.o group?) whether the Turkish plural information needs to be changed. I don't know where this issue should be filed? It looks like the defaults in l10n_pconfig are wrong for a start, so I'll file it there and we can move it elsewhere if it should be somewhere else:
#2503057: Turkish may have wrong default plural setup

So now we should probably get back to the discussion of this issue instead of getting sidetracked? :)

maxocub’s picture

Title: Use better labels for numeric fields translation to multiple plurals languages » Use better labels for numeric fields when using a multiple plural forms language
Issue summary: View changes
Mark_L6n’s picture

Some comments:

  1. Noticed that we already have pluralization rules from Symfony: https://api.drupal.org/api/drupal/core%21vendor%21symfony%21translation%21PluralizationRules.php/8
  2. From the point of view of someone in Computational Linguistics, the treatment of plurals in GetText is overly simplistic—no serious NLP program would rely on such basic rules. (The GetText docs even say "Without the input of linguists (which was not available)...") Take-aways from this comment are:
    • Don't need to be perfectionist about each rule
    • Don't get in the way of translators, i.e. don't make the rules too restrictive. An example is Chinese, which commonly has a pluralization rule nplurals=1; plural=0; since Chinese plural noun forms rarely appear. However, they can (see Wikipedia), so a pluralization rule that wouldn't cause translators problems would be nplurals=2; plural=(n > 1);.
  3. As for the question about Russian, it's a good language to illustrate difficulties. If the link above and Wikipedia are accurate, Russian has 6 or more noun cases, each of which has a plural form. If each of the endings are different, you have 6+ singular forms and 6+plural forms of a noun. Already this is too complex for the GetText system. However, we could stipulate that if a noun (or adj+noun) appears in isolation (i.e. not in the context of a larger phrase or sentence), we should use the nominative case for both singular and plural. (Nominative case is used for a noun in subject position, as opposed to direct object or indirect object or object of a preposition position, etc.)
    However, there is a difference between general plurals (i.e. some/many/these things) and numerically-specified plurals (i.e. 2/5/12 things):
    • "If the number, or the last digit of the number is 2, 3 or 4, (example: 22, 42, 103, 4) (but not 12, 13 & 14), then you should use the genitive singular case."
    • But: "If the number ends in any other digit you should use the genitive plural. All the 'teens' (-надцать) fit in to this catagory (11, 12, 13, 14, etc)".

    If we use the genitive singular and the genitive plural in addition to the nominative singular and plural, we have 4 forms. However, the Russian formula above lists nplurals=3;, i.e. just 3 forms. Since the GetText code is based on numbers, it is likely the standard, general plural of a case that is being left out.
    And this is just a simplified example, leaving out other issues mentioned in Wikipedia!
    If we wanted grammatically-accurate labels, we could use:

    • nominative singular
    • nominative plural
    • genitive singular
    • genitive plural

    However, the grammatical terms might not be understood by everybody, and there still is the issue of how to use the nominative plural.

jhodgdon’s picture

Regarding Russian - that is why we try to put things in context and translate phrases/sentences and not nouns in isolation. We do not need to have forms for all the grammatical stuff in Russian. We just need to be able to translate things in context, like "There are @count new comments", where @count could be 0, 1, 2, 3, ... So for Russian we need cases for 0, numbers ending in 1, etc. Not for nominative etc.

Gábor Hojtsy’s picture

Issue summary: View changes
Gábor Hojtsy’s picture

Issue summary: View changes
Gábor Hojtsy’s picture

Status: Postponed » Active
jhodgdon’s picture

OK... So what about the suggestion I made in #6/#8? Meaning:
- Let's say I have a German UI
- And I am translating a plural composite string from English to Russian.
- My UI builder calls a function like the one in #6 locale_get_plural_form_labels('en') for the English side, and locale_get_plural_form_labels('ru') for the Russian side. The English side would return:

array(t('Singular form'), t('Plural form'));

And the Russian side would return something like:

array( t('Form for numbers ending in 1'),
 t('Form for numbers ending in 2, 3, 4'),
 t('Form for other numbers'))

- These return values, since they are passed through t(), would be translated into German.
- My form builder would notice it needs 2 form elements for the English side, and it would split the source string into two and put it there, and it would need 3 form elements for Russian and would put those up.

Would that work?

Gábor Hojtsy’s picture

Are we able to do a finite list of all possible combinations of plural variants for that list and would we need a core release to add new ones?

jhodgdon’s picture

Issue summary: View changes

Ok. Let's see. We lost this link in the issue summary; restoring:
http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html
This is a table of what the plural forms are for most languages.

So scanning that, it looks like we have these variants:

a) 2 forms for singular/plural. This comes in two variants, based on whether 0 is considered singular or plural, but for purposes of *labels*, "Singular form" and "Plural form" (translated) should be fine for both variants.
These are listed in the table with these rules:
nplurals=2; plural=(n > 1);
nplurals=2; plural=(n != 1);

b) Languages with just 1 plural form. In this case we would need to present them with the @count objects thing to translate. These are listed in the table with this rule:
nplurals=1; plural=0;

c) Special case languages with complex rules -- we should get input from the language teams for those languages for what to make the labels say so that they're concise and understandable. These are languages like Arabic, Belarusian, Czech, Russian... There are around 20 of these that would need special cases.

So with the proposed resolution in #27, these cases would be decided and their labels would be put into a function. I guess we could also use a config entity or even a simple config object somehow? But in either case I think we would need to have a Core update to push out changes, like any other string update, unless we can build them into the .po files somehow? I'm not sure how that would work though, because we need to make sure that for each of these cases in a, b, c, the strings get into the pot database and get translated into all the languages. ???

maxocub’s picture

FileSize
8.99 KB
FAILED: [[SimpleTest]]: [PHP 5.5 MySQL] 98,568 pass(es), 15 fail(s), and 2 exception(s). View

Here's a first patch to start with.
The labels are placeholders, I just wanted to see how many exceptions there was.

maxocub’s picture

And some screenshots:

* NumericField:
numericfiled.png

* TranslateEditForm:
translateeditform.png

* PluralVariants:
pluralvariants.png

jhodgdon’s picture

+++ b/core/modules/locale/locale.module
@@ -1378,3 +1378,149 @@ function locale_translation_language_table($form_element) {
+
+function locale_get_plural_form_labels($langcode) {
+  switch ($langcode) {

Nice!

Of course this will need a doc block...

And we'll have to check over the details of the labels, but at first glance it looks good!

jhodgdon’s picture

Status: Active » Needs review

Let's set to Needs Review and see how many tests will need adjusting...

Gábor Hojtsy’s picture

+++ b/core/modules/locale/locale.module
@@ -1378,3 +1378,149 @@ function locale_translation_language_table($form_element) {
+    // Two plural forms (default)
+    default:
+      return array(
+        t('Singular form'),
+        t('Plural form'),
+      );

I don't think we can assume that, the langcode may be just one not listed above, eg. a special case of the ones listed above, etc. We need to at least return a list of labels relevant for this case until that langcode is added above (for new languages added globally). Thinking of languages that may not be added globally, eg. if you want to add a de-informal or something on your site, then site developers (and/or site builders) would need some way to provide the correct labels or accept that it falls back on some simplistic "1. variant, "2. variant" etc. and only appearing fancy for centrally known languages.

Status: Needs review » Needs work

The last submitted patch, 30: numeric_fieds_form_labels-2499639-30.patch, failed testing.

andypost’s picture

+++ b/core/modules/config_translation/src/FormElement/PluralVariants.php
@@ -23,6 +23,7 @@ class PluralVariants extends FormElementBase {
+    $labels = locale_get_plural_form_labels($source_language->getId());

@@ -51,6 +52,7 @@ protected function getSourceElement(LanguageInterface $source_language, $source_
+    $labels = locale_get_plural_form_labels($translation_language->getId());

+++ b/core/modules/locale/locale.module
@@ -1378,3 +1378,149 @@ function locale_translation_language_table($form_element) {
+function locale_get_plural_form_labels($langcode) {

Why this function placed to module file?
This exactly a helper like \Drupal\Core\StringTranslation\TranslationInterface::getNumberOfPlurals()

Also no reason to do that on per language because we have \Drupal\Core\Language\LanguageManager::getStandardLanguageList() so maybe it's time to add plural formula here?

jhodgdon’s picture

So... it sounds like when you define your own language in the UI, or use one of our configured languages, we need to have a way to configure:
- The number of plural forms
- The labels for the plural forms

Then we would have a method on the Language object that would be something like getPluralFormLabels() ?

That makes sense to me... can we do that? Or have I misunderstood what's being suggested in #34/#36?

Gábor Hojtsy’s picture

I don't think the ability to specify number of plurals and labels for them would be seen anything but a feature. It has not been a feature in any Drupal release before, so for it to not be a feature, this would at minimum be a major UX issue to fix (which it is not I believe). What I meant is that a module would need to be able to hook into that labeling function and that labeling function should have a sane fallback label list for languages that have multiple plurals but no specific fancy labels defined.

maxocub’s picture

FileSize
71.73 KB
89.78 KB
115.86 KB
jhodgdon’s picture

Category: Task » Bug report
Priority: Normal » Major
Issue summary: View changes
Issue tags: +Usability

Updating issue summary. I think this is actually a Major Bug and not a feature request. This is leading to confusion on localize.drupal.org. Adding to summary to explain why.

jhodgdon’s picture

So I guess when you set up a new language, you need to be able to specify:
- The number of plural forms
- The rules for when to use which plural form
This was true before. Now we would add:
- The labels for the plural forms

Can you specify anything about plural forms when you add a language from the UI now? Let's see...

No, you can't. All you get is Language name, Language code, and Left-to-right/Right-to-left.

So that means that any language you add in the UI is limited to using the English/default plural rules.

---------

I also wanted to see where some of this comes from...

First, one of the forms that allows people to translate or set up plural forms is NumericField::buildOptionsForm() with this code:

 $plurals = $this->getNumberOfPlurals($this->view->storage->get('langcode'));
 for ($i = 0; $i < $plurals; $i++) {
    $form ['format_plural_values'][$i] = array(
      '#title' => ($i == 0 ? $this->t('Singular form') : $this->formatPlural($i, 'First plural form', '@count. plural form')),
       '#description' => $this->t('Text to use for this variant, @count will be replaced with the value.'),
   }

(and then this is special-cased below for the $plurals == 2 format to saying Singular/Plural)

And then:

protected function getNumberOfPlurals($langcode = NULL) {
  return $this->getStringTranslation()->getNumberOfPlurals($langcode);
}

So this is looking up the number of plurals on the Translation Manager service, which is TranslationManager::getNumberOfPlurals(), which is getting it from a list stored in the state:

$plural_formulas = $this->state->get('locale.translation.plurals') ? : array();
...
return $plural_formulas [$langcode]['plurals'];

When it comes time to translate, TranslationManager::formatPluralTranslated() is ultimately calling locale_get_plural() to figure out which form to use from the mashed stored array of strings. That is using that same state variable:

$plural_formulas = $this->state->get('locale.translation.plurals') ? : array();
// Plural formulas are stored as an array for 0-199. 100 is the highest
// modulo used but storing 0-99 is not enough because below 100 we often
// find exceptions (1, 2, etc).
$index = $count > 199 ? 100 + ($count % 100) : $count;
$plural_indexes [$langcode][$count] = isset($plural_formulas [$langcode]['formula'][$index]) ? $plural_formulas [$langcode]['formula'][$index] : $plural_formulas [$langcode]['formula']['default'];

So, I think if we do this, we should somehow make sure that the labels for the plural forms get stored in this same state variable?

So... let's see. Currently the only place outside a test that this state variable is *set* is in PoDatabaseWriter. When it imports a PO file, it is parsing the header to find what the plural form is, and then storing:

      $plural = $header->getPluralForms();
      if (isset($plural) && $p = $header->parsePluralForms($plural)) {
        list($nplurals, $formula) = $p;
        $locale_plurals[$langcode] = array(
          'plurals' => $nplurals,
          'formula' => $formula,
        );
        \Drupal::state()->set('locale.translation.plurals', $locale_plurals);
 

Hm.....

jhodgdon’s picture

I was slightly wrong in the previous comment about what happens for a language that isn't on localize.d.o, and hence has no plural forms information.
- On a translation form, it will show up as having 2 forms (the default), so it will show Singular/Plural choices to enter.
- When translating, if there is nothing stored for a language in the state variable, locale_get_plural() returns the index -1. And then in TranslationManager::formatPluralTranslated(), it will return the "plural" form for that language (the second form in the list) in all cases. It will never use the "singular" form that was provided on the translation edit form.

jhodgdon’s picture

OK... So the implications of all of this:

a) If you trigger \Drupal\locale\Gettext::fileToDatabase() on a language, which is what causes PoDatabaseWriter to save the plural information to state, then you'll get some plural information in the state for that language -- either what's in the header of the PO file, or English rules if there's nothing. The only place this is called is from locale_translate_batch_import(), which gets triggered when you manually import a PO file or when one gets imported by an update or install of a language.

b) If you create a language but don't import a PO file, you will get currently:
- 2 plural forms in the UI, labeled Singular/Plural
- In translation, only the second (plural) form will ever be used.

So. It seems like for this issue, what we should do is:

1. Make sure that when you import a PO file and the plural stuff is taken care of in PoDatabaseWriter, we also add (untranslated) field labels and field descriptions for the plural forms to the state variable 'locale.translation.plurals' for "known" languages, with sensible defaults for unknown languages. We could perhaps make "sensible" defaults by matching known formats for the formula? It might work... or the "sensible" default could just be the labels we have now. We'd also need to make sure that we store the English strings for field labels/descriptions in the state variable, and somewhere also pass all the variants through t() so they get added to the POT database.

2. When setting up plural translation forms, get the label/description information from the state variable, translate the strings, and if there isn't anything there, we should only show 1 variant but make sure it gets read from / saved from index #1 (second index in the array) because that will be what is used in translations.

3. As a future feature, perhaps let people edit (on language configuration):
- Number of plural forms
- Formulas for plural forms. This would need to be in a format that PoHeader::parsePluralForms() could parse
- Labels for plural forms
When saving, we would call PoHeader::parsePluralForms() to parse the formula and make the required indexed array that is used in translation. But this would be a new feature that Drupal has never supported before, so we should open a separate issue to do this, and push it off to 8.1.x or later (since features are frozen).
[edit: stray line removed here]

How is that for a plan? I think 1/2 would fix the bug, and 3 would be a "nice to have" feature for the future (which could probably also be done in contrib for people who need customized languages).

andypost’s picture

I think we need to add plural info for supported languages statically so 2 way:

1) add another value for known languages to LanguageManager::getStandardLanguageList() (lang1, lang2, RTL, plural_formula)
2) add another method to language manager to return plural data LanguageManager::getStandardLanguagePluralFormula($language)

Also there's related issue but not popular

jhodgdon’s picture

The plural *formulas* are currently coming from the .po files, and I don't think we should change that, should we? I guess someone could import a .po file as a mechanism to change the plural formula for a language.

However, I agree that maybe for the standard languages, putting the information about the plural *labels* would be best here... although I am not sure how that would work if the plural formula in the .po file changed somehow, or if someone imported a .po file with a different formula in it. The labels would then possibly not match the formula or the number of plurals even.

So... I'm not sure if this is a good idea?

maxocub’s picture

Here's the issue I posted on l.d.o about the translation mistakes that the unprecise labels may have caused:
#2538142: Some po files have wrong plural translations

jhodgdon’s picture

Thanks. Definitely illustrates the problems. Updating the issue summary.

andypost’s picture

I'd prefer to rely on CLDR for source of rules

Maybe better use labels as examples of translation for the language?
Or actually get rid of labels and use description part of input for sane examples?
Both of that requires core to ship this mappings(formula and count) and examples as standard language list

Also this will rip the problem of import wrong .po file that can break all translations for language simply having wrong plural formula for language

jhodgdon’s picture

Issue summary: View changes

Yes, Unicode.org seems like a better source of information. Let's update the link in the issue summary.

And I think you are probably right that we should decouple .po file import from plural rules and plural labels... But I'm not sure what to do for custom language codes like 'en-UK' or 'es-MX' or whatever. I guess we would want to have a way for people when defining their own languages, to say "Use the pluralization rules from this base language", and then we could store that somewhere, a translation between new languages and known pluralization rules?

Mark_L6n’s picture

Here is a suggestion for labels: use a commonly-used name of the word-form in that language, to eliminate ambiguity.
This would be in contrast to describing how the word-form is used, as in this quote from the summary:

So really what is needed are language-specific labels. So for Russian, the labels might be something like:
- Form for numbers ending in 1 but not 11
- Form for numbers ending in 2, 3, 4, but not 12, 13, 14
- Form for all other numbers

The names in English for the 3 word-forms referenced there are: nominative singular, genitive singular and genitive plural. These terms are unambiguous, but difficult to remember, and of course we probably would want their Russian equivalents.

Here is an example from Czech about why it's good to let the locals decide what these names should be. While there are Czech terms for 'nominative singular', 'nominative plural' and 'genitive plural', they are not commonly used. Why? Because in Czech schools, to make things simple, they just call the cases 'Case 1', 'Case 2', 'Case 3' etc. and this is therefore what most people use. (I don't know about Russian.)

So suggestion:
1. Give good instructions to local experts about how to configure and name the plural system.
2. Let the local experts give good names. Use unambiguous, commonly-used names for the singular/plural forms used.

Something to include on instructions for local experts:
Some languages with complex case systems have many plural forms, one for each case (for ex., Slavic languages may have 6-7 plural forms). The pluralization system used by Drupal is not structured for all of the case plurals, though, so which plurals should you define? Answer: the number of plural forms used for counting items: 1 comment, 3 comments, 7 comments, etc. (which in Russian and Czech results in 1 singular form and 2 plural forms, I believe).

Crell’s picture

A relevant recent article on pluralization in Javascript, and the standard ICU MessageFormat: http://alistapart.com/article/pluralization-for-javascript

Gábor Hojtsy’s picture

We can refactor Drupal's plural handling around CLDR and/or ICU or some other standard in a future major version. Drupal 8 is not supposed to be in a state of release to do such refactoring unless this issue is critical and even then it needs to be substantially explained.

@jhodgdon: As for why we only do plural set on .po import, as you may have seen the plural formulas are not user friendly. Asking someone to come up with the right math formula for a language on adding a language sounds like a problem. Of course Drupal can do anything therefore https://www.drupal.org/project/l10n_pconfig

@andypost, @jhodgdon: indeed the problem of broken .po files did not escape us through the years and therefore usually we only consider the first .po file imported to set the plural rules for the language; if plural rules are already set for a language. See PODatabaseWriter::setHeader(), it would only ever overwrite the header if overwrite options allowed for it (at least one of the two overwrite options were enabled, neither of which are enabled by default).

@Mark_L6n: the trick is we need labels that can also be translated to other languages, ie. when you edit plurals of a Czech string on an Irish UI (imagine site in Irish by default and you are editing some Czech translations).

Mark_L6n’s picture

In light of your comments, another idea:
1) Decide which system Drupal will be compatible with in the future, ICU or CLDR, and use their terminology for labels.
2) Add an information field (or 2) which has:
a) a description of what each label is for (e.g. 'numbers ending in 2, 3, 4, but not 12, 13, 14') for end users needing to know how this works
b) the grammatical name of the item (e.g. `genitive singular`) for people who are looking up in reference material what the proper word-form should be

Crell’s picture

Mark: MessageFormat builds on CLDR, doesn't it? ICU vs. CLDR aren't different formats as I understand it (although my understanding of this space is very novice, I grant).

jhodgdon’s picture

The problem that I see for the idea of labeling with things like "genitive singular" is that while that may be the correct grammatical term for *a single noun* form, we are usually or at least often not translating a single noun, but rather an entire sentence or a phrase. For instance, the string we are translating could be something like:
@count users were updated

Calling this by the grammatical term for the noun "users" in this phrase would most likely confuse people -- even the grammar experts who understand what "genitive singular" means, because there could be multiple nouns in that phrase, and even multiple clauses that might not be all actually correctly translated into the genitive singular case.

So I again go back to a label like "the form for when @count ends in 1 but not 11", which would be descriptive for translators (does anyone disagree that the translators who know the target language would understand which forms they would need to translate if they were described this way?). This format would also provide labels that could be translated successfully into another language by ordinary speakers of that language (whereas I have no idea who could translate "genitive singular" into Spanish, for instance, which has no such linguistic concept).

Mark_L6n’s picture

@Crell: From a quick glance at them, it looks like CLDR added some things to ICU. Do you think Drupal will want to use one of these projects (the newer CLDR I would guess) in the future?

@jhodgon: To clarify, the last suggestion was to use labels that ICU or CLDR use, which for CLDR are terms like 'one', 'few', 'many', 'other'. Not important, just a suggestion; probably the most important thing is just to decide on a label and move on.
The grammar term would just be an aid to users for looking up information, because that is usually how a word-form would be listed in a reference work. Again, not really important, just an ease-of-use suggestion.

andypost’s picture

@Mark_L6n there's another big question - do we need to update all codebase "1 item" to "@count item" at least for asian languages that have mostly 1 form...

Gábor Hojtsy’s picture

@andypost: the idea so far was that there is nothing stopping translators from replacing 1 with @count in a translation and in fact those with only one form were expected to do so given l.d.o was supposed to display only one input field so it was assumed it was clear that is the only form used regardless of number. Of course this issue was opened because some of those assumptions were not right.

andypost’s picture

@Gabor sure, also maybe we need separate issue about "allow override formatPlural() per language" to allow use different numbers for different countables.

PS: Number formatting is another beast...

Gábor Hojtsy’s picture

@andypost why would they need to override it?

Mark_L6n’s picture

@andypost, the rule that I saw for Drupal, in the case of Chinese, looked pretty good.

maxocub’s picture

Hi all, I'd like to reopen this discussion since I'll be sprinting Friday through Sunday, maybe we can get some work done to improve this situation. I don't have anything to add right now, except that I agree with @jhodgdon on #55. I'll think about it more on Friday morning, but in the meantime, if you have more arguments towards a solution, please share.

maxocub’s picture

After reading the excellent article from #51 (thank you @Crell) I like the labels idea from CLDR's FormatMessage (one, few, many, etc.). I think they 'kind of' apply to every situation. Maybe we can use them and offer an optional help text where we use the more descriptive labels, like "the form for when @count ends in 1 but not 11".

jhodgdon’s picture

Hm, but which of "one, few, many, etc." applies to "when @count ends in 1 but not 11", or "when @count ends in 2-4 but not 12, 13, or 14" for instance? I mean, 123452 is not really "few", it's a lot.

maxocub’s picture

Haha, you're totally right! That's not really accurate.
I guess the example in the article is wrong on that count: "If the counter has a value that ends in 2–4, excluding 12–14, use the plural form few."

jhodgdon’s picture

Yeah... So I looked carefully through the CLDR article. I just don't think their idea of trying to make the labels uniform across all languages really makes a lot of sense. I mean, I can see where uniformity is good, but if it also means inaccuracy or confusion or lack of specificity, I don't see the benefit.

Take Russian for example. It has forms for:
Numbers ending in 1 but not 11
Numbers ending in 2, 3, 4, but not 12, 13, 14
Everything else [0; numbers ending in 11, 12, 13, 14; numbers ending in 0, 5, 6, 7, 8, 9]

The CLDR rules would label these "one", "few", and "many". I think this would be problematic:

a) We are currently having problems with the "singular" form labels we have now -- translators for languages like Russian or single-form languages are leaving @count out in these cases. Calling it "one" instead of "singular" would not resolve this problem in the slightest.

b) We should ask @andypost or other Russian speakers if they would understand "one" "few" "many" labels. I mean, maybe Andy would because he has been following this discussion, but how about some random Russian/English translator?

Anyway... I think CLDR is trying for uniformity... which is nice, but we have a problem of clarity and understanding. I don't think adopting their labels is a good idea. I think we should instead customize the labels to the actual rules of the languages so that they are clear and correct.

maxocub’s picture

@jhodgdon: I agree with you that we should aim for clarity over uniformity.

I just found this tool: https://github.com/mlocati/cldr-to-gettext-plural-rules

It generates the CLDR labels with examples: http://mlocati.github.io/cldr-to-gettext-plural-rules/

We don't need to use the CLDR labels, but the examples could be useful.

jhodgdon’s picture

That seems *marginally* useful, but... looking at Russian again, their examples are:

one: 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, …
few: 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, …
other: 0, 5~19, 100, 1000, 10000, 100000, 1000000, …

These examples are not incorrect, but they don't make it clear that:
121, 131, etc. belong with "one"
122, 143, etc. belong with "few"
20 and 105-120, etc. belong with "other".

I just don't think anything automatic is going to be all that useful, and I think that we can be much more concise than making a list of numbers... So I reallly think that for the special languages like Russian, we will be better off having an intelligent human being make the labels/examples, and put them into a big switch statement, rather than relying on anything automatic.

jhodgdon’s picture

Version: 8.0.x-dev » 8.1.x-dev

This will change translatable UI text, so according to
https://groups.drupal.org/node/484788
would need to be tagged "rc deadline".

Given that RC is very soon, and we don't really have a direction here, I think we should just move it to 8.1 now. Unless we decide this is a really really important bug, I think it would need to be 8.1 material. If it's 8.1, we have a lot more flexibility in how to tackle it.

Gábor Hojtsy’s picture

I think weighting the benefit vs. disruption, this removes 3 strings and adds a whole bunch, so not sure the RC phase rules it out. Nonetheless it needs considerably more discussion AFAIS, so overall a good call to move to 8.1 IMHO.

jhodgdon’s picture

FYI, on #2545730: Misuse of formatPlural() in Numeric field prefix/suffix I found myself needing to add these plural labels to 2 more forms. So I added a method to StringTranslationTrait to generate the current (not great) labels, with the idea that on this issue here we could:
a) Use this method on the other 4 classes currently generating their own labels
b) Fix the method to generate better labels.

There is a @todo on that method pointing to this issue.

Anyway, anyone interested in this issue could go look at the other patch...

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.