Postponed on #2454829: Configuration translation UI does not support plural sources/targets.
Problem/Motivation
The form for configuring and translating numeric fields has been copied multiple times: NumericField::buildOptionsForm(), TranslateEditForm::buildForm(), PluralVariants::getSourceElement(), and PluralVariants::getTranslationElement(). (That in itself is a problem -- it should be centralized.)
The problem is that the labels used in this form are not clear. Not only that, they are not appropriate for many languages. The reason is that while English, Spanish, etc. have two forms for plural expressions (singular, plural), many other languages have either one form, multiple forms, or two forms that are not "singular" and "plural".
Here's an easy-to read list of the rules:
http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html
And here's a more definitive set of rules that is not as easy to code or follow, but should be more accurate since it comes from Unicode.org:
http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_r...
And this related issue demonstrates actual real-world problems that have resulted from our localization teams misunderstanding the labels (the same labels are used on localize.drupal.org)... These are the best experts we have in the Drupal community, and they can be educated on how to understand the labels, and even they are having difficulty. For ordinary users who are not part of a localization community, who are trying to translate their own Drupal site, the problem would be even worse.
#2538142: Some po files have wrong plural translations
So, we need to fix this!
Next: some examples of actual observed problems arising from these labels on localize.drupal.org from that other issue.
Languages with only 1 plural form
In this case the label for the translation is "Singular form". Bad!
So we are seeing output like:
msgid_plural "@count hours"
msgstr[0] "1 сағат"
With this translation, '12 hours' will be translated as '1 сағат'.
Languages with 2 plural forms
For all languages with 2 plural forms, we are currently using these labels:
- Singular form
- Plural form
This is OK for English, Spanish, French, etc. But there are languages where the 1st form is actually for all numbers that end in 1 (Icelandic, for instance), or for all non-zero numbers (Javanese). So for instance, here's a translation from the current Javanese po file:
msgid_plural "@count comments"
msgstr[0] "1 komèntar"
msgstr[1] "@count komèntar"
In this case the result is that, '0 comments' will be translated as '1 komèntar' in Javanese, because the first plural form is used for 0 and the second for all non-zero numbers. This is not correct.
Languages with more than 2 plural forms
The labels in the UI currently are:
- 'Singular form'
- 'First plural form'
- '2. plural form'
- '3. plural form'
- etc.
The problem here is that in many languages, the first form is for something like "Numbers that end in 1", not really singular. Since the English form they are translating will say something like "1 item", it's likely they will put "1 item" in their language as well for the first form, instead of using "@count item" or the equivalent.
So really what is needed are language-specific labels. So for Russian, the labels might be something like:
- Form for numbers ending in 1 but not 11
- Form for numbers ending in 2, 3, 4, but not 12, 13, 14
- Form for all other numbers
Proposed resolution
Add a centralized function or method that generates those labels appropriate for a given language, and translate them in the interface language.
Remaining tasks
Decide. Implement. Review.
User interface changes
We will have appropriate labels for every languages.
API changes
None.
Comment | File | Size | Author |
---|---|---|---|
#30 | numeric_fieds_form_labels-2499639-30.patch | 8.99 KB | maxocub |
Issue fork drupal-2499639
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #1
maxocub CreditAttribution: maxocub commentedPostponed on #2454829: Configuration translation UI does not support plural sources/targets
Comment #2
maxocub CreditAttribution: maxocub commentedComment #3
Gábor HojtsyNot sure those are better labels. In #2449597: Number formatters: Make it possible to configure format_plural on the formatter level @jhodgdon argued the labels overall should be improved.
Comment #4
jhodgdonOK... So as things are now, translating plural forms seems like something only a member of the drupal localization team could do (or someone who is familiar with how it is done there).
For instance, say that I'm a Russian/English speaker. Here's the Russian formula from that plural formulas site:
nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);
So:
- If the number ends in the digit 1, use the first form of the noun, unless it ends in the digits 11.
- If it ends in 2, 3, or 4, use the second form, unless it ends in 12, 13, or 14.
- In other cases, use the third form.
What I don't know is what a Russian speaker would call these forms. But... wouldn't it make sense to let our Russian localization team figure that out? So maybe we need to have strings like this:
t('Singular form'), t('Plural form') ==> for the usual 2-form language case
... not sure what the strings would be for the multi-form cases? I was just going to suggest something like
t('First/singular form'), t('Second/plural form'), t('Third/plural form'), etc. but I do not think that is so good either.
Comment #5
maxocub CreditAttribution: maxocub commentedI made a quick search to find how Russian plural forms are called and found this site: http://www.russianlessons.net/lessons/lesson11_main.php
Maybe it would be clearer to have those kinds of labels (for Russian, as an example):
- Numbers ending in: 1 (but not 11)
- Numbers ending in: 2, 3, 4 (but not 12, 13, 14)
- Numbers ending in: 5, 6, 7, 8, 9, 0 (and 11, 12 ,13, 14)
So for Russian, the label 'Singular form' is not quite right, since it's the form for any numbers ending in 1.
I don't know if it would be possible, but we could parse the plural formulas and generate those labels. But I don't know any of those languages and maybe there is some exceptions that would be hard to put in a label.
Comment #6
jhodgdonWould something like this work, making a new function in locale.module:
That would also centralize the generation of these labels, which seems like a Very Good Thing.
Comment #7
jhodgdonSo that was a bit rough. I think we'd also need to handle the "only one form" case, for languages without plurals, not sure how we do that in the UI now? Also that last line should truncate to $num_forms items, not $num_forms + 1.
Comment #8
Gábor HojtsyNote that you may be using German to edit a Polish view. So when configuring format plural stuff, you need all the variants for Polish (more than German), because the config you are editing mandates that. That you happen to edit it on a German UI does not matter for the number of fields. It will affect the labels printed on the forms however, which is why your ideas above don't work well (to let the labels be translated as appropriate for the config language). Because in this example, the form labels will be printed in German for the Polish plural configuration :P
Comment #9
jhodgdonThat's exactly why I think we need a function like #6, actually.
So say my UI language is German, and I'm editing Russian plural forms.
To make the UI, I call locale_get_plural_form_labels('ru'), and it will return:
which will take the 3 labels I need for Russian, and translate their text into German.
Right?
Comment #10
maxocub CreditAttribution: maxocub commentedAbout #7 and languages without plurals, I looked at all the po files (for beta11) and none of them have 'nplurals=1; plural=0;' in the header, but some of them don't even have a 'Plural-Forms:' line (bo, kk, ky, lo, rhg, tr, ug, vi).
Drupal then assumes that those languages have 2 plural forms.
If I add the line 'Plural-Forms: nplurals=1; plural=0;' to one of those po files header and import it, then it won't have a plural form. I then tested it in the translation UI and only one field is displayed (the singular form).
If we want to support languages without plurals, should we add a requirement that those languages include a 'Plural-Forms:' line in their po file, or should we remove the default 'nplurals=2; plural=n>1;'?
Comment #11
jhodgdonQuestion: do all of the languages that use 2 plurals have the plurals line in their .po file?
If so, it seems like the default should be nplurals=1 for .po files with no information... but I agree it would be better if we had definite information for everything.
Although, people can also create their own languages in the UI, and presumably they would not always have plural forms information when they create it (not sure if they even can), so in that case, given that most languages have 2 plural forms, 2 is a more sensible default.
OK I have no idea what the best default would be :) but yes let's get that info into all of our own .po files?
Comment #12
Gábor Hojtsy2 is the most sensible default, which is why that is. Drupal core does not even allow editing the plural forms because that would be a highly technical (under the hood) thing to expose. Only .po file imports may set the plural form (or the l10n_pconfig contrib module exposes this setting with several suggested settings for languages).
Also all .po files on localize.drupal.org/ftp.drupal.org should have a plural form exported. If some are missing, that is a problem. No language is allowed on localize.drupal.org without an explicit plural rule set and that is carried over to the exported .po files, unless there are bugs with that.
Comment #13
maxocub CreditAttribution: maxocub commentedThe po files I looked at were the beta11 ones on ftp.drupal.org, and yes, some of them are missing the plural forms line.
#11: @jhodgdon: Yes, all of the 2 plural forms languages have the plural line in their po file.
All of the files missing the plural line seems to be language without plural forms, (except Turkish for which I found conflicting information):
bo, kk, ky, lo, rhg, tr, ug, vi
(those languages all have no plural on http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html, except Turkish, but on http://cgit.drupalcode.org/l10n_pconfig/tree/l10n_pconfig.module#n204 it does)
I agree that the default should stay as it is and that we should add the information on all po files.
Comment #14
jhodgdonTurkish appears to have plurals. Here's a page I found when I searched the internet for "turkish plural": http://en.wikibooks.org/wiki/Turkish/Plural
and there were a bunch of other pages with similar information, so it seems to be accurate.
Also I see Turkish on that pluralforms page and it does also say 2 plural forms there.... Maybe you meant a different language?
Comment #15
maxocub CreditAttribution: maxocub commentedI was talking about Turkish.
On http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html it says 2 plural forms,
but on http://cgit.drupalcode.org/l10n_pconfig/tree/l10n_pconfig.module#n204, it says only one (but this one may be outdated)
And for your link in #14, the last lines say:
Confusing...
Comment #16
jhodgdonOh that's interesting. So Turkish has two plural forms for nouns. In format_plural() we're generally putting in "@count [noun]", and in that case we would want it to say "1 oda" for one, "2 oda" for two. But someone could use format_plural() without actually including @count, in theory anyway... and often in English we would not put in the number for singular (presumably also for Turkish), so I think we still would need to allow translators to have 2 forms (singular, plural) for Turkish, right? It looks like Drupal has a bug then in this case.
Comment #17
Gábor HojtsyIt is a localization server bug to not include the plural forms for the singular case. I don't know this was not recognized before honestly, I am pretty floored, but well... Opened #2502381: Singular plural formulas are not exported to .po files and proposed a fix there to l10n_server. As for replacing @count in the singular version, Drupal 7 and 8 already do this. See https://api.drupal.org/api/drupal/includes%21common.inc/function/format_... the @count is added to the args at all times and is used for replacement for the singular case too. That the English source string does not include @count for the singular case does not stop Turkish from using it in the "singular" case, which may be their universal case for all. (Turkish on localize.drupal.org is configured to have a single plural form). See http://cgit.drupalcode.org/l10n_pconfig/tree/l10n_pconfig.module#n115 for a list of uses of the $one variant. Localize.drupal.org has a much expanded list of languages now and l10n_pconfig should be updated with current data, but this gives an idea of format distribution among different languages with Drupal.
I think #2502381: Singular plural formulas are not exported to .po files should be resolved on l.d.o, which will make at least future .po files correct with the plural formulas. That is/was a sidetrack for this issue anyway, it should not affect how the core UI works or looks like AFAIS.
Comment #18
Gábor Hojtsy#2502381: Singular plural formulas are not exported to .po files is theoretically fixed, rolled out and the .po files are being regenerated. It will take some time to verify since then the files will still need to be synced to ftp.drupal.org where you get them downloaded from. That happens either once or twice a day, I don't remember.
Comment #19
jhodgdonRegarding Turkish, I think we should file an issue to discuss (with the Turkish localize.d.o group?) whether the Turkish plural information needs to be changed. I don't know where this issue should be filed? It looks like the defaults in l10n_pconfig are wrong for a start, so I'll file it there and we can move it elsewhere if it should be somewhere else:
#2503057: Turkish had wrong default plural setup
So now we should probably get back to the discussion of this issue instead of getting sidetracked? :)
Comment #20
maxocub CreditAttribution: maxocub commentedComment #21
Mark_L6n CreditAttribution: Mark_L6n as a volunteer commentedSome comments:
nplurals=1; plural=0;
since Chinese plural noun forms rarely appear. However, they can (see Wikipedia), so a pluralization rule that wouldn't cause translators problems would benplurals=2; plural=(n > 1);
.However, there is a difference between general plurals (i.e. some/many/these things) and numerically-specified plurals (i.e. 2/5/12 things):
If we use the genitive singular and the genitive plural in addition to the nominative singular and plural, we have 4 forms. However, the Russian formula above lists
nplurals=3;
, i.e. just 3 forms. Since the GetText code is based on numbers, it is likely the standard, general plural of a case that is being left out.And this is just a simplified example, leaving out other issues mentioned in Wikipedia!
If we wanted grammatically-accurate labels, we could use:
However, the grammatical terms might not be understood by everybody, and there still is the issue of how to use the nominative plural.
Comment #22
jhodgdonRegarding Russian - that is why we try to put things in context and translate phrases/sentences and not nouns in isolation. We do not need to have forms for all the grammatical stuff in Russian. We just need to be able to translate things in context, like "There are @count new comments", where @count could be 0, 1, 2, 3, ... So for Russian we need cases for 0, numbers ending in 1, etc. Not for nominative etc.
Comment #23
Gábor HojtsyComment #24
Gábor HojtsyComment #25
Gábor Hojtsy#2454829: Configuration translation UI does not support plural sources/targets landed. Let's continue here!
Comment #27
jhodgdonOK... So what about the suggestion I made in #6/#8? Meaning:
- Let's say I have a German UI
- And I am translating a plural composite string from English to Russian.
- My UI builder calls a function like the one in #6
locale_get_plural_form_labels('en')
for the English side, andlocale_get_plural_form_labels('ru')
for the Russian side. The English side would return:And the Russian side would return something like:
- These return values, since they are passed through t(), would be translated into German.
- My form builder would notice it needs 2 form elements for the English side, and it would split the source string into two and put it there, and it would need 3 form elements for Russian and would put those up.
Would that work?
Comment #28
Gábor HojtsyAre we able to do a finite list of all possible combinations of plural variants for that list and would we need a core release to add new ones?
Comment #29
jhodgdonOk. Let's see. We lost this link in the issue summary; restoring:
http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html
This is a table of what the plural forms are for most languages.
So scanning that, it looks like we have these variants:
a) 2 forms for singular/plural. This comes in two variants, based on whether 0 is considered singular or plural, but for purposes of *labels*, "Singular form" and "Plural form" (translated) should be fine for both variants.
These are listed in the table with these rules:
nplurals=2; plural=(n > 1);
nplurals=2; plural=(n != 1);
b) Languages with just 1 plural form. In this case we would need to present them with the @count objects thing to translate. These are listed in the table with this rule:
nplurals=1; plural=0;
c) Special case languages with complex rules -- we should get input from the language teams for those languages for what to make the labels say so that they're concise and understandable. These are languages like Arabic, Belarusian, Czech, Russian... There are around 20 of these that would need special cases.
So with the proposed resolution in #27, these cases would be decided and their labels would be put into a function. I guess we could also use a config entity or even a simple config object somehow? But in either case I think we would need to have a Core update to push out changes, like any other string update, unless we can build them into the .po files somehow? I'm not sure how that would work though, because we need to make sure that for each of these cases in a, b, c, the strings get into the pot database and get translated into all the languages. ???
Comment #30
maxocub CreditAttribution: maxocub commentedHere's a first patch to start with.
The labels are placeholders, I just wanted to see how many exceptions there was.
Comment #31
maxocub CreditAttribution: maxocub commentedAnd some screenshots:
* NumericField:
* TranslateEditForm:
* PluralVariants:
Comment #32
jhodgdonNice!
Of course this will need a doc block...
And we'll have to check over the details of the labels, but at first glance it looks good!
Comment #33
jhodgdonLet's set to Needs Review and see how many tests will need adjusting...
Comment #34
Gábor HojtsyI don't think we can assume that, the langcode may be just one not listed above, eg. a special case of the ones listed above, etc. We need to at least return a list of labels relevant for this case until that langcode is added above (for new languages added globally). Thinking of languages that may not be added globally, eg. if you want to add a de-informal or something on your site, then site developers (and/or site builders) would need some way to provide the correct labels or accept that it falls back on some simplistic "1. variant, "2. variant" etc. and only appearing fancy for centrally known languages.
Comment #36
andypostWhy this function placed to module file?
This exactly a helper like \Drupal\Core\StringTranslation\TranslationInterface::getNumberOfPlurals()
Also no reason to do that on per language because we have \Drupal\Core\Language\LanguageManager::getStandardLanguageList() so maybe it's time to add plural formula here?
Comment #37
jhodgdonSo... it sounds like when you define your own language in the UI, or use one of our configured languages, we need to have a way to configure:
- The number of plural forms
- The labels for the plural forms
Then we would have a method on the Language object that would be something like getPluralFormLabels() ?
That makes sense to me... can we do that? Or have I misunderstood what's being suggested in #34/#36?
Comment #38
Gábor HojtsyI don't think the ability to specify number of plurals and labels for them would be seen anything but a feature. It has not been a feature in any Drupal release before, so for it to not be a feature, this would at minimum be a major UX issue to fix (which it is not I believe). What I meant is that a module would need to be able to hook into that labeling function and that labeling function should have a sane fallback label list for languages that have multiple plurals but no specific fancy labels defined.
Comment #39
maxocub CreditAttribution: maxocub commentedComment #40
jhodgdonUpdating issue summary. I think this is actually a Major Bug and not a feature request. This is leading to confusion on localize.drupal.org. Adding to summary to explain why.
Comment #41
jhodgdonSo I guess when you set up a new language, you need to be able to specify:
- The number of plural forms
- The rules for when to use which plural form
This was true before. Now we would add:
- The labels for the plural forms
Can you specify anything about plural forms when you add a language from the UI now? Let's see...
No, you can't. All you get is Language name, Language code, and Left-to-right/Right-to-left.
So that means that any language you add in the UI is limited to using the English/default plural rules.
---------
I also wanted to see where some of this comes from...
First, one of the forms that allows people to translate or set up plural forms is NumericField::buildOptionsForm() with this code:
(and then this is special-cased below for the $plurals == 2 format to saying Singular/Plural)
And then:
So this is looking up the number of plurals on the Translation Manager service, which is TranslationManager::getNumberOfPlurals(), which is getting it from a list stored in the state:
When it comes time to translate, TranslationManager::formatPluralTranslated() is ultimately calling locale_get_plural() to figure out which form to use from the mashed stored array of strings. That is using that same state variable:
So, I think if we do this, we should somehow make sure that the labels for the plural forms get stored in this same state variable?
So... let's see. Currently the only place outside a test that this state variable is *set* is in PoDatabaseWriter. When it imports a PO file, it is parsing the header to find what the plural form is, and then storing:
Hm.....
Comment #42
jhodgdonI was slightly wrong in the previous comment about what happens for a language that isn't on localize.d.o, and hence has no plural forms information.
- On a translation form, it will show up as having 2 forms (the default), so it will show Singular/Plural choices to enter.
- When translating, if there is nothing stored for a language in the state variable, locale_get_plural() returns the index -1. And then in TranslationManager::formatPluralTranslated(), it will return the "plural" form for that language (the second form in the list) in all cases. It will never use the "singular" form that was provided on the translation edit form.
Comment #43
jhodgdonOK... So the implications of all of this:
a) If you trigger \Drupal\locale\Gettext::fileToDatabase() on a language, which is what causes PoDatabaseWriter to save the plural information to state, then you'll get some plural information in the state for that language -- either what's in the header of the PO file, or English rules if there's nothing. The only place this is called is from locale_translate_batch_import(), which gets triggered when you manually import a PO file or when one gets imported by an update or install of a language.
b) If you create a language but don't import a PO file, you will get currently:
- 2 plural forms in the UI, labeled Singular/Plural
- In translation, only the second (plural) form will ever be used.
So. It seems like for this issue, what we should do is:
1. Make sure that when you import a PO file and the plural stuff is taken care of in PoDatabaseWriter, we also add (untranslated) field labels and field descriptions for the plural forms to the state variable 'locale.translation.plurals' for "known" languages, with sensible defaults for unknown languages. We could perhaps make "sensible" defaults by matching known formats for the formula? It might work... or the "sensible" default could just be the labels we have now. We'd also need to make sure that we store the English strings for field labels/descriptions in the state variable, and somewhere also pass all the variants through t() so they get added to the POT database.
2. When setting up plural translation forms, get the label/description information from the state variable, translate the strings, and if there isn't anything there, we should only show 1 variant but make sure it gets read from / saved from index #1 (second index in the array) because that will be what is used in translations.
3. As a future feature, perhaps let people edit (on language configuration):
- Number of plural forms
- Formulas for plural forms. This would need to be in a format that PoHeader::parsePluralForms() could parse
- Labels for plural forms
When saving, we would call PoHeader::parsePluralForms() to parse the formula and make the required indexed array that is used in translation. But this would be a new feature that Drupal has never supported before, so we should open a separate issue to do this, and push it off to 8.1.x or later (since features are frozen).
[edit: stray line removed here]
How is that for a plan? I think 1/2 would fix the bug, and 3 would be a "nice to have" feature for the future (which could probably also be done in contrib for people who need customized languages).
Comment #44
andypostI think we need to add plural info for supported languages statically so 2 way:
1) add another value for known languages to
LanguageManager::getStandardLanguageList()
(lang1, lang2, RTL, plural_formula)2) add another method to language manager to return plural data
LanguageManager::getStandardLanguagePluralFormula($language)
Also there's related issue but not popular
Comment #45
jhodgdonThe plural *formulas* are currently coming from the .po files, and I don't think we should change that, should we? I guess someone could import a .po file as a mechanism to change the plural formula for a language.
However, I agree that maybe for the standard languages, putting the information about the plural *labels* would be best here... although I am not sure how that would work if the plural formula in the .po file changed somehow, or if someone imported a .po file with a different formula in it. The labels would then possibly not match the formula or the number of plurals even.
So... I'm not sure if this is a good idea?
Comment #46
maxocub CreditAttribution: maxocub commentedHere's the issue I posted on l.d.o about the translation mistakes that the unprecise labels may have caused:
#2538142: Some po files have wrong plural translations
Comment #47
jhodgdonThanks. Definitely illustrates the problems. Updating the issue summary.
Comment #48
andypostI'd prefer to rely on CLDR for source of rules
Maybe better use labels as examples of translation for the language?
Or actually get rid of labels and use description part of input for sane examples?
Both of that requires core to ship this mappings(formula and count) and examples as standard language list
Also this will rip the problem of import wrong .po file that can break all translations for language simply having wrong plural formula for language
Comment #49
jhodgdonYes, Unicode.org seems like a better source of information. Let's update the link in the issue summary.
And I think you are probably right that we should decouple .po file import from plural rules and plural labels... But I'm not sure what to do for custom language codes like 'en-UK' or 'es-MX' or whatever. I guess we would want to have a way for people when defining their own languages, to say "Use the pluralization rules from this base language", and then we could store that somewhere, a translation between new languages and known pluralization rules?
Comment #50
Mark_L6n CreditAttribution: Mark_L6n as a volunteer commentedHere is a suggestion for labels: use a commonly-used name of the word-form in that language, to eliminate ambiguity.
This would be in contrast to describing how the word-form is used, as in this quote from the summary:
The names in English for the 3 word-forms referenced there are: nominative singular, genitive singular and genitive plural. These terms are unambiguous, but difficult to remember, and of course we probably would want their Russian equivalents.
Here is an example from Czech about why it's good to let the locals decide what these names should be. While there are Czech terms for 'nominative singular', 'nominative plural' and 'genitive plural', they are not commonly used. Why? Because in Czech schools, to make things simple, they just call the cases 'Case 1', 'Case 2', 'Case 3' etc. and this is therefore what most people use. (I don't know about Russian.)
So suggestion:
1. Give good instructions to local experts about how to configure and name the plural system.
2. Let the local experts give good names. Use unambiguous, commonly-used names for the singular/plural forms used.
Something to include on instructions for local experts:
Some languages with complex case systems have many plural forms, one for each case (for ex., Slavic languages may have 6-7 plural forms). The pluralization system used by Drupal is not structured for all of the case plurals, though, so which plurals should you define? Answer: the number of plural forms used for counting items: 1 comment, 3 comments, 7 comments, etc. (which in Russian and Czech results in 1 singular form and 2 plural forms, I believe).
Comment #51
Crell CreditAttribution: Crell at Palantir.net commentedA relevant recent article on pluralization in Javascript, and the standard ICU MessageFormat: http://alistapart.com/article/pluralization-for-javascript
Comment #52
Gábor HojtsyWe can refactor Drupal's plural handling around CLDR and/or ICU or some other standard in a future major version. Drupal 8 is not supposed to be in a state of release to do such refactoring unless this issue is critical and even then it needs to be substantially explained.
@jhodgdon: As for why we only do plural set on .po import, as you may have seen the plural formulas are not user friendly. Asking someone to come up with the right math formula for a language on adding a language sounds like a problem. Of course Drupal can do anything therefore https://www.drupal.org/project/l10n_pconfig
@andypost, @jhodgdon: indeed the problem of broken .po files did not escape us through the years and therefore usually we only consider the first .po file imported to set the plural rules for the language; if plural rules are already set for a language. See PODatabaseWriter::setHeader(), it would only ever overwrite the header if overwrite options allowed for it (at least one of the two overwrite options were enabled, neither of which are enabled by default).
@Mark_L6n: the trick is we need labels that can also be translated to other languages, ie. when you edit plurals of a Czech string on an Irish UI (imagine site in Irish by default and you are editing some Czech translations).
Comment #53
Mark_L6n CreditAttribution: Mark_L6n as a volunteer commentedIn light of your comments, another idea:
1) Decide which system Drupal will be compatible with in the future, ICU or CLDR, and use their terminology for labels.
2) Add an information field (or 2) which has:
a) a description of what each label is for (e.g. 'numbers ending in 2, 3, 4, but not 12, 13, 14') for end users needing to know how this works
b) the grammatical name of the item (e.g. `genitive singular`) for people who are looking up in reference material what the proper word-form should be
Comment #54
Crell CreditAttribution: Crell at Palantir.net commentedMark: MessageFormat builds on CLDR, doesn't it? ICU vs. CLDR aren't different formats as I understand it (although my understanding of this space is very novice, I grant).
Comment #55
jhodgdonThe problem that I see for the idea of labeling with things like "genitive singular" is that while that may be the correct grammatical term for *a single noun* form, we are usually or at least often not translating a single noun, but rather an entire sentence or a phrase. For instance, the string we are translating could be something like:
@count users were updated
Calling this by the grammatical term for the noun "users" in this phrase would most likely confuse people -- even the grammar experts who understand what "genitive singular" means, because there could be multiple nouns in that phrase, and even multiple clauses that might not be all actually correctly translated into the genitive singular case.
So I again go back to a label like "the form for when @count ends in 1 but not 11", which would be descriptive for translators (does anyone disagree that the translators who know the target language would understand which forms they would need to translate if they were described this way?). This format would also provide labels that could be translated successfully into another language by ordinary speakers of that language (whereas I have no idea who could translate "genitive singular" into Spanish, for instance, which has no such linguistic concept).
Comment #56
Mark_L6n CreditAttribution: Mark_L6n as a volunteer commented@Crell: From a quick glance at them, it looks like CLDR added some things to ICU. Do you think Drupal will want to use one of these projects (the newer CLDR I would guess) in the future?
@jhodgon: To clarify, the last suggestion was to use labels that ICU or CLDR use, which for CLDR are terms like 'one', 'few', 'many', 'other'. Not important, just a suggestion; probably the most important thing is just to decide on a label and move on.
The grammar term would just be an aid to users for looking up information, because that is usually how a word-form would be listed in a reference work. Again, not really important, just an ease-of-use suggestion.
Comment #57
andypost@Mark_L6n there's another big question - do we need to update all codebase "1 item" to "@count item" at least for asian languages that have mostly 1 form...
Comment #58
Gábor Hojtsy@andypost: the idea so far was that there is nothing stopping translators from replacing 1 with @count in a translation and in fact those with only one form were expected to do so given l.d.o was supposed to display only one input field so it was assumed it was clear that is the only form used regardless of number. Of course this issue was opened because some of those assumptions were not right.
Comment #59
andypost@Gabor sure, also maybe we need separate issue about "allow override formatPlural() per language" to allow use different numbers for different countables.
PS: Number formatting is another beast...
Comment #60
Gábor Hojtsy@andypost why would they need to override it?
Comment #61
Mark_L6n CreditAttribution: Mark_L6n as a volunteer commented@andypost, the rule that I saw for Drupal, in the case of Chinese, looked pretty good.
Comment #62
maxocub CreditAttribution: maxocub commentedHi all, I'd like to reopen this discussion since I'll be sprinting Friday through Sunday, maybe we can get some work done to improve this situation. I don't have anything to add right now, except that I agree with @jhodgdon on #55. I'll think about it more on Friday morning, but in the meantime, if you have more arguments towards a solution, please share.
Comment #63
maxocub CreditAttribution: maxocub commentedAfter reading the excellent article from #51 (thank you @Crell) I like the labels idea from CLDR's FormatMessage (one, few, many, etc.). I think they 'kind of' apply to every situation. Maybe we can use them and offer an optional help text where we use the more descriptive labels, like "the form for when @count ends in 1 but not 11".
Comment #64
jhodgdonHm, but which of "one, few, many, etc." applies to "when @count ends in 1 but not 11", or "when @count ends in 2-4 but not 12, 13, or 14" for instance? I mean, 123452 is not really "few", it's a lot.
Comment #65
maxocub CreditAttribution: maxocub commentedHaha, you're totally right! That's not really accurate.
I guess the example in the article is wrong on that count: "If the counter has a value that ends in 2–4, excluding 12–14, use the plural form few."
Comment #66
jhodgdonYeah... So I looked carefully through the CLDR article. I just don't think their idea of trying to make the labels uniform across all languages really makes a lot of sense. I mean, I can see where uniformity is good, but if it also means inaccuracy or confusion or lack of specificity, I don't see the benefit.
Take Russian for example. It has forms for:
Numbers ending in 1 but not 11
Numbers ending in 2, 3, 4, but not 12, 13, 14
Everything else [0; numbers ending in 11, 12, 13, 14; numbers ending in 0, 5, 6, 7, 8, 9]
The CLDR rules would label these "one", "few", and "many". I think this would be problematic:
a) We are currently having problems with the "singular" form labels we have now -- translators for languages like Russian or single-form languages are leaving @count out in these cases. Calling it "one" instead of "singular" would not resolve this problem in the slightest.
b) We should ask @andypost or other Russian speakers if they would understand "one" "few" "many" labels. I mean, maybe Andy would because he has been following this discussion, but how about some random Russian/English translator?
Anyway... I think CLDR is trying for uniformity... which is nice, but we have a problem of clarity and understanding. I don't think adopting their labels is a good idea. I think we should instead customize the labels to the actual rules of the languages so that they are clear and correct.
Comment #67
maxocub CreditAttribution: maxocub commented@jhodgdon: I agree with you that we should aim for clarity over uniformity.
I just found this tool: https://github.com/mlocati/cldr-to-gettext-plural-rules
It generates the CLDR labels with examples: http://mlocati.github.io/cldr-to-gettext-plural-rules/
We don't need to use the CLDR labels, but the examples could be useful.
Comment #68
jhodgdonThat seems *marginally* useful, but... looking at Russian again, their examples are:
one: 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001, …
few: 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, …
other: 0, 5~19, 100, 1000, 10000, 100000, 1000000, …
These examples are not incorrect, but they don't make it clear that:
121, 131, etc. belong with "one"
122, 143, etc. belong with "few"
20 and 105-120, etc. belong with "other".
I just don't think anything automatic is going to be all that useful, and I think that we can be much more concise than making a list of numbers... So I reallly think that for the special languages like Russian, we will be better off having an intelligent human being make the labels/examples, and put them into a big switch statement, rather than relying on anything automatic.
Comment #69
jhodgdonThis will change translatable UI text, so according to
https://groups.drupal.org/node/484788
would need to be tagged "rc deadline".
Given that RC is very soon, and we don't really have a direction here, I think we should just move it to 8.1 now. Unless we decide this is a really really important bug, I think it would need to be 8.1 material. If it's 8.1, we have a lot more flexibility in how to tackle it.
Comment #70
Gábor HojtsyI think weighting the benefit vs. disruption, this removes 3 strings and adds a whole bunch, so not sure the RC phase rules it out. Nonetheless it needs considerably more discussion AFAIS, so overall a good call to move to 8.1 IMHO.
Comment #71
jhodgdonFYI, on #2545730: Misuse of formatPlural() in Numeric field prefix/suffix I found myself needing to add these plural labels to 2 more forms. So I added a method to StringTranslationTrait to generate the current (not great) labels, with the idea that on this issue here we could:
a) Use this method on the other 4 classes currently generating their own labels
b) Fix the method to generate better labels.
There is a @todo on that method pointing to this issue.
Anyway, anyone interested in this issue could go look at the other patch...
Comment #78
andypostLabeling remains a question.
But the second part of issue is about how to get number of plurals for language that created from standard list without importing po file (to get formula)
Right now logic lives in
\Drupal\locale\PluralFormula::getNumberOfPlurals()
&\Drupal\Core\StringTranslation\StringTranslationTrait::getNumberOfPlurals()
Comment #90
dwwThis came up as a random triage target in #bugsmash.
I confess to not having read every comment, but this definitely still seems like a bug, and likely major is accurate.
Turned patch #30 into MR 6729
I'm sure we don't want
locale_get_plural_form_labels()
as a new procedural method for this as the API addition. 😂 I'm not sure we want to try to maintain this exact list in core code like this, but I'm not yet seeing an alternative. Thankfully, once we get it right, this list will never change except when adding new languages. And multiple plural forms is pretty rare. Even in ʻŌlelo Hawaiʻi (which I co-maintain with @xjm), a language with different versions of "we" for "the two of us" vs. "all of us", it still only has 2 plural forms for the purpose of this issue.Where do we put the logic?
locale.plural.formula
service?StringTranslationTrait
?Comment #91
dwwp.s. re #90.2: I suggested
StringTranslationTrait
since it's already providinggetNumberOfPlurals()
.Comment #92
dwwFixed the PHPCS errors, both via phpcbf and manually.
In comment #71 @jhodgdon pointed to #2545730: Misuse of formatPlural() in Numeric field prefix/suffix where they also need the labels and decided to add something to
StringTranslationTrait
. That could be read as a vote for #90.2.Also tagging that this needs tests before it could be committed, and started saving issue credits for the rich discussion here.