Update: read the summary in comment #51 at http://drupal.org/node/532512#comment-5679184).

Original report

... and the translation interface has no way to determine if they are plural, so it treats them as singular.

That's a severe issue, as it breaks all languages that do not follow English plural conventions (and most of them don't...).

There are two issues:

  • locale() and format_plural() are independent from each other. format_plural() calls t() (and thus locale()) with a single, singular or plural form.
  • the plural information is part of the {locales_target} table, where as it is really a property of the {locales_source} table.

To solve this issue, I suggest we store both the singular and the plural form in the *same* row of the {locales_*} table, in a serialized form. format_plural() would call t() with the full, (serialized) array($singular, $plural), and gets in return a (serialized) array of plural forms.

CommentFileSizeAuthor
#103 532512-101.diff-without-white-space.txt1.92 KBSutharsan
#101 locale_plural_string_storage_broken-532512-101-without-86.patch1.13 KBSutharsan
#101 locale_plural_string_storage_broken-532512-101.patch8.86 KBSutharsan
#96 locale_plural_string_storage_broken-532512-94-without-86.patch1.14 KBSutharsan
#95 locale_plural_string_storage_broken-532512-94.patch8.87 KBSutharsan
#94 locale_plural_string_storage_broken-532512-94-without-#86.patch1.14 KBSutharsan
#86 locale_plural_string_storage_broken-532512-86.patch7.73 KBplach
#83 0001-532512-Plural-string-storage-is-broken.patch8.1 KBPol
#80 17.png59.47 KBPol
#80 0001-532512-Plural-string-storage-is-broken.patch921 bytesPol
#77 yay-plurals-532512-77.patch49.73 KBno_commit_credit
#77 interdiff.txt3.83 KBno_commit_credit
#71 interdiff.txt650 bytesGábor Hojtsy
#71 532512-plural-storage-71.patch49.71 KBGábor Hojtsy
#69 interdiff.txt660 bytesGábor Hojtsy
#69 532512-plural-storage-69.patch49.71 KBGábor Hojtsy
#66 532512-plural-storage-66.patch49.07 KBGábor Hojtsy
#66 interdiff.txt1.04 KBGábor Hojtsy
#64 532512-plural-storage-64.patch49.09 KBGábor Hojtsy
#63 532512-63-before.png32.48 KBSutharsan
#63 532512-63-after.png33.94 KBSutharsan
#58 532512-plural-storage-58.patch46.93 KBGábor Hojtsy
#56 532512-plural-storage-56.patch46.86 KBGábor Hojtsy
#54 532512-plural-storage-54.patch46.85 KBGábor Hojtsy
#53 532512-plural-storage-53.patch38.23 KBGábor Hojtsy
#53 interdiff.txt586 bytesGábor Hojtsy
#52 532512-plural-storage-52.patch38.23 KBGábor Hojtsy
#52 interdiff.txt2.26 KBGábor Hojtsy
#50 interdiff.txt12.78 KBGábor Hojtsy
#50 532512-plural-storage-50.patch35.98 KBGábor Hojtsy
#48 532512-plural-storage-48.patch24.14 KBGábor Hojtsy
#48 interdiff.txt8.48 KBGábor Hojtsy
#44 locale_edit_single_title.png5.11 KBandypost
#43 532512-plural-storage.patch23.01 KBandypost
#43 locale_edit_single.png4.62 KBandypost
#42 532512-plural-storage.patch23.37 KBandypost
#38 532512-plural-storage.patch19.14 KBandypost
#38 locale_list.png11.07 KBandypost
#38 locale_single.png4.74 KBandypost
#38 locale_plural.png9.62 KBandypost
#23 tdemo.tgz1.35 KBandypost
#8 532512-locale-tests-d7.patch3.54 KBandypost
#8 532512-locale-tests-plural-broken-d7.patch3.55 KBandypost
#7 loc.tar_.gz1.4 KBandypost
#7 532512-plural-first-d7.png5.45 KBandypost
#7 532512-single-first-d7.png5.75 KBandypost
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Title: Plural strings autodetected are stored as singular » Autodetected plural strings are stored as singular
andypost’s picture

Good idea and this approach could simplify logic and memory in _locale_rebuild_js()

andypost’s picture

localization server stores strings in one TEXT field separated with "\0". But there's an issue #690746: Text column type doesn't reliably hold serialized variables
This is a approach of gettext mo files http://www.gnu.org/software/hello/manual/gettext/MO-Files.html

This makes easy to search them. If we serialize them then additional data from serialization will break searches :(

I think we should unify core and l10n_server

andypost’s picture

If it's possible to include this in D7 I could roll a patch. Upgrade path could be hard but I think that sane translation is need!

Schema from l10n_server


  $schema['l10n_server_string'] = array(
    'description' => 'Value of translatable strings found.',
    'fields' => array(
      'sid' => array(
        'description' => 'Internal numeric identifier for a source string.',
        'type' => 'serial',
        'not null' => TRUE,
        'disp-width' => '11'
      ),
      'value' => array(
        'description' => 'The actual translatable string. For strings with multiple plural versions, we store them as the same translatable with a \0 separator (unlike Drupal itself), because it is easier to match translations with them (for multiple plural versions) this way, and we can force people to translate both at once.',
        'type' => 'text',
        'not null' => TRUE
      ),

Current core

  $schema['locales_source'] = array(
    'description' => 'List of English source strings.',
...

      'source' => array(
        'type' => 'text',
        'mysql_type' => 'blob',
        'not null' => TRUE,
        'description' => 'The original string in English.',
      ),
...
    'indexes' => array(
      'source_context' => array(array('source', 30), 'context'),
    ),

I cant understand a reason to make an index on 30 chars of blob also mysql_type should be removed

Suppose we could follow l10n_server approach to join singular & plural in one field. This lead to removing a plid and plural fields from {locales_target} also simplifies a code and makes translation system a bit usable.

andypost’s picture

History search:
- index source(30) is outdated #42463: locale.module: adding an index on the source field improves speeds
- mysql blob used for case sensitive search #93506: locale_* tables need %b

andypost’s picture

Because of nature of http://api.drupal.org/api/function/format_plural/7 I see no visible benefit of joining singular and plural.
Because in request mostly only one of them used. But I think we could add relation between them which could help to improve edit-strings form.

Functions that affected by plural column
http://api.drupal.org/api/function/locale_translate_export_po_form_submit/7

And import strings

andypost’s picture

There's another issue with format_plural() - no ability to differentiate a singular and plural form of string. Attached module shows this (with russian language) - If string used in both t() and format_plural() so only one is stored.

Steps to reproduce:
1) Enable locale module
2) Add some language with plurals >= 2 (for example russian); a list of plurals could be found at http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
3) Enable attached "Loc" module (russian translation should be auto-imported)
4) Visit /loc-test

t('1 line') translates as "@count строка"

If you manually import loc-ru.po file (with mode = "Strings in the uploaded file replace existing ones, new ones are added. The plural format is updated.")

You'll see different translation of first and last lines

This all because

 t('One line') and format_plural($n, 'One line', '@count lines')

potx exports both

#: loc.module:28
msgid "1 line"
msgstr ""

#: loc.module:33
msgid "1 line"
msgid_plural "@count lines"
msgstr[0] ""
msgstr[1] ""
andypost’s picture

Status: Active » Needs review
FileSize
3.55 KB
3.54 KB

Here the patch to illustrate bug

532512-locale-tests-d7.patch - Just improved tests (adds form-id to detect export type)

532512-locale-tests-plural-broken-d7.patch - Shows that no plurals are exported if no translation avaliable

EDIT: interdiff

-    $this->assertRaw('msgid "@count locale test lines"', t('Plural string present in exported file.'));
+    $this->assertRaw('msgid_plural "@count locale test lines"', t('Plural string present in exported file.'));
andypost’s picture

Suppose we should know n-plurals for every language supported by core to build a correct form to translate plural strings.

Status: Needs review » Needs work

The last submitted patch, 532512-locale-tests-plural-broken-d7.patch, failed testing.

Dmitriy.trt’s picture

format_plural() should not use t() and we should add different table for plural forms, or should use t() in different way (one call for singular and plural form imploded with \n or other special symbol).

Drupal core MUST provide methods for correct translation of plural forms to languages with non-English behavior of plural forms. Please, read also this chapter of gettext manual about plural forms: http://www.gnu.org/software/gettext/manual/gettext.html#Translating-plur...
Localization support is not full without it!

andypost’s picture

I stacked with edit forms... I see no ability to build a forms for translation without knowing a number of plural forms for language!

This is required for locale_translate_edit_form()

Possible solutions:
1) for each language core should store a number of plural form... seems hard to find them all
2) This form should assume nplurals=2 for languages that have {languages}.plurals = 0

From this point I think we need review one of locale mainteiners

plach’s picture

subscribe

andypost’s picture

#8 still applies with offset, Suppose we need to commit this quick fix and move this issue into D8

podarok’s picture

subscribe

andypost’s picture

Status: Needs work » Needs review

#8: 532512-locale-tests-d7.patch queued for re-testing.

Gábor Hojtsy’s picture

Issue tags: +D8MI

Tag for Drupal 8 Multilingual Initiative.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

No patch suggested to fix the issue, so marking as needs work.

I think the easiest solution would be to merge the singular/plural values in one string like l10n_server does (it concatenates them with \0). That is not exceptionally nice, BUT the singular+plural string is a unique identifier in .po files (that is what you see when you export the data and the same string appears in both the standalone singular form and the singular/plural form).

That would require format_plural() to do the same concatenation when looking up the strings, but the whole database storage system would be simplified a great deal. There would always be a one to one relationship from source strings to translation strings in the database, because the plural/singular differences would be encoded inside the string. (Also makes it much easier to produce an editing UI with all of them at once) Because we don't need to look up strings individually out of context we don't loose much.

Gábor Hojtsy’s picture

@andypost: would you be interested in helping with introducing a "serialized" plural storage just like in l10n_server?

andypost’s picture

Assigned: Unassigned » andypost

@Gábor Hojtsy I'm going to invest some time this weekend to d8mi. I'll take a storage from l10n_server 7.x branch

andypost’s picture

roadmap

  1. change schema
  2. upgrade hook
  3. ui change
  4. tests

EDIT postponed til this #361597: CRUD API for locale source and locale target strings
Having \0 as delimeter will cause a trouble with postgresql and sqlite #690746: Text column type doesn't reliably hold serialized variables

andypost’s picture

FileSize
1.35 KB

after some testing and investigation I propose to use "\03" Control_character which means End-of-text_character

There's a small module for D8 which I used to test searchability of text inside text field after delimeter

"\0" does not works:

So I start work with "\03"

It could be great if someone will test this module under mssql and oracle, see attachment
You could use it under D7 just change to core=7.x in .info file
- install the module
- navigate to /tdemo
- press "Generate"
- enter any part of strings in "Database" table and press "Search"
- results for searching "@count новых коментария" should return 1 line in result

podarok’s picture

#23
can we provide an option for selecting proper character depend on database type? via configuration option possibly with provided array of successfull selectors

Gábor Hojtsy’s picture

@podarok: that would make dumping and import/export pretty hard, right? If serialization does not work out, we can store the values in their own column. Ie. a plural field in the source table and plural1, plural2, etc fields in the translation table up to as many number of plurals as needed. It would be slower to search in (i.e. from the admin search screen we would need to look up based on possible matches in all columns), and we'd need to be able to define the max number of possible plurals.

podarok’s picture

that would make dumping and import/export pretty hard, right?

not if we will provide CRUDi API for that

Ie. a plural field in the source table and plural1, plural2, etc fields in the translation table up to as many number of plurals as needed.

id - langcode - plural1 - plural2 - plural3 -... - pluralN

right?
such table will not be huge, but very nice for indexing... possibly we can store it in one or few documents in NoSQL databases
also we will store multiple NULL data 8))

andypost’s picture

I think denormalization in this case will bring more troubles than advantages.

Most of time core make queries by source field and refactoring of format_plural() to query imploded string should be the same as quering ine string as it works for now. Also we could use hash column to query faster #851362: Add hash column to {locales_source} to query faster locale strings

right now I bit stacked with refactoring of format_plural() D8 stores number of plurals and formula in variable since #1323176: Decouple UI translation language information from language list schema
also this could cause more race conditions #746240: Race condition in locale() - duplicates in {locales_source}

Gábor Hojtsy’s picture

Yes, Drupal 8 will however does not offer the variable_* API anymore, it will use a new API introduced by the CMI (Config Management Initiative), so I think we should return to possible race conditions at that point. It is hard to predict now what kind of race conditions solutions that would provide (it depends on the storage it is going to use for the values and/or how we use that storage).

Gábor Hojtsy’s picture

@andypost: also, I think that postponing this on #361597: CRUD API for locale source and locale target strings is the wrong solution. By making plural storage simpler, the issue at #361597: CRUD API for locale source and locale target strings is made much simpler also. So solving the more complex problem there first and then coming back to simplify it does not seem like a good way forward. I think it would be great to fix this first.

andypost’s picture

The conclusion I came up now for format_plural() is t(implode(LOCALE_DELIMETER, array($singular, $plural))) and then explode the result and use the result.

I'm still working on schema upgrade path

Gábor Hojtsy’s picture

@andypost: that seems to reflect the gettext storage model for the string well, where it is stored using the singular and plural form both comprising the ID together.

andypost’s picture

@Gábor Hojtsy yes, but schema upgrade seems should be done in batch because a huge data amount stored in tables. But bootstrap require fixed storage so chicken-egg should be resolved in early install/update/fix requirements phase

Gábor Hojtsy’s picture

@andypost: can we somehow pick out the plural source/translation strings only. That should be a pretty small number of strings on a regular Drupal site and would not require a batch. That is if we identify those strings with a simple and quick query.

podarok’s picture

different storage for plurals?
exelent idea!

Gábor Hojtsy’s picture

Issue tags: +sprint

@andypost: any progress?

Adding the sprint tag to keep a closer tab on this.

Gábor Hojtsy’s picture

Issue tags: +language-ui

Adding UI language translation tag.

Gábor Hojtsy’s picture

Title: Autodetected plural strings are stored as singular » Plural string storage is broken, editing UI is missing

Hope this new title is better for the realized new scope of this issue. I'm not sure we need to solve all the issues here. Let me know if I can help you with anything.

andypost’s picture

Status: Needs work » Needs review
Issue tags: +Needs usability review
FileSize
9.62 KB
4.74 KB
11.07 KB
19.14 KB

Patch introduces:
1) Changed a format_plural.
2) Plural strings are stored merged with LOCALE_PLURAL_DELIMITER in locales_source.source and translation locales_target.translation probably we could drop plid and plural from locales_target table not sure that update function possible that could find broken strings (different issue)
3) Import and export are updated to use this storage. _locale_export_remove_plural() could be removed now we have no need in @count[n] strings.
4) admin UI changed to allow translate plural strings. For plurals strings I choose a vertical tabs with languages as tabs. For none-plural strings interface is a same. Form is changed so some tests could fail. The problem here that if there's no plural formula so no ability to determine a number of controls so fallback to singular and plural. This form needs usability review

locale_single.png


locale_plural.png


5) translation search screen is not touched but need some love because merged strings could confuse
locale_list.png

Let's see what testing could say...

Here's goes some background

Currently we have no plural formula and number of plurals for language added within admin interface this causes that we cant use plurals for newly added languages and can't build a edit forms because we don't know a number of fields needed for a language
This information stored in
variable 'locale_translation_plurals' = array('langcode if not en' => array('plurals' => N, 'formula' => '(n > 1)'))

We have exception for English language to not store it's plural information in this variable and fixed troubles caused in #1221276: locale_get_plural() only works for a single language within a request and does not work for English
and small fix already in #655048: Plural formula information blanked when importing a poorly-formed .po file

So I see 2 solutions for solving unknown number of plurals for language
1) Add to standards.inc a predefined list of plural formulas and numbers to copy them into variable after adding a language. That way we allow some contrib module to edit this information if needed and override on import .po file
2) assume that each language has 1 plural formula by default (there's a some languages see Plural forms) and build edit forms with this assumption. I found that Vietnamese has no plural formula because this language has only 1 normal form as the most Asian languages

Modify storage:
1) remove plid and plural from locales_target table because they are not used by core and store all translations imploded by LOCALE_PLURAL_DELIMITER
2) add a bit-flag is_plural to locales_source table indication that string is a plural or just explode loaded string with LOCALE_PLURAL_DELIMITER on load to determine is a string has plurals

We are already going to modify translation table in #1445004: Implement customized translation bit on translations

Status: Needs review » Needs work

The last submitted patch, 532512-plural-storage.patch, failed testing.

Gábor Hojtsy’s picture

Superb, great stuff. Looked at the patch and the changes seem to be very good. Some comments:

2) Plural strings are stored merged with LOCALE_PLURAL_DELIMITER in locales_source.source and translation locales_target.translation probably we could drop plid and plural from locales_target table not sure that update function possible that could find broken strings (different issue)

I think we'll need to somehow resolve the upgrade path in this issue? Do you intend to move that to a new issue? (Or just handling incorrect data from before the update?)

4) admin UI changed to allow translate plural strings. For plurals strings I choose a vertical tabs with languages as tabs. For none-plural strings interface is a same. Form is changed so some tests could fail. The problem here that if there's no plural formula so no ability to determine a number of controls so fallback to singular and plural.

I'm not a usability expert, but that the form looks totally differently for only-singular and singular/plural strings is a bit puzzling. Why not make it work the same way for both? (I think we'll want to move towards an l10n_server (localize.drupal.org-resembling) UI for Drupal 8 if we have the time for it, so I'd not delve into this UI much, but in case we don't have the time for that, we should at least have something consistent IMHO).

Currently we have no plural formula and number of plurals for language added within admin interface this causes that we cant use plurals for newly added languages and can't build a edit forms because we don't know a number of fields needed for a language

Yes, this is a long known limitation of Drupal. The overall plan for Drupal 8 is that integration with files downloaded from localize.drupal.org will be built-in (think we'd build in l10n_update in core essentially), so theoretically the window inbetween adding a language and getting a useful plural formula for it would be reduced to seconds if using that feature. Which means practically, languages would not exists for any reasonable time without a useful plural formula.

If we do want to add some default for plural formulas, we have a good directory of formulas available / setup on localize.drupal.org. We can export from that data, but given the above I'm not sure its worth the effort. We can default the formula to the most common one (nplurals=2; plural=(n!=1);), but then overwriting it would only be possible if you import .po files in overwrite mode, which you usually don't want.

Also, plural formulas look very much black magic to an everyday user. I built http://drupal.org/project/l10n_pconfig about two years ago to expose the setting for sites like localize.drupal.org where it is important to be editable. Yes, it is just a textfield but it is very-very techy and people should usually not touch it at all. (The module also has a built-in list of plural formulas that is not really synced (yet?) to localize.drupal.org's latest data with all the new languages added there).

So all-in-all I think leaving the initialization to the first .po file imported is fine, and the l10n_update integration should bring that much sooner. I think its ok to not let people enter translations for plurals until the formula is initialized (l10n_server does the same). So I'd throw an error on the translation page if there was no formula to build the translation UI on.

1) remove plid and plural from locales_target table because they are not used by core and store all translations imploded by LOCALE_PLURAL_DELIMITER

Yes.

2) add a bit-flag is_plural to locales_source table indication that string is a plural or just explode loaded string with LOCALE_PLURAL_DELIMITER on load to determine is a string has plurals

I think this could be useful for quick filtering purposes and was suggested for l10n_server before.

We are already going to modify translation table in #1445004: Implement customized translation bit on translations

I don't think we should worry about that, whichever gets in first will dictate that the other one needs to adapt to the first. These are two very separate features, so IMHO we should keep them separate.

All-in-all looks good, keep it up! Thanks!

andypost’s picture

Filed issue about UI #1452188: New UI for string translation
Going to fix some tests and proceed with VT implementation for UI in this patch because we definitely need a UI to continue work and allow core to translate plural strings

andypost’s picture

Status: Needs work » Needs review
FileSize
23.37 KB

seems tests should be fixed

andypost’s picture

Unified screen for editing - always using VT

locale_edit_single.png

andypost’s picture

last patch displays a language name as label for text area so fixed sceenshot

locale_edit_single_title.png

Gábor Hojtsy’s picture

@andypost: great, I think this looks good. Needs upgrade path and tests for the upgrade path. I don't think this needs usability review in light of #1452188: New UI for string translation.

andypost’s picture

I think we already have enough tests so just need to extend a current test-set to new storage, Tests for UI already fixed and needs to check new controls. Tests for import|export needs to be extended to detect plurals and their order.

The only possible upgrade path is to find a strings by their translations with plid or plural <> 0

OTOH no upgrade path needed because on D7->D8 upgrade to much strings are changed and translation needs re-roll. Also if strings are been translated with core's interface so there's no relation for plurals and upgrade gives nothing...

Gábor Hojtsy’s picture

Status: Needs review » Needs work

All right, first of all, very good patch. Very good! This is long overdue and it even follows Damien's thinking from mid-2009. Yeah! Once again I don't think we should fuss much about the UI here, since we have #1452188: New UI for string translation to turn the whole thing upside down anyway.

However, while I do agree that upgrades will need to re-import translations, there still can be translations for custom modules, etc., so if we can make an upgrade on a best effort basis to upgrade the plural translations (where it was stored in its ideal form), that would be superb. We'll not upgrade broken stuff, but we'll still go a great way. Tests for that would be great too. Let me know if you need help with any of that, I'm happy to help or find someone else to help! It would be basically about (1) adding an update function to migrate the data (2) add plural source strings and translations in core/modules/simpletest/tests/upgrade/drupal-7.language.database.php, (3) add tests in core/modules/simpletest/tests/upgrade/upgrade.language.test to check that the plural translations were upgraded to their new form properly in the DB.

Apart from that, only minor notes:

+++ b/core/includes/common.incundefined
@@ -161,6 +161,11 @@ const DRUPAL_CACHE_PER_PAGE = 0x0004;
 /**
+ * The deliditer used to split plural strings.
+ */
+const LOCALE_PLURAL_DELIMITER = "\03";

Would be good to document why \03 is the best choice? l10n_server uses \0 and above I think you uncovered msgfmt also uses that for .mo files, but in real testing above, you stated sqlite and PostgreSQL did not work with \0. So that would be good to document. In effect (a) what is \03 and (b) why is it a good choice.

+++ b/core/includes/common.incundefined
@@ -1717,27 +1722,35 @@ function format_xml_elements($array) {
+  // @todo Check this against a single plural form languages. (Vietnamese?)
+  if ($index == 0) {
+    // Singular form.
+    return $translated_array[0];
   }

I don't see a reason why this would not work with single plural formula languages. I think this @todo would need to be removed before the patch can land, right? :)

+++ b/core/includes/gettext.incundefined
@@ -416,29 +416,16 @@ function _locale_import_one_string($op, $value = NULL, $mode = NULL, $lang = NUL
+          // Sort plurals by their form number.
+          ksort($value['msgstr']);
+          // Join plurals forms in one string.
+          $value['msgstr'] = implode(LOCALE_PLURAL_DELIMITER, $value['msgstr']);

// Sort plural variants by their form index.

// Serialize plural variants in one string by LOCALE_PLURAL_DELIMITER.

+++ b/core/includes/gettext.incundefined
@@ -416,29 +416,16 @@ function _locale_import_one_string($op, $value = NULL, $mode = NULL, $lang = NUL
+        // A simple string to import.
+        $english = $value['msgid'];
+        $translation = $value['msgstr'];
+        _locale_import_one_string_db($report, $lang, isset($value['msgctxt']) ? $value['msgctxt'] : '', $english, $translation, $comments, $mode);

I don't think there is a point in naming $english and $translation at this point, we can just pass them on to the function as-is?

(The larger import API will be reworked, so don't worry about long lines here or anything like that).

+++ b/core/includes/gettext.incundefined
@@ -872,28 +838,19 @@ function _locale_import_parse_quoted($string) {
-    $result = db_query("SELECT s.lid, s.source, s.context, s.location, t.plid, t.plural FROM {locales_source} s LEFT JOIN {locales_target} t ON s.lid = t.lid ORDER BY t.plid, t.plural");
+    $result = db_query("SELECT s.lid, s.source, s.context, s.location FROM {locales_source} s");

Why is that we don't need the target table anymore?

+++ b/core/includes/gettext.incundefined
@@ -933,6 +890,12 @@ function _locale_export_po_generate($language = NULL, $strings = array(), $heade
+        // Store nplurals to not use check in export loop.

// Remember number of plural variants to optimize the export.

+++ b/core/includes/gettext.incundefined
@@ -933,6 +890,12 @@ function _locale_export_po_generate($language = NULL, $strings = array(), $heade
+        // Do not export plurals while there's no formula.

// Remember we did not have a plural number for the export.

+++ b/core/modules/locale/locale.pages.incundefined
@@ -278,13 +278,31 @@ function locale_translate_edit_form($form, &$form_state, $lid) {
+    $form['plural'] = array('#type' => 'value', '#value' => 0);
+    // Add original text to the top and some values for form altering.

Break the FAPI element to multiple lines please. The comment looks like was intended above the plural key? Maybe make it more specific:

// Add original text value and mark as non-plural.

+++ b/core/modules/locale/locale.pages.incundefined
@@ -307,22 +325,64 @@ function locale_translate_edit_form($form, &$form_state, $lid) {
+  // Store languages to iterate for validation and submition of the form.

submission

+++ b/core/modules/locale/locale.pages.incundefined
@@ -307,22 +325,64 @@ function locale_translate_edit_form($form, &$form_state, $lid) {
+  $form['translations'] = array('#type' => 'vertical_tabs', '#tree' => TRUE);

Break into multiple lines.

+++ b/core/modules/locale/locale.pages.incundefined
@@ -307,22 +325,64 @@ function locale_translate_edit_form($form, &$form_state, $lid) {
+        // Build a textarea for each plural form.

// Add a textarea for each plural variant.

+++ b/core/modules/locale/locale.pages.incundefined
@@ -307,22 +325,64 @@ function locale_translate_edit_form($form, &$form_state, $lid) {
+            '#title' => ($i == 0 ? t('Sigular form') : format_plural($i, '1 plural form', '@count plural form')),

Maybe make it a bit more friendly:

format_plural($i, 'First plural form', '@count. plural form')

+++ b/core/modules/locale/locale.pages.incundefined
@@ -349,9 +411,19 @@ function locale_translate_edit_form_validate($form, &$form_state) {
+    // Join all plural forms in one string.

// Serialize plural variants in one string by LOCALE_PLURAL_DELIMITER.

+++ b/core/modules/locale/locale.testundefined
@@ -793,7 +793,7 @@ class LocaleImportFunctionalTest extends DrupalWebTestCase {
     // The import should have created 7 strings.
-    $this->assertRaw(t('The translation was successfully imported. There are %number newly created translated strings, %update strings were updated and %delete strings were removed.', array('%number' => 9, '%update' => 0, '%delete' => 0)), t('The translation file was successfully imported.'));
+    $this->assertRaw(t('The translation was successfully imported. There are %number newly created translated strings, %update strings were updated and %delete strings were removed.', array('%number' => 8, '%update' => 0, '%delete' => 0)), t('The translation file was successfully imported.'));

Update the code comment too above? It was inaccurate before too :)

Gábor Hojtsy’s picture

Since you seem to have dropped the ball on this patch and are active on other issues, here is a fix with my suggestions from above. The attached patch solves my suggestions, but it does not add or change any functionality (interdiff also included).

The only place I had a question above on why we don't need the target table joined anymore I figured out myself. Basically, we stored the plural index info in target before, so we needed it joined, but now there is no reason to join it if we only need the source strings. Attached test passes the .po patches on my machine, should hopefully pass core tests.

I did look into the locale tests to see how much coverage there is for plurals, and the results are pretty said. They are imported in several .po files but then never checked for success (or attempted to be edited). So we need testing added there. That is coming up next.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
Gábor Hojtsy’s picture

Added plenty of test coverage for the plural functionality (since there was *none* before that actually checked it imported stuff or exported properly - not surprisingly since the export did not work unless you imported the sources).

The first test method does the following:

- import a French and Croatian translation file with plural translations
- check that the right translations are returned by format_plural() for proper counts (reworked/extended the plural index test)

The second one is back to back import/edit/add/export testing:

- import the same two .po files
- test that on export, the plurals retain their expected structure
- test that the plural shows up on the editing overview
- test that the plural has an editing screen with discrete input fields
- test modifying some of the translations
- add a new plural source string with format_plural() - this was not in the imported .po so this is the point where data/structure loss would have been happening up to before this patch
- save a complete set of translations for this new string
- export the whole thing again for both languages and check the edited and newly added strings are both in the structure expected

I think this is as full a test suite for import/add/edit/export/use-in-api for plurals as can be.

However, there is still no upgrade path or test for the upgrade path. That is still to do. @andypost, do you want to take it from here, or should I?

Gábor Hojtsy’s picture

For reference, here is the data representation for correct and for broken storage.

Data stored properly

Files imported (Hungarian, Croatian):

msgid "Monday"
msgstr "hétfő"

mgsid "1 byte"
msgid_plural "@count bytes"
msgstr[0] "1 bájt"
msgstr[1] "@count bájt"
msgid "Monday"
msgstr "Ponedjeljak"

mgsid "1 byte"
msgid_plural "@count bytes"
msgstr[0] "@count bajt"
msgstr[1] "@count bajta"
msgstr[2] "@count bajtova"

Imported results (source and target table):

lid source
146 Monday
1758 1 byte
1759 @count bytes
3140 @count[2] bytes
lid translation language plid plural
146 hétfő hu 0 0
146 Ponedjeljak hr 0 0
1758 1 bájt hu 0 0
1759 @count bájt hu 1758 1
1758 @count bajt hr 0 0
1759 @count bajta hr 1758 1
3140 @count[2] bajtova hr 1759 2

The source table maps count with additional numbers from 2 and up for the plural indexes. The target table maps translations to the source lid and the whole plural index/ordering system is maintained in the translation table (the main trigger for this issue). The parent lid's are stored in plid and the plural index counter is in plural. Singular values which do have plurals cannot be identified because they have the same characteristics as regular single translatable strings. So a singular string can only be identified by looking up if its lid is one of the plids.

Data stored broken

When the plural is not imported but just saved with format_plural() this information never ends up in the translation table (in part because only source strings are saved, in part because format_plural() will look strings up individually before the patch and will not send over plural information). So what we end up with is a broken state (if the above files are not imported, but we leave to format_plural() to fill in the source strings):

lid source
1 Monday
2 1 byte
3 @count bytes
4 @count[2] bytes

The same source strings are found per the plural formulas, but when translations are added on the UI, there is no plural information assigned, so:

lid translation language plid plural
1 hétfő hu 0 0
1 Ponedjeljak hr 0 0
2 1 bájt hu 0 0
3 @count bájt hu 0 0
2 @count bajt hr 0 0
3 @count bajta hr 0 0
4 @count[2] bajtova hr 0 0

All strings look like simple standalone strings. The @count[2] gives away there was likely some plural stuff there and @count bajta is likely the parent (but not necessarily, there can be a standalone string with the same text, gettext supports that).

Edit: so when we want to export this, we get (for Croatian):

msgid "Monday"
msgstr "Ponedjeljak"

mgsid "1 byte"
msgstr "@count bajt"

msgid "@count bytes"
msgstr "@count bajta"

msgid "@count[2] bytes"
msgstr "@count[2] bajtova"

Yes, all strings become individual and [2] will not be removed from the last count because it is only removed when Drupal knows it was a plural and that Drupal added that in there. This is visibly very far from what we want to have to work.

Solution proposed in this patch

The suggested patch reorganized these as the following:

lid source
1 Monday
2 1 byte\03@count bytes
lid translation language
1 hétfő hu
1 Ponedjeljak hr
2 1 bájt\03@count bájt hu
2 @count bajt\03@count bajta\03@count bajtova hr

Note that (1) we identify the source string with its full identifier as per gettext (msgid and msgid_plural combined), (2) we store the translations related to that compound base string (3) we don't need to hack the translations with @count[2] and such, (4) we don't need to use any clunky chained storage (and can loose the extra columns used for that). The compound storage is good because we want to look them up together more often than not, we need to let people edit them together (something we never provided, but is provided by this patch) and we need to import/export them together. Using \03 allows for the least interference for text search/lookups.

Upgrade path

For the upgrade path, we'll not be able to upgrade previously improperly stored plurals (middle scenario), but we should be able to update properly stored values (first scenario). We can safely drop the plural and plid columns because they had no useful value to debug this problem at the first place.

Gábor Hojtsy’s picture

Here is some Croatian sample data for writing an upgrade path / test. Feel free to replace with a different but similarly complex language if you want to :) The existing languages we had in the dump did not seem to be good targets for this testing this complexity (they did not have enough plural forms). I'll really leave this alone for now, so feel free to pick up again :)

Gábor Hojtsy’s picture

FileSize
586 bytes
38.23 KB

Meh, wrong table name in dump :) This one should work :)

Gábor Hojtsy’s picture

All right, given no other response, extended the patch with test coverage for the upgrade as well (additionally to the test coverage above to editing / data storage). Changes:

- Added sample translation for Catalan too (I pasted that in from localize.drupal.org, yes, it looks exactly like the English text, ask the Catalan people :)
- Added an upgrade function that enumerates all plurals, combines them and saved them according to the new storage; it also drops indexes and columns and recreates index where needed.
- Added tests in the upgrade path, so that it checks if the source and translations were all converted proper (we have a 2 variant and a 3 variant language, so nice variance for testing).

The upgrade tests pass locally. I think this is complete in terms of what we want to achieve, so let's get it reviewed, fixed if needed and then land :)

Status: Needs review » Needs work

The last submitted patch, 532512-plural-storage-54.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
46.86 KB

#1272840: Add upgrade path for language domains and validation landed in the meantime and used up our update function name. Rerolled :) (Even though #1222106: Unify language negotiation APIs, declutter terminology is likely going to land before this and we'll need to roll this again for another update function, but for now, we can prove it passes and works :).

Status: Needs review » Needs work

The last submitted patch, 532512-plural-storage-56.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
46.93 KB

And #1222106: Unify language negotiation APIs, declutter terminology landed too. I did not expect this to be happening so soon :) Should have waited more :) There are no immediate locale updates on the horizon, so it looks safe to reroll this now and let it be reviewed :) Passes for me locally for the upgrades at least.

Gábor Hojtsy’s picture

Issue tags: -Needs tests

Also should have all the tests needed.

andypost’s picture

Status: Needs review » Reviewed & tested by the community

Awesome!

@Gábor, thanx a lot! Now I know how to write upgrade tests!

+++ b/core/modules/locale/locale.installundefined
@@ -380,6 +366,122 @@ function locale_update_8004() {
+  $results = db_query("SELECT s.lid, s.source, s.context, s.location, t.translation, t.plid, t.plural, t.language FROM {locales_source} s LEFT JOIN {locales_target} t ON s.lid = t.lid WHERE s.lid IN (:lids) ORDER BY t.language, t.plid, t.plural", array(':lids' => $plural_lids));

Suppose we should remove and fetch DB faster
, s.context, s.location

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs review

Lets rtbc only if the tests came back green at minimum :) your suggestion for the query also looks good. Will add that in.

Status: Needs review » Needs work

The last submitted patch, 532512-plural-storage-58.patch, failed testing.

Sutharsan’s picture

FileSize
33.94 KB
32.48 KB

I experience no problem applying the patch. I do however fail to import a po file.
PDOException: SQLSTATE[42S22]: Column not found: 1054 Unknown column 'plid' in 'field list': INSERT INTO {locales_target} (lid, language, translation, plid, plural) VALUES (:db_insert_placeholder_0, ... ...) in _locale_import_one_string_db() (line 498 of /Users/erik/www/drupal8/core/includes/gettext.inc).
'plid' and 'plural' are still used in code.

I would suggest different interface text on the plural translation page. See attachment. But we could postpone this till #1452188: New UI for string translation.

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
49.09 KB

@Sutharsan: thanks for testing. Indeed I left over some plid/plural code in the DB import part. I did not run the plural tests locally, only the upgrade tests. Heh. Now removed those, ran both the plural and upgrade tests, and both ran fine for me, so it should be fine for .po imports definitely (the plural tests added with the patch are very extensive in that direction).

Also rolled against latest D8 checkout.

Please review/test/try again :)

Sutharsan’s picture

Import and export tested Ok with Dutch and Croatian. Now, waiting for the bot...

Gábor Hojtsy’s picture

Per @andypost's suggestion above, also removed the s.context and s.location columns from the query in the update as we don't need those pieces of information in the update at all.

Gábor Hojtsy’s picture

Any other concern, issue, feedback?

Gábor Hojtsy’s picture

#66: 532512-plural-storage-66.patch queued for re-testing.

Gábor Hojtsy’s picture

FileSize
49.71 KB
660 bytes

With added changelog entry. Please test/review and help move this forward unless you have any further issues with it. Don't leave it lingering around! :) Thank you!

andypost’s picture

Status: Needs review » Needs work
+++ b/core/modules/locale/locale.installundefined
@@ -380,6 +366,122 @@ function locale_update_8004() {
+  $results = db_query("SELECT lid, plid FROM {locales_target} WHERE plural != 0");

This should be changed to <> see #1226796: Simplify not equal operator '!=' is not supported by all databases, must be '<>'
Suppose plural >= 1 is better, everything other is RTBC

Gábor Hojtsy’s picture

Status: Needs work » Needs review
FileSize
49.71 KB
650 bytes

Changed != to <>. It is not possible for plural to be less than 0, so <> 0 and >= 1 should mean the same thing. However, <> 0 is what expresses that we are looking for items which have some plural denomination. This is explained in in the summary referenced from the update functions phpdoc.

andypost’s picture

Status: Needs review » Reviewed & tested by the community

Now it's good to go!!!

kalman.hosszu’s picture

Status: Reviewed & tested by the community » Needs review

I tested the patch, and I think it's OK!

kalman.hosszu’s picture

Status: Needs review » Reviewed & tested by the community

Sorry, just change the status to RTBC

xjm’s picture

YAY we are finally supporting grammatical Russian plurals?

+++ b/core/modules/locale/locale.installundefined
@@ -380,6 +366,122 @@ function locale_update_8004() {
+      // Drop the Drupal specific numbering scheme from the end of plural formulas.

Over 80 chars.

+++ b/core/modules/locale/locale.installundefined
@@ -380,6 +366,122 @@ function locale_update_8004() {
+  // Remove all plural lids from source and target, only keep ones which
+  // were originally used for the singular strings (now updated to contain
+  // the serialized version of plurals).

Wraps a bit early.

I'll reroll to fix these because this patch is awesome.

20 days to next Drupal core point release.

xjm’s picture

Couple other things so I don't forget:

+++ b/core/modules/locale/locale.installundefined
@@ -380,6 +366,122 @@ function locale_update_8004() {
+  // Collect all lids that are sources to plural variants.

Oops, a capitalization fix is needed here and in subsequent comments (so that we don't wonder why locale is goin' around putting lids on everything).

+++ b/core/modules/locale/locale.pages.incundefined
@@ -307,22 +331,68 @@ function locale_translate_edit_form($form, &$form_state, $lid) {
+        // Fallback for unknow number of plurals.

Typo here.

no_commit_credit’s picture

FileSize
3.83 KB
49.73 KB

The use of the word "remembers" in the comments is also a bit anthropomorphic, but I left it because I couldn't think of a nice way to fix it without rewriting half the comments. ;)

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Spent a good 30 minutes looking at this patch and it looks good. Committed to 8.x.

Gábor Hojtsy’s picture

Issue tags: -sprint

Thanks! Added changelog entry at http://drupal.org/node/1478334 - off the sprint with this :) And now onto lots of interesting things this made possible:

- #1189184: OOP & PSR-0-ify gettext .po file parsing and generation
- #1445004: Implement customized translation bit on translations and
- #1452188: New UI for string translation

Help if welcome there!

Pol’s picture

Status: Fixed » Needs review
FileSize
921 bytes
59.47 KB

Hi,

I've updated my Drupal 8 install and get an error (see screenshot). It happends when $plural_lids is empty.

The patch add a test and checks if the array is not empty before continuing:

  if (count($plural_lids) == 0) {
    return;
  }
Gábor Hojtsy’s picture

Status: Needs review » Needs work

Good catch! I'd use empty() though and remember we'd still need to make the database schema changes, so we cannot just drop out of the update function at that point just yet.

Pol’s picture

Ok I'll work on it now.

Pol’s picture

Here's the updated version. I hope it's the good way to do it.

Gábor Hojtsy’s picture

Status: Needs work » Needs review

Looks good on review. Lets run the tests too.

Gábor Hojtsy’s picture

I think this looks good. @andypost what do you think?

plach’s picture

+++ b/core/modules/locale/locale.install
@@ -386,89 +386,91 @@ function locale_update_8005() {
+  ¶

Fixed trailing whitespace, otherwise unchanged.

16 days to next Drupal core point release.

andypost’s picture

Status: Needs review » Reviewed & tested by the community

Suppose it's good patch looks big but it's purpose to skip string update if no plurals

+++ b/core/modules/locale/locale.installundefined
@@ -386,88 +386,90 @@ function locale_update_8005() {
+  if (!empty($plural_lids)) {
Gábor Hojtsy’s picture

Yes, it is just a huge block of indentation change really to wrap in that condition.

catch’s picture

Priority: Normal » Critical
Issue tags: +D8 upgrade path

If the existing upgrade tests didn't catch this, can we add some extra data so that they will? Looks like it would be necessary to have locale module enabled, but no plurals in the db unless I'm missing something.

Also bumping to critical since the ugprade path is broken.

catch’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests
andypost’s picture

Status: Needs work » Reviewed & tested by the community
Issue tags: -Needs tests

@catch Are you sure we need tests for code in hook_update? This is just bug - missed if() in code-flow

catch’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests

Yes, we need tests for updates, at least as much as most other places we have tests for.

While it's a missed if in this case, it means there's two code paths the upgrade can take and one isn't tested - we might find other bugs later on with the same update, or need to migrate the same data differently in a later update, so it's better to add the data now then it will cover any further changes as well.

Gábor Hojtsy’s picture

I agree with @catch. I think this would be easy to test also, we can add another test method that drops the translations before updating or just remove the plural translations. Then we'd get the empty set and fail before the patch and pass after.

Sutharsan’s picture

Status: Needs work » Needs review
FileSize
1.14 KB

Test added as Gábor proposed.
The first patch only contains the test and must fail.

Sutharsan’s picture

Second patch contains both the test and #86 and must pass.

Sutharsan’s picture

Another try with the first patch.

Status: Needs review » Needs work

The last submitted patch, locale_plural_string_storage_broken-532512-94-without-86.patch, failed testing.

Sutharsan’s picture

Status: Needs work » Needs review

#96 (test only) failed the test #95 (test + fix) passed the test. Both as expected. Status back to needs review.

Gábor Hojtsy’s picture

In the interest of making this test future compatible, I'd use the same condition as the update, "WHERE plural <> 0", so if we add any more plural translations for testing other things lets say, then this would still accurately test the condition of no plurals (although it would not fail, since we fix the issue in the meantime).

Gábor Hojtsy’s picture

Status: Needs review » Needs work
Sutharsan’s picture

Condition changed. Comment touch-ups.
First patch must fail. Second patch must pass.

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

Looks good. Note for reviewers: most of this patch is indenting lots of lines by two more spaces basically.

Sutharsan’s picture

Which is easy to see using the git-diff -b option.

catch’s picture

Status: Reviewed & tested by the community » Fixed

Thanks for the test addition.

I've committed/pushed this to 8.x (adding parentheses to locale_update_8005() in the comment before commit in case anyone spots that after this goes in).

klonos’s picture

This issue was originally filed against 7.x (got pushed to D8 about a year ago). Is there anything that we can/should do that will benefit D7? Did we introduce API changes?

catch’s picture

Title: Plural string storage is broken, editing UI is missing » Change notification for: Plural string storage is broken, editing UI is missing
Category: bug » task
Status: Fixed » Active

Hmm, there'll be a change for translators (the addition of the plural string delimiter), and there's also a schema change. These probably need a change notification - at least the delimiter could use one, it's not really an API change but it's something we could document as part of the upgrade path.

I'm not sure how doable a backport is or not, so re-opening only for the change notification for now, if people think this is doable though, feel free to move it back to 7.x for that.

Gábor Hojtsy’s picture

Title: Change notification for: Plural string storage is broken, editing UI is missing » Plural string storage is broken, editing UI is missing
Category: task » bug
Status: Active » Fixed

There was/is a change notification submitted as linked from the issue above. http://drupal.org/node/1478334 Please reopen if it does not look right. Since we needed to change storage models to support this, it is impossible to backport to D7 and be compatible with existing modules like l10n_update.

catch’s picture

I missed the change notification, that looks great though.

Agreed this looks to intrusive to backport but worth checking :)

Status: Fixed » Closed (fixed)
Issue tags: -Needs tests, -D8 upgrade path, -D8MI, -language-ui

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Link summary from below.

andypost’s picture

andypost’s picture

Issue summary: View changes

Looks we need split l10n_server storage for "before d8" #2833830-11: Translation for third plural form