Problem/Motivation

I'm currently testing translation support in Drupal 8.9.0. I'm using Norwegian Bokmål which is 100% translated for Drupal core. I have experienced that several strings are missing after importing. (This issue is not the same as #3124942: No translation updates for current version of project.)

We are importing translations in batches of 200. One translation is skipped per batch - roughly 50 strings if there are around 10000 translations in total. The reason is that PoStreamReader::readItem calls readLine (repeatedly) to get the next PO item. To be sure that the item is complete, you need to read one extra line (containing the msg ID of the next string). This isn't a problem inside one operation - the PoStreamReader keeps state, but when you start a new operation the state is lost.

Steps to reproduce

Proposed resolution

I have attached a patch the reports the buffer pointer (byte offset) after the last completed PO item - not the last read line, so we don't loose any items.

Importing Norwegian Bokmål without patch:

From log: Translations imported: 9594 added, 0 updated, 0 removed.
From language stats: Norwegian Bokmål 9594/10027 (95.68%)

Importing Norwegian Bokmål with patch:

From log: Translations imported: 9640 added, 0 updated, 0 removed.
From language stats: Norwegian Bokmål 9640/10068 (95.75%)

I think the reason we don't get 100% is the language list. The test above is done using drush si standard --locale=nb

Remaining tasks

User interface changes

None.

API changes

New public method: core/lib/Drupal/Component/Gettext/PoStreamReader::getSeekLastItem()

Data model changes

None.

Release notes snippet

CommentFileSizeAuthor
#53 3126929.45_53.interdiff.txt4.59 KBdww
#53 3126929-53.patch14.14 KBdww
#53 3126929-53.test-only.patch11.99 KBdww
#45 interdiff-3126929-43-45.txt9.1 KBhuzooka
#45 drupal-po_stream_reader_seek-3126929-45.patch13.86 KBhuzooka
#45 drupal-po_stream_reader_seek-3126929-45.patch13.86 KBhuzooka
#43 interdiff-3126929-37-43.txt3.59 KBhuzooka
#43 drupal-po_stream_reader_seek-3126929-43--fix-only.patch2.03 KBhuzooka
#43 drupal-po_stream_reader_seek-3126929-43--core-9.5.patch13.9 KBhuzooka
#43 drupal-po_stream_reader_seek-3126929-43.patch13.87 KBhuzooka
#41 3126929-nr-bot.txt3.1 KBneeds-review-queue-bot
#40 rerolldiff_37-40.txt1.27 KBayush.khare
#40 3126929-40.patch13.43 KBayush.khare
#37 drupal-3126929-0903-37-po-stream-reader-seek.patch13.5 KBsweetchuck
#36 drupal-3126929-0905-36-po-stream-reader-seek.patch13.51 KBsweetchuck
#35 drupal-3126929-0905-35-po-stream-reader-seek.patch13.46 KBsweetchuck
#34 drupal-3126929-0903-34-po-stream-reader-seek.patch13.45 KBsweetchuck
#28 drupal-3126929-0903-26-po-stream-reader-seek.patch13.44 KBsweetchuck
#25 3126929-25.patch5.93 KBjungle
#25 interdiff-21-25.txt1.78 KBjungle
#21 locale-import-translations-skipped-3126929-21.patch5.2 KBhansfn
#21 interdiff-3126929-19-21.txt1018 byteshansfn
#19 interdiff-3126929-15-19.txt2.5 KBhansfn
#19 locale-import-translations-skipped-3126929-19.patch5.2 KBhansfn
#15 interdiff-3126929-13-15.txt439 byteshansfn
#15 locale-import-translations-skipped-3126929-15.patch5.05 KBhansfn
#13 interdiff_11-13.txt1.68 KBneslee canil pinto
#13 3126929-13.patch5.05 KBneslee canil pinto
#11 3126929-11.patch5.05 KBeiriksm
#11 3126929-11-test-only.patch3.06 KBeiriksm
#7 3126929-7.patch4.8 KBeiriksm
#7 3126929-7-test-only.patch2.8 KBeiriksm
#2 locale-import-translations-skipped-3126929-1.patch1.99 KBhansfn

Issue fork drupal-3126929

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

hansfn created an issue. See original summary.

hansfn’s picture

hansfn’s picture

Status: Active » Needs review
gábor hojtsy’s picture

Wow what a find! How could we test this? Looks like a sample po file with 400 strings would suffice?

hansfn’s picture

Regarding testing: Any PO file with more strings than the chunk size, should do, yes. Maybe test with a smaller chunk size than 200, so the test isn't so slow.

jhodgdon’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests

In other words, this patch needs a test added. :) Great catch though!!

eiriksm’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
StatusFileSize
new2.8 KB
new4.8 KB

...and here is a test.

Very meta, we are checking for the translation of "Next" in Norwegian, on the "next page" :)

The last submitted patch, 7: 3126929-7-test-only.patch, failed testing. View results

jhodgdon’s picture

Status: Needs review » Needs work

The test fails as expected, and the test-plus-patch passes.

However, I did notice in the patch-plus-test that the test results show one coding standards failure:

core/lib/Drupal/Component/Gettext/PoStreamReader.php
line 44	Line indented incorrectly; expected 2 spaces, found 3

I was also wondering if maybe the test should verify that all of the strings in the test PO file were found, instead of just verifying the one that is skipped by the buggy current code? I think it might be good to add maybe 2 more strings to the PO file, and then verify that all of them got imported. This would ensure that another bug that maybe skips the first string in a batch or ?? who knows what, doesn't occur in the future.

jhodgdon’s picture

Oh, forgot to say THANK YOU ericsm for making the test!

eiriksm’s picture

Status: Needs work » Needs review
StatusFileSize
new3.06 KB
new5.05 KB

2 very good points, thanks!

Updated the PO file with 2 more X-Files related strings, and made the test verify all the strings in the file.

Also fixed the coding standard error and added some comments referring to this issue in the test. Sorry for no interdiff :(

The last submitted patch, 11: 3126929-11-test-only.patch, failed testing. View results

neslee canil pinto’s picture

StatusFileSize
new5.05 KB
new1.68 KB

Made readline to read line.

hansfn’s picture

Thank you for fixing the UTF-8 errors with the Norwegian characters - eiriksm normally has full control - but I don't agree with the comment change. We are calling the readLine function, not the "read line" function.

hansfn’s picture

StatusFileSize
new5.05 KB
new439 bytes

OK, I fixed the comment about readLine - added parentheses which is used other places very functions are referred to in comments.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

This looks great! Assuming that the test still passes (and I cannot see why it wouldn't, with a one-line docs-only change), this is RTBC. As noted above, the test-only patch failed in the expected way, and with the fix, it passes. The code and test are very clear. @hansfn tested the patch on a real site also.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

dww’s picture

Status: Reviewed & tested by the community » Needs work

Mostly looks great, thanks! A few concerns with the test:

  1. +++ b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    @@ -0,0 +1,79 @@
    + * Tests building the translatable project information.
    

    Is this true? Doesn't really seem to explain what this test is actually testing. Not sure.

  2. +++ b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    @@ -0,0 +1,79 @@
    +    // Import 2 batches, since this will reveal the bug.
    +    foreach (array_fill(0, 2, 'drupal') as $round_number) {
    

    A) What's 'drupal' got to do with it?
    B) What's $round_number got to do with it?

    If all we're doing is calling the inner code twice, I think this would be clearer:

    for ($i = 0; $i < 2; $i++) { 
    

    C) p.s. "will reveal the bug" is a little ambiguous. Future everyone would probably be grateful for a more specific explanation, so they don't have to git blame, do a bunch of archeology, find the issue nid and read it. E.g.:

    // Import 2 batches to ensure that subsequent imports properly seek in the .po file. 
    

    (or something... I'm only skimming the issue/patch).

  3. +++ b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    @@ -0,0 +1,79 @@
    +    foreach ($strings as $string) {
    +      $item = $this->container->get('database')
    +        ->select('locales_target', 'lt')
    +        ->fields('lt')
    +        ->condition('lt.translation', $string)
    +        ->execute()
    +        ->fetchAssoc();
    +      $this->assertNotEmpty($item);
    +    }
    

    If this fails, we're just going to see:

    "Failed asserting that a boolean is not empty."

    We have no way to know which loop iteration we were on that failed. Per #3131946: [policy] Remove PHPUnit assertion messages when possible, and standardize remaining messages for assertions in loops, we should include a custom assertion message so we know exactly what failed. E.g.:

    $this->assertNotEmpty($item, "Translated string $string should exist.");
    
hansfn’s picture

Status: Needs work » Needs review
StatusFileSize
new5.2 KB
new2.5 KB

Thx for the additional review, dww. The for loop puzzled me too ;-) I have tried to address all your comments.

jungle’s picture

Status: Needs review » Needs work

Two nitpicks:

  1. +++ b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    @@ -0,0 +1,83 @@
    +  public function setUp(): void {
    

    protected function setUp(): void

  2. +++ b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    @@ -0,0 +1,83 @@
    +   * Test that importing batches imports all translations.
    

    Tests that ...

hansfn’s picture

Status: Needs work » Needs review
StatusFileSize
new1018 bytes
new5.2 KB

Thanks :-) Do you mind do a quick review again?

jungle’s picture

Thank you @hansfn!

  1. The test-only patch in #11 reproduced the bug, that's good.
  2. All @dww's concerns in #18 were addressed in #19.
  3. +++ b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    @@ -0,0 +1,83 @@
    +  protected function setUp(): void {
    

    The void return type is supported since 7.1, the min PHP version in 8.x is 7.0.8.

    As this is a bug, I think it's worthy backporting to 8.x. So let's make a patch for 8.x without returning type hint.

    Then I'd +1 to RTBC this to let committer to make the decision to backport or not.

  4. +++ b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    @@ -0,0 +1,83 @@
    +    // Copy test po files to the translations directory.
    +    $this->filePath = \Drupal::service('file_system')->copy(__DIR__ . '/../../test.nb.po', 'public://', FileSystemInterface::EXISTS_REPLACE);
    ...
    +      'uri' => $this->filePath,
    

    Would suggest using vfsStream for tests. core/tests/Drupal/Tests/Component/Gettext/PoStreamWriterTest.php is an example of using it.

maximpodorov’s picture

Thanks for the patch. It solves the problem of the translation import I faced (msgctxt was ignored for some translations in my case).

tanubansal’s picture

Tested and can be moved to RTBC

jungle’s picture

Issue tags: +Global2020
StatusFileSize
new1.78 KB
new5.93 KB
  1. diff --git a/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    
    +++ b/core/modules/locale/tests/test.nb.po
    @@ -0,0 +1,27 @@
    +msgid ""
    

    The two files could be ignored from spelling check. core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php is a Test file, the other core/modules/locale/tests/test.nb.po is a test .po file.

    Or spelling check fails (The command to check spelling; cd core && yarn && yarn spellcheck:core)

  2. +++ b/core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
    @@ -0,0 +1,83 @@
    +    $this->filePath = \Drupal::service('file_system')->copy(__DIR__ . '/../../test.nb.po', 'public://', FileSystemInterface::EXISTS_REPLACE);
    

    Could be dirname(__DIR__, 2) . '/test.nb.po'

Addressing the above and #22.4, tagging "Global2020"

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

sweetchuck’s picture

hansfn’s picture

@Sweetchuck: Why did you add PoStreamReaderTest.php? Is that relevant for this issue?

Updating the existing patch in #25 for 9.3.x-dev is very useful. I would RTBC immediately.

sweetchuck’s picture

The actual bug this issue is about easily testable by pure unit test. No need for kernel test. If the kernel test is also there that is okay for me.

PoStreamReaderTest tests that if the PoStreamReader works correctly or not.

LocaleBatchImportTest tests that if the Gettext::fileToDatabase() uses the PoStreamReader correctly or not.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

bartlangelaan’s picture

Status: Needs review » Reviewed & tested by the community

Wow, this issue was mindblowing. I just couldn't figure out why a single string would not be translated, while the translation was in the po file.
Crazy that this bug exists, as it fails on everty 200th / 400th / 600th / ... item on all translation files. Probabily more users have this issue, but just can't figure out what is causing this.

We applied the patch from #28 and it fixes the bug perfectly.
Also, the tests seem to be testing exactly what they should.
LGTM!

sweetchuck’s picture

Re-roll patch #28 to the latest 9.3.x

sweetchuck’s picture

sweetchuck’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new13.51 KB
sweetchuck’s picture

StatusFileSize
new13.5 KB
pasqualle’s picture

Status: Needs review » Needs work

There is an issue when the po translation file contains 2 (or more) empty lines at the end.

https://git.drupalcode.org/project/drupal/-/blob/9.5.x/core/modules/loca...

if ($report['seek'] < filesize($file->uri)) {
The last empty line does not increase the seek value, and the filesize is bigger (by the number of extra empty lines at the end). The po file is re-started from the beginning (or form the last chunk?), an the import never finish.

Without this patch, the import (drush locale:update) does finish with files containing 2 empty lines at the end.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

ayush.khare’s picture

Status: Needs work » Needs review
StatusFileSize
new13.43 KB
new1.27 KB

Rerolled #37 for 10.1.x

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new3.1 KB

The Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

huzooka’s picture

Assigned: Unassigned » huzooka
huzooka’s picture

Assigned: huzooka » Unassigned
Status: Needs work » Needs review
StatusFileSize
new13.87 KB
new13.9 KB
new2.03 KB
new3.59 KB

I couldn't simplify the logic unfortunately, but this addresses #38 (and the edge case is added to both tests).

huzooka’s picture

Assigned: Unassigned » huzooka
Status: Needs review » Needs work

Sweetchuck suggested that it would be good to ensure that the line breaks at the end of the .po (fixture) files are preserved... (They might be stripped away by a properly configured IDE.) And he is right!

huzooka’s picture

Assigned: huzooka » Unassigned
Status: Needs work » Needs review
StatusFileSize
new13.86 KB
new13.86 KB
new9.1 KB

This patch moves both /core/modules/locale/tests/test.nb.po and /core/tests/fixtures/PoStreamReader/basic.hu.po into Nowdoc strings.

Side-effect: same patch can be applied on 10.1.x and 9.5.x 🙃

huzooka’s picture

huzooka’s picture

smustgrave’s picture

Status: Needs review » Needs work
Issue tags: +Needs Review Queue Initiative, +Needs issue summary update

Tried testing by importing Norwegian Bokmål and got 99.55%

The issue summary should be updated with the default template, what's the proposed solution? remaining tasks? any api changes?

dww’s picture

Issue summary: View changes
Status: Needs work » Needs review
Issue tags: -Needs issue summary update +Bug Smash Initiative

Tried to give this a proper summary.

I haven't closely reviewed all the new test coverage, so not RTBC'ing, but moving back to NR...

smustgrave’s picture

Is this still an issue though? I got up to 99% without the fix when the patch mentions I should get to 95

I Can retest also if there’s a missing step

dww’s picture

Yeah, not totally sure about steps to reproduce manually and how to really see the bug in obvious ways. Back in #11 we had a failing test-only (vs. 8.9.x). Perhaps we should do another test-only here to see if all the tests still fail in 10.1.x without the fix. 😅

Also, removing credit from a couple of 1-off rerolls that didn't work and were then abandoned by the author. 😢

dww’s picture

Yeah, locally, the new Kernel test still fails in 10.1.x:

% ./vendor/bin/phpunit -c core/phpunit.xml core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.php
PHPUnit 9.5.26 by Sebastian Bergmann and contributors.

Testing Drupal\Tests\locale\Kernel\LocaleBatchImportTest
F                                                                   1 / 1 (100%)

Time: 00:01.324, Memory: 10.00 MB

There was 1 failure:

1) Drupal\Tests\locale\Kernel\LocaleBatchImportTest::testImportBatches
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
     3 => 'Brukergrensesnitt'
     4 => 'Tittel'
     5 => 'Brødtekst'
-    6 => 'Neste'
-    7 => 'Små grønne menn'
-    8 => 'Verten'
+    6 => 'Små grønne menn'
+    7 => 'Verten'
 )

The new Unit test also fails, but it's less interesting: 😂

1) Drupal\Tests\Component\Gettext\PoStreamReaderTest::testReadItemsInMultipleSequence
Error: Call to undefined method Drupal\Component\Gettext\PoStreamReader::getSeekLastItem()

But if I restore the fix, then intentionally break PoStreamReader::getSeekLastItem(), the unit test fails more interestingly:

There was 1 failure:

1) Drupal\Tests\Component\Gettext\PoStreamReaderTest::testReadItemsInMultipleSequence
PO item 3
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-'msgid "id-03"
-msgstr "str-03"
+'msgid ""
+msgstr ""
+"Project-Id-Version: Drupal core (9.2.7)\n"
+"POT-Creation-Date: "
+"2021-10-27 10:29+0000\n"
+"PO-Revision-Date: YYYY-mm-DD "
+"HH:MM+ZZZZ\n"
+"Language-Team: Hungarian\n"
+"MIME-Version: "
+"1.0\n"
+"Content-Type: text/plain; "
+"charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: "
+"nplurals=2; plural=(n!=1);\n"
+""

 '

So I'm 99.9% sure this is still a bug, but the steps to manually reproduce it could be better explained. 😅 Tagging for that.

dww’s picture

StatusFileSize
new11.99 KB
new14.14 KB
new4.59 KB

Grrr, I just did a new fairly long dreditor review, which refused to paste. 😢 But maybe it's for the best, since I decided it was easier to just fix everything and upload a new patch/interdiff than to re-do the review. 😂 Hopefully the interdiff mostly speaks for itself.

While I'm at it, here's a new test-only.

dww’s picture

This is the only hunk of the interdiff that might need explanation:

+++ b/core/tests/Drupal/Tests/Component/Gettext/PoStreamReaderTest.php
@@ -232,14 +232,7 @@
-    $item = $reader->readItem();
-    $this->assertSame(
-      $expected['items'][5],
-      (string) $item,
-      'PO item 5',
-    );
-
-    for ($i = 6; $i < count($expected['items']); $i++) {
+    for ($i = 5; $i < count($expected['items']); $i++) {

The removed lines are identical to the body of the for() loop. I saw no reason to do item 5 manually, then a loop for 6 and up. So I folded item 5 into the loop.

The last submitted patch, 53: 3126929-53.test-only.patch, failed testing. View results

smustgrave’s picture

Status: Needs review » Needs work

Moving to NW for an answer to #54 and steps to reproduce.

Even with the patch I still only get to 99%

Going to ping the subsystem maintainer too

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

giuseppe87 made their first commit to this issue’s fork.

nicxvan’s picture

Priority: Normal » Major

This feels major