Problem/Motivation
I'm currently testing translation support in Drupal 8.9.0. I'm using Norwegian Bokmål which is 100% translated for Drupal core. I have experienced that several strings are missing after importing. (This issue is not the same as #3124942: No translation updates for current version of project.)
We are importing translations in batches of 200. One translation is skipped per batch - roughly 50 strings if there are around 10000 translations in total. The reason is that PoStreamReader::readItem calls readLine (repeatedly) to get the next PO item. To be sure that the item is complete, you need to read one extra line (containing the msg ID of the next string). This isn't a problem inside one operation - the PoStreamReader keeps state, but when you start a new operation the state is lost.
Steps to reproduce
Proposed resolution
I have attached a patch the reports the buffer pointer (byte offset) after the last completed PO item - not the last read line, so we don't loose any items.
Importing Norwegian Bokmål without patch:
From log: Translations imported: 9594 added, 0 updated, 0 removed.
From language stats: Norwegian Bokmål 9594/10027 (95.68%)
Importing Norwegian Bokmål with patch:
From log: Translations imported: 9640 added, 0 updated, 0 removed.
From language stats: Norwegian Bokmål 9640/10068 (95.75%)
I think the reason we don't get 100% is the language list. The test above is done using drush si standard --locale=nb
Remaining tasks
User interface changes
None.
API changes
New public method: core/lib/Drupal/Component/Gettext/PoStreamReader::getSeekLastItem()
Data model changes
None.
Release notes snippet
| Comment | File | Size | Author |
|---|---|---|---|
| #53 | 3126929-53.patch | 14.14 KB | dww |
| #53 | 3126929-53.test-only.patch | 11.99 KB | dww |
Issue fork drupal-3126929
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
hansfn commentedComment #3
hansfn commentedComment #4
gábor hojtsyWow what a find! How could we test this? Looks like a sample po file with 400 strings would suffice?
Comment #5
hansfn commentedRegarding testing: Any PO file with more strings than the chunk size, should do, yes. Maybe test with a smaller chunk size than 200, so the test isn't so slow.
Comment #6
jhodgdonIn other words, this patch needs a test added. :) Great catch though!!
Comment #7
eiriksm...and here is a test.
Very meta, we are checking for the translation of "Next" in Norwegian, on the "next page" :)
Comment #9
jhodgdonThe test fails as expected, and the test-plus-patch passes.
However, I did notice in the patch-plus-test that the test results show one coding standards failure:
I was also wondering if maybe the test should verify that all of the strings in the test PO file were found, instead of just verifying the one that is skipped by the buggy current code? I think it might be good to add maybe 2 more strings to the PO file, and then verify that all of them got imported. This would ensure that another bug that maybe skips the first string in a batch or ?? who knows what, doesn't occur in the future.
Comment #10
jhodgdonOh, forgot to say THANK YOU ericsm for making the test!
Comment #11
eiriksm2 very good points, thanks!
Updated the PO file with 2 more X-Files related strings, and made the test verify all the strings in the file.
Also fixed the coding standard error and added some comments referring to this issue in the test. Sorry for no interdiff :(
Comment #13
neslee canil pintoMade readline to read line.
Comment #14
hansfn commentedThank you for fixing the UTF-8 errors with the Norwegian characters - eiriksm normally has full control - but I don't agree with the comment change. We are calling the readLine function, not the "read line" function.
Comment #15
hansfn commentedOK, I fixed the comment about readLine - added parentheses which is used other places very functions are referred to in comments.
Comment #16
jhodgdonThis looks great! Assuming that the test still passes (and I cannot see why it wouldn't, with a one-line docs-only change), this is RTBC. As noted above, the test-only patch failed in the expected way, and with the fix, it passes. The code and test are very clear. @hansfn tested the patch on a real site also.
Comment #18
dwwMostly looks great, thanks! A few concerns with the test:
Is this true? Doesn't really seem to explain what this test is actually testing. Not sure.
A) What's 'drupal' got to do with it?
B) What's $round_number got to do with it?
If all we're doing is calling the inner code twice, I think this would be clearer:
C) p.s. "will reveal the bug" is a little ambiguous. Future everyone would probably be grateful for a more specific explanation, so they don't have to git blame, do a bunch of archeology, find the issue nid and read it. E.g.:
(or something... I'm only skimming the issue/patch).
If this fails, we're just going to see:
"Failed asserting that a boolean is not empty."
We have no way to know which loop iteration we were on that failed. Per #3131946: [policy] Remove PHPUnit assertion messages when possible, and standardize remaining messages for assertions in loops, we should include a custom assertion message so we know exactly what failed. E.g.:
Comment #19
hansfn commentedThx for the additional review, dww. The for loop puzzled me too ;-) I have tried to address all your comments.
Comment #20
jungleTwo nitpicks:
protected function setUp(): void
Tests that ...
Comment #21
hansfn commentedThanks :-) Do you mind do a quick review again?
Comment #22
jungleThank you @hansfn!
The void return type is supported since 7.1, the min PHP version in 8.x is 7.0.8.
As this is a bug, I think it's worthy backporting to 8.x. So let's make a patch for 8.x without returning type hint.
Then I'd +1 to RTBC this to let committer to make the decision to backport or not.
Would suggest using vfsStream for tests.
core/tests/Drupal/Tests/Component/Gettext/PoStreamWriterTest.phpis an example of using it.Comment #23
maximpodorov commentedThanks for the patch. It solves the problem of the translation import I faced (msgctxt was ignored for some translations in my case).
Comment #24
tanubansal commentedTested and can be moved to RTBC
Comment #25
jungleThe two files could be ignored from spelling check.
core/modules/locale/tests/src/Kernel/LocaleBatchImportTest.phpis a Test file, the othercore/modules/locale/tests/test.nb.pois a test .po file.Or spelling check fails (The command to check spelling;
cd core && yarn && yarn spellcheck:core)Could be
dirname(__DIR__, 2) . '/test.nb.po'Addressing the above and #22.4, tagging "Global2020"
Comment #28
sweetchuckComment #29
hansfn commented@Sweetchuck: Why did you add PoStreamReaderTest.php? Is that relevant for this issue?
Updating the existing patch in #25 for 9.3.x-dev is very useful. I would RTBC immediately.
Comment #30
sweetchuckThe actual bug this issue is about easily testable by pure unit test. No need for kernel test. If the kernel test is also there that is okay for me.
PoStreamReaderTest tests that if the PoStreamReader works correctly or not.
LocaleBatchImportTest tests that if the Gettext::fileToDatabase() uses the PoStreamReader correctly or not.
Comment #33
bartlangelaanWow, this issue was mindblowing. I just couldn't figure out why a single string would not be translated, while the translation was in the po file.
Crazy that this bug exists, as it fails on everty 200th / 400th / 600th / ... item on all translation files. Probabily more users have this issue, but just can't figure out what is causing this.
We applied the patch from #28 and it fixes the bug perfectly.
Also, the tests seem to be testing exactly what they should.
LGTM!
Comment #34
sweetchuckRe-roll patch #28 to the latest 9.3.x
Comment #35
sweetchuckComment #36
sweetchuckComment #37
sweetchuckComment #38
pasqualleThere is an issue when the po translation file contains 2 (or more) empty lines at the end.
https://git.drupalcode.org/project/drupal/-/blob/9.5.x/core/modules/loca...
if ($report['seek'] < filesize($file->uri)) {The last empty line does not increase the seek value, and the filesize is bigger (by the number of extra empty lines at the end). The po file is re-started from the beginning (or form the last chunk?), an the import never finish.
Without this patch, the import (drush locale:update) does finish with files containing 2 empty lines at the end.
Comment #40
ayush.khare commentedRerolled #37 for 10.1.x
Comment #41
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #42
huzookaComment #43
huzookaI couldn't simplify the logic unfortunately, but this addresses #38 (and the edge case is added to both tests).
Comment #44
huzookaSweetchuck suggested that it would be good to ensure that the line breaks at the end of the .po (fixture) files are preserved... (They might be stripped away by a properly configured IDE.) And he is right!
Comment #45
huzookaThis patch moves both
/core/modules/locale/tests/test.nb.poand/core/tests/fixtures/PoStreamReader/basic.hu.pointo Nowdoc strings.Side-effect: same patch can be applied on 10.1.x and 9.5.x 🙃
Comment #46
huzookaComment #47
huzookaComment #48
smustgrave commentedTried testing by importing Norwegian Bokmål and got 99.55%
The issue summary should be updated with the default template, what's the proposed solution? remaining tasks? any api changes?
Comment #49
dwwTried to give this a proper summary.
I haven't closely reviewed all the new test coverage, so not RTBC'ing, but moving back to NR...
Comment #50
smustgrave commentedIs this still an issue though? I got up to 99% without the fix when the patch mentions I should get to 95
I Can retest also if there’s a missing step
Comment #51
dwwYeah, not totally sure about steps to reproduce manually and how to really see the bug in obvious ways. Back in #11 we had a failing test-only (vs. 8.9.x). Perhaps we should do another test-only here to see if all the tests still fail in 10.1.x without the fix. 😅
Also, removing credit from a couple of 1-off rerolls that didn't work and were then abandoned by the author. 😢
Comment #52
dwwYeah, locally, the new Kernel test still fails in 10.1.x:
The new Unit test also fails, but it's less interesting: 😂
But if I restore the fix, then intentionally break
PoStreamReader::getSeekLastItem(), the unit test fails more interestingly:So I'm 99.9% sure this is still a bug, but the steps to manually reproduce it could be better explained. 😅 Tagging for that.
Comment #53
dwwGrrr, I just did a new fairly long dreditor review, which refused to paste. 😢 But maybe it's for the best, since I decided it was easier to just fix everything and upload a new patch/interdiff than to re-do the review. 😂 Hopefully the interdiff mostly speaks for itself.
While I'm at it, here's a new test-only.
Comment #54
dwwThis is the only hunk of the interdiff that might need explanation:
The removed lines are identical to the body of the
for()loop. I saw no reason to do item 5 manually, then a loop for 6 and up. So I folded item 5 into the loop.Comment #56
smustgrave commentedMoving to NW for an answer to #54 and steps to reproduce.
Even with the patch I still only get to 99%
Going to ping the subsystem maintainer too
Comment #61
nicxvan commentedThis feels major