We have experimented with freetm.com, a free tool that utilizes translation memories for the translation. It can handle txt files, but the unnecessary newlines split the sentences into smaller parts, making translation memories less useful. If we want to comply with Drupal coding standards, and keep lines under 80 char, then the newline characters will have to be inserted anyway after the translation process, regardless of whether we removed the newlines from the original text or not. In order to create useful translation memories we need to remove the newlines that split the sentences.

Comments

balagan created an issue. See original summary.

balagan’s picture

Starting to work on it going backwards alphabetically.

balagan’s picture

As I see in the case of tables we are already over the 80 char limit. If we remove the newlines from the middle of the sentences in the original text, then we would not have to redo this removal if the source text changes. Any thoughts on this Jennifer?

balagan’s picture

StatusFileSize
new75.05 KB

Removed the newlines that broke sentences into two lines from the last 20 files.
Can this be reviewed or should I create more smaller patches?

eojthebrave’s picture

Our current guidelines say that text should wrap at 80 characters. Like the Drupal coding standard.

However, in my opinion, if this makes translation significantly easier I'm willing to give it a shot. Do you think translation memories like this are something that other translation teams might want to use as well? It's not a concept I'm familiar with. If so, we could consider extending this to all languages of the guide and just remove unnecessary newlines.

As for reviewing this patch, and making the call on wether or not to proceed with removing newlines for all HU files I'll leave that decision to @Balu Ertl who is the current lead for Hungarian translations. Balu, I think whatever makes the most sense here for you and the others working with you is going to be fine for me. And while I can't say for sure, I'm guessing Jennifer would be open as well.

balagan’s picture

I forgot to mention, that previously we agreed with balu to remove the newlines. Translation memories (TM) keep track of already translated sentence pairs. They offer suggestions above a certain level of similarity, which is usually 70%. Translated sentences and terms can be searched via the concordance function. This really helps translators to be consequent in the translation. Using a centralized TM like freetm.com lets translators see each other's translation instantly, so if one file refers to the title of the other, it can be searched via concordance. I admit, there will be a little learning curve of the tool, but we do all this not only to make our job easier, but to pioneer the work for other teams too, so a step-by-step instruction for this whole process is under development. Freetm.com also handles glossaries, so if a team agrees to certain terminology, they can create glossaries, and if a certain term occurs in a sentence under translation, it will be shown in the glossary window.
Also, if the source changes, TMs make retranslation of the files easy, previously translated sentences can be inserted via a key combination. Recently I have used freetm.com to translate yaml files for the drupal console project, and although it does not support .po files directly, I think with applying the same process that we use here .po files can be translated too, so this experience might have further advantages too.

balagan’s picture

StatusFileSize
new81.54 KB

Another 21 files without newlines in sentences.

balagan’s picture

StatusFileSize
new66.58 KB

Another batch (16 files) without newlines in the middle of sentences.

eojthebrave’s picture

Sounds like a potentially really useful tool to me. Thanks for the explanation.

eojthebrave’s picture

I opened a new issue to discuss removing these newlines, and the 80 character wrapping standard, from the english versions of all the content as well so that future translation teams will not need to go through this process of removing them all. #2813431: Remove requirement to wrap text at 80 characters

balagan’s picture

StatusFileSize
new115.91 KB

Newlines removed from files from extend-theme-install.txt to planning-modular.txt.

balagan’s picture

StatusFileSize
new90.96 KB

The last batch of files withouth newlines in sencences.

baluertl’s picture

As we discussed on #2813431: Remove requirement to wrap text at 80 characters, here we submit the patches prepared to remove all the newlines from paragraphs and sentences in favor of converting them to TUs used with TM. First I applied balagan's 5 patches (see them above), then I checked through also and complemented the missed ones too. As similar to his patches, I also separated them into multiple files for easier review. Following Joe's advice I plan to push them up to the repository yet today.

  • Balu Ertl committed fa8dfae on 8.x-2.x
    Issue #2813089 by balagan, Balu Ertl: Remove newlines from sentences in...
baluertl’s picture

Status: Needs review » Fixed
StatusFileSize
new23.51 KB

Both sets of patches (balagan's & Balu Ertl's) committed and pushed up to the repository under /source/hu directory, so setting issue status to Fixed. For future reference here's a quick comparison of the before & after working with TM:

Edited screenshots before and after newline removal

As you can see, the word "installer" (highlighted with yellow) falls after the 80 character limit, thus breaks into a new line. However the Hungarian translation does not make it possible, to align only this single word to the unit #15, as it would have no sense. But after joining it up to unit #14, the entire sentence can be handled together.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

balagan’s picture

I used tikal from the okapi framework to convert the files to xliff, and I have just realized, that there is a possibility to create custom segmentation rules for tikal. In my rule I just removed \n from a rule, and new lines will not generate a new segment anymore, so this whole issue is not an issue anymore.

jhodgdon’s picture

OK, that sounds good!

So. It seems like we should close
#2813431: Remove requirement to wrap text at 80 characters
because it is not needed?

And also maybe we should add a section to the proposed text on
#2828137: Add section about using Translation Memory to contributor guide
about how to get the files into the TM? So far the proposed text only covers setting up the TM.

I'll add notes to both of those issues...