Although we have cleared up Core so far the German translation is still not exportable if you use the all releases merged option when Core is selected. It still leads to a 500 error. It would be great if this could be fixed.

Reproduce:

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

Joachim Namyslo created an issue. See original summary.

drumm’s picture

This is due to the process running out of memory in l10n_server/l10n_community/export.inc, which is

  $result = $query->execute();

I’m not seeing any quick wins, other than increasing the memory limit, which would be a temporary solution.

This probably would need to be rewritten to batch and stream the output to really solve the issue.

Would a multi-select list of releases be useful, for example for getting all 8.x or 8.6.x releases only?

joachim namyslo’s picture

What may be useful would be an option to export strings by active branches like 7.x merged or 8.x merged. Not sure, if this is possible and easy enough to implement.

The rest is already possible and should work fine because releases like 8.4 are not as big as 8.x, guess the server will not run out of memory if a small part of the main branch is exported

drumm’s picture

Project: localize.drupal.org » Localization server
Component: Infrastructure » Code

Moving to the l10n_server project, where the code for this lives. If this is not an immediate blocker, I’d rather not raise the memory limit, which is already somewhat high.

I’m not seeing any deduping or grouping across releases, so I think each version of Drupal 8.x.x is adding an approximately equal amount of memory usage.

gábor hojtsy’s picture

It may be possible that the code naively dedupes in PHP arrays. @Joachim can you or someone else in the German team help look into l10n_server/l10n_community/export.inc in https://www.drupal.org/project/l10n_server and optimize this? That would be a great contribution to all using the system.

joachim namyslo’s picture

I can use Drupal but I am not able to code Drupal. It's a pity. But I'll redirect this issue to as many people as possible thanks to drupalchat.me Maybe this will get an update in the near feature.

c-logemann’s picture

Assigned: Unassigned » c-logemann

I invest some time to solve this problem and already started to set up a local l10n Server for testing.
After getting stopped by an installation issue (#3000918: PDOException on installing sub module "l10n_drupal") I'm currently importing some translation data to test export functionality.

c-logemann’s picture

I think it would be a good first step to wrap the database call and data build process with a batch process where each chunk is related to one single release. This won't be a big change.

Based on this solution it will be easy to also change the single to a multi selection as suggested in comment #2 or bundle all releases of a major version as suggested in comment #3. Because combining #2 and #3 would end in a bigger complexity it would be fine to make a decision for #2 or #3. I think #3 would be a good way to avoid to many uses of "all" option when people only translate the last active major versions. So maybe we should also skip the "all" option as default and maybe change to the most recent release or the most recent major version.

c-logemann’s picture

Title: German translation is not exportable » Rewrite export.inc to avoid memory problems and timeout on export

On my local tests I also got timeout problems which will also be solved with a batch solution.

After chatting with @Joachim Namyslo I think I will focus on #3 to also implement this feature request of #3000298: Clear up translations and give us some more options with Drupal 9.

c-logemann’s picture

Priority: Normal » Major

Because the export all translations of core feature unusable I set the priority to Major.

c-logemann’s picture

To realize "bundle all releases of a major version" I would like to change the function "l10n_community_export()" to receive a string on the second value "$release" instead of an integer. The responding comment change would be:

 * @param $release string
 *   Release string contains "rid-*" to generate tarball for a single rid, 
 *   "all-*" for all releases of a major version, or "all" for all releases
 *   of all major versions considered.

Because this is an API change which is possibly used by other modules I like to get a response from the maintainers if I can move on this simple way. Solving this issue on a more compatible way take more time.

c-logemann’s picture

Ok, I just realized that I can create a new function with strings and change old one for sending strings to the new one. So I can move on without response from the maintainers.

c-logemann’s picture

FYI: I have finished more than 50% of the fix and try to finish in next week.

c-logemann’s picture

Status update: I changed batch API with every release as chunk. But now the data which is temporary stored in the database is bigger and I needed to increase innodb_log_file_size on my local dev system. Maybe it would be better write out in file system more than once.

gábor hojtsy’s picture

A temporary file could be created with the batch carrying the file ID/name?

c-logemann’s picture

@Gabor: In the current data concept we have header information based on all data. When this data is really needed I see two options:

  1. Creating a second temp file at the end and merge with the data file.
  2. Doing a second DB query at the beginning to retrieve this meta data and start the big temp file with this header information.

I currently prefer the second one because I only see good operation system commands to merge files. For my own servers I would do this but it's not a good solution for a contrib module.

c-logemann’s picture

It's the header "PO-Revision-Date". When we can skip this information or maybe only fill based on the newest release to create the header on the first batch it would be easier to solve this.

c-logemann’s picture

This weekend I was working again on this issue and already finished the solution "1" with creating a second tempfile to merge with header information. But now I need to organize the serving of the PO file. The old method with a direct hand over doesn't work with batch API as described on stackoverflow. This needs to fixed maybe in a direction described in this blog post by Jeff Gerrling. Maybe we can avoid an additional page and handle this on the main form page via redirect and the tempfile name as uri argument.

c-logemann’s picture

c-logemann’s picture

On Contribution Weekend in FFM/Germany I discussed my current strategy
with @kfritsche. I got some good inspirations how to solve the last steps. I try to present a patch tomorrow.

c-logemann’s picture

The new plan for the last step is to move the temp file into a download folder and present the link to it via message. Additionally the files in the download folder needs to be deleted by cron.

c-logemann’s picture

Status: Active » Needs review
StatusFileSize
new16 KB

Here is a first patch to review and discuss. The process is still very slow on "all" export. Maybe we need more data splitting e.g. another chunk logic to get it faster.

When the download solution is selected the cron based cleanup still needs to be added. To prepare I already placed a new function l10n_community_directory() in ".module" file.

I added a unique id to download filename. I kept the ".po" or ".pot" suffix logic and realized that ".po" files are currently blocked by the default drupal ".htaccess". This also needs to be discussed and solved when we present the result in this way. For testing the patch just remove "po" from the "FilesMatch" directive.

gábor hojtsy’s picture

Now this even times out for exporting one specific current release of core, does not need to be all releases merged. Noted by @svenreyen.

c-logemann’s picture

Assigned: c-logemann » Unassigned
svenryen’s picture

Those looking to export a .pot from l.d.o and getting a timeout can also check out this module, and generate the file locally: https://www.drupal.org/project/potx

drumm’s picture

Version: 7.x-1.x-dev » 3.0.x-dev
Status: Needs review » Needs work

This was reported via Slack at https://drupal.slack.com/archives/C51GNJG91/p1657399501531629. They found a workaround, and since the 3.0.x branch is under active development, let’s fix it for 3.0.x.

(I’m working on resolving the 7.x-1.x test failures.)

shmy’s picture

Assigned: Unassigned » shmy

Was able to reproduce it on current 3.0.x-dev and i'm working on a fix.

shmy’s picture

StatusFileSize
new7.67 MB

Unfortunately the export of a project, that consists of a large code base, like core, has major issues. The generation takes quiet some time and the produced
translation files are very large when the Include metadata (Verbose output on 7.x-1.x)) option is enabled. They have a header that includes a Generated from files listing. These files are prefixed with the version string when releases are merged.

On my local i've run several exports of the German translation of Core where almost all releases, that are listed on the core's update endpoint, have been parsed. That's ~630 releases, starting from 4.5.0.

The results are:
- All flags enabled (Download untranslated and translated strings + Inject German suggestions? + Include metadata):
The generation took 36min and produced a 337MB file.
The above mentioned file list is 724084 lines long where most of them look like this: # drupal-10.3.x-dev/core/themes/starterkit_theme/starterkit_theme.info.yml: n/a. I've seen only a few that look differently like that one: install.inc,v 1.24 2006/10/23 06:45:17 dries

Every msgid / msgstr item has a reference to a source file and line number. When releases are merged there is a reference to for every release. That makes them really long (multiple thousand characters). E.g. #: core/authorize.php:146; core/lib/Drupal/Core/Updater/Module.php:130; core/lib/Drupal/Core/Updater/Theme.php:110; ...

Just in case someone is interested, i've attached the compressed file.

- When the Include metadata option was unchecked (and Download untranslated and translated strings + Inject German suggestions? where checked) the generation took 24.69min and produced a 3.9MB file.

- The fastest export took 11.82min and produced a 311.8MB file (with Download only translated strings + Include metadata checked and unchecked Inject German suggestions?) / 13.8min with checked Inject German suggestions? and produced a 315.5MB file

---

That's on modern hardware (Zen 3 CPU + NVMe), running on Linux.

I've never used a local translation application but can hardly imagine that this amount of data is actually useful. I wonder what the value of having these file listing is!? Can those large files even processed by local translation applications? Are they able to handle / visualize that many references? Do they remove the metadata? If not the files are too large for an import (file size limit is 50MB on production).

The batch rewrite is still (probably) required, because the timeout happens when a single release is exported too, but i think we should limit the All releases merged option to projects that are below a specific threshold (e.g. a line count and / or release count).

shmy’s picture

Status: Needs work » Postponed (maintainer needs more info)
shmy’s picture

Assigned: shmy » Unassigned
fmb’s picture

Assigned: Unassigned » teebeecoder
gábor hojtsy’s picture

I think local translation tools use the reference location value to help translators find where the string came from which is useful especially if the string was ambiguous. This:

#: lib/error.c:116
msgid "Unknown system error"
msgstr "Error desconegut del sistema"

But the "generated from files" list is a Drupalism and I don't think there is any need to keep that.