Postponed (maintainer needs more info)
Project:
Localization server
Version:
3.0.x-dev
Component:
Code
Priority:
Major
Category:
Bug report
Assigned:
Issue tags:
Reporter:
Created:
14 Aug 2018 at 11:02 UTC
Updated:
26 May 2025 at 08:50 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #2
drummThis is due to the process running out of memory in
l10n_server/l10n_community/export.inc, which isI’m not seeing any quick wins, other than increasing the memory limit, which would be a temporary solution.
This probably would need to be rewritten to batch and stream the output to really solve the issue.
Would a multi-select list of releases be useful, for example for getting all 8.x or 8.6.x releases only?
Comment #3
joachim namysloWhat may be useful would be an option to export strings by active branches like 7.x merged or 8.x merged. Not sure, if this is possible and easy enough to implement.
The rest is already possible and should work fine because releases like 8.4 are not as big as 8.x, guess the server will not run out of memory if a small part of the main branch is exported
Comment #4
drummMoving to the l10n_server project, where the code for this lives. If this is not an immediate blocker, I’d rather not raise the memory limit, which is already somewhat high.
I’m not seeing any deduping or grouping across releases, so I think each version of Drupal 8.x.x is adding an approximately equal amount of memory usage.
Comment #5
gábor hojtsyIt may be possible that the code naively dedupes in PHP arrays. @Joachim can you or someone else in the German team help look into
l10n_server/l10n_community/export.incin https://www.drupal.org/project/l10n_server and optimize this? That would be a great contribution to all using the system.Comment #6
joachim namysloI can use Drupal but I am not able to code Drupal. It's a pity. But I'll redirect this issue to as many people as possible thanks to drupalchat.me Maybe this will get an update in the near feature.
Comment #7
c-logemannI invest some time to solve this problem and already started to set up a local l10n Server for testing.
After getting stopped by an installation issue (#3000918: PDOException on installing sub module "l10n_drupal") I'm currently importing some translation data to test export functionality.
Comment #8
c-logemannI think it would be a good first step to wrap the database call and data build process with a batch process where each chunk is related to one single release. This won't be a big change.
Based on this solution it will be easy to also change the single to a multi selection as suggested in comment #2 or bundle all releases of a major version as suggested in comment #3. Because combining #2 and #3 would end in a bigger complexity it would be fine to make a decision for #2 or #3. I think #3 would be a good way to avoid to many uses of "all" option when people only translate the last active major versions. So maybe we should also skip the "all" option as default and maybe change to the most recent release or the most recent major version.
Comment #9
c-logemannOn my local tests I also got timeout problems which will also be solved with a batch solution.
After chatting with @Joachim Namyslo I think I will focus on #3 to also implement this feature request of #3000298: Clear up translations and give us some more options with Drupal 9.
Comment #10
c-logemannBecause the export all translations of core feature unusable I set the priority to Major.
Comment #11
c-logemannTo realize "bundle all releases of a major version" I would like to change the function "l10n_community_export()" to receive a string on the second value "$release" instead of an integer. The responding comment change would be:
Because this is an API change which is possibly used by other modules I like to get a response from the maintainers if I can move on this simple way. Solving this issue on a more compatible way take more time.
Comment #12
c-logemannOk, I just realized that I can create a new function with strings and change old one for sending strings to the new one. So I can move on without response from the maintainers.
Comment #13
c-logemannFYI: I have finished more than 50% of the fix and try to finish in next week.
Comment #14
c-logemannStatus update: I changed batch API with every release as chunk. But now the data which is temporary stored in the database is bigger and I needed to increase innodb_log_file_size on my local dev system. Maybe it would be better write out in file system more than once.
Comment #15
gábor hojtsyA temporary file could be created with the batch carrying the file ID/name?
Comment #16
c-logemann@Gabor: In the current data concept we have header information based on all data. When this data is really needed I see two options:
I currently prefer the second one because I only see good operation system commands to merge files. For my own servers I would do this but it's not a good solution for a contrib module.
Comment #17
c-logemannIt's the header "PO-Revision-Date". When we can skip this information or maybe only fill based on the newest release to create the header on the first batch it would be easier to solve this.
Comment #18
c-logemannThis weekend I was working again on this issue and already finished the solution "1" with creating a second tempfile to merge with header information. But now I need to organize the serving of the PO file. The old method with a direct hand over doesn't work with batch API as described on stackoverflow. This needs to fixed maybe in a direction described in this blog post by Jeff Gerrling. Maybe we can avoid an additional page and handle this on the main form page via redirect and the tempfile name as uri argument.
Comment #19
c-logemannComment #20
c-logemannOn Contribution Weekend in FFM/Germany I discussed my current strategy
with @kfritsche. I got some good inspirations how to solve the last steps. I try to present a patch tomorrow.
Comment #21
c-logemannThe new plan for the last step is to move the temp file into a download folder and present the link to it via message. Additionally the files in the download folder needs to be deleted by cron.
Comment #22
c-logemannHere is a first patch to review and discuss. The process is still very slow on "all" export. Maybe we need more data splitting e.g. another chunk logic to get it faster.
When the download solution is selected the cron based cleanup still needs to be added. To prepare I already placed a new function l10n_community_directory() in ".module" file.
I added a unique id to download filename. I kept the ".po" or ".pot" suffix logic and realized that ".po" files are currently blocked by the default drupal ".htaccess". This also needs to be discussed and solved when we present the result in this way. For testing the patch just remove "po" from the "FilesMatch" directive.
Comment #24
gábor hojtsyNow this even times out for exporting one specific current release of core, does not need to be all releases merged. Noted by @svenreyen.
Comment #25
c-logemannComment #26
svenryen commentedThose looking to export a .pot from l.d.o and getting a timeout can also check out this module, and generate the file locally: https://www.drupal.org/project/potx
Comment #28
drummThis was reported via Slack at https://drupal.slack.com/archives/C51GNJG91/p1657399501531629. They found a workaround, and since the 3.0.x branch is under active development, let’s fix it for 3.0.x.
(I’m working on resolving the 7.x-1.x test failures.)
Comment #29
shmy commentedWas able to reproduce it on current 3.0.x-dev and i'm working on a fix.
Comment #30
shmy commentedUnfortunately the export of a project, that consists of a large code base, like core, has major issues. The generation takes quiet some time and the produced
translation files are very large when the Include metadata (Verbose output on 7.x-1.x)) option is enabled. They have a header that includes a
Generated from fileslisting. These files are prefixed with the version string when releases are merged.On my local i've run several exports of the German translation of Core where almost all releases, that are listed on the core's update endpoint, have been parsed. That's ~630 releases, starting from 4.5.0.
The results are:
- All flags enabled (Download untranslated and translated strings + Inject German suggestions? + Include metadata):
The generation took 36min and produced a 337MB file.
The above mentioned file list is 724084 lines long where most of them look like this:
# drupal-10.3.x-dev/core/themes/starterkit_theme/starterkit_theme.info.yml: n/a. I've seen only a few that look differently like that one:install.inc,v 1.24 2006/10/23 06:45:17 driesEvery
msgid/msgstritem has a reference to a source file and line number. When releases are merged there is a reference to for every release. That makes them really long (multiple thousand characters). E.g.#: core/authorize.php:146; core/lib/Drupal/Core/Updater/Module.php:130; core/lib/Drupal/Core/Updater/Theme.php:110; ...Just in case someone is interested, i've attached the compressed file.
- When the Include metadata option was unchecked (and Download untranslated and translated strings + Inject German suggestions? where checked) the generation took 24.69min and produced a 3.9MB file.
- The fastest export took 11.82min and produced a 311.8MB file (with Download only translated strings + Include metadata checked and unchecked Inject German suggestions?) / 13.8min with checked Inject German suggestions? and produced a 315.5MB file
---
That's on modern hardware (Zen 3 CPU + NVMe), running on Linux.
I've never used a local translation application but can hardly imagine that this amount of data is actually useful. I wonder what the value of having these file listing is!? Can those large files even processed by local translation applications? Are they able to handle / visualize that many references? Do they remove the metadata? If not the files are too large for an import (file size limit is 50MB on production).
The batch rewrite is still (probably) required, because the timeout happens when a single release is exported too, but i think we should limit the All releases merged option to projects that are below a specific threshold (e.g. a line count and / or release count).
Comment #31
shmy commentedComment #32
shmy commentedComment #33
fmb commentedComment #34
gábor hojtsyI think local translation tools use the reference location value to help translators find where the string came from which is useful especially if the string was ambiguous. This:
But the "generated from files" list is a Drupalism and I don't think there is any need to keep that.