Problem/Motivation

There's no way to use big webforms on multilingual sites because fails to insert into locales_source.source field elements of the form.

PDOException: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'source' at row 1 in                                                          [error]
/var/www/html/core/lib/Drupal/Core/Database/Statement.php:59

This because mysql BLOB limited by 64k http://cgit.drupalcode.org/drupal/tree/core/modules/locale/locale.instal...

Proposed resolution

not yet

Remaining tasks

decide how to deal with webform translation on top of config_translation

User interface changes

API changes

Data model changes

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

andypost created an issue. See original summary.

andypost’s picture

Steps to reproduce

1) create webform which has more the 64k size when exported to config
2) enable more then one language on site
3) try to import import it

jrockowitz’s picture

andypost’s picture

The workaround for me is to write custom plugins that replace long duplicated field sets (participants of the event)

jrockowitz’s picture

I am able to duplicate this issue by

  • Enable the webform_test.module and the webform_test_translation.module
  • Translate 'Test: Form API #states' which is 156 KB form (/admin/structure/yamlform/manage/test_form_states/translate/es/add)
  • Click 'Save Translation'
  • Throws "Drupal\Core\Database\DatabaseExceptionWrapper: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'source' at row 1:"
jrockowitz’s picture

I am stumped.

The two solutions that I can think of are...

  • Write a patch that increases the 'locales_source.source' column. This will cause performance issues.
  • Alter the query to 'locales_source.source' and prevent the webform.elements from being written to this column. I am not sure of the side effect of this solution.

Neither option seems acceptable.

jrockowitz’s picture

I created patch for #1885192: Field locales_source.source is not suitable for long texts and huge config objects that provides a workaround for this issue.

andypost’s picture

faced with another issue, probably related
When I run `drush locale-check && drush locale-update` to update custom strings for webforms the `elements` key is removed from webform's config translation

maybe related to #2831126: Translation strings from install profile config language/[langcode]/*.yml files don't appear in locale_target or #2650434: Clearing cache via UI in translated language resets config translation of field labels to default language

jrockowitz’s picture

Enabling the 'Interface Translation' module after the Webform module is installed also throws the below error.

ResponseText: {"message":"A fatal error occurred: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column \u0027source\u0027 at row 1: INSERT INTO {locales_source} (source, context, version) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2);

andypost’s picture

I think the only way to make it work is to add to each element own config schema

So elements key will became not a string but a yaml and each element will define its translatable keys(properties)

It can bring another sort of troubles - collisions with string translation (used to translate config properties)
Maybe sort of "context" may be needed to separate translation of form properties

jrockowitz’s picture

It is going to be almost impossible to define a hierarchical config schema for 50+ elements.

I think we could define a config schema for flattened array of element labels and descriptions.

In the below example elements would be ignored and labels could have a config schema.

elements: |
  name:
    '#title': 'Your Name'
    '#type': textfield
    '#required': true
    '#default_value': '[webform-authenticated-user:display-name]'
  email:
    '#title': 'Your Email'
    '#type': email
    '#required': true
    '#default_value': '[webform-authenticated-user:mail]'
  subject:
    '#title': Subject
    '#type': textfield
    '#required': true
    '#test': 'Testing contact webform from [site:name]'
  message:
    '#title': Message
    '#type': textarea
    '#required': true
    '#test': 'Please ignore this email.'
labels:
  name:
    '#title': 'Your Name'
  email:
    '#title': 'Your Email'
  subject:
    '#title': Subject
  message:
    '#title': Message

Still, I am optimistic that #1885192: Field locales_source.source is not suitable for long texts and huge config objects could be resolved and we maybe could leave things AS-IS.

andypost’s picture

Core issue is workaround not solution.

Maintaining of translation for webforms is real pita, that's why I'm searching for other way

1) translation copies elements key so actually allows translation to be totally different structure UX--

2) it's mostly impossible to keep synchronized huge forms into different languages, single looong line impossible to review and validate when code review in action DX--

3) translation should not contain HTML tags and it breaks core's translation ui, it total mess when you trying to export .po file with custom translations

In long run it will not work and needs to be refactored, maybe in 6.x.

Btw yamlform stoped core's initiative to create forms module and nonstructural nature make all its benefits useless

jrockowitz’s picture

The UX of being able to translate all the elements as a single YAML file is very powerful. I agree that core does not really support this concept.

The nonstructural nature of a Webform's elements being a FAPI render array is also very powerful while I also agree it is completely not applicable to core.

BTW, there some real cool form element's and API's that are could be applicable to core like a reusable multiple value element and dismissable inline status messages.

I think my effort with the Webform module and core is going to have to intersect in the not so distant future. For now, I need to get more people involved and I need some revenue to help me continue to support the Webform module and ultimately we need get out a stable release.

CatherineOmega’s picture

So here's a thought for a simple improvement for translating long elements; could the form not just replace all the \r\n with proper line returns so it looked like the source language? I've just been doing a find-replace in my text-editor and then pasting the resulting code in, and it works fine.

jrockowitz’s picture

@CatherineOmega Drupal 8 Core makes is impossible to reformat YAML when exported during a bulk config-export. @see \Drupal\Component\Serialization\Yaml

The Webform module includes a `drush webform-tidy` command that converts \r\n into line breaks. If you enable the Webform devel module, besides adding 'Export' sub tab to 'Devel' tabs all single configuration exports of Webform configuration files will be tidied.

Here is a tidied Webform config export.

bucefal91’s picture

Hello, guys!

I'll let myself butt in here :) I am also right now considering a case where I need to handle certain kind of translation features for webforms.

I think I keep saying it in every other issue I visit in webform module, but I'll say it once again. jrockowitz, your effort and dedication is admirable on this module. Do you even sleep sometimes? :) I really have no idea how you can put up with so much work in the module. I personally thought it was impossible before I witnessed it in your case.

Alright, now I cut to the business. I know it implies ugly load of work, but I lean towards putting some more structure into 'elements' property too. I do not have any good specific example, but I am rather sure if it becomes more structured, one way or another folks will find a way to leverage that structure.

Another reason in favor, at least from my analysis it appears that a webform in EN must have the same elements as the same webform in FR language. Otherwise some unpleasant effects may be encountered. Just consider a parallel with Views, namely the parallel between webform elements and views handlers. In views, you can't alter the handlers nor their configs, just the UI (labels). Why? I think because if your view becomes absolutely another thing when you go from EN to FR, that generates pain (you'll have much worse time theming, much worse time altering, much worse time overall interacting with that view). Same way I see it for webform elements: if the webform elements change from a translation to a translation, it will be difficult to create a table of submissions (you'll get a sparse matrix in the best case scenario). Many things may actually be hard wired to webform element machine names (webform handlers, strings tokenized on webform submission).

So I personally would define translation of webform as: keep the same elements, just change the UI (labels). And if you are having such a use case where you need to actually present a different set of elements depending on whether it's EN or FR... well, then apparently you need 2 different webforms (one in FR and another in EN). There is little gain in trying to merge them into a single webform with 2 translations while there is definitely certain pain in it.

That's my personal opinion which may .. or may not be right :) The consequences from such opinion that I recognize: Yes, the 1st step towards it is to structure the 'elements' property and make it follow a certain schema. I've given nearly no thought about what actual schema, but from the comments above I see that one concern is how to represent a multi-level hierarchy (presumably unlimited level of nesting). I would define a flat mapping of all the elements + a sequence property of #parents like they have it in form API. Then any hierarchy can be flattened out into a flat mapping for storing into YAML and parsed back into hierarchical tree when read from the YAML with a rather simple parser/encoder logic.

DamienMcKenna’s picture

I'd honestly suggest working out how to translate each form label & description tag individually rather than the entire config object - you don't want someone who has no knowledge of YAML data structures poking around in it. I'd also second the idea that you'd probably want to keep the field structures in sync across all locales, and as soon as you start translating the YAML data you're creating forks of the form - any change would have to be replicated across each translation, and by hand using YAML code instead of the GUI.

jrockowitz’s picture

I do think we could improve the current webform translation UI/UX to display text fields and text areas for individual element properties.

I think restucturing how elements are defined/stored could be a lot of work but this might be an inevitable change/improvement.

@see #2913372: Allow forms to be defined in three segments: schema, UI, data

dubcanada’s picture

Is there a way to make the webform ui work on different languages? It is the same YAML as the default language, we could temporarily just make the UI interface work for the translate area?

DamienMcKenna’s picture

On the config side, are there any handlers / callbacks which could be used to change the config object export? While it'd be nice for core to handle this better, maybe there are ways of extending core?

jrockowitz’s picture

Title: Translation of big webforms broken » [Webform 8.x-6.x] Translation of big webforms broken
Status: Active » Postponed

Maybe this issue can be revisited via Webform 8.x-6.x.

ckaotik’s picture

So elements key will became not a string but a yaml and each element will define its translatable keys(properties)

I agree with #10, but suggest a tweaked solution with less overhead: Provide a generic "webform element" schema, that is still super-powerful YAML code. Serialization for each element should be doable, I think?

elements:
  type: sequence
  label: 'Form elements'
  sequence:
    type: text
    label: 'Element'

Things this helps with:

  • Reduces YAML code to better manageable chunks
  • Allows translation of individual elements, instead of all in a single huge text box
  • Allows to better rerview config changes to a single element
DamienMcKenna’s picture

Should this issue focus on improving the ability to translating the entire configuration object while we separately deal with translating the user-visible strings, or should we skip the configuration translation portion itself and just come up with a simpler means of translating the user-visible strings? Should sites that want to have different field/form configurations per language just have separate webforms?

Pere Orga’s picture

Title: [Webform 8.x-6.x] Translation of big webforms broken » Translation of big webforms broken
Version: 8.x-5.x-dev » 6.x-dev
Status: Postponed » Active

Updating version and status as per #21.

I am also unsure if the title and description should be updated: #3014680: Translation Interface to be textareas and not YAML was closed as a duplicate of this one, but the former is about an expected feature request (providing a user interface for translating webforms), and this one appears to be about a bug when translating big webforms.

jrockowitz’s picture

Fixing this specific issue requires reworking how webforms are stored and translated.

I don't have the free time available to address this issue. I am not sure it should delay the stable release of Webform 6.x.

Improving the Webform module's translation UI/UX can happen before or after 6.x is released. Here is the dedicated ticket.

#3172735: Improve webform elements translation UI and refactor webform translation code

ckaotik’s picture

Should sites that want to have different field/form configurations per language just have separate webforms?

This would likely break how we currently use Webform. Translatable webforms make a site setup possible, where the unknowing editor does not need to know about config translation, translation "just works": Write your node content, select your webform, done. Only more advanced editors get access to build and translate webforms.

Should this issue focus on improving the ability to translating the entire configuration object while we separately deal with translating the user-visible strings, or should we skip the configuration translation portion itself and just come up with a simpler means of translating the user-visible strings?

I'm all in favor of fixing translation of large webforms. I wonder how complex a solution for per-element storage would be to implement. Not schemas for each element type, but rather a "webform element" schema item.

Providing a clean UI for translating all the elements webform supports is an excessive task, especially when considering that some element types need more than "just" label and description translations, e.g. options values on radios, checkboxes and select elements, help texts etc.

jrockowitz’s picture

Status: Active » Postponed

#3172735: Improve webform elements translation UI and refactor webform translation code will address the UI/UX issue but #2836206: [Translations] Translation of big webforms broken is still a problem which I don't think is fixable in 6.x. I am moving this issue back to postponed.

EgbertB’s picture

The inaccessibility of a non TWIG userinterface for the translation makes this module practically unusable for my clients with multiligual sites.....

andypost’s picture

Status: Postponed » Active

As core 9.2 fixed it now needs work to adopt changes on next core release #2844452: Export configuration YAML strings as multiline

See CR https://www.drupal.org/node/3114725

joseph.olstad’s picture

strangely enough, this bug doesn't affect my PostgreSQL environment, only affects my MySQL v8.0.28 environment.

If this was fixed in 9.2.x, hopefully there's an answer already just have to find it.

digging through the CR and the rest.

joseph.olstad’s picture

Figured out a workaround but ya, a PITA

#3271926: mysql 1406 Data too long for column 'source' at row 1 insert into locales_source

Not only is it a pita figuring this out but the SQL error came up when I put the config file in my custom module config/install folder, for some reason this imports using a different mechanism than the configuration import code (that works) in my .install file.

if you're curious, have a look at the linked issue and look at my project source code for the .install file.

I moved the config file from config/install to config/optional then force load it with some code in my .install file, it works without error provided the configuration was exported after being cleaned/re-imported without the dreaded \r .

With that said, PostgreSQL had no issues either way, it was MySQL that was barking about the data column error

joseph.olstad’s picture

ok, interesting conundrum using the web interface this time. Different even larger form, what is most frustrating is the form save does not throw any notice or error at all, no values will change. I would at least expect an exception or notice or error or warning message or at minimum a log entry in the dblog however there is nothing. and I do not see a mysql error message.

Tried with Drupal 9.3.16, Drupal 9.4.5,
tested both webform 6.0.7, 6.1.3, 6.2.x-dev latest , same behavior

Database information: "Percona Server (GPL), Release '41'"
PHP versions tested 7.4.x / 8.0.x
Environment: Acquia Cloud.
PHP

post_max_size 256M
memory_limit:  512M
max_input_nesting_level	64
max_input_time	60
max_input_vars	1000

Workaround is to manually edit the translation yml and import it with the command line.

I have a massive webform, it is 384k original language (Français)
and the translation is 'English' which is 108K

du -hs config/default/language/en/webform.webform.form_inscri_assu_temp_adulte.yml
108K    config/default/language/en/webform.webform.form_inscri_assu_temp_adulte.yml
du -hs config/default/webform.webform.form_inscri_assu_temp_adulte.yml
384K    config/default/webform.webform.form_inscri_assu_temp_adulte.yml
joseph.olstad’s picture

FileSize
33.86 KB

Here is the YML in a tgz file for the webform that fails to save translation in the form GUI

To test this you need to install Drupal with french as the default language and add english as a translation.
install the webform module 6.0.7 or 6.1.3 or 6.2.x or 6.2.x-dev

import the two webforms found inside the tgz file with drush cim --partial

Curiously, no error, no notice, no log messages, no warnings, but doesn't save with /admin/structure/webform/manage/form_inscri_assu_temp_adulte/translate/en/edit

joseph.olstad’s picture

Why is Drupals wysiwyg generating a single line that gets imported into the locales table?

using Drush the discussed fix which is to add translations in multiple lines works properly

the solution therefore is to fix the text processing in the wysiwyg mode.

#2844452: Export configuration YAML strings as multiline

the issue says fixed but for some reason it's not fixed in the webform module

joseph.olstad’s picture

Assigned: Unassigned » joseph.olstad

I have an idea for a core patch that could fix this, it could create some ideas, working on this.

#3271926: mysql 1406 Data too long for column 'source' at row 1 insert into locales_source

I'll use a technique available in core to clean this up, I've written a core patch that takes the output of ckeditor and allows for cleanup.
see this patch that I will modify for this use case to remove the \r to make sure it doesn't come back

#2857273: \Drupal\filter\Plugin\Filter\FilterHtmlCorrector and Html::normalize() and incorrectly "corrects" <source> tags

joseph.olstad’s picture

This might explain why drush is able to import large translations , having issues figuring out why the gui editor generated stuff seems to be single line processed instead of multiline however see this utility, seems like Webforms knows about multiline , wondering how it can be fixed in the GUI like it is for drush.

src/Utility/WebformYaml.php

/**
 * Provides YAML tidy function.
 */
class WebformYaml implements SerializationInterface {

  /**
   * {@inheritdoc}
   */
  public static function encode($data) {
    // Convert \r\n to \n so that multiline strings are properly formatted.

joseph.olstad’s picture

Assigned: joseph.olstad » Unassigned

I tried a bunch of things including increasing the blob to a mediumblob and a longblob
didn't help

I tried my core filter for \r\n , didn't help perhaps I need to re-try that again or dig a bit deeper but running out of time.

joseph.olstad’s picture

known way to get around the problem:
Roll the drush command:

drush webform-tidy /path/to/config/default/language/fr/webform.webform.example.yml
modify the content language/fr/webform.webformamp.exle.yml with the changes for the translation, save it

then:
drush cim;

Now go to view the changes in the translation version of the webfrom.

The changes for the translation of the 'very long' form will then appear in the interface.

I'm not satisfied with this workaround.

Another way to avoid this issue: use a Drupal install with Postgres v12+ instead of mysql/persona/mariadb this way using a Postgresql based system there's no issue with very large webform translations as it's blob type is on the huge side of things, the issue is avoided.

what's puzzling is that I altered the column blob size in "mysql" using a table alter and it didn't fix the problem, something in the db api for mysql likely related to this, with that said, I think there's another solution because core solved this by doing multiline processing for translations. For some reason the webform UI does things differently while the drush side of things can work because it's handling the import /update differently.

andypost’s picture

@joseph.olstad thanks for sharing! Curious how pgsql12+ helping to solve?

joseph.olstad’s picture

pgsql has rediculously high limits for blob columns so the translation string can be gigantic. Also Drupal abstraction api is different for pgsql than it is for mysql so there could be something else happening here because I did alter the locales_source table in mysql and converted the column to mediumblob however I didn't change any code in the mysql db api which might explain the lack of success with this test.

Limits in PGSQL are very high: https://www.enterprisedb.com/postgres-tutorials/postgresql-toast-and-wor...

*limited 4TB (PostgreSQL 9.3+) per entry, & 4 Billion per database

Whereas MySQL / Persona / MariaDB have blob, mediumblob and longblob

BLOB: Can handle up to 65,535 bytes of data. MEDIUMBLOB: The maximum length supported is 16,777,215 bytes. LONGBLOB: Stores up to 4,294,967,295 bytes of data. https://dzone.com/articles/how-to-work-with-blob-in-mysql-database-hoste...

Perhaps we should be working on a db api patch to convert the default string saving to happen into a mediumblob column in Drupal that would be way more than enough.

in Drupal the default for the source column in the locales_source table is 'blob' which is 64 kilobytes max.

I did test a few times and didn't notice any issues in postgresql Drupal 9.4s however mysql based Drupal 9.4s had the large webform translation bug:

curiously, when failing with mysql:

  • no error message which was quite dissappointing
  • no warning from the db api,
  • no warning from mysql,
  • no warning from Drupal,
  • just nothing happens (the translation is not saved).

I think 64 kilobytes is great as a limitation for retro computing but maybe not so great for big information systems with a lot of text/tokens/markup.

It's time to think BIG!

joseph.olstad’s picture

We should probably flip this issue over to Drupal core from the webform project, seems like more of a core issue, I think we need to adjust the default blob type to mediumblob and make sure cores db api allows > 64kb for more than just PostgreSQL, it's 2022 not 1993, we're not dealing with minuscule hard drives anymore.

joseph.olstad’s picture

Ok crap, I think I know why the alter table failed in my testing the alter table in mysql, I was altering the source column instead of the locales_target table target column.

ok retest altering the locales_target table target column
alter the column by the name of 'target' to be a longblob (drupal core doesn't mention mediumblob)

retest, this should fix the problem

joseph.olstad’s picture

In all intents and purposes the locale module is a core module, IMHO this module AND cores db api should start by making a patch to convert the translation column default to a longblob but aim to make it a mediumblob because a longblob is really overkill however it will do the job and we're currently needing a working solution.

In Drupal core they call this a blob:big

core/modules/sqlite/src/Driver/Database/sqlite/Schema.php:      'blob:big'        => 'BLOB',
core/modules/mysql/src/Driver/Database/mysql/Schema.php:      'blob:big'        => 'LONGBLOB',
core/modules/pgsql/src/Driver/Database/pgsql/Schema.php:      'blob:big' => 'bytea',
core/tests/Drupal/KernelTests/Core/Database/SchemaTest.php:      ['type' => 'blob', 'size' => 'big'],

Drupal core does not have a 'blob:medium'

joseph.olstad’s picture

mysql> ALTER TABLE locales_target MODIFY translation LONGBLOB;
Query OK, 13004 rows affected (0.40 sec)
Records: 13004 Duplicates: 0 Warnings: 0

ok this should fix the problem:

drush sqlc;
then run the following table alter:
ALTER TABLE locales_target MODIFY translation LONGBLOB;
retest, cross fingers.

joseph.olstad’s picture

Ok I tested changing both locales_source source column AND
I tested changing also locales_target translation column

as follows:

ALTER TABLE locales_target MODIFY translation LONGBLOB;
ALTER TABLE locales_source MODIFY source LONGBLOB;

This did not fix the problem however I think it likely has to do with the DB api or serialization limits to 65536 bytes

jrockowitz’s picture

Title: Translation of big webforms broken » [Translations] Translation of big webforms broken
Version: 6.x-dev » 6.2.x-dev
joseph.olstad’s picture

Issue tags: +MySQL

Note: this only affects MySQL based systems. Postgres has ridiculously high blob limits and a different abstraction layer. This error does not occur if you're using PostgreSQL, only MySQL. I have not tested SQLite or other systems, I have only tested this scenario on PostgreSQL and MySQL/MariaDB, I found that the situation /problem is only reproducible on MySQL/MariaDB/Persona based systems when using the mysql abstraction layer.

The situation occurs when the webform is equal or larger than 65536 bytes. Sometimes the translations can be longer than the source and thus hit this limit.