Problem/Motivation

Site Studio - Lingotek only pulling in last field in a series of repeater fields

Steps to reproduce

1. The issue happens when using the Field Repeater field type in Cohesion
2. When you add multiple items in the field repeater, only the last item in the list is sent to Lingotek for translation.

I’ve tracked down the problem to this method on the uploading to Lingotek side of things. The logic seems to start to take into account the use of arrays, but doesn’t successfully track the keys and upload all of them.

Currently when it uploads the data to Lingotek it’s uploading:

<uuid-of-the-text-component-of-the-field-repeater>: second item body text

But really it should be uploading:

<uuid-of-the-field-repeater-field>: [
  [<uuid-of-the-text-component>: first item body text],
  [<uuid-of-the-text-component>: second item body text],
]

I’ve done some tinkering locally, and have been able to get it to upload the second set of structured data successfully. But when pulling the data back into Drupal, it doesn’t pull in the translated content. That seems to be related to the logic in this condition.

When looking at the existing value from Drupal, it’s expecting an stdClass, but instead is getting an array of stdClass so it doesn’t actually process the incoming data as it has no logic to do so.

This is as far as I’ve gotten debugging the issue so far. My fixes are probably naive, as I don’t know how the rest of the Lingotek system works and don’t have much familiarity with the inner workings of Cohesion either.

Proposed resolution

Ensure we can upload elements that have multiple values, which the repeaters use.

Remaining tasks

Patch with tests.

User interface changes

None.

API changes

None.

Data model changes

None.

Comments

penyaskito created an issue. See original summary.

penyaskito’s picture

Status: Active » Needs review
StatusFileSize
new3.68 KB

Crediting @marcaddeo for a wonderful report and a starting patch for extracting the data.

Attached patch that should fix the issue.

marcaddeo’s picture

StatusFileSize
new3.7 KB
new774 bytes

@penyaskito I had to decode the incoming entities to get it working, otherwise seems to be good. I've attached a patch and interdiff.

penyaskito’s picture

StatusFileSize
new26.25 KB
new29.95 KB

Thanks @marcaddeo!

Added a test on top of your patch.
No interdiff, the tests-only patch serves as such.

penyaskito’s picture

Issue summary: View changes

The last submitted patch, 5: 3202231-sitestudio-repeaters-6.only-tests.patch, failed testing. View results

hlopez’s picture

Status: Needs review » Reviewed & tested by the community

Looks good! Thank you!

  • penyaskito committed c8c3a14 on 3.5.x authored by marcaddeo
    Issue #3202231 by penyaskito, marcaddeo, hlopez: Site Studio - Lingotek...
penyaskito’s picture

Committed c8c3a14 and pushed to 3.5.x.
Cherry-picked a2f9225 and pushed to 3.6.x.

Attributed to @marcaddeo as he did most of the work here. Thanks!

  • penyaskito committed a2f9225 on 3.6.x authored by marcaddeo
    Issue #3202231 by penyaskito, marcaddeo, hlopez: Site Studio - Lingotek...
marcaddeo’s picture

Status: Reviewed & tested by the community » Needs work

Hey @penyaskito I'm still testing this and finding some issues.

I've been working on an additional patch to get it working and will upload that once I've got that working.

marcaddeo’s picture

StatusFileSize
new10.03 KB

Alright, spent this week working through this and refactoring the Cohesion portions a bit. This is now handling Cohesion components correctly, including nested field repeaters/arrays.

I also found that this was running Html::escape() on the strings/HTML when gathering source data, but this was causing Lingotek's HTML tokenization to not function, so I've removed it.

There's now logic in place to check the components schema type and use that as a determinator when processing each type of data. I also added logging so any unhandled component types will be logged and can be addressed when found.

I believe this should be working fully now, but of course want to get it reviewed here by everyone.

penyaskito’s picture

Status: Needs work » Needs review

This is awesome, Marc. Thanks!
Setting as needs review to trigger the testbot

penyaskito’s picture

Version: 3.5.x-dev » 3.6.x-dev
StatusFileSize
new11.41 KB

Rerolled the patch on top of the one already committed.
This patch will go only on 3.6.x, so testing against that.

penyaskito’s picture

  1. +++ b/src/LingotekContentTranslationService.php
    @@ -824,65 +828,149 @@ class LingotekContentTranslationService implements LingotekContentTranslationSer
    -  private function processCohesionComponentsValues($values, $component_model, $nested_key = []) {
    -    $data_layout = [];
    +  protected function extractCohesionComponentValues(array $component_model, $values) {
    +    $field_values = [];
    

    As I made this private, the API change can be accepted. I'm good having it as protected.

  2. +++ b/src/LingotekContentTranslationService.php
    @@ -824,65 +828,149 @@ class LingotekContentTranslationService implements LingotekContentTranslationSer
    +  protected function extractCohesionComponentValues(array $component_model, $values) {
    

    This full piece of code is far more readable, thanks!

  3. +++ b/src/LingotekContentTranslationService.php
    @@ -824,65 +828,149 @@ class LingotekContentTranslationService implements LingotekContentTranslationSer
    +      $blacklist = [
    +        'cohTypeahead',
    +        'cohEntityBrowser',
    +        'cohFileBrowser',
    +      ];
    +      $component_type = $settings->type ?? NULL;
    +      // Skip blacklisted component types.
    +      if (in_array($component_type, $blacklist)) {
    +        continue;
    +      }
    

    I'm against the use of the term blacklist, let's use a more friendly term as $skippedComponentTypes. The comment then will be superfluous.

The last submitted patch, 15: 3202231-sitestudio-repeaters-15.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 16: 3202231-sitestudio-repeaters-16.patch, failed testing. View results

penyaskito’s picture

Status: Needs work » Needs review
StatusFileSize
new1.42 KB
new12.81 KB

Added fix for the tests to pass after changing the escaping.
I'm not 100% sure this is desirable for every component type tho.

marcaddeo’s picture

Added fix for the tests to pass after changing the escaping.
I'm not 100% sure this is desirable for every component type tho.

It doesn't seem like any other content that is being pushed to Lingotek is being escaped from what I could tell. The Cohesion components were the only ones, and that was causing errors in the TMS when tokenizing the HTML for translation.

Is there a way to instruct Lingotek that the incoming data is HTML encoded, and only in some places?

marcaddeo’s picture

Issue summary: View changes
penyaskito’s picture

If it works for you I guess we are fine.

If you mean *on upload from Drupal to Lingotek*, there are hooks in the Drupal side, and on TMS import the content run through a set of configurable filters, but being fair I think it's better to send it this way, unless you have a use-case I'm missing.
If you mean *on download from Lingotek to Drupal*, there are hooks in the Drupal side.

Should we commit and release this for you?

marcaddeo’s picture

Status: Needs review » Reviewed & tested by the community

Yup, everything is looking good on our end after testing. This is good to go.

  • penyaskito committed 8260eea on 3.5.x authored by marcaddeo
    Issue #3202231 by marcaddeo: Site Studio - Lingotek only pulling in last...

  • penyaskito committed f144c16 on 3.6.x authored by marcaddeo
    Issue #3202231 by marcaddeo: Site Studio - Lingotek only pulling in last...
penyaskito’s picture

Version: 3.6.x-dev » 3.5.x-dev
Status: Reviewed & tested by the community » Fixed

Committed 8260eea and pushed to 3.5.x.
Cherry-picked f144c16 and pushed to 3.6.x. Thanks!

penyaskito’s picture

penyaskito’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.