Problem/Motivation
Site Studio - Lingotek only pulling in last field in a series of repeater fields
Steps to reproduce
1. The issue happens when using the Field Repeater field type in Cohesion
2. When you add multiple items in the field repeater, only the last item in the list is sent to Lingotek for translation.
I’ve tracked down the problem to this method on the uploading to Lingotek side of things. The logic seems to start to take into account the use of arrays, but doesn’t successfully track the keys and upload all of them.
Currently when it uploads the data to Lingotek it’s uploading:
<uuid-of-the-text-component-of-the-field-repeater>: second item body text
But really it should be uploading:
<uuid-of-the-field-repeater-field>: [
[<uuid-of-the-text-component>: first item body text],
[<uuid-of-the-text-component>: second item body text],
]
I’ve done some tinkering locally, and have been able to get it to upload the second set of structured data successfully. But when pulling the data back into Drupal, it doesn’t pull in the translated content. That seems to be related to the logic in this condition.
When looking at the existing value from Drupal, it’s expecting an stdClass, but instead is getting an array of stdClass so it doesn’t actually process the incoming data as it has no logic to do so.
This is as far as I’ve gotten debugging the issue so far. My fixes are probably naive, as I don’t know how the rest of the Lingotek system works and don’t have much familiarity with the inner workings of Cohesion either.
Proposed resolution
Ensure we can upload elements that have multiple values, which the repeaters use.
Remaining tasks
Patch with tests.
User interface changes
None.
API changes
None.
Data model changes
None.
| Comment | File | Size | Author |
|---|---|---|---|
| #19 | 3202231-sitestudio-repeaters-19.patch | 12.81 KB | penyaskito |
| #19 | 3202231-sitestudio-repeaters.interdiff.16-19.txt | 1.42 KB | penyaskito |
| #16 | 3202231-sitestudio-repeaters-16.patch | 11.39 KB | penyaskito |
| #16 | 3202231-sitestudio-repeaters.interdiff.15-16.txt | 747 bytes | penyaskito |
| #15 | 3202231-sitestudio-repeaters-15.patch | 11.41 KB | penyaskito |
Comments
Comment #3
penyaskitoCrediting @marcaddeo for a wonderful report and a starting patch for extracting the data.
Attached patch that should fix the issue.
Comment #4
marcaddeo commented@penyaskito I had to decode the incoming entities to get it working, otherwise seems to be good. I've attached a patch and interdiff.
Comment #5
penyaskitoThanks @marcaddeo!
Added a test on top of your patch.
No interdiff, the tests-only patch serves as such.
Comment #6
penyaskitoComment #8
hlopez commentedLooks good! Thank you!
Comment #10
penyaskitoCommitted c8c3a14 and pushed to 3.5.x.
Cherry-picked a2f9225 and pushed to 3.6.x.
Attributed to @marcaddeo as he did most of the work here. Thanks!
Comment #12
marcaddeo commentedHey @penyaskito I'm still testing this and finding some issues.
I've been working on an additional patch to get it working and will upload that once I've got that working.
Comment #13
marcaddeo commentedAlright, spent this week working through this and refactoring the Cohesion portions a bit. This is now handling Cohesion components correctly, including nested field repeaters/arrays.
I also found that this was running
Html::escape()on the strings/HTML when gathering source data, but this was causing Lingotek's HTML tokenization to not function, so I've removed it.There's now logic in place to check the components schema type and use that as a determinator when processing each type of data. I also added logging so any unhandled component types will be logged and can be addressed when found.
I believe this should be working fully now, but of course want to get it reviewed here by everyone.
Comment #14
penyaskitoThis is awesome, Marc. Thanks!
Setting as needs review to trigger the testbot
Comment #15
penyaskitoRerolled the patch on top of the one already committed.
This patch will go only on 3.6.x, so testing against that.
Comment #16
penyaskitoAs I made this private, the API change can be accepted. I'm good having it as protected.
This full piece of code is far more readable, thanks!
I'm against the use of the term blacklist, let's use a more friendly term as
$skippedComponentTypes. The comment then will be superfluous.Comment #19
penyaskitoAdded fix for the tests to pass after changing the escaping.
I'm not 100% sure this is desirable for every component type tho.
Comment #20
marcaddeo commentedIt doesn't seem like any other content that is being pushed to Lingotek is being escaped from what I could tell. The Cohesion components were the only ones, and that was causing errors in the TMS when tokenizing the HTML for translation.
Is there a way to instruct Lingotek that the incoming data is HTML encoded, and only in some places?
Comment #21
marcaddeo commentedComment #22
penyaskitoIf it works for you I guess we are fine.
If you mean *on upload from Drupal to Lingotek*, there are hooks in the Drupal side, and on TMS import the content run through a set of configurable filters, but being fair I think it's better to send it this way, unless you have a use-case I'm missing.
If you mean *on download from Lingotek to Drupal*, there are hooks in the Drupal side.
Should we commit and release this for you?
Comment #23
marcaddeo commentedYup, everything is looking good on our end after testing. This is good to go.
Comment #26
penyaskitoCommitted 8260eea and pushed to 3.5.x.
Cherry-picked f144c16 and pushed to 3.6.x. Thanks!
Comment #27
penyaskitoComment #28
penyaskitoCreated #3208606: Avoid the spread operator on array as it is not supported in php 7.3