Problem/Motivation
When translating content using Deepl we noticed that some paragraphs get lost.
I don't know if this is a Deepl problem or a Drupal problem.
This happened when translating German to English.
Steps to reproduce
I could reproduce this on a bare Drupal instance with tmgmt_deepl, tmgmt_content, and the modules required by these 2.
It was reproducible (submitting the same paragraph several times/on several instances provided the same result.
For example:
<p>Mit der Shop-Lösung von Primer können Produkte und Inhalt ideal miteinander kombiniert werden. Der Shop ist speziell für die Schweiz optimiert. Ohne grosse Anpassungen bietet dieser sofortige Einsatzmöglichkeiten welche von physischen Produkten, Abonnements, digitalen Produkten oder dem Spenden von Hilfsgüter reichen. Mit individuellen Funktionalitäten und Designs kann der Shop zusätzlich auch nach Ihren Wünschen angepasst werden. Produkte können effizient verwaltet und im Inhalt referenziert werden.</p>
<p>Mit der Shop-Lösung von Primer können Produkte und Inhalt ideal miteinander kombiniert werden.</p>
<p>Der Shop ist speziell für die Schweiz optimiert. Ohne grosse Anpassungen bietet dieser sofortige Einsatzmöglichkeiten welche von physischen Produkten, Abonnements, digitalen Produkten oder dem Spenden von Hilfsgüter reichen.</p>
<p>Mit individuellen Funktionalitäten und Designs kann der Shop zusätzlich auch nach Ihren Wünschen angepasst werden.</p>
<p>Produkte können effizient verwaltet und im Inhalt referenziert werden.</p>
returned
<p>The shop is specially optimized for Switzerland. Without major adaptations, it offers immediate possibilities for use ranging from physical products, subscriptions, digital products or the donation of relief goods</p>
<p>With individual functionalities and designs, the shop can also be adapted to your wishes.</p>
<p>Products can be efficiently managed and referenced in the content.</p>
I did break down the paragraph into several paragraphs because as soon as one sentence was not translated, the whole paragraph vanished. In this case, the first sentence broke the translation of the whole paragraph.
I could not figure out why this specific sentence was a problem. When pasted in deepl.com it works.
Weirdness also happened translating English into German, but this time creating content (in the problems that I could reproduce):
<p>In order to simplify the future replacement of existing systems (e.g. ERP, CRM), the website will be decoupled from third-party systems. However, they will still be seamlessly integrated without creating direct dependencies. This way, the third-party systems will not restrict the agility of the web processes in the future and independent adjustments can be made with no direct impact on other <a href="https://www.google.com/">systems</a>.</p>
<p>By taking these points into account and using the modern Drupal 8 platform, the website is set up for a long lifetime.</p>
was translated to:
<p>Um die zukünftige Ablösung bestehender Systeme (z.B. ERP, CRM) zu vereinfachen, wird die Website von Drittsystemen entkoppelt. Sie werden jedoch weiterhin nahtlos integriert, ohne direkte Abhängigkeiten zu schaffen. Auf diese Weise schränken die Drittsysteme die Agilität der Web-Prozesse in Zukunft nicht ein und unabhängige Anpassungen können ohne direkte Auswirkungen auf andere <a href="https://www.google.com/">Systeme</a>.</p>
<p>vorgenommen werden</p>
<p>Durch die Berücksichtigung dieser Punkte und die Verwendung der modernen Drupal-8-Plattform ist die Website für eine lange Lebensdauer ausgelegt.</p>
We noticed that the <a> elements could also break things (but I could not figure out a logic). I also did not test any other language.
Is there a way to look at what exactly is sent to / returned by Deepl?
Comments
Comment #2
steffenrYou could debug the the DeeplProTranslator doRequest method. This is the place, where the text is passed to the deepl API.
If things get lost, it could also be a problem of the "Tag handling" - could you check this at /admin/tmgmt/translators/manage/deepl_pro?
We are fine by using xml here. Right now we have not encountered any loss of content, while translating.
Comment #3
mathilde_dumond commentedWe have no option here so far. I will add "xml" then, thanks a lot.
Edit: so far so good, looks like that was the trick, thanks again.
Comment #4
berdirAccording to https://www.deepl.com/docs-api/handling-xml/sentences-with-markup/, the xml option is kind of expected when dealing with HTML, if I understand that correctly? Maybe instead of making that a global option, it could detect per text if it contains tags and if yes, enable that automatically?
We could work on that if you that makes sense to you?
Comment #5
steffenr@Berdir:
Sounds good for me. Feel free to contribute a patch for kind of „autodetection“.
Thx,
SteffenR
Comment #6
steffenrComment #7
steffenrSince this is not a real bug, i'll will the issue as Closed.
The issue can be fixed by setting the tag handling to xml in the translator settings.