Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
We should add something like masking
#1882108: Change record: mask HTML markup in XLIFF
It seems mirosoft drops HTML thus leading to misformatting.
Comment | File | Size | Author |
---|---|---|---|
#15 | translate-html-2042873-15.patch | 2.08 KB | jacktonkin |
| |||
#12 | tmgmt_microsoft-preserve-html-escape-strings-2042873-12.patch | 1.78 KB | jacktonkin |
| |||
#7 | tmgmt_microsoft-preserve-html-escape-strings-2042873-7.patch | 1.81 KB | jantoine |
#3 | 2042873-TMGMT-MicrosoftTranslator-Malformed-HTML.JPG | 321.56 KB | pmu-1 |
Comments
Comment #1
BerdirThis used to work just fine, we tested this quite a bit when initially developing this.
Comment #2
miro_dietikerI have requested an example of what doesn't work from the original reporter
Comment #3
pmu-1 CreditAttribution: pmu-1 commentedHere is some good example in the attached screenshot: as you can see:
1.
has extra space at opening tag and space is removed at closing tag ( < br/> )
2. tags have an extra spaces and the class attribute is shifted forward in the text < a class = "" href = "... >
3. most opening
tags are OK but the closing
tags are mostly pulled back into the text, but not always....
So this behaviour is really inconsistent ...
Suggestion: Is there a way that TMGMT could take care of it by never sending HTML to the Translator Engine, but only the text between HTML tags in multiple requests, that way rebuilding the HTML locally?
Comment #4
miro_dietiker"Is there a way that TMGMT could take care of it by..."
That's what masking is all about.
However, the external service should support masking natively, too.
... and all that will require quite some work. :-)
Comment #5
BerdirInstead of trying to do anything ourself, we should probably sent it as content type text/html, then we just need to check if we have (valid) HTML and switch the content type: http://msdn.microsoft.com/en-us/library/ff512421.aspx
I haven't found a way to exclude something from translation, which would still be useful for e.g. locale placeholders.
Comment #6
BerdirAh, also spans with notranslate class, similar (equal?) to google translate: http://social.msdn.microsoft.com/Forums/en-US/41f09c5d-68ae-4d26-ad93-a2...
Comment #7
jantoine CreditAttribution: jantoine commentedThe attached patch changes the content type to text/html as suggested in #5. It does not, however, validate the HTML. This works fine when translating text without HTML, so it shouldn't affect plain text translations.
It also implements escapeStart and escapeEnd for escaping user defined strings.
This patch is working for me, although I had to extend the TMGMTEntitySourcePluginController class in order to define custom strings to be escaped. Would be great if this could be handled via the UI.
Comment #8
AnybodyThe patch works great for me also. If we can get 1-2 more reviews, we can perhaps set it RTBC and get this into the next dev release?
Comment #9
akalam CreditAttribution: akalam commentedWorks perfect. Great patch!
Thanks jantoine
Comment #10
AnybodyComment #11
gge CreditAttribution: gge commentedI just tested this patch and is working great until now, but I found two minor things that could be improved.
1. I'm using the CKEditor module and added "{ name : 'Do not translate' , element : 'span', attributes: { 'class': 'notranslate' } }" to ckeditor.styles.js. I'm able to select some text and easily add a span with class "Do not translate", from the Select dropdown. Everything is perfect except there should be an empty space after the closing < /span>
Original text:
<span class="notranslate">Do not translate this in</span> German
the translated text:
Do not translate this inDeutsch
and should be:
in Deutsch
2. How can "notranslate" can be used for the title field?
Thank you!
Comment #12
jacktonkin CreditAttribution: jacktonkin at ISSUP commentedReviving this issue because I'm having similar issues with 8.x-1.0-beta1, with
<a>
tags occasionally being translated as< un >
for English -> Spanish translations.It was trivial to port the patch from #7, and it appears to work with minimal testing so far. I've additionally removed the Content-Type header as I think it's meaningless for GET requests.
Comment #13
heddnIs this still working on your site after 4 years? Or did MS fix things on their side so this patch is no longer needed?
Comment #14
jacktonkin CreditAttribution: jacktonkin at ISSUP commentedI'm still applying a version of this patch and translations work with it applied. The patch above is against an old version of the API. I have a newer patch I'll re-roll against HEAD.
I haven't tested without this since I updated to the V3 API, but looking at the documentation it seems clear to me that any markup should be submitted to the service with
'textType' => 'html'
.https://docs.microsoft.com/en-gb/azure/cognitive-services/translator/ref...
Also, thank you so much for taking the time to prepare a Drupal 9 compatible release of this module!
Comment #15
jacktonkin CreditAttribution: jacktonkin at ISSUP commentedUpdated patch.
Comment #16
heddnI don't see where this is used. Or does it come into play from the parent class?
I am really going to have to depend on you to tell me if this is a change that risks breaking anything. I only lightly use this module and don't have sufficient time to thoroughly grok if this is a risky change.
Comment #17
jacktonkin CreditAttribution: jacktonkin at ISSUP commentedYes,
parent::escapeText()
andparent::unescapeText()
wrap substrings that shouldn't be translated with those<span>
tags.This is so locale placeholders (e.g.
@title
int("Created new content @title", ['@title' => $node->label()]);
) don't get translated. See #5 above. I believe that escaping like this is only available when translating HTML on Microsoft's service.All of the text we send for translation using this service has been entered in CKEditor and is wrapped in paragraph tags at least, so I don't know for sure that sending text that isn't wrapped in HTML tags will cause a problem. I did see problems in the past with HTML tags being translated, which is why I took up this issue. Also, I'd have thought that most of the translation sources in Drupal are markup rather than plain text (in the sense that they are safe to include directly in the rendered page without further escaping) so it's hard to see how this can make things worse?
Comment #19
heddnThanks for explaining your logic. Committed it.