Translate from file (import/export) doesn't work with filtered html [#2272487]

I tried to export in both html and xlf.

HTML returned blank translations after replacing text (or what else should I do?!).

XLF returned xml formatted text. Example:

Imported xlf file source text:
<source xml:lang="en">PAYMENT METHODS Online&nbsp;with Paypal <ul> <li>all major electronic cards</li> <li>immediate processing</li> <li>secure &amp;&nbsp;guaranteed</li> <li>no additional costs</li> </ul> Offline&nbsp;with Cash on Delivery (COD) <ul> <li>pay the courier at your door</li> <li>extra charge of 5€ at checkout</li> </ul></source>

Imported xlf file target text (note that there is no difference apart from pure translation):
<target xml:lang="it" state="translated">METODI DI PAGAMENTO Online con Paypal <ul> <li>tutte le principali carte elettroniche</li> <li>elaborazione immediata</li> <li>sicuro e garantito</li> <li>nessun costo aggiuntivo</li> </ul> OOffline con contrassegno <ul> <li>paga il corriere alla consegna</li> <li>costo aggiuntivo di 5€ alla cassa</li> </ul></target>

Returned this on translator interface:
Source:

<p><strong>DELIVERY&nbsp;INFO</strong></p>
<p>Ready to ship in 7 days.</p>
<ul>
	<li>Express courier</li>
	<li>1-2 days in Europe</li>
	<li>Live tracking</li>
</ul>

Translation:
CONSEGNA Pronto per la spedizione in 7 giorni <ul> <li>Corriere espresso</li> <li>1-2 giorni in Europa</li> <li>Tracciamento in tempo reale</li> </ul>

And this on translated product display (I mean, html tags are shown to the user!):
CONSEGNA Pronto per la spedizione in 7 giorni <ul> <li>Corriere espresso</li> <li>1-2 giorni in Europa</li> <li>Tracciamento in tempo reale</li> </ul>

I am working with entity translation, and of course trying to translate a Filtered html text.

Also, the workflow is not efficient at all in XLF manual translation because the target tag is not prepopulated! If it was, we could just replace all text at once with a text editor like Sublime Text 2.

Comment	File	Size	Author
#16	tmgmt-translate_from_file_html_contents-2272487-16.patch	1.34 KB	liberatr
#16	7.x-1.x: PHP 5.5 & MySQL 5.5, D7 145 pass
#11	2272487-13.patch	660 bytes	Leksat
#11	7.x-1.x: PHP 5.5 & MySQL 5.5, D7 142 pass

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

kopeboy CreditAttribution: kopeboy commented 22 May 2014 at 12:17

Issue summary:

View changes

Comment #2

kopeboy CreditAttribution: kopeboy commented 22 May 2014 at 12:18

Issue summary:

View changes

Comment #3

Matroschker CreditAttribution: Matroschker commented 3 June 2014 at 20:39

It seems this is the same as my new item (https://drupal.org/node/2279265) - I created today, sorry!

You can work around us (at least for XLIFF) if enabling 'Extended XLIFF processing', but in this case I got problems with my image tags - it seems that some free XLIFF tools doesn't support this. The result is, that my images disappear on my page after accepting.

Matroschker

Comment #4

Matroschker CreditAttribution: Matroschker commented 15 September 2014 at 10:08

Any update here?
I checked this issue again with
TMGMT: Last updated: June 27, 2014 - 18:13, Last packaged version: 7.x-1.0-rc1+7-dev

If 'Extended XLIFF processing' is disabled, if translation was imported the HTML tags are shown to the user not the interpreted html code.
If 'Extended XLIFF processing' is enabled, the HTML code is well interpreted, but now my images are moved to another place on the site, eg if the image was within a table, now it is outside.

That means: no changes

Matroschker

Comment #5

dasginganinja

English

Bethlehem, PA

CreditAttribution: dasginganinja commented 16 September 2014 at 15:28

Subscribing. I've got the same issue going on. I'm thinking about creating my own module that extends the file translator plugin and just decodes the entities to get around it. I guess the real question is what is the proper way of doing this in the long run?

Comment #6

molenick CreditAttribution: molenick commented 18 March 2015 at 21:21

Comment #7

jonnydev13 CreditAttribution: jonnydev13 commented 30 April 2015 at 14:55

I'm still seeming to have this issue.

When I try to import an xliff file after creating a job in xliff and translating it, it just tells me there was an error with no more details about what the error was.

When I try to import an html file after creating a job in html and translating it, it says it succeeds but the translated page has no content when I try to review it. This is probably a different issue, but the translated content was part of a drupal book and both the original and the translated one now show up in the book outline.

Is there anybody who has done coding on this module who can give any direction to all of the people on this thread?

Comment #8

dragonfire353 CreditAttribution: dragonfire353 commented 7 May 2015 at 18:23

Here's a quick fix for new translation imports for 7.x1.0-rc1.

Add this to entity/tmgmt.entity.job_item.inc on line 630 or if that isn't right in function addTranslatedDataRecursive near the end before $this->updateData:

$values['#translation']['#text'] = str_replace('&lt;', '<', $values['#translation']['#text']);
$values['#translation']['#text'] = str_replace('&gt;', '>', $values['#translation']['#text']);

Again, quick fix so use at your own risk.

Comment #9

Bram Tassyns CreditAttribution: Bram Tassyns commented 14 September 2015 at 12:59

We also had this problem for html export.
I fixed our issues by changing the import function in tmgmt_file.format.html
(basically, don't just take the text content, but convert all encountered elements back to xml as well)

public function import($imported_file) {
    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = true;
    $dom->formatOutput = false;
    $dom->loadHTMLFile($imported_file);
    $xpath = new DOMXPath($dom);

    $data = array();
    foreach ($xpath->query("//div[@class='atom']") as $atom) {
      // Assets are our strings (eq fields in nodes).
      $key = $this->decodeIdSafeBase64((string) $atom->getAttribute('id'));
      // The content of the node might include html so we need all children
      // (not just the text content).
      $content='';
      foreach($atom->childNodes as $child) {
        $content .= $child->ownerDocument->saveXML($child);
      }
      // Compensate for DOM's conversion of the '\r' part of windows line endings.
      $content = str_replace('&#13;',"\r", $content);
      $data[$key]['#text'] = (string) $content;
    }
    return tmgmt_unflatten_data($data);
  }

Comment #10

Say_Ten CreditAttribution: Say_Ten as a volunteer commented 26 November 2015 at 10:47

Same issue here and the same solution with html_entities_decode(). My concern would be if there's import systems that don't mess it up, would this function be aware or should it be handled in the XLIFF file_format code?

Comment #11

Leksat CreditAttribution: Leksat at Amazee Labs commented 8 February 2016 at 08:19

Status:

Active

» Needs review

File	Size
2272487-13.patch	660 bytes
7.x-1.x: PHP 5.5 & MySQL 5.5, D7 142 pass

The following works well for many of our company clients since middle 2015:

 -      return $translation;
 +      return decode_entities($translation);

We had no issues with this code.

Comment #12

Leksat CreditAttribution: Leksat at Amazee Labs commented 8 February 2016 at 08:27

Just found that #2279265: Without 'Extended XLIFF processing' review shows eg '<' instead of '<' has the same patch, but not sure which issue should be marked as duplicate :/

Comment #13

sri@re CreditAttribution: sri@re commented 24 January 2017 at 12:58

Hi am new to drupal .I just select file translator and i import html file

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xml:lang="de-CH" lang="de-CH" xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <meta name="JobID" content="3" />
    <meta name="languageSource" content="en" />
    <meta name="languageTarget" content="zh-hans" />

    <title>Job ID 3</title>
  </head>
  <body>
          <div class="asset" id="3">
                  <div class="atom" id="bM11bbm9kZV90aXRsZQ">szdfszcs</div>
              </div>
      </body>
</html>

but am getting this error "Failed to validate file, import aborted. " can anyone please clarify?

Comment #14

Kristen Pol

she/her

English

Santa Cruz, CA, USA

CreditAttribution: Kristen Pol at Hook 42 commented 15 May 2017 at 20:41

Tried code in #9 above and it didn't work in my case so I used this which is a bit hacky and could be better but seems to work so we can keep moving:

  /**
   * {@inheritdoc}
   */
  public function import($imported_file) {
    $dom = new DOMDocument();
    $dom->loadHTMLFile($imported_file);
    $xml = simplexml_import_dom($dom);
    $data = array();
    foreach ($xml->xpath("//div[@class='atom']") as $atom) {
      // Assets are our strings (eq fields in nodes).
      $key = $this->decodeIdSafeBase64((string) $atom['id']);
      // This is for plain text.
      $content = (string) $atom;
      foreach ($atom->children() as $child) {
        // This is for HTML.
        $content .= $child->asXML();
      }
      // Get rid of some Windows characters.
      $content = str_replace('&#13;', '', $content);
      $content = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $content);
      $data[$key]['#text'] = $content;
    }
    return tmgmt_unflatten_data($data);
  }

Comment #15

lobodakyrylo CreditAttribution: lobodakyrylo as a volunteer commented 23 October 2019 at 16:39

Code #9 and #14 don't work for RC3

This is my code:

/**
   * {@inheritdoc}
   */
  public function import($imported_file, $is_file = TRUE) {
    $dom = new DOMDocument();
    $dom->loadHTMLFile($imported_file);
    $xml = simplexml_import_dom($dom);

    $data = array();
    foreach ($xml->xpath("//div[@class='atom']") as $atom) {
      // Assets are our strings (eq fields in nodes).
      $key = $this->decodeIdSafeBase64((string) $atom['id']);
      
      // This is for plain text.
      $content = (string) $atom;
      foreach ($atom->children() as $child) {
        // This is for HTML.
        $content .= $child->asXML();
      }
      // Get rid of some Windows characters.
      $content = str_replace('&#13;', '', $content);
      $content = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $content);
      $data[$key]['#text'] = $content;
    }
    return tmgmt_unflatten_data($data);
  }

Comment #16

liberatr

he/him

English

Portland, OR

CreditAttribution: liberatr for Autodesk Knowledge Network commented 9 March 2020 at 22:06

File	Size
tmgmt-translate_from_file_html_contents-2272487-16.patch	1.34 KB
7.x-1.x: PHP 5.5 & MySQL 5.5, D7 145 pass

Here is a version of the last comment as a patch.

Comment #17

liberatr

he/him

English

Portland, OR

CreditAttribution: liberatr for Autodesk Knowledge Network commented 9 March 2020 at 23:41

NOTE if your translated content contains CSS in a style tag you will get output like this:

<style type="text/css"><![CDATA[.some-css-selector{
    display: inline-block;
    width: 100%;
  }
  ]]></style>

Fix (if this matters to you):
$content = str_replace(array('<![CDATA[', ']]>'), '', $content);

Comment #18

liberatr

he/him

English

Portland, OR

CreditAttribution: liberatr for Autodesk Knowledge Network commented 10 March 2020 at 19:52

Status:

Needs review

» Reviewed & tested by the community

Translate from file (import/export) doesn't work with filtered html

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Thank you to these Drupal contributors

News items

Our community

Documentation

Drupal code base

Governance of community