Hello,

submitting any node with more than 1200 chars will result in this error:

Translation has been rejected with following error: Unable to connect to Google Translate service due to following error: Request-URI Too Large at https://www.googleapis.com/language/translate/v2/? [... very long url here... ]

The problem should be solved by sending the data via POST.

Useful references:
https://groups.google.com/forum/?fromgroups=#!topic/google-ajax-search-a...
https://developers.google.com/translate/v2/using_rest

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Sifro’s picture

Ok, i've tried but i'm out of ideas.. The solution must be very simple and near, but i couldn't find it.

I first tried using drupal_http_request... but i get this error:

Translation has been rejected with following error: Unable to connect to Google Translate service due to following error: Bad Request at https://www.googleapis.com/language/translate/v2

"Bad Request" only? It seems one of microsoft highly descriptive errors!

This is the code i used:


if (!empty($q)) {
      foreach ($q as $source_text) {
        $params .= "&q=" . str_replace('%2F', '/', rawurlencode($source_text));
}
    }

$options['method']='POST';
$options['headers']['X-HTTP-Method-Override'] = 'GET';
$options['data']="key=MY-API-KEY-HERE&source=it&target=en".$params;
$url = 'https://www.googleapis.com/language/translate/v2';

    $response = drupal_http_request($url, $options);

And this is the $options array:

array(3) {
  ["headers"]=>
  array(2) {
    ["Content-Type"]=>
    string(10) "text/plain"
    ["X-HTTP-Method-Override"]=>
    string(3) "GET"
  }
  ["method"]=>
  string(4) "POST"
  ["data"]=>
  string(7667) "key=AIzaSyAM7Yk248uGMHhzVg2nKNdMC1m0RrNPa2E&source=it&target=en&q=%3Cp%20style%3D%22margin-bottom%3A%200cm%22%3EEdgeWorld%20%C3%A8%20un%20browser%20game%20prodotto%20dalla%20Kabam%2C%20azienda%20che%20abbiamo%20gi%C3%A0%20imparato%20a%20conoscere%20grazie%20a%20diversi%20titoli%20di%20qualit%C3%A0%2C%20che%20ci%20metter%C3%A0%20nei%20panni%20di%20un%20comandante%20dell%26%2339%3Besercito.%3C/p%3E%3Cp%20style%3D%22margin-bottom%3A%200cm%22%3EIl%20gioco%20%C3%A8%20ambientato%20nel%20futuro%2C%20precisamente%20nel%202711%2C%20su%20un%20pianeta%20alieno%20disabitato%20chiamato%20Cerulea%20nel%20bel%20mezzo%20di%20una%20guerra%20che%20vede%20innumerevoli%20fazioni%20una%20contro%20l%26%2339%3Baltra.%3C/p%3E%3Cp%20style%3D%22margin-bottom%3A%200cm%22%3E%26nbsp%3B%3C/p%3E%3Cp%20style%3D%22margin-bottom%3A%200cm%22%3EEffettuata%20la%20registrazione%2C%20ci%20verr%C3%A0%20immediatamente%20messa%20davanti%20%3Cem%3Ela%20nostra%20base%20operativa%20che%2C%20nel%20corso%20delle%20battaglie%2C%20si%20evolver%C3%A0%20dal%20catorcio%20che%20%C3%A8%20adesso%20fino%20a%20diventare%20una%20potenza%20di%20qualche%20chilometro%20quadrato.%3C/em%3E%3C/p%3E%3Cp%20style%3D%22margin-bottom%3A%200cm%22%3EPer%20insegnarci%20i%20rudimenti%20del%20gioco%2C%20ci%20verr%C3%A0%20in%20aiuto%20Kira%2C%20un%20essere%20blu%20cibernetico%20%28che%20sembra%20la%20sorella%20minore%20di%20Cortana%20di%20Halo%29%20che%20ci%20aiuter%C3%A0%20a%20muovere%20i%20primi%20passi%20nel%20gioco.%3C/p%3E%3Cp%20style%3D%22margin-bottom%3A%200cm%22%3E%3Cstrog%3EIl%20nostro%20compito%2C%20ovviamente%2C%20sar%C3%A0%20quello%20di%20potenziare%20la%20nostra%20base%2C%20di%20modo%20che%20sia%20pronta%20per%20gestire%20attacchi%20o%20eventuali%20difese.%3C/strong%3E%3C/p%3E%3Cp%20style%3D%22margin-bottom%3A%200cm%22%3EAccumulando%20risorse%2C%20saremo%20in%20grado%20di%20costruire%20strutture%20via%20via%[...cut...]"
}

So i changed approach. I found an example on the internet with cURL, and i decided to try this way.

I could succesfully send queries to google by using cURL: google returned the right translations.
But for some reason the integration with the tmgmt system doesn't work.. i see "translation in progress" and not "ready for review", without any errors.

This is the code i used:

$params="";

if (!empty($q)) {
      foreach ($q as $source_text) {
        $params .= $source_text;
}
    }

 $values = array(
                'key'    => 'AIzaSyAM7Yk248uGMHhzVg2nKNdMC1m0RrNPa2E',
                'target' => 'en',
                'source' => 'it',
                'q'      => strip_tags($params)
            );

var_dump($values);

            // turn the form data array into raw format so it can be used with cURL
            $formData = http_build_query($values);

            // create a connection to the API endpoint
            $ch = curl_init('https://www.googleapis.com/language/translate/v2');

            // tell cURL to return the response rather than outputting it
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

            // write the form data to the request in the post body
            curl_setopt($ch, CURLOPT_POSTFIELDS, $formData);

            // include the header to make Google treat this post request as a get request
            curl_setopt($ch, CURLOPT_HTTPHEADER, array('X-HTTP-Method-Override: GET'));

            // execute the HTTP request
            $json = curl_exec($ch);
            curl_close($ch);

            // decode the response data
            $response = json_decode($json, true);
var_dump($response); 

// for debug only to see the dumped arrays
die();


    // If we do not have data - we got error.
    if (!isset($response['data'])) {
      throw new TMGMTGoogleException('Google Translate service returned following error: @error',
        array('@error' => $response['error']['message']));
    }

    return $response;
  }

The code above gives me this:

array(4) {
  ["key"]=>
  string(39) "AIzaSyAM7Yk248uGMHhzVg2nKNdMC1m0RrNPa2E"
  ["target"]=>
  string(2) "en"
  ["source"]=>
  string(2) "it"
  ["q"]=>
  string(2462) "Urban Rivals (nome completo Clint Urban Rivals) è un gioco di carte collezionabili francese on-line stile Yu-Gi-Oh. Fondato nel 2006, inizialmente come gioco per cellulari, e successivamente come browser game. Ad oggi conta venti milioni di utenti unici creati, con una media di ventimila persone online contemporaneamente e più di un miliardo e mezzo di partite giocate in tutto.Il gameplay è [...cut...]"
}
array(1) {
  ["data"]=>
  array(1) {
    ["translations"]=>
    array(1) {
      [0]=>
      array(1) {
        ["translatedText"]=>
        string(2117) "Urban Rivals (full name Clint Urban Rivals) is a collectible card game online French style Yu-Gi-Oh. Founded in 2006, initially as a mobile game, and later as a browser game. Now has twenty million unique users created, with an average of twenty thousand people online at the same time and more than a billion and a half games played in tutto.Il gameplay is entirely focused on PvP challenges [...]"
      }
    }
  }
}

If i remove the die() instruction, run the script and then go to the page node/MY-NODE-ID/translate, i can see that the status is "not translated" and pending translation is "in progress"... while, as far as i've understood, it should be in a "needs review" status.

If i go to the page admin/config/regional/tmgmt/jobs/MY-JOB-ID i can see this under progress: 0/2/7
But i don't really get what it means.

Translating through file export\import works fine.

I'm completely stuck, i hope someone can help me. Please let me know if there's anything else i can add to the discussion.

micwille’s picture

I attached a patch to change from GET to POST request.

data is in x-www-form-urlencoded format and existing options added to the doRequest function are overwritten with drupal_http_request options that enable a POST request.

Limit is now effectively 5000 chars, as set by the api itself:
https://developers.google.com/translate/v2/faq

zhuber’s picture

Status: Active » Needs review
FileSize
1.91 KB

I had trouble applying the last patch, although the changes seem to have worked for me.

I recreated the changes, cleaned up some of the syntax formatting and then recreated the patch. I would also like to patch the tmgmt module to show the total character count, in addition to the total word count. The word count can be misleading, since the google translate API has a max character limit instead of a word limit.

Berdir’s picture

Version: 7.x-1.0-alpha2 » 7.x-1.x-dev

Testbot bump, this might need changes in the tests.

Status: Needs review » Needs work

The last submitted patch, tmgmt_google-http_post-1799502-3.patch, failed testing.

mikel1’s picture

I had spotty luck getting POST to work, and it only increases the length from 1400 to 5000 characters (which is still too small for many pages I want to translate). Attached is a patch which removes the length limit entirely, by breaking up large fields. It does not change the method from GET to POST, but it fulfills the spirit of this feature by removing length limits on translated data.

The new code works as follows.

If a field is too big (> $maxCharacters after URL encoding), it looks for HTML paragraph tags and tries to split it into small enough chunks. If a paragraph is still too big it tries to split it on sentence boundaries (delimited by "." "!" or "?"). If a single sentence is still too big it will split it on white space boundaries. If a single word is too big it leaves it untranslated. It makes a reasonable attempt to put as much stuff in each call to google as possible, to minimize latency.

I've been using this on my sites for a couple of weeks now, and it appears to work well on nodes with large bodies.

I think this patch is still useful even if the module is modified to use POST, as it will remove the 5000 character limit. To change the limit for splitting large text, simply change $maxCharacters. It is set to 1400 right now because that seems to be close to the maximum size google translate will accept.

Hope this helps.

Berdir’s picture

Status: Needs work » Needs review
Berdir’s picture

Status: Needs review » Needs work

Thanks for working on this.

+++ b/tmgmt_google.plugin.incundefined
@@ -34,7 +34,16 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+  protected $translationREs = array('/(<\/[pP][^>]*>)/',   // Paragraphs
+                                    '/([.!?])/',           // Sentences
+                                    '/((?:\s|&nbsp;)+)/'); // white space

tmgmt_word_count() has a list of characters it considers as punctuation, not all of them apply here but maybe some do?

In general, splitting text up is quite tricky, and will require a good amount of tests, also in non-english languages.

Additionally, it's a thing that many translators need, so we should try to extract this into common functions/methods so that e.g. the microsoft translator can use it. That would then live in the core module.

Might also require some thinking to come up with a good API to reduce code duplication.

+++ b/tmgmt_google.plugin.incundefined
@@ -100,18 +101,67 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+      // put in as many fields as we can...

Comments should start with an uppercase character, end with a ".", < 80 characters and consist of complete, actual english sentences.

+++ b/tmgmt_google.plugin.incundefined
@@ -100,18 +101,67 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+      // FIXX SEE IF IT WORKS LIKE THIS BEFORE MAKING MORE CHANGES

Comments like this should be removed :)

+++ b/tmgmt_google.plugin.incundefined
@@ -100,18 +101,67 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+      $qLength = 0;

Only class propertiy should use camel case, normal variables should be $q_length/$query_length (variable names are usually not shortened)

+++ b/tmgmt_google.plugin.incundefined
@@ -100,18 +101,67 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+        $fieldLength = strlen(urlencode($field)) + 3;

Drupal has wrappers for this: drupal_strlen(). That uses the mbstring extension when available.

The question, how does Google count/treat multibyte characters?

+++ b/tmgmt_google.plugin.incundefined
@@ -100,18 +101,67 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+      // FIXX this goes away
       // Split $q into chunks of self::qChunkSize.
-      foreach (array_chunk($q, $this->qChunkSize) as $_q) {
+      /* foreach (array_chunk($q, $this->qChunkSize) as $_q) {
 
         // Get translation from Google.
         $result = $this->googleRequestTranslation($job, $_q);
 
         // Collect translated texts with use of initial keys.
-        foreach ($result['data']['translations'] as $translated) {
-          $translation[$keys_sequence[$i]]['#text'] = $translated['translatedText'];
+        foreach ($result as $translated) {
+          $translation[$keys_sequence[$i]]['#text'] = $translated;
           $i++;
         }
       }

Then do it away :)

+++ b/tmgmt_google.plugin.incundefined
@@ -128,6 +178,78 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+   * Helper method to split long text into chunks and translate it.

I don't like Helper method prefixes, just state what it does, starting with a verb: "Splits long text into chunks and translates it."

+++ b/tmgmt_google.plugin.incundefined
@@ -128,6 +178,78 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+  function translateChunks($text, $index, TMGMTJob $job)

Opening { should be on the same line as the method.

+++ b/tmgmt_google.plugin.incundefined
@@ -128,6 +178,78 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+    if (preg_match('/^((<\/?[pP][^>]*>)|(<[bB][rR][^>]*\/?\s*>)|\n|\s|&nbsp;)+$/', $text)) {
+      return $text;
...
+    if ($index >= count($this->translationREs)) {
+      // Too big, and no way to split it - return text untranslated
+      return $text;

This is when it can't be splitted? We can't just return the source text then?

+++ b/tmgmt_google.plugin.incundefined
@@ -128,6 +178,78 @@ class TMGMTGoogleTranslatorPluginController extends TMGMTDefaultTranslatorPlugin
+      // print_r($match, false);
+      $nextPos = $match[1] + strlen($match[0]);
...
+        // print("continuing because this is inside a tag '" . substr($text, $translatedUpTo, $nextPos - $previousPos) . "'\n");

Debug code should be removed.

CarlHinton’s picture

This seems to be a pretty major issue with the translator. I would suggest breaking up nodes into short sentences, then posting them one by one. Breaking at a full-stop doesn't work well (I tried it), as the full-stop is used for many, many purposes - not least as a decimal point.

Anybody’s picture

Issue summary: View changes

This problem is very heavy and stops the module from working for many cases. How can we proceed here? It's really terrible.

Anybody’s picture

I tested around a lot and was NOT successful using the POST method, anyway I think there must be a way.

The path from #6 works good, but also seems to run into limits in some cases. So my suggestion would be to clean it (#6) up like suggested and then commit it to the next dev release? I think there are many people having problems with the GETs limitations.

carsonw’s picture

The patch from #6 worked for me, and I agree with @Anybody's comments in #10 and #11.