Convert translation request method to POST to get past 5000 character limit that results with GET requests.

Original
Module works great... so long as you don't have an HTML text field on your page that has > 5000 characters text + HTML combined. Have you given thought to breaking down a field that has > 5000 characters into multiple requests and concatenating the results?

Comments

lpeabody created an issue. See original summary.

lpeabody’s picture

https://cloud.google.com/translate/docs/reference/libraries#client-libra...

This should be using the google/cloud-translate Composer package.

EDIT: FWIW, I did a test and replaced the module's existing API usage with the Google PHP API library and I was able to remove the 5000 character limit entirely, and just pass it the raw translation data using TranslateClient::batchTranslation(). I was able to translate my whole site in about an hour (yay free trial credit). I'll look at posting a patch for it this weekend!

westerix’s picture

Did you have any luck with a patch for D8?

The character limit was easy enough to workaround in D7 but struggling to get my head around it in D8 without falling back to refactoring my content types.

lpeabody’s picture

Hey, never created the patch. I ended up just prototyping it in a project. However! It's just a single file, along with needing to require google/cloud-translate:^1.2 in your composer.json. File here https://gist.github.com/lpeabody/c3ec7615098ecce536cddcedff4e4985. If your API key is correctly set in the UI, then all you need do is add this file as a Plugin in a module and include the library as mentioned (I think, it's been quite a few months since I worked with this).

It's fine for prototyping I would say, but could use some enhancement and error checking particularly around handling rate limitations.

westerix’s picture

Just wanted to offer my thanks for the prototype, it works very well but as you say needs some refinement for more graceful error handling.

I'll see what I can do to extend that functionality now I have the base to work from.

ajfernandez’s picture

Hi, I need the same functionality for 7.x version of the module.

Anyone has a patch or knows how to fix this issue for 7.x version?

Thanks in advance!

westerix’s picture

ajFernandex - Within 7.x this patch addressed majority of issues for me: https://www.drupal.org/project/tmgmt_google/issues/1799502#comment-7710247

ajfernandez’s picture

Thanks for your response @westerix!

Which version should I apply the patch to? alpha2 or dev? I tried to apply the patch on the dev version but it gave me an error.

Thanks again!

omar alahmed’s picture

A patch for @lpeabody's work is created. The patch creates 2 files: composer.json to install google/cloud-translate:^1.2 library and a new tmgmt translation provider uses Google batch translation.

mohammed j. razem’s picture

Title: Batch jobs that have greater than 5000 characters » Allow batch jobs that have greater than 5000 characters
mohammed j. razem’s picture

Status: Active » Needs review
omar alahmed’s picture

StatusFileSize
new4.53 KB

Adding PHPCS coding standards.

lpeabody’s picture

Status: Needs review » Needs work

Two pretty minor issues to address:

+      // Pull the source data array through the job and flatten it.
+      $data = \Drupal::service('tmgmt.data')
+        ->filterTranslatable($job_item->getData());

1. Should use DI to use tmgmt.data.

+      // Save the translated data through the job.
+      // NOTE that this line of code is reached only in case all translation
+      // requests succeeded.
+      $job_item->addTranslatedData(\Drupal::service('tmgmt.data')
+        ->unflatten($translation));

2. Same as above.

criz’s picture

StatusFileSize
new30.09 KB

The patch works.

But one thing is still missing: When there is already a job item with more than 5000 characters, the job form can't be saved because of the checkTranslatable() override. This overrride and the $maxCharacters variable should be just removed if there is no limit any more.

Google TMGMT Job Form Screenshot

dejan0’s picture

Thanks for the previous patch, it helped us to find out what are actual problems with the 'character limit'.

BTW, it patch doesn't work with Continuous jobs at all. It missing parts of code related to Continuous jobs. But it works fine for simple page translate, and there is no 5000 characters limit. We needed Continuous jobs working, so I get deeper into it and realized that limits are originating from the GET method used to send Guzzle HTTP requests to Google Translate API.

The previous patch includes an additional library google/cloud-translate, which sends POST requests instead of GET, so it fixed the characters limit issue. Also, google/cloud-translate's `translateBatch` method doesn't work anything 'fancy', just supporting translating multiple strings (array) with one request, and it's already supported by TMGMT_Google. Example: `$q = ['Title', 'Body text', etc]`. It doesn't send multiple requests or so

So, I thought it would be nice to have it working within the original TMGMT Provider (Google), and without including another library dependency. So I have created a simple patch which fixes 'character limit' issues. It changes GET requests to POST, removing a limit function, etc. Please find attached patch, probably it would be helpful for someone :)

I'm not sure about Google Translate API limits, their documentation is a bit confusing, so if someone realizes what are actual limits, let us know please :) Documentation says hard limit for requests is 30K Unicode characters, but in my case, I managed to translate 35.000+ characters without any problems using this patch. Looks like the Google Cloud Translation API is migrating to a new quota system in these days

When/If we know actual Google Translate API limits (number of characters per request, number of requests in some time period, etc) - it makes a sense to work on improvements here, like splitting content into parts (this wouldn't be easy I guess), add Drupal Batch Jobs, etc

criz’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 15: 3025137-15-remove_5000_character_limit_use_POST_instead_of_GET.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

lpeabody’s picture

Why not use the official Google library for interacting with Google services? Regardless of whether or not it doesn't do anything "fancy", why reduplicate work that is already done by the library? Haven't seen a convincing argument here.

dejan0’s picture

Hi, lpeabody! It's completely fine to use the official Google library, but I think, in that case, most of the code/methods should be re-written and adapted to a new library. I have seen here is already some good work done and created a working system with GuzzleHTTP & official Google Translate API, so just tried to do some minimal changes and make it improved. And it's not any reduplicate work, just a couple lines of code added. Just my 2 cents :)
Cheers

dejan0’s picture

updated patch attached, fixed test error

dejan0’s picture

Status: Needs work » Needs review
jedgar1mx’s picture

this is great 👍. I will test #20

mohammed j. razem’s picture

Status: Needs review » Reviewed & tested by the community

The patch works well. Passed all the tests and it should be safe to say it's RTBC.

jedgar1mx’s picture

Works Great!!!

criz’s picture

Status: Reviewed & tested by the community » Needs work

Great, it works also for us so far. Probably this line needs to be removed before RTBC:
// TODO: Investigate if there are any other Google Translate API limits after switching to POST method
I don't think there is one?

mohammed j. razem’s picture

Status: Needs work » Needs review
StatusFileSize
new3.04 KB

Makes sense. The checkTranslatable method is now empty and has no validations to check.

I believe all Google Translate API limits and quotes - other than the 5000 characters limit on GET requests - are managed from the Google APIs console. Which means those limits will be checked after the request is sent to Google, as it differs from a Google console app to another.

I have removed the checkTranslatable method. Hope this patch passes the tests now.

mohammed j. razem’s picture

Both D8.9 and D9 tests passed.

Since last patch was submitted by me I hope someone else can test it and mark as RBTC.

berdir’s picture

Status: Needs review » Needs work
+++ b/src/Plugin/tmgmt/Translator/GoogleTranslator.php
@@ -266,16 +246,9 @@ class GoogleTranslator extends TranslatorPluginBase implements ContainerFactoryP
-    // Prepare Guzzle Object.
-    $request = new Request('GET', $url);
-
     // Build the query.
     $query_string .= '&key=' . $translator->getSetting('api_key');
-    if (isset($request_query['q'])) {
-      foreach ($request_query['q'] as $source_text) {
-        $query_string .= '&q=' . urlencode($source_text);
-      }
-    }
+
     if (isset($request_query['source'])) {
       $query_string .= '&source=' . $request_query['source'];

I was confused why the tests were still passing, when we completely change how we pass things along.

Turns out the tests are broken. Specifically \Drupal\tmgmt_google_test\Controller\GoogleTranslatorTestController::translate(), we check for certain query arguments and attempt to return an error if missing, but we don't actaully *return* the error.

So lets fix those checks so that the test can actually fail and then update it to look for post data. Maybe we should also add an explicit check for the expected translatable string.

Sorry for putting that on this issue, but I don't want to get the tests even further from what they should do and we need to adjust them anyway here.

ryrye’s picture

#26 is working really well for me. Great work!

andileco’s picture

#26 did not work for me. It removed my inability to submit due to the rate limit, but then after submitting, I received this error:

"Translation has been rejected with following error: Google Translate service returned following error: User Rate Limit Exceeded"

Do I also need to require google/cloud-translate with patch #26?

Thanks!

ankondrat4’s picture

#26 doesn't apply on D9.4.7
I created a new version of this patch for pass tests on D9.

ankondrat4’s picture

Sorry, I did mistake. Patch #26 applied on D9.4.7. I had some other patches for this module and forgot to check this.

ezeedub’s picture

Title: Allow batch jobs that have greater than 5000 characters » Convert request method to POST to get past 5000 character limit
Issue summary: View changes
Status: Needs work » Needs review
StatusFileSize
new12.79 KB

Here's an attempt at fixing the tests. The main issue was simply not returning the JSON missing the translatedText item with error messages, etc.. I also added looking for the POST data, the request method and cleaned up some assertions. I'm not a phpunit expert, so there may be more to do. Hopefully, this moves the ball forward.

ezeedub’s picture

StatusFileSize
new12.89 KB

Fix linting issue.

  • Berdir committed 3434ded3 on 8.x-1.x authored by ezeedub
    Issue #3025137 by dejan0, Omar Alahmed, ezeedub, Mohammed J. Razem,...
berdir’s picture

Status: Needs review » Fixed

Thanks for improving the tests, didn't test very large characters but seems to work at least as well as before.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

ankondrat4’s picture

Hello.
As I understood well, these changes were applied in tmgmt_google 8.x-1.1
Thank you.