This patch adds JavaScript localization to Drupal.

Strings in JavaScript can now be wrapped in Drupal.t() to translate them to another language. This function is pretty much an exact copy of the PHP t() function. The only change is that it also handles plural formatting. The syntax is:

Drupal.t('This is a @variable.', { '@variable': 'string' });
Drupal.t([ '@count comment', '@count comments' ], { '@count': 4 });

If you pass an array ([]) as first argument, it tries to return the correct plural form based on the @count argument. Other variable prefixes like ! and % are also available and behave just like the PHP equivalents.

Translations for JavaScript are provided in the same manner as regular translations. The locale module looks in the database for strings that occur in JavaScript files and converts them to a file in files/locale/ so the translations can be cached.

For this to work, extractor.php must be modified so that it also parses JavaScript files and converts the Drupal.t() calls to translation strings in the resulting pot file. Note that it also needs to handle Drupal.t()’s plural capabilities. (I have not yet written a patch for extractor.php)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

kkaefer’s picture

There is currently a flaw in this patch: When locale.module is enabled and a JavaScript translation file is available, drupal.js and jquery.js and the translation file are added to each page request. I have a solution for this, but this goes in another patch: drupal_add_js(..., 'core') is changed so that core files are only added when there is also other JavaScript on the page. Then, we can add drupal.js, jquery.js and the JavaScript localization file as 'core' and all other files as 'module' or 'theme'.

Regarding drupal_add_js(..., 'theme'): This patch adds a (pretty basic) theming mechanism to drupal.js. Theme files can overwrite default theme functions (currently, there is only Drupal.theme('placeholder', ...)) with their own.

Steven’s picture

I'm not sure about the robustness of this approach... it depends on the extractor.php script being used to generate the .pot files in the first place. A string from a .js file will never be automatically added to the locales_source table, unlike a regular t() call.

Also, you do not necessarily need another .js file to use drupal.js/jquery.js. You can always use inline JS.

kkaefer’s picture

Status: Needs work » Needs review

(The drupal_add_js patch can be found at http://drupal.org/node/118094 and has nothing to do with this patch.)

A string from a .js file will never be automatically added to the locales_source table, unlike a regular t() call.

It would be possible to add such translations with an AJAX callback to locales_source. However, I fear that this feature could be used to drive a DoS attack if it is not properly implemented.

Actually, I don’t really see a proper way to implement this. Allowing new translations from all clients is impossible as everyone could submit garbage to our locales_source table. Recreating the translation file every time (which is necessary if there are JS translation changes) leads into higher server load and more bandwith used (regular visitors downloading a new translation file on each page view).

Working with tokens doesn’t really buy us much here either, just get a valid token and do the same attack. Limiting the number of new translations a user can supply with one token would be possible, but I’m not sure if it really worthwile.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

Very interesting, indeed! We need some way to translate strings in JS files, as more and more stuff goes there now. I have two immediate ideas to facilitate an automated collection algorithm, so not only extractor.php based translation is supported:

1. Pull every string out of .js files and into the modules requireing those. When a JS is added, the module can call t() and also instruct Drupal to add the strings to a JS file used for translation. Maybe this instruction is done via calling a locale.module function (to be implemented) or by implementing a hook, eg. hook_strings() from which Drupal can pick up every JS aimed string used on the site, and build a JS file with translations.

2. Once a page has a list of JS files used, Drupal goes through all JS files and greps (preg_match()) Drupal.t() calls, and grabs strings from there. A JS file is generated with strings for the page. This is very similar to how the CSS cache works, so if you have a different page with a different set of JS files, they need to be parsed again, and a new JS cache file will be generated and subsequently used on that page. The unfortunate result of this is that more JS cache files are created. The advantage is that unlike (1) above, there is no new concept to introduce to PHP coders.

By the way I would stronly request a Drupal.format_plural() in JS, so that the Drupal API is mirrored. Or alternatively get rid of format_plural() in Drupal and overload t(), if core maintainers are keen on that idea :)

The patch will surely not apply now that locales_meta is not there anymore, and there were a lot of other database/code changes committed around this corner of Drupal.

Anyway, good ideas here!

andreashaugstrup’s picture

I hope this comment isn't uncalled for. I just saw Gábor's wiki page in the Internationalization group and thought I would add my comment. This is the first time I try adding a comment to an issue so I hope I don't break anything.

I have been using drupal_add_js(..., 'setting') to provide localized strings for javascript using the module name to make sure one module doesn't overwrite another's language strings. E.g.

  drupal_add_js(array('my_module' => array(
    'lang' => array(
      'ok' => t('Ok'),
      'saving' => t('Saving...'),
      'edit' => t('Edit'),
      'delete' => t('Delete'),
    ),
    )
  ), 'setting');

Strings would then be available at Drupal.settings.my_module.lang. I realize this requires more work than wrapping text in t() in the javascript, but it's the solution I've been using since I am not competent to make changes in Drupal Core.

The technique works fine although it is a bit tedious if the Javascript contains many strings. The main downside is that two identical drupal_add_js(..., 'setting') calls doesn't overwrite, but rather add to each other. So using drupal_add_js() twice on the same page will result in the JS object to become an array. E.g.

  drupal_add_js(array('my_string' => 'foo'), 'setting');
  drupal_add_js(array('my_string' => 'foo'), 'setting');

Will have the unintended result that Drupal.settings.my_string will be an array with two identical elements rather than a string. This issue comes up when adding javascript strings using hook_nodeapi to add javascript when viewing a node. On the front page you will get arrays with 10 or more identical elements.

kourge’s picture

Since t() is such a common function (isn't that why it's named t() instead of drupal_translate()?), I suggest that there should be an official, endorsed shortcut function for Drupal.t() such as $t(), $T(), or _(). Although a simple var $t = Drupal.t; statement can easily define a shortcut because of JavaScript's dynamic nature, endorsing an official shortcut can make life easier for extractor.php.

Gábor Hojtsy’s picture

Konstantin, what we can do here, is to code something like the CSS compressor. Grab the JS files when they are about to be displayed and extract strings from them (yes, we would need a really small regex based extractor in Drupal core for JS files). Then we can insert the strings to db. Components of the system could be:

- a "locale_js_files" variable, containing the names of files already parsed
- when $scripts is built up for the page, compare it to locale_js_files, and read in -> parse -> insert into DB the strings from the files not yet parsed, add their names to "locale_js_files"
- when a JS is included in $scripts, which had strings extracted, also include the JS containing the translation strings (we can alternatively have a translation JS for every set of JS combinations, like the CSS is done)
- if a locale import is done or the web based editor is used, wipe the JS translation file to get built again later when required
- warn devel.module maintainers that they can add a button/link to wipe the JS translation file

Someone to implement? You mostly just need to copy the CSS cache stuff.

Steven’s picture

Regarding shortcuts: I've actually wanted to change core to use '$' in jQuery in safe fashion (i.e. without interfering with another library). You can do this with a trick like this:

(function ($) {
  
})(jQuery);

The same could be used to make 't' available with a one letter function.

kourge’s picture

I've been wrapping core JavaScript files with anonymous functions like Steven said because I needed to use the Prototype library alongside with jQuery. (It was required to replace jQuery to version 1.1, call noConflict(), and add the 1.0 compatibility pack, but that's a long, long story.) Some modules already do it, such as Fivestar.

Because of JavaScript's dynamic nature, parsing strings that are passed to Drupal.t() through shortcut functions out of JavaScript files can be a tricky problem. This is a problem that needs to be tackled. For example, should extractor.php look for Drupal.t() or t()? Or both? Should it intelligently try to sniff for other shortcuts that are present? Or should it simply follow the convention? Should it treat t() function calls, even if there is no such code as this that introduces the shortcut?

(function (t) {
// ...
})(Drupal.t);

Or this?

var t = Drupal.t;
// Or even this:
window.t = Drupal.t;
Gábor Hojtsy’s picture

kourge, PHP is equally dynamic. Anyone could do $magic = 't'; $magic('Mystring'); and no way we can pick that up. We enforce a convention, which works pretty nicely. Enterprising developers going on other ways can do so, but their work will not be localizable, that is how it works.

kkaefer’s picture

@Gabor:

As for “JS aggregation”, I’m not sure if that’s the way to go because you are not required to turn that feature on. What do you do if you don’t aggregate JS files? There will be no JS string translations…

I think we have two possibilities here:

  • Merge the JS translations into the module.pot when the .pot file is created (like the original patch) and mark them as “JS” (the translation comment contains the filename (which ends with ‘.js’)
  • Look in an actual Drupal installation for JavaScript files and extract the strings from there. The problem is, that we can’t ship pre-translated JS files (if we don’t use the method described in the first point as well) and that I have no idea when to execute that (Cron maybe? Manual toggle?)

@Steven: Good idea, => separate patch

Gábor Hojtsy’s picture

Konstantin, it seems like I was not clear. Steven was missing the "autocollection" feature of locale module from here, referring to the problem that only extractor generated and imported translations would contain strings from a JS, and Drupal would not be able to collect them. You suggested that an AJAX solution could work, but would be too open.

That was when I came in and suggested that because we have all those JavaScript files verbatim on the server file system, we can load them in, parse the Drupal.t() calls out and store them in the database as strings to translate, just like t() works when it encounters a new string. So the question is when do we encounter JavaScript files?

- Either new enabled modules could have their JS files automatically parsed and strings inserted to the database,
- Or when drupal_add_js() is called, Drupal can look into a list about whether that JS file was parsed before, and it not, it can ask locale module to parse it and insert source strings to the DB.

So my point is that (1) we have the files on the server, no need to AJAX, and (2) we can find places to parse them with some simple regex, it just needs to be decided. In case we can define calling conventions for translatable strings (I still ask for duplication of the Drupal API, having t() and format_plural() separately), a small regex can be developed and we have what Steven is missing. The translation template extractor would use the same regex anyway, so that should be developed anyway. We just need to include that into Drupal core and use it when required.

This has nothing to do with aggregation or dynamic loading or anything, I just tried to illustrate my point with them. Hopefully the above distilled version is cleaner now.

Gábor Hojtsy’s picture

Konstantin, is it likely that a new patch will see the light (considerably) before the code freeze? It would be great!

kkaefer’s picture

Yes, I will post a patch before code freeze, but not before wednesday as I'm having my ultimate final exams on tuesday and wednesday. I have already rerolled large parts of the patch and incorporated the changes from your i18n patches, but the regex parser for JS files still needs to be written.

Gábor Hojtsy’s picture

Well, if you can post an intermediary patch, we can discuss that at lest and/or help you out with the regex, while your time does not permit coding.

kkaefer’s picture

FileSize
14.05 KB

Here’s the patch.

kkaefer’s picture

Oh, it still uses the textgroup "javascript". That needs to change.

Gábor Hojtsy’s picture

Reviewed the code. Some notes:

- I would do a db_result() in locale_translate_delete(), you only need the language code
- We use $langcode where the language code is used and $language where a language object is used, so the parameter of _locale_rebuild_js() does not seem to be right... this would also remove ambiguity that $language becomes an object later
- move $data_hash = md5($data); between the if() statement using it and the comment above it, so it is clearly visible what it is for
- what are the use cases behind these? why do you include format_plural code here? why do you support passing an array object when you only handle the first argument? this seems to overly complicate the life to write an extractor for Drupal.t() calls (and it does not mirror the t() API). What is the value of these?

+  if (typeof str == 'object') {
+    if (Drupal.locale.strings[str[0]]) {
+      str = Drupal.locale.strings[str[0]];
+    }
+    if (args && args['@count']) {
+      if (Drupal.locale.pluralFormula) {
+        str = str[Drupal.locale.pluralFormula(args['@count'])];
+      }
+      else {
+        str = str[(args['@count'] == 1) ? 0 : 1];
+      }
+    }
+    else {
+      str = str[0];
+    }
+  }

Otherwise this patch looks very nice, good job! Rock on!

kkaefer’s picture

Status: Needs work » Needs review
FileSize
40.02 KB

New patch, everything should work now. If it doesn’t, that is a bug ;). Files are checked on drupal_add_js. A list is used to prevent checking the files on every page load, so only newly added files are parsed. Files are re-parsed when the JS cache is emptied.

Gábor Hojtsy’s picture

Did a quick review of the code, and it generally looks quite right. A few notes:

- lots of line ending whitespace changes, which make it hard to tell real changes from whitespace changes apart
- "$language = db_result(db_query('SELECT language FROM {locales_source} WHERE lid = %d', $lid));", should use $langcode, we use langcode for language code names (for whatever reason, this did not implemented so in the DB, but the variable names for language codes are $langcode)
- drupal_get_js() should shortly document what happens in _locale_update_js_files(), and why is it called in this unusual manner (I know, but others will not)
- Seems like _locale_update_js_files() is called on all page views and parses JS files on all page views... Or am I mistaken? This does not look right. This would easily result in some racing conditions on the "javascript_parsed" variable
- Why is _locale_rebuild_js() concerned about the settings of the "language_default" variable? IMHO it should not? What do I miss here?
- Although you added Drupal.formatPlural() on my request, you did not *enforce* it's usage (it is just a wrapper around Drupal.t()), neither you *support* using it in the parser. You did not comment why it is important to support both functionality in t(), and not mirror the Drupal API, which on the other hand you so nicely did with the theming stuff for example.
- The DB update function should work with / use the schema API functions now.

So in summary, this patch looks quite good as far as I see, it seems we only need to work out some smaller issues now.

kkaefer’s picture

(patch follows, CVS is not working atm)

  • Removed the whitespace changes. (That was caused by pressing ⌘⇧R in TextMate).
  • Renamed $language to $langcode. Maybe we should look into renaming the database column to langcode as well to keep things consistent.
  • Documented the three times NULL usage of drupal_add_js().
  • Moved the update function to schema API functions.
  • Added a new default file script.js, similar to style.css. This file is included automatically if it’s present and should contain theme overrides.

On Drupal.formatPlural(): While I think it can be useful to mirror the server-side API, I also think we should adapt to JavaScript’s flexibility. I think the array as first parameter for Drupal.t() looks quite straightforward (it could even be done on the server side). Also, due to JavaScript’s nature, it will become extremely complicated to parse Drupal.formatPlural() calls with the first parameter being an arbitrary value. Users could use things like (function() { ... })() as parameter and the regex to parse such constructs would get ridiculously complex, whereas the current regex is rather simple. I am even tempted to remove Drupal.formatPlural() completely as it would save a lot of hassle.

locale_rebuild_js() is concerned about the setting of language_default because in this variable, the JavaScript file hash is stored as well. If it is not updated, it might get outdated resulting in permanent regeneration of JS files because the keys seemingly do not match.

Yes, _locale_update_js_files() is called on almost every page view (when there are JS files added with drupal_add_js()). However, it keeps a list (as a variable) of files that are already parsed and only parsed “new” files. The list of files is cleared when the general JS cache is also cleared. However, I’m pretty open to suggestions in this matter. I am aware that adding yet another function call on virtually every page view is not a good thing, but I couldn’t come up with a better alternative. A way parsing could work is on module installation/activation: All JS files in the module’s folder are parsed and added to the database. Is that a better approach than parsing JS files when they are added to the page?

kkaefer’s picture

And here comes the patch.

Gábor Hojtsy’s picture

I am glad you reconsidered your FormatPlural approach, and actually split the functionality to map the server API to the JS API. That looks quite nice IMHO (and whether you throw lamba functions at it or not is equally enforcable whether we use arrays passed to Drupal.t() or Drupal.FormatPlural() AFAIS). *But* you did not update the regexp code to account for this nice change.

Now the default language modification is clear. It would be nice to get a short comment there with this information. Basically this is done to ensure data consistency, because we have the default language object stored as a variable to save some speed.

Excuse me about the "JS parsing called on every page" argument, I did not notice the if (!in_array($path, $parsed)) portion of the code. My bad. I don't know of a better way then this, checking against that list should be sufficient.

Anyway, this patch is in a very good shape now (by the looks, I did not test it yet) and the JS stuff is fantastically documented. A short list of things to do:

- fix the JS parsing regex
- add a small note to the default variable update as explained above
- add little phpdoc blocks to _locale_parse_js_file($location) and _locale_update_js_files() to describe what they do
- add a small reference to the FormatPlural to "the server side format_plural()", just as with you have done with t()

I would welcome people to test this patch, and report any final problems.

kkaefer’s picture

Parsing the JS files with the first argument for formatPlural being anything is way more difficult than just taking the first two arguments which have to be a string (as in Drupal.t() and its plural functionality it has had before). Any suggestions for a regular expression that correctly parses these calls?:

Drupal.formatPlural(test(), 'One item', '@count items');
Drupal.formatPlural((function() { /* , */ return 4; })(), 'One comment', '@count comments');
Drupal.formatPlural(variable, 'This is singular', 'There are @count strings');
[...]
Gábor Hojtsy’s picture

Well, I would search for '([^']+)',\s*'([^']*@count[^']*)'\); to get the two strings from the end of the line argument list... Sure, this is not 100% accurate, you can't have your quotes escaped and such, but we can make conventions here. IMHO we could even make conventions on how FormatPlural() should be called, like we have for several other (PHP) functions to properly extract the strings used.

kkaefer’s picture

  • Added parsing for Drupal.formatPlural() calls
  • The parser now recognizes concatenated strings (with +). This can be useful because JavaScript doesn’t support spanning strings over multiple lines without using concatenations (i.e. no \n allowed within a string)
  • Escaped chars like \' are allowed in strings
kkaefer’s picture

(Forgot to update the regex constant)

AmrMostafa’s picture

Status: Needs review » Needs work

IE6 JS doesn't seem to support the string[index] method of accessing string characters which causes some parts of the JS introduced here to fail (the one I stumbled upon with Drupal.t() formatting), so I guess it's better to replace usage of this syntax to string.charAt(index).

kkaefer’s picture

Status: Needs work » Needs review
FileSize
38.01 KB

Thanks for pointing that out. Fixed now.

hass’s picture

i subscribe to this cool feature

ChrisKennedy’s picture

+1 - I need a client-side t() to properly translate strings for dynamically validating textarea #maxlength. The dynamic password strength/confirmation validation patch would also be better translated with this functionality.

Gábor Hojtsy’s picture

I tried to apply, test and eventually commit it, but all I got was this:

patching file install.php
patching file includes/common.inc
Hunk #4 FAILED at 1749.
1 out of 5 hunks FAILED -- saving rejects to file includes/common.inc.rej
patching file includes/form.inc
Hunk #1 succeeded at 1562 (offset 13 lines).
patching file includes/locale.inc
patching file includes/theme.inc
patching file misc/autocomplete.js
patching file misc/drupal.js
Hunk #1 succeeded at 1 with fuzz 2.
patching file misc/progress.js
patching file misc/tableselect.js
patching file misc/teaser.js
patching file misc/upload.js
patching file modules/locale/locale.install
patching file modules/locale/locale.schema
patching file modules/system/system.js
patching file modules/system/system.module
Hunk #1 succeeded at 614 (offset 7 lines).
Hunk #2 succeeded at 1058 (offset 7 lines).
Hunk #3 succeeded at 1112 (offset 7 lines).

Seems like some stuff changed since the patch patch was made. While updating the patch, I would suggest the following little cleanups too:

- in _locale_update_js_files(), you use $path for the "file name with path" value, but it suddenly turns into $location in _locale_parse_js_file(), this is confusing... fileapi calls this $filepath... a $path is a directory trail, which might not include the file name, a $location is already well known in the locale module, and means a different thing, maybe $filepath would be better at both places, but at least use the same variable name
- $plural_matches[1] = array_merge($plural_matches[1], $t_matches[1]); seems to be odd... why would you merge the values into an index of an array, and then call this 'all matches'... $plural_matches[1] does not suggest it contains all matches, so the foreach following that looks odd

These are really small nitpicks, and I would have fixed them myself while testing, if the above patch errors would not come around, but as we need the patch rerolled anyway, the above would be handy to fix.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

Anyway, I went on, fixed the errors reported by CVS, fixed the nitpicks pointed out by myself and another few:

- $parsed array was not defined, so throwed a PHP error
- the javascript column was added twice on the upgrade path, removed the first one
- the JS files were not updated when a file gets imported, so added that in

Tested the patch, looked through the code, and it seemed to be fine, so committed to Drupal 6.

Marking this "needs work", so the theme upgrade guide is updated to reflect on translation changes as well as the new automatic script.js discovery.

Gábor Hojtsy’s picture

BTW this was the patch I committed, for future reference.

Gábor Hojtsy’s picture

Konstantin, could you please update the theme update page to reflect these changes, so we can close the issue?!

kkaefer’s picture

Status: Needs work » Closed (fixed)
Gábor Hojtsy’s picture

Great, thanks!

firecentaur’s picture

in the example above:

Drupal.t([ '@count comment', '@count comments' ], { '@count': 4 });

I understand why @ is included in the place holders, but
can you kindly explain why the final parameter has a @count in it:

{'@count':4}

Gábor Hojtsy’s picture

@firecentaur: please don't use issues to post support questions. You can use three type of placeholder, and there can be a %count, a @count and a !count, and these can be different. See http://hojtsy.hu/files/Drupal7TranslationCheatSheetv2.pdf