Problem

  • Some info hooks require you to declare human-readable values with t(). Some others don't. Getting it wrong potentially leads to security issues.
  • Info hook values for low-level stuff defined in code and similar things are always the same, but Drupal creates different, language-specific caches for them.

Goal

  • Restore sanity.
  • Drop language-specific caches. Only translate/localize strings on presentation.

Details

  • #503550: Translated strings are cached in _info() hooks and #813370: Hook_menu_alter() not supported by potx made us (inconsistently) introduce two patterns:
    1. Info hooks that define things using t() for human-readable values
    2. Info hooks that define things NOT using t() for human-readable values

    Nonsense? Nonsense.

  • Some (most) info hooks require you to use t() now, and we cache (the same) information for each language, just translated.
  • The mere reason for doing so is that we're not able to determine translatable/localizable strings through our potx template extractor otherwise.
  • Lacking a proper mechanism for doing so, we helped ourselves by introducing weird magic for functions like watchdog() that requires you to put the log message string within the function arguments on the same line in order to be identified by potx' string extractor.

Proposal

  1. Introduce tl(), standing for "translate later/lazy", [bikeshed], to signify a string to be extracted for string translation.
  2. Remove language-specific caches for info hooks, everywhere.
Files: 
CommentFileSizeAuthor
#31 interdiff.txt659 bytesalmaudoh
#31 translation_no_op-31.patch3.1 KBalmaudoh
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 94,344 pass(es).
[ View ]
#29 plenty_t.zip658 bytesalmaudoh
#29 translation_no_op-29.patch2.45 KBalmaudoh
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 92,599 pass(es), 85 fail(s), and 46 exception(s).
[ View ]
#27 translation_no_op.patch1.75 KBalmaudoh
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 26,015 pass(es), 1,014 fail(s), and 62 exception(s).
[ View ]
#12 d8_transltable.patch7.95 KBfago
FAILED: [[SimpleTest]]: [MySQL] 48,960 pass(es), 0 fail(s), and 16 exception(s).
[ View ]
#3 drupal8.tl_.3.patch1.5 KBsun
PASSED: [[SimpleTest]]: [MySQL] 35,353 pass(es).
[ View ]

Comments

sun’s picture

Berdir’s picture

Restoring sanity sounds nice :)

+1

Would we use that in hook_menu() as well?

This would also be helpful for contrib which have similar patterns (not just info hooks, but e.g. writing untranslated strings to the db and translate on display like menu/watchdog).

However, wouldn't we also need something like t_dear_potx_i_know_what_i_am_doing($variable) to prevent it from throwing those warnings that it currently does if you do t($variable)?

sun’s picture

Status:Active» Needs review
StatusFileSize
new1.5 KB
PASSED: [[SimpleTest]]: [MySQL] 35,353 pass(es).
[ View ]

Perhaps we'd just remove the warning for t($variable) for D8?

Attached patch implements tl().

Of course, there will be a lot to convert after introducing this. I'd suggest to handle every single subsystem/API on its own in separate issues, since each of those patches will have to:

  1. Add tl() to all strings in the API hook implementations.
  2. Remove the per-language caching and whatever else from the API.
  3. Adjust all forms and view/output code to call t() on the strings.
Berdir’s picture

Another thing that might make use of this are exception messages.

Those are currently a huge mess in core and there are open issues for that. Simply using tl() won't be enough for them because they might have arguments which need to be passed through (and either used when the exception is displayed and/or passed to watchdog() to save them).

Docblock in the patch looks nice, probably makes sense to add a reference to t() as well?

Gábor Hojtsy’s picture

One of the main reasons this was not done before is performance. Why introduce a function call even for the most innocent thing as to return a string, which would slow down the page a bit. If we only use these with info hooks, it would of course only be called when the info caches are regenerated but it is a small hit nonetheless. Magic comments were suggested before, but those could be ugly too. Think /* t( */ and /* )t */ or something along those lines.

Also consider that we'd need to support format_plural() type singular/plural pairs as well. These are stored and retrieved in pair, so they not equal to t($singular) and t($plural). They were not equal in D7 and much less so in D8 that we changed their storage model too earlier. So with that in mind, you'd need format_plurall() too (note double l).

Finally, if this is all only needed for potx, and we can agree on a code convention that does not require API changes (no new function), it is easily introduced in any Drupal version. Potx is a contrib module and it can react to anything in the code so long as we code it in :) So we might not need a core change at all to make this happen. We might only need a coding style change.

(This is a recurring discussion, there are likely many places in issues and g.d.o where you can find previous instances of the same discussion. The above is my short recollection of what we discussed before in previous instances).

Gábor Hojtsy’s picture

Also not sure why this would be a critical, especially that it can be solved outside core even.

sun’s picture

@Gábor Hojtsy: I filed this as critical, because of the first and foremost problem mentioned in the OP:

Some info hooks require you to declare human-readable values with t(). Some others don't. Getting it wrong potentially leads to security issues.

As mentioned in #3, a tl() facility cannot be simply tacked on the current code. The actual API/module code needs to actively support and be aware of it. If hook implementations don't return translated strings anymore, then the strings need to be translated later on (on presentation, when translations are actually needed).

I don't consider performance to be an issue. It's only used in cases of info hook definitions (of which all are cached), and this is actually the typical no-op function being used in performance benchmarks. Even if you'd call this a million of times, like so:

const ITERATIONS = 1000000;

$data = 'text is the argument';

$start = microtime(true);
for ($i=0; $i < ITERATIONS; ++$i) {
}
$stop = microtime(true);
echo 'nothing: ' . ($stop - $start) . ' seconds' . PHP_EOL;

$start = microtime(true);
for ($i=0; $i < ITERATIONS; ++$i) {
  $text = tl($data);
}
$stop = microtime(true);
echo 'function tl(): ' . ($stop - $start) . ' seconds' . PHP_EOL;

then the total time is negligible:

> php debug.php
nothing: 0.26343202590942 seconds
function tl(): 1.5744860172272 seconds

On format_plural(), eek. ;) I'm not sure whether that use-case actually exists within the tl() scope. I'm only aware of one issue that wanted to add format_plural() support to watchdog() strings, but I think I marked that won't fix, because no one really cares whether log messages use proper plural language rules.

chx’s picture

Drop language-specific caches. Only translate/localize strings on presentation.

That's ... not practical? Good caching involves storing pieces of rendered HTML, you can't really disassemble that into the source strings to translate that on the fly.

Crell’s picture

-1 from me on using a noop function as a marker. That has performance implications as well as hard dependency problems. Can we not use annotations somehow?

How does anyone else do this? Or do they simply use keyed lookup files (something I know we considered but decided against for some reason...)?

xjm’s picture

Whatever solution is implemented should then be used for annotations' Translation objects. (See #1683644: Use Annotations for plugin discovery).

fago’s picture

Coming back to this one from #1853096: Integrate symfony validation violations with Drupal translation. Over there, we need a way to pass on a translated message such that afterwards we can output the translation + the untranslated message, e.g. to log it to watchdog. Problem being, our regular way to do this (pass on message template + args) does not cope with format_plural(), what we need absolutely for validation violation messages.

Thus, I was thinking about introducing a simple class holding translatable messages

<?php
class Translatable {
  
  
// Provide ways to construct it with
   // - single message + args
   // - plural message + args
   // - optionally, with context also

 
public function getMessage() {
     return
t($this->message, $this->arguments);
}
?>

Once we'd have such a simple value-object we could easily
- write potx support for that
- pass it on to functions that need to postpone translation like watchdog() or violation constraints.
- put it into our caches (that's why I'm posting it here ;)

Not sure whether having those objects in caches is a performance concern - that would obviously require benchmarks. But it would be a simple way to unify per-language caches and solve potx issues. Actually, I'm surprised I've not found any talk about a solution like this one here?

fago’s picture

StatusFileSize
new7.95 KB
FAILED: [[SimpleTest]]: [MySQL] 48,960 pass(es), 0 fail(s), and 16 exception(s).
[ View ]

I did a quick patch and performance test. I converted t() calls from system_data_type_info() and tested deserialization performance via that code snippet.

I was not able to get a measurable performance difference, e.g.
Translatable classes
1424.64ms for 10000 runs.
0.142464ms per run

with pre-translated strings
1439.51ms for 10000 runs.
0.143951ms per run

Simple patch attached. Note, that we could use that class from plugin annotations would result in a better DX.

Gábor Hojtsy’s picture

@fago: the patch at #1813762: Introduce unified interfaces, use dependency injection for interface translation proposes to swap most of what is under t() out to be pluggable, so I think there are probably lots of opportunities to work together here, instead of making things pluggable on different levels in parallel, somehow figure out the best way :)

fago’s picture

Thanks for the pointer I was not aware of that issue, it doesn't seem to address the problem of postponing translation though?

Gábor Hojtsy’s picture

No it does not do that.

Status:Needs review» Needs work

The last submitted patch, d8_transltable.patch, failed testing.

Crell’s picture

Also potentially relevant: #1843798: [meta] Refactor Render API to be OO

That issue isn't targeting translation per se, but the parallels are strong.

Gábor Hojtsy’s picture

Version:8.x-dev» 9.x-dev

Does not seem to have any chance to happen in Drupal 8.

Gábor Hojtsy’s picture

Issue summary:View changes

Updated issue summary.

Gábor Hojtsy’s picture

Issue tags:-D8MI
catch’s picture

Version:9.x-dev» 8.0.x-dev
Category:Feature request» Task
Priority:Critical» Major

This would solve some outstanding critical issues in 8.x and it might even be a duplicate.

Gábor Hojtsy’s picture

In one of our criticals at #2363099: Using translated field definition descriptions in entity schema results in different schema definitions, resulting in update.php changes this came up but then got discarded in favour of special casing these method calls for the potx parser.

stefan.r’s picture

Just bumping this because this is coming up again in another issue and will surely come up in contrib as well.

@Crell what kind of annotations were you thinking of? If we duplicate the string in the annotation we'd need to police that it actually exists in the file because it's so easy for it to deviate.

Maybe something like this could work, where we just take the default value of the property?

<?php
/**
 * @LazyTranslate
 */
public $message = 'The email address %value is already taken.';
?>
almaudoh’s picture

Maybe something like this could work, where we just take the default value of the property?

This potentially would work for constraint violation messages and other cases where the translatable string is stored as a class property. But are there other cases where the translatable string is not stored in a class property?

stefan.r’s picture

Hmm maybe that's a limitation we can live with?

Otherwise maybe an annotation right above a regular variable assignment, do we know if any other PHP projects do this?

Gábor Hojtsy’s picture

[6:08pm] stefan_r: GaborHojtsy: hi! would you mind having a look at https://www.drupal.org/node/1542144
[6:08pm] Druplicon: https://www.drupal.org/node/1542144 => Mark strings as localizable/translatable (new t()-alike string function that isn't t(), only for potx) [#1542144] => 25 comments, 3 IRC mentions
[6:08pm] stefan_r: GaborHojtsy: and letting me know if the annotations idea has any merit / chance of getting in?
[6:09pm] GaborHojtsy: stefan_r: I don’t think the annotation is much simpler than adding some dead code with the string, although easier to maintain
[6:09pm] Mark_L6n: eiriksm: I’ve worked on MT systems, will write some feedback and put it there
[6:09pm] GaborHojtsy: stefan_r: technically it could be parsed out with the string
[6:10pm] stefan_r: stefan_r: crell had some worries about dependencies and performance
[6:10pm] GaborHojtsy: stefan_r: the ultimate problem in D8 is there are 10 APIs to mark a string translatable now http://hojtsy.hu/blog/2013-jul-24/drupal-8-multilingual-tidbits-10-context-specific-text-translation-apis
[6:10pm] stefan_r: with the nt()
[6:10pm] stefan_r: stefan_r: so we don't want yet another one? :)
[6:11pm] stefan_r: Isn't this a problem contrib developers are going to run into as well?
[6:11pm] sebcorbin`away is now known as sebcorbin.
[6:11pm] GaborHojtsy: stefan_r: well, its a problem they already ran into for 10 years :D
[6:11pm] stefan_r: :)
[6:12pm] GaborHojtsy: stefan_r: one of the “D8 standard kind of” ways would be to have a $module.messages.yml that would include these and be exposed to potx
[6:12pm] stefan_r: hey that could work
[6:12pm] GaborHojtsy: stefan_r: along the lines of info.yml, menu.links.yml, etc.
[6:12pm] stefan_r: but the problem is you'd have duplication
[6:12pm] GaborHojtsy: stefan_r: potx has a standard way of those explained so in fact any contrib module could invent this even
[6:13pm] stefan_r: because the code also needs access to it
[6:13pm] GaborHojtsy: stefan_r: well, code can read from this
[6:13pm] stefan_r: by calling a MessagesFileParser? :)
[6:13pm] GaborHojtsy: stefan_r: yeah something… anyway, I’ll post this discussion on the issue… not a very baked idea just brainstorming
[6:14pm] stefan_r: GaborHojtsy: thanks. Berdir seemed to really have wanted to get something in
stefan.r’s picture

A $module.messages.yml file would make sense, though I think @Crell mentioned in #9 that was decided against at some point, does anyone remember why?

almaudoh’s picture

Status:Needs work» Needs review
StatusFileSize
new1.75 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 26,015 pass(es), 1,014 fail(s), and 62 exception(s).
[ View ]

Looking through this again, I think a no-op version of t() or $this->t() would still be the best option. Perhaps a fourth optional argument to specify that it's a no-op. This would need profiling though for the extra if statement.

Status:Needs review» Needs work

The last submitted patch, 27: translation_no_op.patch, failed testing.

almaudoh’s picture

Status:Needs work» Needs review
StatusFileSize
new2.45 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 92,599 pass(es), 85 fail(s), and 46 exception(s).
[ View ]
new658 bytes

Fixed the test fails.

Also did some basic profiling with the patch using ab with xdebug disabled. Disabled internal page cache module. Installed attached plenty_t module which calls t() 10000 times.

ab -n 1000 -c10 http://localhost/drupal-dev

HEAD:
Run1:

Requests per second:    1596.26 [#/sec] (mean)
Time per request:       0.626 [ms] (mean)
Time per request:       0.626 [ms] (mean, across all concurrent requests)
Transfer rate:          971.16 [Kbytes/sec] received

Run2:

Requests per second:    1604.50 [#/sec] (mean)
Time per request:       0.623 [ms] (mean)
Time per request:       0.623 [ms] (mean, across all concurrent requests)
Transfer rate:          976.18 [Kbytes/sec] received

patch:
Run1:

Requests per second:    1605.53 [#/sec] (mean)
Time per request:       0.623 [ms] (mean)
Time per request:       0.623 [ms] (mean, across all concurrent requests)
Transfer rate:          976.80 [Kbytes/sec] received

Run2:

Requests per second:    1591.07 [#/sec] (mean)
Time per request:       0.629 [ms] (mean)
Time per request:       0.629 [ms] (mean, across all concurrent requests)
Transfer rate:          968.00 [Kbytes/sec] received

This indicates that there is really no overhead to the if-statement needed by the no_op approach.

I will look at using xhprof maybe later.

Status:Needs review» Needs work

The last submitted patch, 29: translation_no_op-29.patch, failed testing.

almaudoh’s picture

Status:Needs work» Needs review
StatusFileSize
new3.1 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 94,344 pass(es).
[ View ]
new659 bytes
Crell’s picture

Re #26: I don't recall details. There was discussion between D8MI and CMI back in 2011 and early 2012 regarding using placeholders in code, with a lookup file for the actual strings. There are other projects that work that way, but not all of them. I *think* the reason it was decided against was that it was hinder DX for module authors, but I don't recall for sure. Perhaps Gabor or Heyrocker would remember.

For my part, my main concern is avoiding runtime dependencies. A no-op function call in a random class somewhere may have minimal performance impact, but it's still a function that must be defined at runtime and therefore hurts testing, code separation, etc. That's why, eg, we're not translating exception messages; it would mean any class that might throw an exception always needs the translation system, which is not acceptable (especially given the potentially numerous circular dependencies it could cause). Some way of denoting a translatable string that does not have a runtime requirement is fine; what that is, I don't care as much. :-)

stefan.r’s picture

Well it'd be the n-th way of translating something but I don't think the DX hindrance is that horrible. If we can't do a no-op function likely anything we pick will be a DX hindrance, but this seems like an important enough feature to make us want to pick at least something.

Maybe annotations would be the least "controversial" bet? We could just parse any variable assignment in the line underneath the annotation:

<?php
// @localizable
$foo = 'bar';
?>