In many langages the same english string can (and should) be translated differently, depending on context.
Here are two patches, one for locale.inc and one for locale.module that allows the same string to be translated more than once.

It adds the correct path (filename) and line number of the translated string as "location" instead of the url where it was first seen.
It will also allow you to download the .po-file for only one module at the time.
These two improvements makes it a lot easier to find the correct context for the translation.

It will use a "best effort" when finding translations, first trying to match on file:line, then only file and at last, any translated string with the same 'source'.
Be aware that the first few page loads after a new translation is added are really slow, until the database has been updated with all the new strings and locations.
Also note that this will lead to several "unused" strings in the database. I have an idea about some sort of timestamp to check when a string was last used, and a cron job that removes old, unused strings, but I think this could cause more problems than it fixes.

Anyway, here are the patches. Let me know if you want them as attachments instead.

locale.module:

--- locale.module.orig  2005-03-23 08:42:29.000000000 +0100
+++ locale.module       2005-03-25 10:10:39.180616160 +0100
@@ -142,29 +142,67 @@

   // We don't have this translation cached, so get it from the DB
   else {
-    $result = db_query("SELECT s.lid, t.translation FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE s.source = '%s' AND t.locale = '%s'", $string, $locale);
+    $caller = debug_backtrace();
+    $docroot = realpath($_SERVER['DOCUMENT_ROOT']);
+    $file = ereg_replace($docroot, '', $caller[1]['file']);
+    $basefile = basename($file);
+    $line = $caller[1]['line'];
+    $origstring = $string;
+    $result = db_query("SELECT s.lid, s.location, t.translation FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE s.source = '%s' AND t.locale = '%s'", $string, $locale);
     // Translation found
-    if ($trans = db_fetch_object($result)) {
+    while ($trans = db_fetch_object($result)) {
       if (!empty($trans->translation)) {
-        $locale_t[$string] = $trans->translation;
-        $string = $trans->translation;
+        if ($trans->location == "$file:$line") {
+          // We have 100% match
+          $locale_t[$string] = $trans->translation;
+          $string = $trans->translation;
+          $match = $trans->lid;
+          $rate = 100;
+          break;
+        }
+        elseif (eregi($basefile, $trans->location) && ($rate < 100)) {
+          // We have a match in the same file, but on a different line
+          $locale_t[$string] = $trans->translation;
+          $string = $trans->translation;
+          $match = $trans->lid;
+          $rate = 75;
+        }
+        elseif ($rate < 50) {
+          // We have a match in another file
+          $locale_t[$string] = $trans->translation;
+          $string = $trans->translation;
+          $match = $trans->lid;
+          $rate = 50;
+        }
+      }
+    }
+    // We have a translation, but not a full file:line match
+    if (($match) && ($rate < 100)) {
+      // Lets update source and target with the correct location
+      db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", "$file:$line", $origstring);
+      if ($locale) {
+          $lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $origstring, "$file:$line"));
+          db_query("INSERT INTO {locales_target} (lid, locale, translation) VALUES (%d, '%s', '%s')", $lid->lid, $locale, $string);
       }
     }

     // Either we have no such source string, or no translation
-    else {
-      $result = db_query("SELECT lid, source FROM {locales_source} WHERE source = '%s'", $string);
-      // We have no such translation
+    elseif (!$match) {
+      $result = db_query("SELECT lid, source FROM {locales_source} WHERE source = '%s' AND location = '%s'", $origstring, "$file:$line");
       if ($obj = db_fetch_object($result)) {
         if ($locale) {
-          db_query("INSERT INTO {locales_target} (lid, locale) VALUES (%d, '%s')", $obj->lid, $locale);
+          $trans = db_fetch_object(db_query("SELECT lid FROM {locales_target} WHERE lid = '%d' AND locale = '%s'"", $obj->lid, $locale));
+          // We have no such translation
+          if (!$trans) {
+            db_query("INSERT INTO {locales_target} (lid, locale) VALUES (%d, '%s')", $obj->lid, $locale);
+          }
         }
       }
       // We have no such source string
       else {
-        db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", request_uri(), $string);
+        db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", "$file:$line", $string);
         if ($locale) {
-          $lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $string));
+          $lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $string, "$file:$line"));
           db_query("INSERT INTO {locales_target} (lid, locale) VALUES (%d, '%s')", $lid->lid, $locale);
         }
       }
@@ -410,7 +448,7 @@
   include_once 'includes/locale.inc';
   switch ($_POST['op']) {
     case t('Export'):
-      _locale_export_po($_POST['edit']['langcode']);
+      _locale_export_po($_POST['edit']['langcode'], $_POST['edit']['filename']);
       break;
   }
   print theme('page', _locale_admin_export_screen());

And for locale.inc

--- locale.inc.orig     2005-03-23 18:03:27.000000000 +0100
+++ locale.inc  2005-03-25 09:58:22.433809358 +0100
@@ -176,11 +176,9 @@
         if ($key == 0) {
           $plid = 0;
         }
-        $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $english[$key]));
+        $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $english[$key], $comments));
         if ($loc->lid) { // a string exists
           $lid = $loc->lid;
-          // update location field
-          db_query("UPDATE {locales_source} SET location = '%s' WHERE lid = %d", $comments, $lid);
           $trans2 = db_fetch_object(db_query("SELECT lid, translation, plid, plural FROM {locales_target} WHERE lid = %d AND locale = '%s'", $lid, $lang));
           if (!$trans2->lid) { // no translation in current language
             db_query("INSERT INTO {locales_target} (lid, locale, translation, plid, plural) VALUES (%d, '%s', '%s', %d, %d)", $lid, $lang, $trans, $plid, $key);
@@ -198,7 +196,7 @@
         }
         else { // no string
           db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", $comments, $english[$key]);
-          $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $english[$key]));
+          $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $english[$key], $comments));
           $lid = $loc->lid;
           db_query("INSERT INTO {locales_target} (lid, locale, translation, plid, plural) VALUES (%d, '%s', '%s', %d, %d)", $lid, $lang, $trans, $plid, $key);           if ($trans != '') {
@@ -213,11 +211,10 @@
     else {
       $english = $value['msgid'];
       $translation = $value['msgstr'];
-      $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $english));
+      $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $english, $comments));
       if ($loc->lid) { // a string exists
         $lid = $loc->lid;
         // update location field
-        db_query("UPDATE {locales_source} SET location = '%s' WHERE source = '%s'", $comments, $english);
         $trans = db_fetch_object(db_query("SELECT lid, translation FROM {locales_target} WHERE lid = %d AND locale = '%s'", $lid, $lang));
         if (!$trans->lid) { // no translation in current language
           db_query("INSERT INTO {locales_target} (lid, locale, translation) VALUES (%d, '%s', '%s')", $lid, $lang, $translation);
@@ -662,7 +659,7 @@
   while(strlen($comm) < 128 && count($comment)) {
     $comm .= substr(array_shift($comment), 1) .', ';
   }
-  return substr($comm, 0, -2);
+  return trim(substr($comm, 0, -2));
 }

 /**
@@ -689,18 +686,37 @@
 }

 /**
+ * Get a list of all files with at least one translatable string
+ */
+function _locale_active_modules() {
+  $loc = db_query("SELECT location FROM {locales_source}");
+  $filenames[''] = t('All files');
+  while ($locat = db_fetch_object($loc)) {
+    $basename = basename(preg_replace('/:.*/', '', $locat->location));
+    if ($basename) {
+      $filenames[$basename] = $basename;
+    }
+  }
+  ksort($filenames);
+  return $filenames;
+}
+
+/**
  * User interface for the translation export screen
  */
 function _locale_admin_export_screen() {
   $languages = locale_supported_languages(FALSE, TRUE);
   $languages = array_map("t", $languages['name']);
   unset($languages['en']);
+  $filenames = _locale_active_modules();
+
   $output = '';

   // Offer language specific export if any language is set up
   if (count($languages)) {
     $output .= '<h2>'. t('Export translation') .'</h2>';
     $form = form_select(t('Language name'), 'langcode', '', $languages, t('Select the language you would like to export in gettext Portable Object (.po) format.'));
+    $form .= form_select(t('File name'), 'filename', '', $filenames, t('Select the file you would like to export strings from.'));
     $form .= form_submit(t('Export'));
     $output .= form($form);
   }
@@ -719,13 +735,21 @@
  *
  * @param $language Selects a language to generate the output for
  */
-function _locale_export_po($language) {
+function _locale_export_po($language, $filename = NULL) {
   global $user;
+  if ($filename) {
+    $filename = "/%$filename%";
+    $sort = '(substring_index(s.location, ":", -1)+0)';
+  }
+  else {
+    $filename = '/%';
+    $sort = 'substring_index(s.location, ":", 1), (substring_index(s.location, ":", -1)+0)';
+  }

   // Get language specific strings, or all strings
   if ($language) {
     $meta = db_fetch_object(db_query("SELECT * FROM {locales_meta} WHERE locale = '%s'", $language));
-    $result = db_query("SELECT s.lid, s.source, s.location, t.translation, t.plid, t.plural FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' ORDER BY t.plid, t.plural", $language);
+    $result = db_query("SELECT s.lid, s.source, s.location, t.translation, t.plid, t.plural FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' and s.location like '%s' ORDER BY t.plid, t.plural, $sort, s.source, s.lid", $language, $filename);
   }
   else {
     $result = db_query("SELECT s.lid, s.source, s.location, t.plid, t.plural FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid GROUP BY s.lid ORDER BY t.plid, t.plural");
@@ -750,7 +774,14 @@

   // Generating Portable Object file for a language
   if ($language) {
-    $filename = $language .'.po';
+    if ($filename) {
+      $filename = preg_replace('/[^A-z0-9\.\-_]/', '', $filename);
+      if (!$filename) {
+        $filename = 'all';
+      }
+      $filename .= '.';
+    }
+    $filename .= $language .'.po';
     $header .= "# $meta->name translation of ". variable_get('site_name', 'Drupal') ."\n";
     $header .= '# Copyright (c) '. date('Y') .' '. $user->name .' <'. $user->mail .">\n";
     $header .= "#\n";

Comments

Stefan Nagtegaal’s picture

StatusFileSize
new5.84 KB

I can remember that Dries and Steven prefered the use of uploaded patch/diff files, instead of just putting the diff into an issue itself.
So, attached you'll find the patch for locale.inc..

This is such a nice feature and should _really_ get in core once.. Whatever is wrong with this patch, i'll keep on updating until it has hit the trunk..
I love it!

Stefan Nagtegaal’s picture

StatusFileSize
new4.31 KB

I can remember that Dries and Steven prefered the use of uploaded patch/diff files, instead of just putting the diff into an issue itself.
So, attached you'll find the patch for locale.module..

This is such a nice feature and should _really_ get in core once.. Whatever is wrong with this patch, i'll keep on updating until it has hit the trunk..
I love it!

(Set status to patch again.)

chx’s picture

Please consider this for 4.6. The need is real great for this functionality.

Olen’s picture

Just discovered a small bug.
There is an extra " at the end of tis query on line 194 of locale.module:

$trans = db_fetch_object(db_query("SELECT lid FROM {locales_target} WHERE lid = '%d' AND locale = '%s'"", $obj->lid, $locale));

gábor hojtsy’s picture

Things to note here:

  • debug_backtrace() could be expensive, it does not seem to me that someone benchmarked this change
  • I expect realpath() to be quite expensive, since it tries to resolve all possible symbolic links in the path, so it does quite some file system checks. Note that there are really a lot of t() calls on a page!
  • The locale caching code was not changed as far as I can tell, and only the non cached strings will be checked for file name and line number, so those that have real problems (short strings) are not affected by this patch, as they are precached and loaded and checked without the line numbers... Excuse me if I find this funny :)
  • The real big roadblock here, is that you need to find a way to represent these multiple strings in the po file... First it is not possible to have different translations for the same string in PO files, second, if it would be possible, the extractor would need to have all the filename:line unique source strings extracted separately (ie. you would have ~20 "Submit" strings to translate even only for core, etc.). So you need to provide some solution for representing this in the PO files, or unless this whole idea is pointless.
Olen’s picture

> Things to note here:
>
> * debug_backtrace() could be expensive, it does not seem to me that someone benchmarked this change

I have not done a real benchmark, but at least things don't "feel" slower. This function was much faster than i feared. But other solutions that gives the same info in a less expensive way would be highly appreciated.

What I did have in mind first was to build something from extractor.php to extract the strings from the files and add them all to the database at once, not waiting for them to be accessed, but debug_backtrace at least gave the right location without making too many changes to exisiting code.

> * I expect realpath() to be quite expensive, since it tries to resolve all possible symbolic links in the path, so it does quite some file system checks. Note that there are really a lot of t() calls on a page!

The reason I used realpath is just because I use a couple of symlinks for the base_dir, and did not want them in the location field. I guess this is not true for most people, so it could probably be removed.

> * The locale caching code was not changed as far as I can tell, and only the non cached strings will be checked for file name and line number, so those that have real problems (short strings) are not affected by this patch, as they are precached and loaded and checked without the line numbers... Excuse me if I find this funny :)

If this is true, I totally agree. I was not aware of the precache. I believed things were only cached on first access (an hence affected by my patch).

> * The real big roadblock here, is that you need to find a way to represent these multiple strings in the po file... First it is not possible to have different translations for the same string in PO files, second, if it would be possible, the extractor would need to have all the filename:line unique source strings extracted separately (ie. you would have ~20 "Submit" strings to translate even only for core, etc.). So you need to provide some solution for representing this in the PO files, or unless this whole idea is pointless.

I partly agree. For me, "Submit" was one of the reasons for adding this. That string should be translated to at least three or four different words or expressions in norwegian to be correct in all places.
An other reason I started on this patch was because I wanted to find out exactly where to string is originating when I do translations.
'locations' of the form "/?PHPSESSID=foobar" does not make it easy to find out what I should translate some sting to if it does not have a clear and unambiguous meaning.

What happens in the patch today, is that if someone calls t('Submit') for the first time in a new location, the translations are searched. And if a translation of the same string is found - either in the same file or at all - the tables are updated and the new location is added to the _source table. The translated string is then added to the _target table as well.
So if you translate 'Submit' once, that translation is used everywhere. But if you need to change it in one or more places, download the (now uncorrect) .po-file for that module (or other file) and change it on that single line.
(Ofcourse, this could lead to the opposite problem - If you want to change _all_ translations of "Submit" whois would now have to be done on ~20 places instead of one, but I am also working on an improved version of the built in translation tool, that will take care of this (as well as fix a few other issues to make it more useful (even if it is not ment to compete with specialized applications such as Kbabel or GnomeTranslator).

The formal correctness of the PO files was secondary to me when I started this work, as the important issue was to make the translated strings be correct in Drupal.
I am sure the problem that some strings need to be translated differently in different parts of an application must have been an issue other developers of other applications must have "discovered", and that there must be a way to represent that in i PO file.
I'll have to read a bit about i18n and PO to find the best solution to this.

I think I am trying to solve an important issue, but it should ofcourse be done the right way.
Thanks for pointing this out.

gábor hojtsy’s picture

Olen, you really need to investigate the original locale code further. Since short strings are cached by Drupal, your code will not be called for the 'Submit' string, and the proper file/line will not be found. You added the check to the place where only the long strings are searched for (actually the strings not cached).

We also need to have a completely po friendly way of representing this, this might be of secondary consern to you, but the exploded number of interface Drupal translations resulted from the fact that it finally became easy to translate the interface with ready-to-use desktop tools. No matter how friendly you make the web interface, it is still tremendously easier to do text editing on the desktop.

Doing realpath() on all t() calls on a constant value is quite pointless, and it should not be done. If it is desired to be called, then the result should be cached somewhere. Resolving symlinks takes time.

I agree that this problem is apperent, and it would be ideal to have some fix, but this is not there yet.

Stefan Nagtegaal’s picture

Olen, is there any more work done on this issue? Please share your thoughts and idea's please, because i truly like this to meet core when Goba and Gerhard also think that "this is a good thing"tm..

Olen’s picture

Sorry, I've been buzzy "going live" with a site the last week, so I have not had the time to comment or do any more work on this for some days. Now things have calmed down a bit, and I am ready to fix the issues that has come up.

After reading a bit about PO, I realize that today it is not possible to have multiple translations of the same string in the same "domain".
As far as I can see, "domain" seems to more or less mean "po file", so this should be a minor issue (with my patch, you already have the option to download one .po file per drupal file).
All one would need to do to make it 100% PO-compatible is to add a "DISTINCT" to the query creating the .po-file to download.
(Please correct me if I'm wrong).

That way you could first export "All", translate all (distinct) strings, translate everything, import the file, surf a bit around to locate the places the translations need to be changed, export the particular po files to change the translations in whatever files (domains) you need to afterwards.

To really make this work as expected, it would be best to have a script run when new files are added or updated to get all strings from all files and add them to the locales_*-tables. That way one would ensure that all strings are translated the first time.
Today strings are added to the tables the first time they are "seen", which makes the tables grow slowly, and is frustrating for translators, as the files need to be downloaded many times to get all the string.

So my proposal is to do something like:

  • Copy/alter/make a new version of 'extractor.php' to insert strings into the database.
  • Run this script (from cron?) to make sure strings from new modules are inserted automatically.
  • Maybe do some cleanup of no longer needed strings at the same time (export them to a unsed.$date.po or something).
    • As files change, you may have the same string mulitple times in the database IE in the locations "example.module:123" and "example.module:124", because a line was added or removed somewhere above the string in an update of the module.
  • Use "filename" as "domain" and create both .po and .pot-files from the database.
  • Use "DISTINCT" on the queries to make sure the same string only appears once in every .po (and .pot)
killes@www.drop.org’s picture

"domain" means today "one Drupal site" because we do offer one accumulated PO file per site for download. If we had two different translations per original string that PO file would be invalid. I don't see a way around that problem. We also prefer attached patches.

jose reyero’s picture

Version: x.y.z » 6.x-dev

While the feature looks really interesting, I don't think we are quite there yet. As Gabor pointed out, debug functions will possibly be expensive in terms of performance plus I wouldn't feel really comfortable by using all this debug code for real functionality -besides actually debugging-.

I also would like to point you at some related development that builds on locale module, dynamic object translation, here http://drupal.org/node/141461

About this feature, basically, we need to provide some context to the t() function. I have a pair suggestions for this:

  1. Provide some string id, or context string as a parameter, like in the other patch I've mentioned. This possibly does away the simplicity of the nice t() function but may make sense for some cases
  2. Provide some global context for page executions so Drupal can keep information about in which module we are and what we are doing. This may be also used by a great deal of other functionality, like query or form rewriting depending on context. Some code about how this can be done:
// We can keep track of which module we are running if we make all 'cross module' calls with module_invoke
// Something similar may be used for the menu callback system
function module_invoke($module, $function....)
  global $context = array();
  array_push($context, $module); // Or array_push($context, "$module:$function");
  //...
  // The rest of module invoke here. From this point on we can know which module we are running by examining the global $context array
  //....
 array_pop($context);
dami’s picture

Version: 6.x-dev » 7.x-dev

Problem still exists, move to 7.x.
Cross referencing similar issue #334283: Add msgctxt-type context to t()

jose reyero’s picture

Issue tags: +i18n sprint
chx’s picture

If we begin to push information into a global, then it's high time to benchmark debug_backtrace instead.

jose reyero’s picture

Status: Active » Postponed

I'd postpone this one in favor of #334283: Add msgctxt-type context to t(), at least for i18n sprint.

gábor hojtsy’s picture

Why not make a duplicate of that one then?

jose reyero’s picture

Status: Postponed » Closed (duplicate)
Issue tags: -i18n sprint

Yes, maybe better.

As a side note, the approach talked here (adding 'Drupal context') was really different but anyway I think the good one is adding semantic context to strigs, which is in the other issue.

So closing this one. Let's discuss all contexts in the other thread, #334283: Add msgctxt-type context to t()