(Yes, I know this is going to be long, but there is a lot of insight there about issues with localizing user defined stuff, so please read everything, instead of just skimming through).

MOTIVATION:

Drupal has a simple mechanism for translating source strings: t(). It works nicely on standalone strings, it works nicely with gettext based tools, translation import and export is possible. t() has the following important characteristics:

  • It assumes you translate from English to some other language.
  • It assumes that all text you pass at it are standalone strings of characters (no relation between chars).
  • It assumes that a generic string editing for widget is fine for translating these.

Unfortunately none of these stand for user defined data. Take aggregator fields and categories as examples (but keep in mind that this can be site settings, content type details, user profile fields, etc):

  • You define your aggregator categories and feeds in your site default language (this is what we are about to assume). This can be anything, oftentimes English.
  • You need translations of different properties of your objects at once, ie. when you get an aggregator category displayed, you need both the title and description translated. Properties of your objects have close relations.
  • You might need specific widgets to translate user defined data. Best would be to have an aggregator category title and description translated on the same screen, so the translator sees the relation.

So we need to translate against some language, which is not English, we need to be able to load related translations at once, and we might need to be able to provide a UI to translate related strings together.

You might jump on the issues and say that we already have the aggragetor UI to do this, we just need to have aggregator objects in multiple languages, related together. There are some obvious problems with this:

  • You don't want to have your feeds in multiple copies, downloaded multiple times by your cron hook, just to translate the title for users.
  • You won't give "manage aggregator" permissions to your translators, they should not be able to mingle with your aggregator settings.
  • Translators need an overview of what is translated, what needs to be done. If translation is spread across all Drupal admin screens, it is impossible to keep up.
  • Outsourcing of translation of aggregator stuff would rely on having import/export for aggregator objects, which is frankly not there, and not likely to happen.

So we need an interface separated from the aggregator module, but we should somehow carry over the nature of objects being edited there. This can lead to a very far consequence, so we tried to tie our hands down, and come up with a very simple to use translation API, which in turn can be used to provide a very simple translation interface, but can be built upon to provide ubercool stuff. So we are trying to enable core, understanding that it is not feasible to provide an all encompassing shiny solution in Drupal 6.

IMPLEMENTATION:

So the idea was to extend locale module to be able to translate different parts of Drupal. We already have locale module supporting different "text groups", and locale module defining a default text group for use with t() in Drupal 6.x-dev. Translations of other objects would go to text groups based on the definition of module developers. How is that done (actual patch excerpt):

function aggregator_locale($op = 'groups') {
  switch($op) {
    case 'groups':
      return array('aggregator' => t('Aggregator'));
    case 'objects':
      return array('aggregator' => array(
        'category' => array('cid', 'title', 'description'),
        'feed'     => array('fid', 'title'),
      ));
  }
}

Drupal 6.x-dev already has hook_locale() in with the "groups" op. So aggregator module can say that it defines an "aggregator" text group. What is new here is that aggregator module gives us some metadata about objects for localization. It says that it has a category and a feed object type with the shown fields for localization. The first field is always some kind of id to identify the object instance. So locale module knows that these types of objects exist there, and it needs to look into localizing title and description for aggregator categories and title for feeds.

We implemented a very simplistic object translation function, which works this way (actual patch excerpt):

$category = dt('aggregator', 'category', db_fetch_object(db_query('SELECT cid, title, block FROM {aggregator_category} WHERE cid = %d', $id)))

You need to identify the text group and object type used, and pass on the object itself. The implementation of dt() is defined in locale module. It has the knowledge of the category object type defined in the aggregator text group from hook_locale(), so it knows that the block property is not localizable, but the title is. It also notes that description is not passed, so it will not waste time loading a translation for that.

The good thing with user defined objects, if that they are user defined ;) so we have them saved before we need to translate them first (unlike how t() works). So when a user defined object is saved, aggregagtor module notifies locale module that it has user defined stuff up for translation. Actual patch excerpt follows:

module_invoke('locale', 'dynamic_update', 'aggregator', 'category', (object) $edit);

That is unsuprisingly telling locale module to save an object's localizable strings for later translation (into the aggregator text group, with the category object type). Now we save these with the following database values:

textgroup = 'aggregator', location = 'category:$cid:title', source = 'the title value'
textgroup = 'aggregator', location = 'category:$cid:description', source = 'the description value'

We basically reuse the location field provided by locale module, because it allows us to simply import and export these for translation outsourcing, and because it is already well integrated with locale module. We understand that this might lead to performance consequences, especially now that the location field is not indexed, so this is a possible performance sweet spot to discuss.

The circle closes on the user interface. Now we just provide a basic text editing widget for all object properties, as if they were standalone strings like locale module implies. But there are big opprtunities here because we store the exact 'location' of strings, so we could provide custom UIs for them.

PERFORMANCE:

Performance is key here. We would welcome groundbreaking ideas on how this should be optimized. We thought about three ways to translate Drupal object initially:

  • On the SQL level. Problem is that many times we don't need a localized object (like when editing the original object on the web interface, or when reviewing stuff on the admin interface). Also there are bazillion of SQL rewrites already, so adding ours would easily lead to breaking others.
  • On the object level. This is what we have choosen, but the performance implication is that if we need to translate tens or hundrends of objects in a row, we need tens or hundreds of SQL queries, because we only translate objects as we get them.
  • On the "list of objects level". If you have an SQL query and a result list of objects, it would be possible to translate the whole list at once, which would be one SQL query per list. The problem is that most Drupal modules do not build up lists of objects, but handle objects one-by-one. If we would require building lists of objects to translate, that would have a noticable performance impact on sites not using localization, and that was a showstopper for this idea.

So we ended up with object translation for these reasons, but we are kept with the performance problems. Different text groups might need different caching and preloading. Imagine that we extend this to menu items or taxonomy terms. Big sites could have hundreds to thousands of taxonomy terms, for which querying translations one-by-one is a nightmare. But caching them all into memory to quick access is also a no-no. Caching all profile field translation to a language into memory when we need profile fields translated is not that bad an idea on the other hand.

So we are looking forward to performance ideas. We added a callback possibility, so contrib modules can hang their custom callbacks on some text groups to optimize preloading and caching as required by needs of the actual web site.

OPEN ENDED DESIGN:

Our simplistic design kept the users in mind. They need a very simple UI, which is streamlined for the types of objects they translate. Having strings with exact locations allow us to present an UI with all properties belonging to the same object to translate at once. Because we have the properties clearly identified, modules will be able to form_alter() on the locale form shown to provide tips (in case of a "user welcome email" being edited for example, where token tips need to be specified), custom UI (in case of the site logo for example, where uploading of an image is required), provide custom validation and submit hooks (obviously for the user email or site logo). This interface is not there yet, because our open ended design even allows that to be developed in contrib. The user defined object translation however needs to be in core, so we should concentrate on providing a good backend, keeping user convinience in mind.

CommentFileSizeAuthor
#17 dt_3.patch24.24 KBGábor Hojtsy
#12 dt_2.patch19.97 KBGábor Hojtsy
dt.patch18.72 KBGábor Hojtsy
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Gábor Hojtsy’s picture

OMG, forgot to give the proper credit here! Although I have done the above writeup, it is a result of lots of discussions with Jose A Reyero and Roberto Gerola. The code in the patch was prepared by Jose and myself.

wroos’s picture

I would prefer a database/sql solution. May be a look on how leading DBMS companies (with CMS) like Oracle solve the internationalization/localization/multi-lingual/multi-domain problem would be helpful. May be db-views and aliases could also help.

Gábor Hojtsy’s picture

wroos, maybe you have some closer ideas to what we can do here? Database views unfortunately are not supported by our target databases, so such remote ideas are not applicable.

Jose Reyero’s picture

Just subscribing to this issue. But I'd like to add to Gabor's excelent writeup that
- This patch should have a negligible performance impact for single-language sites
- We are just providing a simple basic translation mechanism to be reusable by any module, but there's nothing here preventing specific modules from using some other solution for specific objects.
- The translation interface solution, while simple, is powerful and provides a single entry-point for translating everything, which is very good for translators IMHO. We can have an administrator building the site in the default language plus a team of translators that don't need to access administration settings.

Gábor Hojtsy’s picture

It seems to be logical to let people run wild with their ideas, not having a chance to be locked down by our ideas, but because there does not seem to be any activity here, maybe it is better to share some of our ideas.

1. SQL rewrite. Implement a hook_rewrite_sql() to replace columns in queries with their translated equivalence. That would require CONTACT() matching, description of all columns and the text group and object passed to the rewrite function. Possibly: db_rewrite_sql('SELECT * FROM {aggregator_categories} WHERE cid = %d', $cid);. So we would need to know what to replace * with (a list of column names, some replaced with joined columns from locales_target and locales_source. We would need to deal with falling back to default values in SQL, in case translation is not present in locales_target. That would mean every translated query joined against two more tables at least. dt() would not be required at all.

2. Double SQL run to gather the lisf of objects exactly required. We could introduce another SQL wrapper for queries, so locale module could run every query needing localization on its own too. For locale module purposes, we would only need the id of the record and the text group and object type. So db_locale_sql(db_rewrite_sql('SELECT * FROM {aggregator_categories} WHERE cid = %d', $cid), 'aggregator', 'category'); could result in locale module replacing * with cid, and fetch all rows, so we get a list of cids exactly required. Alternatively we can guess the text group and object name from the table, but that would limit extensibility here. After we have the cids, we can prefetch all translations for these cids, so when a dt() call comes along, we can look the values up from locale array cache. A possible optimization here is to look at the list of columns required too, and only query those translations.

3. Intelligent precaching. We could allow locale text gruop definitions to provide some hints on how should we precache values when we encounter a first dt() call. For user profile fields, it is logical to precache all translations, for thousands of menu items or taxonomy terms, it is not really logical. So modules can inform locale module about what precaching strategy it should use. Most of the time, we would load more data then required, but we would need very small modifications of Drupal code to work.

4. Size based precaching. Locale module does load all short strings into memory, because most of the time, we need the short strings displayed. This is also mostly true for user definesd strings (with some possible exceptions like footer message). Longer text, like aggregator category descriptions or node type descriptions are rarely required to be displayed. A possible problem with this approach is that if you have a few hundred or a thousand taxonomy terms, and you use dt() for taxonomy terms, all would be loaded, although it is probably not required. Maybe we can mix this with intelligent precaching above, and provide hints about what not to precache.

5. One SQL query per request. This is what we do in the patch above, and it is clearly not scalable. We need better ideas. Come pour your ideas in!

Gábor Hojtsy’s picture

Title: Introduce dynamic object translation API (optimize this!) » Introduce dynamic object translation API, optimization strategies
Jose Reyero’s picture

Title: Introduce dynamic object translation API, optimization strategies » Introduce dynamic object translation API (optimize this!)

Really interesting ideas, Gabor, some comments

1. SQL rewrite. Implement a hook_rewrite_sql() to replace columns in queries with their translated equivalence.... . That would mean every translated query joined against two more tables at least. dt() would not be required at all.

I think 'joined against two more tables' only is way too optimistic :-) For this solution to really work we'd need better per object and per language storage, like reworking the whole database schema of Drupal. The worst thing about this one though, is that it would also impact performance -a lot I'm afraid- for single language sites.

2. Double SQL run to gather the lisf of objects exactly required. ....

Hey, this is a very good one. We could even use our existing 'db_rewrite_sql' hook if we pass some more information with this hook, like the query arguments.

3. Intelligent precaching. We could allow locale text gruop definitions to provide some hints on how should we precache values when we encounter a first dt() call...........

I guess this precaching may be inteligent but also highly speculative, like betting on what you're going to need next, that may be some use later or may be a waste of time and memory, you'll never know in advance.
Maybe if we did some precaching using the actual current page path we could work out something better because when you know on which page you are you really can know what you're going to need to build that page.

Just to think about, but not to start coding: What if we could do some 'inteligent' precaching, meaning by inteligent that it learns from the experience and it keeps a page cache based on past page requests?

Another option here. Having the modules informing the translation system about what they're going to need just in the cases where the list of objects to translate is expected to be big

function locale_dynamic_preload($textgroup, $object_type, $query) {
   // Where $query may be an array with joins and conditions
}
4. Size based precaching..... A possible problem with this approach is that if you have a few hundred or a thousand taxonomy terms, and you use dt() for taxonomy terms, all would be loaded, although it is probably not required. Maybe we can mix this with intelligent precaching above, and provide hints about what not to precache.

You've already mentioned the problem here. I wouldn't go for this one.

5. One SQL query per request. This is what we do in the patch above, and it is clearly not scalable. We need better ideas. Come pour your ideas in!

Emm... did I mention 'array loading', which would consist of loading all the objects into an array and then having all them passed to the localization system and translated at once?

However, I want to point out that for all of the schemas mentioned above we are starting to need the 'object id' to be in a different column, not tied with 'location'. We'll need it to be able use the locales tables with more joins or conditions.

From the options above I like specially the second one (2. Double SQL run to gather the lisf of objects exactly required. ....) and I think we could start working about building on some improved 'db_rewrite_sql' which allowed some more parameters,

moshe weitzman’s picture

subscribe

Owen Barton’s picture

In terms of optimizing caching, how about something like this:

cache_locale
------------------
md5
data (serialized array of all strings & translations for a page)

cache_locale_path
--------------------------
path (drupal internal path)
md5

Cache creation would then be something like:
- Page requests a bunch of strings
- String keys and translations are cached in a static
- At hook exit these string sets are serialized and md5ed and dumped into cache_locale
- Also at this time, the Drupal page path, and the precalculated md5 is inserted into cache_locale_path

So this would allow strings *for a path* to be preloaded (on the first dt() access), and also, string sets to be *shared* amongst pages. The latter is very important IMO, because it is very likely that a large number of pages (e.g. most node pages) will share a pretty small pool of string sets.

This should much optimize the amount of data storage required for the cache, and also prevent loading unneeded strings into memory by just having one big t()-like cache that is always loaded whatever the page.

The one area this obviously needs attention is if different roles (or users - eek!) need different sets of translations (e.g. on a node edit form some users might see additional CCK fields). I can see a few ways of dealing with this:
1. Simply do a query whenever a missing translation is needed (dumb, but simple)
2. Store the role (or uid....ugg) as an additional key in cache_locale_path (fast, but would lead to massive tables).
3. Try to do something intelligent to keep the size of these tables down - like store one entry for anonymous string sets, one for authenticated and then only store per-role if the md5 for that role differs from the md5 for auth users.

I think the latter should be pretty attainable, certainly for dealing with the 'different string sets on a page per role' use case. For the 'different string sets on a page per user' I think we should just warn that this should be avoided whenever possible, and fall back to an extra query if something is missing from the cache.

There is also the potential for reducing the size of the cache_locale_path table by storing the md5 for the first page as the menu wildcard path (i.e. user/*/edit) and then only storing the full path (i.e. user/23/edit) if the md5 differs from this. This might add a query to the translation cache loading, and so would need to be benchmarked - ideally with a somewhat *real* site load, to see how this plays off against the potentially reduced site of cache_locale_path.

webchick’s picture

It appears that dt is not declared anywhere? So if you go to admin/content/aggregator you get:

Fatal error: Call to undefined function: dt() in /home/.honor/snarkles/webchick.net/i18n/modules/aggregator/aggregator.module on line 1034

can't test this patch. :(

mgifford’s picture

Ok, I do really want to see much better support for i18n in Drupal 6 and May is running out quickly. Unfortunately this patch doesn't really give me a sense of what environment to test it in. From the description I was expecting that you'd want volunteers to pull down the latest tarball of Drupal 6 from:
http://drupal.org/node/97368

And apply the core patches to that. However as webchick pointed out. There isn't a dt() function defined (at least in there). Then I remembered that the good folks at Development Seed have a subversion repository set up (lost the best reference for this, but this isn't bad) to allow them to test the core code:
http://www.developmentseed.org/blog/node/486

I had already downloaded this, so just grepped within that and found it defined there in common.inc. So I just wanted to clarify that we need to set up a test environment based on this repository so that we can test this with the proposed core changes.

A reference for howto set up the testing environment we're supposed to be evaluating would be useful. How the Development Seed svn repository compares with Drupal's cvs repository would also be good.

Mike

Gábor Hojtsy’s picture

FileSize
19.97 KB

OMG, excuse me. dt() was clearly not included in the patch. Oh. mgifford: The idea is that this patch can be tested much like any other 6.x patch: download the latest development version of Drupal 6.x, apply the patch, test.

An updated patch attached which should be up to date to latest Drupal 6.x-dev and includes everything for this functionality. Please test! Thanks!

Owen Barton’s picture

I think we need to draw some more attention to this patch - this is an important part of bringing I18n to D6!

What do folks think about my caching proposal? Would it work? Too heavyweight? Too lightweight? Is there a non-caching approach we could use instead?

mgifford’s picture

Ok, I'm a bit too tired at this point in the evening, but wanted to write something up on this while I had the chance and things were still fresh in my head.

1. The i18n module does not need to be installed to test this, right?
2) This is just a demo of how this patch would affect the aggregator module.
3) I'm not used to patches that are in different directories like this. I pulled this patch into the root directory and then had to specify where each of the sub-directories modules were. Patches seemed to be imported fine though. What's the easiest way to apply this in the future?
4) So as far as work flow you want us to do the following after we apply the patch:
a) add a category & a newsfeed to aggregate
b) go to the locale "Translate interface" link - admin/build/translate/search - and search within the Aggregator group for the text that you just added
c) provide the translation
d) switch languages to verify that the proper language is showing up

I don't get any results in 4b. There doesn't seem to be anything showing up in the aggregator group. However, if it was you'd want us to use the existing locale interface to find the appropriate strings?

I hacked a more user friendly way to use locale with menu's in the past - http://openconcept.ca/menu_string_translations - folks just want to be able to click a link to pull up the term they want to translate. A popup or ajax thing would be nice for this.

Mike

Gábor Hojtsy’s picture

mgifford: To test this patch:

- grab latest Drupal 6.x-dev
- apply the patch
- install Drupal just as you would
- add at least one foreign language and enable it
- add an aggregator category and/or feed
- go to the locale translation overview interface and you will see the aggregator category
- go to the search form and find your aggregator texts

Grugnog2: as you can see, we don't have a non-caching solution which would not be disruptive (ie performance decreasing) for single language sites. Menu key based caching seems to be the most impressive solution without taking roles into account, falling back on SQL queries if the current page requires more strings.

mgifford’s picture

I'm trying here..

- grab latest Drupal 6.x-dev

Mine's not the latest, but less than a week old - May 18 08:03

- apply the patch

Got dt_2.patch installed

- install Drupal just as you would

It had already been installed, but that shouldn't matter. Didn't see any db changes.

- add at least one foreign language and enable it

I've got 4 in place

- add an aggregator category and/or feed

http://drupal6.dev.openconcept.ca/?q=aggregator

Notice the notices:
notice: Undefined variable: output in /home/dm_60/modules/aggregator/aggregator.module on line 1427.
notice: Undefined variable: output in /home/dm_60/modules/aggregator/aggregator.module on line 1381.

- go to the locale translation overview interface and you will see the aggregator category

http://drupal6.dev.openconcept.ca/?q=admin/build/translate

More notices:
notice: Undefined variable: output in /home/dm_60/modules/aggregator/aggregator.module on line 1381.

- go to the search form and find your aggregator texts

I'm searching the aggregator texts with a blank String so that any string that is an aggregator should show up. Right now it doesn't.

Mike

Gábor Hojtsy’s picture

FileSize
24.24 KB

Well, please accept my apologies... Just checked out latest Drupal HEAD, applied the patch and reproduced the problem. Unfortunately the editing form and the search only works if you have at least one translation in the DB already (and the editing form has other problems). Now I included the search fixes which are also part of this patch: http://drupal.org/node/146033 So you can actually translate and see the feature working.

Unfortunately the question on this patch is not whether it works or not, but how to make it scalable and then actually implement that scalability fix. The descriptions above were geared at those who would be interested in having this not bog down Drupal. Attached a new patch against latest Drupal 6.x-dev which you can see working.

Jose Reyero’s picture

I think we -well, I mean Gabor :-) - is making more effort here just keeping the patch up to date with Drupal HEAD, than doing some serious improvements.

AFAIK, this patch works and so far is the only option I've seen to have some decent generic object localization in Drupal. The options currently are not having the functionality, or having it slow. But this mechanism allows further optimization by conributed modules, with that callback option there.

So the worst case scenario here is enabling localization just for the interface, and experiencing these performance problems. This could be fixed just adding a few options to disable localization for specific 'textgroups'.

And then, if we allow using all the rest of Drupal features without this enabled, I'd urge core committers around to consider the patch for inclusion forgetting about these performance problems, which would be just for sites *actually* using the feature, but not for others.

Grugnog2,
I think that caching may work but not only for this, but for all objects, like nodes, taxonomy, etc that are displayed in the page. So I'd like to see this kind of caching generalized for all objects, not only locale strings.

Owen Barton’s picture

Well, after a couple of weeks thought and coding I have proposed a per-user caching patch over at http://drupal.org/node/152901

Please take a look and post reviews.

I wrote that with this particular use case in mind - based approximately on the plan I outlined in comment #9. I made some pretty significant simplifications along the way, but I think that there is a good chance that this patch (perhaps with some more changes once we have some benchmarks) should be able to provide the significant improvements we need to get object caching into core.

We need to write some more client implementations written (to use the per-user caching), so we can get a better idea of how the patch performs in a realistic setting. One of these should certainly be the object caching patch (we would also need some kind of content generator script for benchmarking) - any volunteers?

Jose Reyero’s picture

Though I'd like very much to see this patch committed, looks like it's not moving fast enough for Drupal 6, so I've posted a simplified 'solution' here: http://drupal.org/node/155047 (Object translation. Wrapper function for contrib modules.)

Gábor Hojtsy’s picture

Status: Needs review » Postponed

Which makes this postponed.

sun’s picture

Status: Postponed » Active

Maybe this is the completely wrong issue for this comment, maybe not. I haven't read your study yet, Gábor. But I know i18n and Localizer modules as well as Joomla!'s JoomFish. I just wanted to finally write-up an approach, smk-ka and I had a long time ago while tinkering about the Localizer module and trying to find a working solution to translate any kind of user provided data. I'll try to keep it short, if you're interested further, please ask for details and we'll try to explain.

Basically, all information a user can enter into a Drupal site might need to be translatable.

There are two important terms in that sentence: "enter" and "translatable". Unlike other systems, we have the advantage of FAPI. All user provided data is entered into structurally predefined forms. Now, what about adding a simple #translate = TRUE or #translatable = TRUE to translatable fields?
A form containing only translatable fields could be served to translators, not necessarily in the site administration - we just need the form's fields. They already contain all information that f.e. JoomFish needs to redefine in XML files. The value of #translate defaults to FALSE, so no contrib module is affected by default.

With that information, we would know what's actually translatable and what not. Next step: Map form fields to table columns. Great, we have the schema API already in D6 and a data API right on the way. Now, what about adding a simple #column = 'node.body' to translatable fields? Thus, the system is able to know which form field corresponds to which table column in the database.

Next step: If the schema API knows, which column is in which table, the language system could automatically add a new column 'language' to all affected tables. Thus, the language system would not depend on any custom language integration in contributed or custom modules.

Last step and probably the most difficult one: Auto-update or auto-merge non-translatable fields from the original table row to the translated rows - upon each update of the original row. This means that rows containing translated data would already contain all necessary, non-translatable values of a data object. However, the whole thing presumes and requires that we would not have any unique indexes in all tables containing translatable fields (resp. columns).
We have the advantage of db_query() that could automatically add language = 'de' to the WHERE clause if a translatable object is about to be retrieved. This effectively would mean that we would not need separate translation tables (like f.e. JoomFish). Furthermore, we would need almost no additional queries to fetch a translated content from the database - and - an administrator was able to decide if a content is simply 'internationalized' (i.e. content is only translated, but the data object stays the same for all languages) or if it should be 'multilingual' (for example completely separate nodes, including separated menu items and comments).

We actually had some code for parts of this developed for Localizer v2.0 (which unfortunately was never released, but should still be in CVS, AFAIK).
We should have written this idea earlier, but we always thought to need to explain it more in-depth.

Now, what do you think? Any objections?

sun’s picture

However, the whole thing presumes and requires that we would not have any unique indexes in all tables containing translatable fields (resp. columns).

Forgot to mention, that it also presumes that the language system would add another column pid for a parent id (pointing to the original object row in the table), which is actually based on the Localizer implementation. This effectively ensures compatibility with AUTO_INCREMENT.

Gábor Hojtsy’s picture

Version: 6.x-dev » 7.x-dev

Well, forms are not too close to database fields yet. Things like the date input forms where we have dropdowns and then input fields based on the dropdown selection, this could easily get tricky. The ideas you explain seem to be good, and would definitely be in line with where could Drupal 7 go, if data API is made a reality. We need to reclassify this issue for D7 anyway.

zeta ζ’s picture

Title: Introduce dynamic object translation API (optimize this!) » Introduce dynamic object translation API, optimization strategies

Subscribe.

And re-instate Gábor’s title (lost due to cross-posting) while I’m here.

sun’s picture

FYI: #translatable is now available. It implements a generic object translation method quite similar to this patch. We've already played with db_rewrite_sql() for fetching these objects, which basically worked out.
Pre-caching language data makes less sense to me, because strings (f.e. like taxonomy terms) can be altered on another page.
I wouldn't extend the existing locales_* tables, because we certainly need more information for translation objects. Translatable currently uses this schema, which supports any kind of object (other than nodes):

tid (int)                // Translation object id
object_name (varchar)    // Object name (f.e. 'block', 'menu')
object_key (varchar)     // Source object id (f.e. '123' or 'block-2-garland')
object_field (varchar)   // Object field name (f.e. 'title')
translation (text)       // Field translation
locale (varchar)         // Language code
Jose Reyero’s picture

Issue tags: +i18n sprint
catch’s picture

Just read through this, and saw Gabor's original point about lists of objects:

On the "list of objects level". If you have an SQL query and a result list of objects, it would be possible to translate the whole list at once, which would be one SQL query per list. The problem is that most Drupal modules do not build up lists of objects, but handle objects one-by-one. If we would require building lists of objects to translate, that would have a noticable performance impact on sites not using localization, and that was a showstopper for this idea.

That's hopefully going to be the standard pattern in Core and contrib by the time Drupal 7 comes out. Nodes (which aggregator feeds will be also), users, taxonomy terms, files, vocabularies etc. For a node listing, we currently load all taxonomy term objects in a single request to the database, and hook_taxonomy_term_load() acts on all those objects at once. So, I don't see a reason why we couldn't move the dt() function to accepting an array of objects (keyed by object ID). Assuming translations could be fetched via those IDs, in one request, then we're looking at an extra query per object type instead of an extra query per object. Which starts to look more like an extra 5-20 queries per page instead of 1-300 or so.

Ideally, to do this properly, we also need to centralise loading of objects away from direct queries in different places into centralised API loading functions - so there's a single call to dt() (and hence single point of translation) in each module. Again that's happening slowly in the various core systems.

So this seems like a really promising approach to me, which could just use refreshing.

Jose Reyero’s picture

> I don't see a reason why we couldn't move the dt() function to accepting an array of objects (keyed by object ID)

Yes, that would be interesting. Actually when preparing this patch we had this other option already half done, though finally we decided to drop that in favor of this more simple approach. Also, as you say, loading lists of objects was not common in D6, hopefully will be different in D7.
One important requirement for loading lists though, is that we have the strings properly indexed, by object type and object id, that would need some extension to the current data model, and maybe doesn't fit very well with current locale tables.

> Ideally, to do this properly, we also need to centralise loading of objects away
Yes, that's why this one is important: #365899: API methods for schema-based load and delete operations

However if we want a performing system we may need to think twice before jumping into object translation. Ideally, we shouldn't need to translate objects every time they're loaded, but just before they're rendered for display. So something like the node_view mechanism for all objects (terms, menu items, etc..) would be good here.

nedjo’s picture

Title: Introduce dynamic object translation API, optimization strategies » Object translation option #1: locale system, optimization strategies

See discussion of this and various other object translation API possibilities at http://groups.drupal.org/node/18735.

Jose Reyero’s picture

New write up about why we absolutely need this (i18n sprint), http://groups.drupal.org/node/18816

About the points under discussion here:
1. This may be for limited textgroups (like content type names and descriptions) so caching the whole textgroup is possible
2. We may add an option for the textgroup to be cacheable / non cacheable (working on the patch)
3. Also as 'load_multiple' methods are making its way into D7, localizing the whole list at once becomes more an option
4. We've also been talking about the possibility to localize objects only when rendering (right before), rendering functions may need improvement for that.
5. Considering too retrieving the string information from the schema ('translatable' property for fields)

Jose Reyero’s picture

For textgroups schema based handling of strings, which may work with this one, see #367603: Object translation option #3: 'translatable' schema field attribute, parallel tables

plach’s picture

Version: 7.x-dev » 8.x-dev

It seems we will have to wait and see if dynamic database translation fully addresses this issue and reconsider everything for D8.

andypost’s picture

subscribe

andypost’s picture

Issue summary: View changes

seems this one could be closed

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Nikita Petrov’s picture

@andypost, was it implemented in d8? I am looking for opportunity to translate theme logo field, it is a 'file' field, and for each language I need different logo files. Is it possible to enable translation for file fields?

andypost’s picture

Version: 8.4.x-dev » 8.5.x-dev
Assigned: Gábor Hojtsy » plach

@Nikita Petrov yes, you can translate block with logo and provide different files, not sure that proper UI is here but the issue about different)

In D8 we mostly everywhere using translation objects now so only question of optimization left here

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.