Add hash column to {locales_source} to query faster locale strings [#851362]

Comment	File	Size	Author
#65	add_hash_column_to-851362-65.patch	2.31 KB	rodrigoaguilera
#59	interdiff.txt	7.66 KB	alansaviolobo
#59	add_hash_column_to-851362-59.patch	2.29 KB	alansaviolobo
#57	interdiff.txt	7.66 KB	alansaviolobo
#57	add_hash_column_to-851362-57.patch	2.29 KB	alansaviolobo
#55	interdiff.txt	7.27 KB	alansaviolobo
#55	add_hash_column_to-851362-55.patch	1.9 KB	alansaviolobo
#42	851362-locale-hashkey-42.patch	6.63 KB	andypost
#41	851362-locale-hashkey-41.patch	6.13 KB	andypost
#40	851362-locale-hashkey-40.patch	5.78 KB	andypost
#36	851362-locale-hashkey-36.patch	5.17 KB	sutharsan
#36	performance-import-36.txt	736 bytes	sutharsan
#36	performance-sql-selects-36.txt	821 bytes	sutharsan
#36	ttest-36.txt	182 bytes	sutharsan
#30	performance-sql-selects.txt	948 bytes	sutharsan
#30	performance-import.txt	944 bytes	sutharsan
#29	851362-locale-hashkey-29.patch	5.43 KB	sutharsan
#23	851362-locale-hashkey-23.patch	5.23 KB	sutharsan
#21	851362-locale-hashkey-21.patch	5.24 KB	sutharsan
#10	851362-locale-hashkey.patch	5.11 KB	andypost
#8	851362-locale-hash-d8.patch	4.54 KB	andypost
#5	851362-locale-hashkey-d7.patch	5.53 KB	andypost
	locale-hashkey-d7.patch	5.57 KB	andypost

Comment #1

damien tournoud commented 12 July 2010 at 08:21

Hm. I'm not sure about that.

1. The hash is not guaranteed to be unique. As a consequence, it should not be a unique key
2. We should have both hash = :hash and source = :source

One method I would consider is to combine the partial-index feature of MySQL and the functional index feature of other database engines by making an index on (context, source(30)) and querying on:

context = :context AND SUBSTRING(source, 1, 30) = SUBSTRING(:source, 1, 30) AND source = :source

This query can be optimized by at least MySQL, PostgreSQL and SQL Server: MySQL will satisfy context = :context AND source = :source with it's partial index, PostgreSQL and SQL Server should be able to satisfy context = :context AND SUBSTRING(source, 1, 30) = SUBSTRING(:source, 1, 30) using a functional index (or what SQL Server calls an indexed computed column, but the idea is the same).

Log in or register to post comments

Comment #2

andypost

he/him

Russian

commented 12 July 2010 at 08:37

Here's a benchmarks of frontpage with 20 nodes and locale enabled
ab -n 50 -c 2 http://drupal7/

before patch

Server Software:        Apache/2.2.9
Server Hostname:        drupal7
Server Port:            80

Document Path:          /
Document Length:        37156 bytes

Concurrency Level:      2
Time taken for tests:   7.357 seconds
Complete requests:      50
Failed requests:        0
Write errors:           0
Total transferred:      1882600 bytes
HTML transferred:       1857800 bytes
Requests per second:    6.80 [#/sec] (mean)
Time per request:       294.295 [ms] (mean)
Time per request:       147.148 [ms] (mean, across all concurrent requests)
Transfer rate:          249.88 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:   176  291 194.2    204    1222
Waiting:      168  275 184.9    196    1221
Total:        176  291 194.2    204    1222

Percentage of the requests served within a certain time (ms)
  50%    204
  66%    264
  75%    286
  80%    330
  90%    536
  95%    683
  98%   1222
  99%   1222
 100%   1222 (longest request)

after patch

Server Software:        Apache/2.2.9
Server Hostname:        d7
Server Port:            80

Document Path:          /
Document Length:        36160 bytes

Concurrency Level:      2
Time taken for tests:   5.690 seconds
Complete requests:      50
Failed requests:        0
Write errors:           0
Total transferred:      1832800 bytes
HTML transferred:       1808000 bytes
Requests per second:    8.79 [#/sec] (mean)
Time per request:       227.603 [ms] (mean)
Time per request:       113.801 [ms] (mean, across all concurrent requests)
Transfer rate:          314.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   3.4      0      24
Processing:   183  225  51.0    206     432
Waiting:      176  212  43.8    197     374
Total:        183  225  53.1    206     456

Percentage of the requests served within a certain time (ms)
  50%    206
  66%    218
  75%    228
  80%    259
  90%    280
  95%    312
  98%    456
  99%    456
 100%    456 (longest request)

Log in or register to post comments

Comment #3

12 July 2010 at 08:40

Status:

Needs review

» Needs work

The last submitted patch, locale-hashkey-d7.patch, failed testing.

Log in or register to post comments

Comment #4

andypost

he/him

Russian

commented 12 July 2010 at 08:49

Status:

Needs work

» Needs review

Agreed about uniqueness because we have version column so same source could be stored for different versions

Functional index this is a current implementation but Import could be much faster because a less data is transfered between php and mysql

Also some contrib could change indexes like i18n does #803380: locales_source.location index

Log in or register to post comments

Comment #5

andypost

he/him

Russian

commented 12 July 2010 at 09:07

Status	File	Size
new	851362-locale-hashkey-d7.patch	5.53 KB

Patch with normal index

Log in or register to post comments

Comment #6

andypost

he/him

Russian

commented 4 April 2011 at 14:09

Version:	7.x-dev	» 8.x-dev
Issue tags:		+Needs backport to D7

marked as duplicate #811158: We should not index on (source, context)

Log in or register to post comments

Comment #7

damien tournoud commented 4 April 2011 at 14:27

Could we have a reroll of this one?

Change md5() with drupal_hash_base64(), and change the size of the hash table accordingly
Add a batch-ed update function to create the hashes

Log in or register to post comments

Comment #8

andypost

he/him

Russian

commented 4 April 2011 at 16:11

Status	File	Size
new	851362-locale-hash-d8.patch	4.54 KB

For D8 we don't need hook_update_N()

For D7 there's trouble with update because update.php tries to use t()

Log in or register to post comments

Comment #9

c960657 commented 4 April 2011 at 19:05

Status:

Needs review

» Needs work

Agreed about uniqueness because we have version column so same source could be stored for different versions

No, we don't keep individual sources for different versions. The version field is for keeping a record of when we last asked for a translation of the string.

I guess Damien's point is that you cannot be sure that the hash function will produce a unique string (?). There is a (theoritical) possibility that two strings will generate the same hash. If your code assumes that the hashes are unique, you might as well mark the column as unique.

+  $hashkey = drupal_hash_base64($source . $context);

I suggest we add a separator, e.g. \0, to avoid collisions, e.g. $source='foo', $context='bar' and $source='foobar', $context=''.

+      'hashkey' => array(
+        'description' => 'MD5 hash of the concatenation of context and string, used for quick lookups.',
+        'type' => 'char',
+        'length' => 64,

It is no longer an MD5 hash.
drupal_hash_base64() seems to always return a 44-character string.
I suggest we name the column just "hash" (or "hash_key", assuming it is unique).

Log in or register to post comments

Comment #10

andypost

he/him

Russian

commented 4 April 2011 at 19:41

Status:

Needs work

» Needs review

Status	File	Size
new	851362-locale-hashkey.patch	5.11 KB

Not sure about uniqueness - data could be migrated but "hash" is reserved by sql

Agreed on "\0" separator!
Field comment fixed.

Added fix to no preprocess JS-files in MAINTENANCE_MODE because it tries to access table by new hash column

Log in or register to post comments

Comment #11

c960657 commented 4 April 2011 at 19:54

Not sure about uniqueness - data could be migrated

I think we should kill the duplicates in the upgrade hook. Having duplicates lead to undefined results (with and without the patch). For each duplicate source we can easily delete those that have not been translated. If more than one have been translated, we should keep the one that is returned by the default SELECT query used in locale() (assuming that the query always returns the same row - if it doesn't then we can't do better than just keeping a random row).

but "hash" is reserved by sql

Hmm, the name is used in {registry_file}.

Log in or register to post comments

Comment #12

andypost

he/him

Russian

commented 4 April 2011 at 20:47

Unique hash column could be PITA if we don't take VERSION into account

Idea about of cleaning is great but out of scope of the patch, OTOH I found no contrib tools to make it. Suppose this could be a separate task within upgrade process. We have a lot of sites that moved 5->6->7 so they have a lot of outdated strings

EDIT: "hash" could be used, it's reserved work for oracle PLSQL

Log in or register to post comments

Comment #13

droplet commented 28 April 2011 at 11:27

sub from #705284: Locale strings Import Performance
ref: #622500: Add hashkey to strings and translations for quicker comparison

LDO case, it compares 100 times of strings than drupal core but import strings still very fast. (3 times faster than drupal core to import same .po). it seems we still have some space can be improve.

Log in or register to post comments

Comment #14

catch

he/him

English

commented 28 April 2011 at 13:18

Category:

feature

» task

Haven't reviewed the patch here but someone opened #1131048: Interface translation import times out during install (and there are duplicates) suggesting the installer won't complete on some hosting due to locale strings import - would this help with that?

Log in or register to post comments

Comment #15

droplet commented 28 April 2011 at 14:12

function drupal_hash_base64($data) {
  $hash = base64_encode(hash('sha256', $data, TRUE));
  // Modify the hash so it's safe to use in URLs.
  return strtr($hash, array('+' => '-', '/' => '_', '=' => ''));
}

drupal_hash_base64 is too complex in this case ?? we only need an unique key here.

using CRC32 is another choose.
from l10n, we only have less collisions, CRC32 save more space & comparisons

my l10n test data: ~ 196310 rows
SELECT CRC32(value) as crc, COUNT(value) as total FROM l10n_server_string GROUP BY crc HAVING total > 1
result: 22 rows

** I haven't do performance test

Log in or register to post comments

Comment #16

gábor hojtsy

he/him

Hungarian

Hungary

commented 15 June 2011 at 12:05

Issue tags:

+D8MI

Log in or register to post comments

Comment #17

gábor hojtsy

he/him

Hungarian

Hungary

commented 6 February 2012 at 11:09

Issue tags:

+language-ui

Adding UI language translation tag.

Log in or register to post comments

Comment #18

andypost

he/him

Russian

commented 6 February 2012 at 13:23

sqlite does not support md5() and other functions so no matter which algo we would use.
For performance reasons there's only one bottle-neck - translation import

Log in or register to post comments

Comment #19

gerhard killesreiter commented 13 February 2012 at 17:26

Gabor asked me for input on this.

He asked about using md5: I don't see this as a problem since we already use this for cache keys (eg. filter cache).

I think benchmarks in this case should be done by evaluating queries directly in SQL cli since Apache/PHP add to much fluctuation.

Log in or register to post comments

Comment #20

gábor hojtsy

he/him

Hungarian

Hungary

commented 14 February 2012 at 08:17

Status:

Needs review

» Needs work

Ok, so looks like we don't want to use drupal_hash_base64() then.

Log in or register to post comments

Comment #21

sutharsan commented 27 February 2012 at 17:44

Status:

Needs work

» Needs review

Status	File	Size
new	851362-locale-hashkey-21.patch	5.24 KB

Rerolled #10. None of the recent comments have been included yet.

Log in or register to post comments

Comment #22

droplet commented 27 February 2012 at 17:53

Status:

Needs review

» Needs work

see #15 & #20

Log in or register to post comments

Comment #23

sutharsan commented 27 February 2012 at 22:59

Status:

Needs work

» Needs review

Status	File	Size
new	851362-locale-hashkey-23.patch	5.23 KB

crc32 hash implemented. Know problem: Upgrade test fails because D7 tables have no hashkey field (yet).
For performance tests with different encriptions I suggest a few scenario's:

Normal front page with n nodes.
The same with locale_cache_strings set to 0
test page with many, many translatable strings and locale_cache_strings set to 0

The first test gives a normal situation. The second test disables the short string cache. The third test puts the highest load on the database by requesting many translations.

Log in or register to post comments

Comment #24

27 February 2012 at 23:00

Status:

Needs review

» Needs work

The last submitted patch, 851362-locale-hashkey-23.patch, failed testing.

Log in or register to post comments

Comment #25

droplet commented 27 February 2012 at 23:36

just query for CRC32 is not safe. It should add a comparison. e.g:
SELECT lid FROM {locales_source} WHERE hashkey = :hashkey AND source = :source

theoretically, other hash functions are same but most scripts (and drupal core ?) assume it never hit collisions. ( I'm not sure )

Log in or register to post comments

Comment #26

andypost

he/him

Russian

commented 28 February 2012 at 11:47

"\0" is a bad separator, more info could be found at #532512: Plural string storage is broken, editing UI is missing
In this issue we introduce LOCALE_PLURAL_DELIMITER which is "\3" also "\0" is not compatible with pgsql

+++ b/core/includes/gettext.inc
@@ -470,7 +470,10 @@ function _locale_import_one_string($op, $value = NULL, $mode = NULL, $lang = NUL
+  $hashkey = hash('crc32', $source . "\0" . $context);

Not sure we could proceed with crc32 - all core used md5() and drupal_hash_base64()

So no reason to implement a third hash algo. Let's convert this to md5()

Log in or register to post comments

Comment #27

gábor hojtsy

he/him

Hungarian

Hungary

commented 28 February 2012 at 12:04

Well, the \0 will never actually reach the database in this patch, right? So whether it is compatible with some db does not matter.

Log in or register to post comments

Comment #28

sutharsan commented 28 February 2012 at 21:35

@droplet, fully agree that crc32 alone is not safe. Is was not meant to be, only meant to be an database accelerator. My criteria for a hash is 1. Database performance and 2. PHP performance. We don't need a secure or (absolute) unique result. We only need a relatively unique result to speed up the database index. I think 22 duplicates in 200k strings is unique enough. The combination with 'source' will do the trick. But performance tests may have the final answer.

@gabor, agree that "\0" does not hit the database. Additionally I checked that CRC32 produces a different result with and without this separator.

Log in or register to post comments

Comment #29

sutharsan commented 4 March 2012 at 13:49

Status	File	Size
new	851362-locale-hashkey-29.patch	5.43 KB

Source, context and hash included in in queries where applicable.
Patch re-rolled because of #746240: Race condition in locale() - duplicates in {locales_source}

Log in or register to post comments

Comment #30

sutharsan commented 4 March 2012 at 14:11

Status:

Needs work

» Needs review

Status	File	Size
new	performance-import.txt	944 bytes
new	performance-sql-selects.txt	948 bytes

I carried out performance tests. I have found no ground to use an hashkey for performance improvement.

The tests

Two kinds of test were performed:

Import a core translation.
Read translations from the database.

With three variations each:

Core, without patches
#29 patch applied
md5 variant of #29

All tests are carried out with increasing number of records in the locales_source and locales_target database tables.

The results

1. No significant difference in performance between crc32 and md5.
2. Import is slightly slower when using an additional hashkey column.
3. Reading translations is not measurably slower when using an additional hashkey column.
Note that all measurements show a large variation is measurement results. +/- 20% is no exception! The variation is less when measuring with MySQL (test2), variation is larger when using both PHP and MySQL (test1).

For details see the attachments.

One series of test were carried out using a different patch in which the locales_source table had only an index on the 'hashkey' column. No significant performance difference was found.

    'indexes' => array(
      'hashkey' => array('hashkey'),
    ),

Log in or register to post comments

Comment #31

4 March 2012 at 14:49

Status:

Needs review

» Needs work

The last submitted patch, 851362-locale-hashkey-29.patch, failed testing.

Log in or register to post comments

Comment #32

andypost

he/him

Russian

commented 11 March 2012 at 03:14

3. Reading translations is not measurably slower when using an additional hashkey column.

Example query (with md5 hash): SELECT s.lid, t.translation, s.version FROM locales_source s LEFT JOIN locales_target t ON s.lid = t.lid AND t.language = 'nl' WHERE s.source = 'Add content' AND s.context = '' AND s.hashkey = 'cd01f480a5604aadb44992b45041a0fe';

This query should run without s.source = 'Add content' AND s.context = '' AND

The reason to use hash is to eliminate quering on source blob field!

String used for search is md5($source . "\3" . $context) should be faster

+++ b/core/includes/gettext.incundefined
@@ -470,7 +470,12 @@ function _locale_import_one_string($op, $value = NULL, $mode = NULL, $lang = NUL
+  $lid = db_query("SELECT lid FROM {locales_source} WHERE source = :source AND context = :context AND hashkey = :hashkey", array(
+    ':source' => $source,
+    ':context' => $context,
+    ':hashkey' => $hashkey,

source & context should be removed

+++ b/core/includes/locale.incundefined
@@ -635,7 +635,12 @@ function _locale_parse_js_file($filepath) {
+    $source = db_query("SELECT lid, location FROM {locales_source} WHERE source = :source AND context = :context AND hashkey = :hashkey", array(
+      ':source' => $string,
+      ':context' => $context,
+      ':hashkey' => $hashkey,

same

+++ b/core/modules/locale/locale.moduleundefined
@@ -640,10 +640,12 @@ function locale($string = NULL, $context = NULL, $langcode = NULL) {
+    $hashkey = hash('crc32', $string . "\0" . (string) $context);

context is always string

+++ b/core/modules/locale/locale.moduleundefined
@@ -640,10 +640,12 @@ function locale($string = NULL, $context = NULL, $langcode = NULL) {
+    $translation = db_query("SELECT s.lid, t.translation, s.version FROM {locales_source} s LEFT JOIN {locales_target} t ON s.lid = t.lid AND t.language = :language WHERE s.source = :source AND s.context = :context AND s.hashkey = :hashkey", array(
       ':language' => $langcode,
       ':source' => $string,
       ':context' => (string) $context,
+      ':hashkey' => $hashkey,

source and context should be removed

Log in or register to post comments

Comment #33

sutharsan commented 11 March 2012 at 12:49

@andypost, your assumption is that the hashkey is unique. As @droplet showed in #15, CRC32 is not usable. It would be nice if we could test md5($source . "\0" . $context) with all source strings of l.d.o. But changes are good. This Probability table calculates the chance of two strings having the same md5 hash as less then 10e-18 (l.d.o. contains approx 350000 source strings; md5 is a 128 bit hash).

Log in or register to post comments

Comment #34

droplet commented 11 March 2012 at 13:27

@Sutharsan,
LDO also make an assumption on the hashkey.

If we assumed md5 hashkey is unique, I think we can move hashkey into locales_target table and remove LEFT_JOIN ??

Log in or register to post comments

Comment #35

sutharsan commented 11 March 2012 at 16:18

@droplet,
What assumption does l.d.o make on the hashkey?
We could do away with lid, not with LEFT JOIN. But replacing lid with hashkey would make a significant increase in data in locales_target. And less is faster.

Log in or register to post comments

Comment #36

sutharsan commented 11 March 2012 at 20:29

Status	File	Size
new	ttest-36.txt	182 bytes
new	performance-sql-selects-36.txt	821 bytes
new	performance-import-36.txt	736 bytes
new	851362-locale-hashkey-36.patch	5.17 KB

Hashtag changed to md5, source and context removed from the queries, source_context index removed. Todo: upgrade path.

New performance tests show approximate equal results with and without hashkey. Executing t() and format_plural() with hashtag takes 2 - 6% more longer.

Log in or register to post comments

Comment #37

sutharsan commented 11 March 2012 at 20:29

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #38

andypost

he/him

Russian

commented 11 March 2012 at 20:38

Status:

Needs review

» Needs work

LDO uses hashkey to lookup an existing string l10n_drupal_callback_save_string() so it's import is much faster

Add see otehr places _l10n_community_import_one_string() and l10n_gettext_store_string()

As commited #532512: Plural string storage is broken, editing UI is missing better to use \3 as separator but ldo does:

$sid = db_insert('l10n_server_string')
	->fields(array(
            'value' => (string) $value,
            'context' => (string) $context,
            'hashkey' => md5($value . $context),
          ))
	->execute();

Log in or register to post comments

Comment #39

sutharsan commented 11 March 2012 at 21:56

_locale_import_one_string_db() does use the hashkey to lookup existing source. Note that #1189184: OOP & PSR-0-ify gettext .po file parsing and generation will overhaul the import process. My bigger worry is the reduced performance of t() and format_plural() due to this change. If we can't fix that, lets drop the issue.

Log in or register to post comments

Comment #40

andypost

he/him

Russian

commented 11 March 2012 at 23:25

Status:

Needs work

» Needs review

Status	File	Size
new	851362-locale-hashkey-40.patch	5.78 KB

Changed hash('md5') to md5() - my tests shows that it's faster a 2 times

Added update hook, let's see for upgrade tests

I think hashkey field should be pre-populated on update and we actually need unique index on it

Log in or register to post comments

Comment #41

andypost

he/him

Russian

commented 11 March 2012 at 23:52

Status	File	Size
new	851362-locale-hashkey-41.patch	6.13 KB

Suppose we should add column at somewhere in update_prepare_d8_bootstrap()

EDIT At #12 I mentioned about VERSION key

Log in or register to post comments

Comment #42

andypost

he/him

Russian

commented 12 March 2012 at 02:16

Status	File	Size
new	851362-locale-hashkey-42.patch	6.63 KB

Let's discuss a measurement strategy and take into account that we have pgsql and sqlite too

Patch adds update function locale_update_8006() with batch to calculate hashes. Probably we should run this update right after locale_update_8002() so we have new duplicates in strings

+++ b/core/modules/locale/locale.moduleundefined
@@ -819,6 +820,12 @@ function locale_system_update($components) {
+  if (defined('MAINTENANCE_MODE') && MAINTENANCE_MODE == 'update') {
+    // Skip processing for updates because updated code could use updated
+    // database schema probably not compatible with updated processing.
+    return;
+  }
+

This hunk is removed in the patch.
Suppose we need another approach to prevent string duplicates because update.php could be run from none-english and while no hashes are calculated we get duplicates that been killed in locale_update_8002()

Log in or register to post comments

Comment #43

andypost

he/him

Russian

commented 12 March 2012 at 07:47

Another idea is to use this hash in upcoming core's i18n_server integration so having the same algo with server is preferable. Related #1445004: Implement customized translation bit on translations

Bot is happy now we need performance tests to pass the gate

Log in or register to post comments

Comment #44

podarok

🇺🇦 he/him/his

Ukrainian

Ukraine

commented 12 March 2012 at 09:18

#43 @andypost
What kind of tests do we need?
Just ab?
or possibly just a menu callback with some kind of locale code?

Log in or register to post comments

Comment #45

sutharsan commented 12 March 2012 at 12:45

My strategy was to use focused testing with minimum influence of processes we are not working on. Testing with 'realistic' situations does not give usefull information. As an example lets test the response time of the drupal front page using ab. The variation in the response time is large (StdDev > 10%), approx 50 strings are loaded through t(), translated strings are cached in t() per page call, translated strings are cached in locale() in the database. A single (un cached) call of t() takes approx 0.2 msec, 50 t() calls on the home page take approx 10 msec. Even large variations of 20% in t() execution (= 2 msec) will never be noticed in ab tests of the home page (70 - 100 msec). Testing realistic situations is not usable, we need focused testing.

I executed three kinds of tests:

Database performance of individual sql queries
Import performance
t() and format_plural() performance

All tests were performed with varying number of string in the database. Simulating small and medium size sites.

No 1. Tests the database performance of a set of select queries which could be executed by t().

SET profiling = 1;
SELECT s.lid, t.translation, s.version FROM locales_source s LEFT JOIN locales_target t ON ...
...
SHOW PROFILES;

No 2. Tests the execution time of the import function _locale_import_read_po(). Time measured by adding time_start() and time_read() functions before and after the function call.

No 3. Tests the execution time of a large number of t() and format_plural() function calls. Internal caching is turned off as much as possible to focus on the database retrieval. For this I have written a tTest module which you find in my Sandbox. The module calls t() or format_plural() with a fixed set of strings taken from D7 core. Locale string caching was turned off during all tests.

Log in or register to post comments

Comment #46

droplet commented 12 March 2012 at 13:20

Status:

Needs review

» Needs work

back to NW first,

- update.php is failed on Localize enabled.
- missing add hashkey field in locale_update_8006
- switch between patched / non-patch, it will leave some strings haven't hashkey (it also may affect benchmark result)

Log in or register to post comments

Comment #47

andypost

he/him

Russian

commented 12 March 2012 at 22:27

Also we need discus a hash strategy - I think it's good to have the same algo on hash generation in l10n_server

@droplet 1 & 2 - field is added in update_prepare_d8_language()

+++ b/core/includes/update.incundefined

@@ -197,6 +197,24 @@ function update_prepare_d8_language() {
+  // Add 'hashkey' column to 'locales_source' table.
+  if (db_table_exists('locales_source') && !db_field_exists('locales_source', 'hashkey')) {

Log in or register to post comments

Comment #48

andypost

he/him

Russian

commented 21 April 2012 at 11:33

Issue tags:

-Needs backport to D7

I'm going to change current storage. Remove lid and add hashkey as primary key for locales_source and hashkey + langcode as primary key for locales_target

What to do with version column?

The only possible performance degradation could appear on joins... we test it later

Benefits
1) easy to sync with l.d.o
2) kill duplicates by nature of PK
3) probably easy to make mass string import

Log in or register to post comments

Comment #49

gábor hojtsy

he/him

Hungarian

Hungary

commented 21 April 2012 at 11:58

The version column once again is only used in locale() for lookup I believe (and is updated on use), to limit the size of the cache in memory for short strings (if you had strings leftover from previous versions). I'm not sure of the effectiveness, with Drupal releases going this far and wide (3 years inbetween them) and much more contribs used which also change a lot (but are not represented in this version tracking) I think its likely that the version tracking does not have much usefulness (and it slows down pages on major version updates as long as these updates happen for strings).

Log in or register to post comments

Comment #50

gábor hojtsy

he/him

Hungarian

Hungary

commented 21 April 2012 at 12:01

Correction: looks like the version column is also used for minor versions, in which case this might still be a good way to weed out outdated strings on a site. It is really only used for that that I know.

Log in or register to post comments

Comment #51

soul88

Ukrainian

Ukraine

commented 21 April 2012 at 21:01

As I'm already aware of this: http://drupal.org/node/1189184 I know that the following code will be thrown away at some point. But as I've already written it, I'd post it here. It may show on how could batch inserts/updates be implemented in current situation.

1. I didn't write it to the end and didn't test it, so it's more of showing the idea, rather than the code for use.
2. It relies on the schema changes posted by andypost and modified on the codesprint today. We assumed, that hashcode should be the substitution for "lid" as PK.

function _locale_import_one_string_db(&$report, $langcode, $context, $source, $translation, $location, $overwrite_options, $customized = LOCALE_NOT_CUSTOMIZED, $insert_everything = FALSE) {
  function source_arr($var) {
    $keys = array('hashcode', 'location', 'source', 'context');

    $res = array();
    foreach($keys as $v){
      $res[$k] = $var[$v];
    }
    return $res;
  }
  
  function target_arr($var) {
    $keys = array('hashcode', 'translation', 'language', 'customized');

    $res = array();
    foreach($keys as $v){
      $res[$k] = $var[$v];
    }
    return $res;
  }

  
define('MAX_ALLOWED_ROWS_NUMBER', 500);

  if (!locale_string_is_safe($translation)) {
    watchdog('locale', 'Import of string "%string" was skipped because of disallowed or malformed HTML.', array('%string' => $translation), WATCHDOG_ERROR);
    $report['skips']++;
    return 0;
  }

  // Initialize overwrite options if not set.
  $overwrite_options += array(
    'not_customized' => FALSE,
    'customized' => FALSE,
  );
  
  $buffer = drupal_static(__FUNCTION__ . 'buffer', array());
  
  $buffer[] = array(
    'hashcode' => md5($source . LOCALE_PLURAL_DELIMITER . $context),
    'location' => $location,
    'source'   => $source,
    'context'  => $context,
    'translation' => $translation,
    'language' => $langcode,
    'customized' => $customized,
    'overwrite_options' => $overwrite_options,
  );

  if ((count($buffer) < MAX_ALLOWED_ROWS_NUMBER) && !$insert_everything) {
    return 1;
  }
  
  $insert_source = array();
  $insert_target = array();
  $delete_source = array();
  $delete_target = array();
  
  $keys = array();
  foreach($buffer as $buff){
    $keys[] = $buff['hashcode']; 
  }
  
  $strings = array();
  $qr = db_select('locales_source', 's');
  $qr->join('locales_target', 't', 't.hashcode = s.hashcode AND t.language = s.language');
  $qr->fields('s', array('hashcode'))
     ->fields('t', array('customized'));
  $qr->condition('t.hashcode', $keys, 'IN');   
  $qr->condition('t.language', $language, '=');   
  $result = $qr->execute();
  while($row = $result->fetchAssoc()) {
    $strings[$row['hashcode']] = $row['customized'];
  }   
  
  foreach($buffer as $buff){
    if (!empty($translation)) {
      // Skip this string unless it passes a check for dangerous code.
      //if (isset($string->lid))
      if (isset($strings[$buff['hashcode']])) {
        // We have this source string saved already.
        $delete_source[] = $buff['hashcode'];
        $insert_source[] = source_arr($buff);
  
        //if (!isset($string->customized))
        if (empty($strings[$buff['hashcode']])) {
          // No translation in this language.
          $insert_target[] = $buff;
          
          $report['additions']++;
        }
        elseif ($buff['overwrite_options'][$strings[$buff['hashcode']] ? 'customized' : 'not_customized']) {
          // Translation exists, only overwrite if instructed.
          $delete_target[] = $buff['hashcode'];
          $insert_target[] = target_arr($buff);
          
          $report['updates']++;
        }
      }
      else {
        // No such source string in the database yet.
        $insert_source[] = source_arr($buff);
        $insert_target[] = target_arr($buff);

        $report['additions']++;
      }
    }
    elseif (isset($string->lid) && isset($string->customized) && $overwrite_options[$string->customized ? 'customized' : 'not_customized']) {
      // Empty translation, remove existing if instructed.
      $delete_target[] = $buff['hashcode'];      

      $report['deletes']++;
    }
  }

  db_delete('locales_target')
    ->condition('hashcode', $delete_target, 'IN')
    ->condition('language', $langcode)
    ->execute();
    
  db_delete('locales_source')
    ->condition('hashcode', $delete_source, 'IN')
    ->execute();
 
  
  db_insert('locales_source')
    ->fields(array('hashcode', 'location', 'source', 'context'))
    ->values($insert_source)
    ->execute();
    
  db_insert('locales_target')
    ->fields(array('hashcode', 'translation', 'language', 'customized'))
    ->values($insert_target)
    ->execute();

  $buffer = array();
}

P.S. As far as I'm concerned for now db_merge can't make batch updates that's why I decided to make insert+delete instead of 1 update.
P.P.S. The motivation for this code comes from here: http://drupal.org/node/361597#comment-5899520

Log in or register to post comments

Comment #52

jair commented 17 August 2013 at 20:43

Issue tags:

+Needs reroll

Log in or register to post comments

Comment #53

alansaviolobo commented 18 June 2014 at 04:08

Status:

Needs work

» Needs review

42: 851362-locale-hashkey-42.patch queued for re-testing.

Log in or register to post comments

Comment #54

18 June 2014 at 04:09

Status:

Needs review

» Needs work

The last submitted patch, 42: 851362-locale-hashkey-42.patch, failed testing.

Log in or register to post comments

Comment #55

alansaviolobo commented 18 June 2014 at 04:32

Issue summary:	View changes
Status:	Needs work	» Needs review

Status	File	Size
new	add_hash_column_to-851362-55.patch	1.9 KB
new	interdiff.txt	7.27 KB

re-rolled the patch.

Log in or register to post comments

Comment #56

18 June 2014 at 04:44

Status:

Needs review

» Needs work

The last submitted patch, 55: add_hash_column_to-851362-55.patch, failed testing.

Log in or register to post comments

Comment #57

alansaviolobo commented 18 June 2014 at 04:53

Status:

Needs work

» Needs review

Status	File	Size
new	add_hash_column_to-851362-57.patch	2.29 KB
new	interdiff.txt	7.66 KB

Log in or register to post comments

Comment #58

alansaviolobo commented 18 June 2014 at 04:55

2 files were hidden/shown/deleted

Status	File	Size
hidden	add_hash_column_to-851362-57.patch	2.29 KB
hidden	interdiff.txt	7.66 KB

ignore the previous comment.

Log in or register to post comments

Comment #59

alansaviolobo commented 18 June 2014 at 05:00

Status	File	Size
new	add_hash_column_to-851362-59.patch	2.29 KB
new	interdiff.txt	7.66 KB

Log in or register to post comments

Comment #60

18 June 2014 at 05:24

The last submitted patch, 57: add_hash_column_to-851362-57.patch, failed testing.

Log in or register to post comments

Comment #61

gábor hojtsy

he/him

Hungarian

Hungary

commented 15 January 2015 at 15:44

Issue tags:

+SprintWeekend2015Queue

Log in or register to post comments

Comment #62

15 January 2015 at 15:45

Gábor Hojtsy queued 59: add_hash_column_to-851362-59.patch for re-testing.

Log in or register to post comments

Comment #63

15 January 2015 at 15:47

Status:

Needs review

» Needs work

The last submitted patch, 59: add_hash_column_to-851362-59.patch, failed testing.

Log in or register to post comments

Comment #64

rodrigoaguilera

Spanish

Madrid

commented 17 January 2015 at 13:54

Issue tags:

-SprintWeekend2015Queue

+SprintWeekend2015

I will rerrol this

Log in or register to post comments

Comment #65

rodrigoaguilera

Spanish

Madrid

commented 17 January 2015 at 14:07

Status	File	Size
new	add_hash_column_to-851362-65.patch	2.31 KB

rerrol with no conflicts

Log in or register to post comments

Comment #66

rodrigoaguilera

Spanish

Madrid

commented 17 January 2015 at 14:08

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #67

gábor hojtsy

he/him

Hungarian

Hungary

commented 17 January 2015 at 14:14

OK how does this change the performance of imports and lookups? @Sutharsan in #25 explained a strategy but did not explain the results AFAIS. Need performance results.

Log in or register to post comments

Comment #68

andypost

he/him

Russian

commented 18 January 2015 at 07:37

Issue tags:

-Needs reroll

+needs profiling

20 files were hidden/shown/deleted

Status	File	Size
hidden	locale-hashkey-d7.patch	5.57 KB
hidden	851362-locale-hashkey-d7.patch	5.53 KB
hidden	851362-locale-hash-d8.patch	4.54 KB
hidden	851362-locale-hashkey.patch	5.11 KB
hidden	851362-locale-hashkey-21.patch	5.24 KB
hidden	851362-locale-hashkey-23.patch	5.23 KB
hidden	851362-locale-hashkey-29.patch	5.43 KB
hidden	performance-import.txt	944 bytes
hidden	performance-sql-selects.txt	948 bytes
hidden	ttest-36.txt	182 bytes
hidden	performance-sql-selects-36.txt	821 bytes
hidden	performance-import-36.txt	736 bytes
hidden	851362-locale-hashkey-36.patch	5.17 KB
hidden	851362-locale-hashkey-40.patch	5.78 KB
hidden	851362-locale-hashkey-41.patch	6.13 KB
hidden	851362-locale-hashkey-42.patch	6.63 KB
hidden	add_hash_column_to-851362-55.patch	1.9 KB
hidden	interdiff.txt	7.27 KB
hidden	add_hash_column_to-851362-59.patch	2.29 KB
hidden	interdiff.txt	7.66 KB

Log in or register to post comments

Comment #69

18 January 2015 at 07:37

Version:

8.0.x-dev

» 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #70

damienmckenna

TN, USA

commented 5 July 2016 at 15:40

Version:

8.1.x-dev

» 8.2.x-dev

Bumping to the 8.2.x branch.

Log in or register to post comments

Comment #71

5 July 2016 at 15:40

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #72

5 July 2016 at 15:40

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #73

5 July 2016 at 15:40

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #74

5 July 2016 at 15:40

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #75

5 July 2016 at 15:40

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #76

5 July 2016 at 15:40

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #77

mpp commented 8 April 2019 at 09:44

Status:

Needs review

» Needs work

Note that #65 is still using the old array notation.

Log in or register to post comments

Comment #78

8 April 2019 at 09:44

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #79

8 April 2019 at 09:44

Version:

8.9.x-dev

» 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #80

8 April 2019 at 09:44

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Log in or register to post comments

Comment #81

8 April 2019 at 09:44

Version:

9.2.x-dev

» 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #82

8 April 2019 at 09:44

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #83

8 April 2019 at 09:44

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #84

8 April 2019 at 09:44

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #85

8 April 2019 at 09:44

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #86

8 April 2019 at 09:44

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Add hash column to {locales_source} to query faster locale strings

Comments

The tests

The results