Locale caching algorithm is broken on Non MySQL/PostgreSQL databases [#2490976]

The locale module is using MySQL specific data types for some key language related tables:

     'translation' => array(
        'type' => 'text',
        'mysql_type' => 'blob',
        'not null' => TRUE,
        'description' => 'Translation string value in this language.',
      ),

      'source' => array(
        'type' => 'text',
        'mysql_type' => 'blob',
        'not null' => TRUE,
        'description' => 'The original string in English.',
      ),

BLOB behaves as CASE SENSITIVE, and because the binary spec has not been set on these columns, when deployed on other database engines these columns are CASE INSENSITIVE.

From the MySQL docs:

For nonbinary strings (CHAR, VARCHAR, TEXT), string searches use the collation of the comparison operands. For binary strings (BINARY, VARBINARY, BLOB), comparisons use the numeric values of the bytes in the operands; this means that for alphabetic characters, comparisons will be case sensitive.

This completely messes up the caching logic in the locale() method - the one doing translations on all the website - and because the caching is broken (it expects a CASE SENSITIVE behaviour) the result is a big bunch of database queries issued on every request!

Porposed solution:

Add the binary specification to these fields, and a core update to fix already created columns.

Comments

Comment #1

david_garcia CreditAttribution: david_garcia commented 18 May 2015 at 22:18

Issue summary:

View changes

Comment #2

david_garcia CreditAttribution: david_garcia commented 18 May 2015 at 22:21

Version:

7.x-dev

» 8.0.x-dev

Thanks god this is still an issue in D8, otherwise it would have never got fixed in D7.

In locale_schema():

      'source' => array(
        'type' => 'text',
        'mysql_type' => 'blob',
        'not null' => TRUE,
        'description' => 'The original string in English.',
      ),

Comment #3

andypost

he/him

Russian

CreditAttribution: andypost as a volunteer commented 20 May 2015 at 00:26

Issue tags:

+D8MI, +language-base

Comment #4

mradcliffe

he/him

English

CreditAttribution: mradcliffe at Kosada commented 20 May 2015 at 12:19

I think right now, locales_source and locales_target would be case sensitive on PostgreSQL, but not in SQLite because of #2454733: Add a user-space case-insensitive collation to the SQLite driver. It may make sense to create a test for SQLite to fail to demonstrate the problem.

@david_garcia, sqlsrv needs to treat this as "blob" type too, right? What about mongo? Does it matter? Oracle driver?

Comment #5

david_garcia CreditAttribution: david_garcia commented 20 May 2015 at 15:02

Rather than using database specific types, we should rely on the abstraction layer and only use the "binary" spec to tell this must be case insensitive.

MS SQL uses collations to deal with cases so no need to treat this as a blob or binary storage. And we have the binary spec for that, so we don't care about what the internal storage really is. We want to store text, and it must be case sensitive.

I cannot speak about the other database engines because I am totally lost there.

Regarding POSTGRE, in this issue:

#2464481: PostgreSQL: deal with case insensitivity

they are trying to bring case insensitivity into play, if that happens, the issue will appear on postgre too.

Thus, the portable field spec should be something like this:

      'source' => array(
        'type' => 'text',
        'binary' => TRUE,
        'not null' => TRUE,
        'description' => 'The original string in English.',
      ),

(removed the specific MySQL type and added the binary spec)

Comment #6

mradcliffe

he/him

English

CreditAttribution: mradcliffe at Kosada commented 20 May 2015 at 15:14

Title:

Locale caching algorithm is broken on Non MySQL databases

» Locale caching algorithm is broken on Non MySQL/PostgreSQL databases

If binary or maybe case-sensitive were a required property in schema API, then I could solve the PostgreSQL case sensitive issue. We would then need to add that to all hook_schema() implementations and look through the field and entity storage property definitions for Sql as well. Making it mandatory is definitely a disruptive change thoug.

Comment #7

david_garcia CreditAttribution: david_garcia commented 20 May 2015 at 15:24

To my understading, omitting the binary spec means 'binary=FALSE'.

That's how it behaves now on MySQL, right? If you do not use the binary spec, it uses the default collation that is " utf8_general_ci" and that behaves as case insensitive.

So there would be no need to force the 'binary' spec on all schema definitions, just make sure that it is clear that omiting the binary spec means "case insensitive" by default.