The locale module is using MySQL specific data types for some key language related tables:

     'translation' => array(
        'type' => 'text',
        'mysql_type' => 'blob',
        'not null' => TRUE,
        'description' => 'Translation string value in this language.',
      ),
      'source' => array(
        'type' => 'text',
        'mysql_type' => 'blob',
        'not null' => TRUE,
        'description' => 'The original string in English.',
      ),

BLOB behaves as CASE SENSITIVE, and because the binary spec has not been set on these columns, when deployed on other database engines these columns are CASE INSENSITIVE.

From the MySQL docs:

For nonbinary strings (CHAR, VARCHAR, TEXT), string searches use the collation of the comparison operands. For binary strings (BINARY, VARBINARY, BLOB), comparisons use the numeric values of the bytes in the operands; this means that for alphabetic characters, comparisons will be case sensitive.

This completely messes up the caching logic in the locale() method - the one doing translations on all the website - and because the caching is broken (it expects a CASE SENSITIVE behaviour) the result is a big bunch of database queries issued on every request!

Porposed solution:

Add the binary specification to these fields, and a core update to fix already created columns.

Comments

david_garcia’s picture

Issue summary: View changes
david_garcia’s picture

Version: 7.x-dev » 8.0.x-dev

Thanks god this is still an issue in D8, otherwise it would have never got fixed in D7.

In locale_schema():

      'source' => array(
        'type' => 'text',
        'mysql_type' => 'blob',
        'not null' => TRUE,
        'description' => 'The original string in English.',
      ),
andypost’s picture

Issue tags: +D8MI, +language-base
mradcliffe’s picture

I think right now, locales_source and locales_target would be case sensitive on PostgreSQL, but not in SQLite because of #2454733: Add a user-space case-insensitive collation to the SQLite driver. It may make sense to create a test for SQLite to fail to demonstrate the problem.

@david_garcia, sqlsrv needs to treat this as "blob" type too, right? What about mongo? Does it matter? Oracle driver?

david_garcia’s picture

Rather than using database specific types, we should rely on the abstraction layer and only use the "binary" spec to tell this must be case insensitive.

MS SQL uses collations to deal with cases so no need to treat this as a blob or binary storage. And we have the binary spec for that, so we don't care about what the internal storage really is. We want to store text, and it must be case sensitive.

I cannot speak about the other database engines because I am totally lost there.

Regarding POSTGRE, in this issue:

#2464481: PostgreSQL: deal with case insensitivity

they are trying to bring case insensitivity into play, if that happens, the issue will appear on postgre too.

Thus, the portable field spec should be something like this:

      'source' => array(
        'type' => 'text',
        'binary' => TRUE,
        'not null' => TRUE,
        'description' => 'The original string in English.',
      ),

(removed the specific MySQL type and added the binary spec)

mradcliffe’s picture

Title: Locale caching algorithm is broken on Non MySQL databases » Locale caching algorithm is broken on Non MySQL/PostgreSQL databases

If binary or maybe case-sensitive were a required property in schema API, then I could solve the PostgreSQL case sensitive issue. We would then need to add that to all hook_schema() implementations and look through the field and entity storage property definitions for Sql as well. Making it mandatory is definitely a disruptive change thoug.

david_garcia’s picture

To my understading, omitting the binary spec means 'binary=FALSE'.

That's how it behaves now on MySQL, right? If you do not use the binary spec, it uses the default collation that is " utf8_general_ci" and that behaves as case insensitive.

So there would be no need to force the 'binary' spec on all schema definitions, just make sure that it is clear that omiting the binary spec means "case insensitive" by default.

catch’s picture

Status: Active » Closed (duplicate)

#2580671: Locale DB schema case insensitive (blob) only on MySQL not on other databases is a newer issue but has more details on the problem, marking this as duplicate but see you over there!