The locale module is using MySQL specific data types for some key language related tables:
'translation' => array(
'type' => 'text',
'mysql_type' => 'blob',
'not null' => TRUE,
'description' => 'Translation string value in this language.',
),
'source' => array(
'type' => 'text',
'mysql_type' => 'blob',
'not null' => TRUE,
'description' => 'The original string in English.',
),
BLOB behaves as CASE SENSITIVE, and because the binary spec has not been set on these columns, when deployed on other database engines these columns are CASE INSENSITIVE.
From the MySQL docs:
For nonbinary strings (CHAR, VARCHAR, TEXT), string searches use the collation of the comparison operands. For binary strings (BINARY, VARBINARY, BLOB), comparisons use the numeric values of the bytes in the operands; this means that for alphabetic characters, comparisons will be case sensitive.
This completely messes up the caching logic in the locale() method - the one doing translations on all the website - and because the caching is broken (it expects a CASE SENSITIVE behaviour) the result is a big bunch of database queries issued on every request!
Porposed solution:
Add the binary specification to these fields, and a core update to fix already created columns.
Comments
Comment #1
david_garcia CreditAttribution: david_garcia commentedComment #2
david_garcia CreditAttribution: david_garcia commentedThanks god this is still an issue in D8, otherwise it would have never got fixed in D7.
In locale_schema():
Comment #3
andypostComment #4
mradcliffeI think right now, locales_source and locales_target would be case sensitive on PostgreSQL, but not in SQLite because of #2454733: Add a user-space case-insensitive collation to the SQLite driver. It may make sense to create a test for SQLite to fail to demonstrate the problem.
@david_garcia, sqlsrv needs to treat this as "blob" type too, right? What about mongo? Does it matter? Oracle driver?
Comment #5
david_garcia CreditAttribution: david_garcia commentedRather than using database specific types, we should rely on the abstraction layer and only use the "binary" spec to tell this must be case insensitive.
MS SQL uses collations to deal with cases so no need to treat this as a blob or binary storage. And we have the binary spec for that, so we don't care about what the internal storage really is. We want to store text, and it must be case sensitive.
I cannot speak about the other database engines because I am totally lost there.
Regarding POSTGRE, in this issue:
#2464481: PostgreSQL: deal with case insensitivity
they are trying to bring case insensitivity into play, if that happens, the issue will appear on postgre too.
Thus, the portable field spec should be something like this:
(removed the specific MySQL type and added the binary spec)
Comment #6
mradcliffeIf binary or maybe case-sensitive were a required property in schema API, then I could solve the PostgreSQL case sensitive issue. We would then need to add that to all hook_schema() implementations and look through the field and entity storage property definitions for Sql as well. Making it mandatory is definitely a disruptive change thoug.
Comment #7
david_garcia CreditAttribution: david_garcia commentedTo my understading, omitting the binary spec means 'binary=FALSE'.
That's how it behaves now on MySQL, right? If you do not use the binary spec, it uses the default collation that is " utf8_general_ci" and that behaves as case insensitive.
So there would be no need to force the 'binary' spec on all schema definitions, just make sure that it is clear that omiting the binary spec means "case insensitive" by default.
Comment #8
catch#2580671: Locale DB schema case insensitive (blob) only on MySQL not on other databases is a newer issue but has more details on the problem, marking this as duplicate but see you over there!