Problem/Motivation

On MySQL, VARCHAR defaults to a case insensitive collation. Currently in Drupal there is no way to specify case-sensitive (aka 'BINARY') collation. (PostgreSQL and SQLite both default to case sensitive collation, so this does not apply to them)

Proposed resolution

Patch schema API to support VARCHAR BINARY attribute on MySQL via 'binary' => TRUE. This attribute could be ignored on non-mysql dbs as VARCHAR is already case-sensitive on postgres / sqlite.

The BINARY attribute in MySQL is not to store binary data, that would be VARBINARY and BLOB. It's to store text with a defined character set but a binary case-sensitive collation. The BINARY attribute automatically chooses an appropriate collation for the column's character set. E.g. varchar(255) BINARY is equivalent to varchar(255) CHARACTER SET utf8 COLLATE utf8_bin if the table has DEFAULT CHARSET=utf8.

For Postgres and Sqlite, operations on varchar and text columns are case-sensitive by default, so as far as I know setting binary collation is only useful for MySQL.

Rationale: There are at least two other patches that are blocked on this API addition:

While this is an API change, it is technically only an API addition and thus could be backported to D7.

Remaining tasks

Patch was approved, but discussion is now about how to improve what's being changed, and extend support for PostgreSQL and SQLite

User interface changes

(new or changed features/functionality in the user interface, modules added or removed, changes to URL paths, changes to user interface text)

API changes

(API changes/additions that would affect module, install profile, and theme developers, including examples of before/after code if appropriate)

Original report by [bfroehle]

// Text of original report here.

On MySQL, VARCHAR defaults to a case insensitive collation. Currently in Drupal there is no way to specify case-sensitive (aka 'BINARY') collation. (PostgreSQL and SQLite both default to case sensitive collation, so this does not apply to them).

Suggestion:
Patch schema API to support VARCHAR BINARY attribute on MySQL via 'binary' => TRUE. This attribute could be ignored on non-mysql dbs as VARCHAR is already case-sensitive on postgres / sqlite.

The BINARY attribute in MySQL is not to store binary data, that would be VARBINARY and BLOB. It's to store text with a defined character set but a binary case-sensitive collation. The BINARY attribute automatically chooses an appropriate collation for the column's character set. E.g. varchar(255) BINARY is equivalent to varchar(255) CHARACTER SET utf8 COLLATE utf8_bin if the table has DEFAULT CHARSET=utf8.

For Postgres and Sqlite, operations on varchar and text columns are case-sensitive by default, so as far as I know setting binary collation is only useful for MySQL.

Rationale: There are at least two other patches that are blocked on this API addition:

While this is an API change, it is technically only an API addition and thus could be backported to D7.

Comments

bfroehle’s picture

bfroehle’s picture

Status: Active » Needs review
sun’s picture

Issue tags: +Needs tests

Thanks, this should be covered by database tests though.

bfroehle’s picture

StatusFileSize
new4.52 KB
new2.34 KB

With tests.

sun’s picture

Status: Needs review » Reviewed & tested by the community

Thanks, this looks good to me.

dries’s picture

This looks good to me too. Would be nice to get a couple more +1s though.

bfroehle’s picture

Attribution for the patch, by the way, should go to mfb. (This was set as the author line in the patches, but not noted here).

drewish’s picture

StatusFileSize
new4.55 KB

Updated the docs to make the data type clearer. Side note it should be Boolean not boolean because it's an eponym but I'll save that for a separate issue and do a core wide find and replace.

drewish’s picture

Status: Reviewed & tested by the community » Needs review

I'll add that this has my +1, but I'll kick it back to CNR so someone who didn't touch the patch can double check it.

bfroehle’s picture

Status: Needs review » Reviewed & tested by the community

The #4..#9 interdiff is comments only:

- *     - 'binary': For type 'char', 'varchar' or 'text' fields on MySQL, forces
- *       a case-sensitive binary collation. This has no effect on other database
- *       types for which case sensitivity is already the default behavior.
+ *     - 'binary': A boolean indicating that MySQL should force 'char',
+ *       'varchar' or 'text' fields to use case-sensitive binary collation.
+ *       This has no effect on other database types for which case sensitivity
+ *       is already the default behavior.

This seems reasonable so I'm moving this back to RTBC.

Anonymous’s picture

+1 from me.

Using a binary (utf8_bin) on tables like the date_formats is excellent. I had a similar issue #1006826: Case sensitive custom date formats 'd-m-y' and 'd-M-Y': PDOException: duplicate entry 'd-m-y-custom' and marked it as a duplicate of this issue. This solution from #4 is the right way.

catch’s picture

Category: task » bug
Priority: Normal » Critical

Critical because it is blocking #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case, this part of the patch has just been moved here for easier review.

damien tournoud’s picture

Status: Reviewed & tested by the community » Needs work

Sorry for bumping this back, but I am very uncomfortable with this. The current situation is a total mess that this patch doesn't even come close to resolving.

Before the patch:

MySQL: case insensitive, accent insensitive collation; affects the = operator and LIKE operator
PostgreSQL: collation only affects sort, ILIKE is used instead of LIKE (which is case insensitive but accent sensitive)
SQLite: the BINARY collation is the default: binary, exact matching for =, exact prefix for LIKE

After the patch:

MySQL:
 - For non-binary columns: case insensitive, accent insensitive collation; affects the = operator and LIKE operator
 - For binary columns: binary, exact matching for =, exact prefix for LIKE
PostgreSQL: for both non-binary and binary columns: collation only affects sort, ILIKE is used instead of LIKE (which is case insensitive but accent sensitive)
SQLite: for both non-binary and binary columns: binary, exact matching for =, exact prefix for LIKE

So it seems to me that we need to extend support for PostgreSQL and SQLite:

  • For PostgreSQL: we need to implement the binary column as a real bytea, but how to make the ILIKE operator conditional?
  • For SQLite: we should at least use the NOCASE collation for non-binary column, pending implementation of better collations (see my patch in https://bugs.php.net/bug.php?id=55226)
drewish’s picture

Damien, well how would you propose we deal with #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case? Because the upgrade path is still broken.

makara’s picture

Sub.

From what I see, the different DBs cannot be exactly the same. I'd suggest to keep the current behaviors and only use "case sensitive like" when needed, i.e. for filenames.

sylvain lecoy’s picture

what about adding a 'collate' field in the schema definition ?

jbrown’s picture

[made some edits]

I've spent a few hours researching this. Please correct any mistakes.

We need to have the option to index char and varchar columns with and without case sensitivity.

PostgreSQL may be able to have both indexes for a single column: http://archives.postgresql.org/pgsql-sql/2003-09/msg00395.php , but MySQL cannot: http://lists.mysql.com/mysql/205551 as it is specified in the collation of the column.

So we need an option in the schema for whether or not a column should be case sensitive.

For case sensitive columns:
MySQL - use utf8_bin collation / BINARY keyword
PostgreSQL - default
SQLite - default

For case insensitive columns:
MySQL - use utf8_general_ci

PostgreSQL:

SQLite - use NOCASE collation

jbrown’s picture

Update to my previous post:

I don't think the database layer has access to the schema information, so for case insensitive columns ILIKE would have to be specified in condition() calls. This would then be converted to LIKE for mysql and sqlite. For postresql it would be converted to "LOWER() LIKE LOWER()" (to use the index).

andypost’s picture

Bump. This critical still has no patch and solution.

#19 db-backend has ability to retrieve info from information_schema see schema.inc

I think we can extend LIKE with ILIKE or similar operator

#14 points to postgresql and sqlite - pgsql conversion to bytea could be doen in follow-up but sqlite still depends on unresolved https://bugs.php.net/bug.php?id=55226

mikeryan’s picture

There's no controversy over the patch in #9 being the correct fix for the critical MySQL upgrade issue, correct? Given that inconsistent collation behavior among the database engines is a pre-existing condition that is not made any worse by this patch, why don't we open a separate major issue for normalizing behavior across the other engines, and commit this one?

tim.plunkett’s picture

Tagging.

catch’s picture

iamEAP’s picture

Status: Needs work » Needs review

Agree with mikeryan in #21.

I've broken this out into a new issue #1518506: Normalize how case sensitivity is handled across database engines.

In the meantime, can we take another look at #9?

berdir’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests, -Needs issue summary update, -API addition, -D7 upgrade path, -Needs backport to D7

#9: db_1237252.patch queued for re-testing.

Agreed, triggered a request of the patch in #9, I don't think he will apply anymore :)

Status: Needs review » Needs work

The last submitted patch, db_1237252.patch, failed testing.

berdir’s picture

Here's a re-roll for the patch from #9.

iamEAP’s picture

Status: Needs review » Reviewed & tested by the community

Works for me and there was pretty widespread consensus previously. RTBC.

btmash’s picture

Since this is now strictly concerning mysql, it would be a +1 from me.

Crell’s picture

Status: Reviewed & tested by the community » Needs work

This doesn't appear to address #14 and #18, both of which are valid concerns. This feels more like a MySQL hack than a proper solution to still more database fugliness.

Crell’s picture

Status: Needs work » Reviewed & tested by the community

Update: Talked with catch a bit more in IRC. I totally missed the link to #1518506: Normalize how case sensitivity is handled across database engines above, probably because it was not colored. :-) Given that the follow-up is critical, I can deal with this as a temporary solution to un-bork the update process.

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

I agree with getting the easier fix in to unblock the upgrade path fixes, those two {files} upgrade issues are very painful.

Committed/pushed to 8.x, marking for backport to 7.x.

iamEAP’s picture

Status: Patch (to be ported) » Needs review
StatusFileSize
new3.83 KB

Same against 7.x.

Status: Needs review » Needs work

The last submitted patch, core-binary_attr_mysql-1237252-33.patch, failed testing.

drumm’s picture

Status: Needs work » Needs review
StatusFileSize
new4.37 KB

Reroll.

webchick’s picture

Issue tags: +drupal.org D7
iamEAP’s picture

Status: Needs review » Reviewed & tested by the community

I'm using #35 in an upgrade process, myself. Marking RTBC, but could hear from others, possibly.

mfb’s picture

hrmph I didn't get commit credit after all :p Looks RTBC to me.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Great, thanks a lot, all. mfb, I gave you credit in D7 anyway. :)

Committed and pushed to 7.x. Thanks!

webchick’s picture

Issue tags: +7.13 release notes

x

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

reformat issue summary