Problem/Motivation
On MySQL, VARCHAR defaults to a case insensitive collation. Currently in Drupal there is no way to specify case-sensitive (aka 'BINARY') collation. (PostgreSQL and SQLite both default to case sensitive collation, so this does not apply to them)
Proposed resolution
Patch schema API to support VARCHAR BINARY attribute on MySQL via 'binary' => TRUE. This attribute could be ignored on non-mysql dbs as VARCHAR is already case-sensitive on postgres / sqlite.
The BINARY attribute in MySQL is not to store binary data, that would be VARBINARY and BLOB. It's to store text with a defined character set but a binary case-sensitive collation. The BINARY attribute automatically chooses an appropriate collation for the column's character set. E.g. varchar(255) BINARY is equivalent to varchar(255) CHARACTER SET utf8 COLLATE utf8_bin if the table has DEFAULT CHARSET=utf8.
For Postgres and Sqlite, operations on varchar and text columns are case-sensitive by default, so as far as I know setting binary collation is only useful for MySQL.
Rationale: There are at least two other patches that are blocked on this API addition:
- #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case
- #1012620: Unique key on date_formats.(format|type) is problematic for case insensitive collations
While this is an API change, it is technically only an API addition and thus could be backported to D7.
Remaining tasks
Patch was approved, but discussion is now about how to improve what's being changed, and extend support for PostgreSQL and SQLite
User interface changes
(new or changed features/functionality in the user interface, modules added or removed, changes to URL paths, changes to user interface text)
API changes
(API changes/additions that would affect module, install profile, and theme developers, including examples of before/after code if appropriate)
Original report by [bfroehle]
// Text of original report here.
On MySQL, VARCHAR defaults to a case insensitive collation. Currently in Drupal there is no way to specify case-sensitive (aka 'BINARY') collation. (PostgreSQL and SQLite both default to case sensitive collation, so this does not apply to them).
Suggestion:
Patch schema API to support VARCHAR BINARY attribute on MySQL via 'binary' => TRUE. This attribute could be ignored on non-mysql dbs as VARCHAR is already case-sensitive on postgres / sqlite.
The BINARY attribute in MySQL is not to store binary data, that would be VARBINARY and BLOB. It's to store text with a defined character set but a binary case-sensitive collation. The BINARY attribute automatically chooses an appropriate collation for the column's character set. E.g. varchar(255) BINARY is equivalent to varchar(255) CHARACTER SET utf8 COLLATE utf8_bin if the table has DEFAULT CHARSET=utf8.
For Postgres and Sqlite, operations on varchar and text columns are case-sensitive by default, so as far as I know setting binary collation is only useful for MySQL.
Rationale: There are at least two other patches that are blocked on this API addition:
- #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case
- #1012620: Unique key on date_formats.(format|type) is problematic for case insensitive collations
While this is an API change, it is technically only an API addition and thus could be backported to D7.
| Comment | File | Size | Author |
|---|---|---|---|
| #35 | 1237252.patch | 4.37 KB | drumm |
| #33 | core-binary_attr_mysql-1237252-33.patch | 3.83 KB | iamEAP |
| #27 | db_1237252_reroll.patch | 3.96 KB | berdir |
| #9 | db_1237252.patch | 4.55 KB | drewish |
| #4 | 1237252-test-only.patch | 2.34 KB | bfroehle |
Comments
Comment #1
bfroehle commentedThe attached patch is the relevant parts of #966210-45: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case.
Comment #2
bfroehle commentedComment #3
sunThanks, this should be covered by database tests though.
Comment #4
bfroehle commentedWith tests.
Comment #5
sunThanks, this looks good to me.
Comment #7
dries commentedThis looks good to me too. Would be nice to get a couple more +1s though.
Comment #8
bfroehle commentedAttribution for the patch, by the way, should go to mfb. (This was set as the author line in the patches, but not noted here).
Comment #9
drewish commentedUpdated the docs to make the data type clearer. Side note it should be Boolean not boolean because it's an eponym but I'll save that for a separate issue and do a core wide find and replace.
Comment #10
drewish commentedI'll add that this has my +1, but I'll kick it back to CNR so someone who didn't touch the patch can double check it.
Comment #11
bfroehle commentedThe #4..#9 interdiff is comments only:
This seems reasonable so I'm moving this back to RTBC.
Comment #12
Anonymous (not verified) commented+1 from me.
Using a binary (utf8_bin) on tables like the date_formats is excellent. I had a similar issue #1006826: Case sensitive custom date formats 'd-m-y' and 'd-M-Y': PDOException: duplicate entry 'd-m-y-custom' and marked it as a duplicate of this issue. This solution from #4 is the right way.
Comment #13
catchCritical because it is blocking #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case, this part of the patch has just been moved here for easier review.
Comment #14
damien tournoud commentedSorry for bumping this back, but I am very uncomfortable with this. The current situation is a total mess that this patch doesn't even come close to resolving.
Before the patch:
After the patch:
So it seems to me that we need to extend support for PostgreSQL and SQLite:
bytea, but how to make the ILIKE operator conditional?Comment #15
drewish commentedDamien, well how would you propose we deal with #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case? Because the upgrade path is still broken.
Comment #16
makara commentedSub.
From what I see, the different DBs cannot be exactly the same. I'd suggest to keep the current behaviors and only use "case sensitive like" when needed, i.e. for filenames.
Comment #17
sylvain lecoy commentedwhat about adding a 'collate' field in the schema definition ?
Comment #18
jbrown commented[made some edits]
I've spent a few hours researching this. Please correct any mistakes.
We need to have the option to index char and varchar columns with and without case sensitivity.
PostgreSQL may be able to have both indexes for a single column: http://archives.postgresql.org/pgsql-sql/2003-09/msg00395.php , but MySQL cannot: http://lists.mysql.com/mysql/205551 as it is specified in the collation of the column.
So we need an option in the schema for whether or not a column should be case sensitive.
For case sensitive columns:
MySQL - use utf8_bin collation / BINARY keyword
PostgreSQL - default
SQLite - default
For case insensitive columns:
MySQL - use utf8_general_ci
PostgreSQL:
SQLite - use NOCASE collation
Comment #19
jbrown commentedUpdate to my previous post:
I don't think the database layer has access to the schema information, so for case insensitive columns ILIKE would have to be specified in condition() calls. This would then be converted to LIKE for mysql and sqlite. For postresql it would be converted to "LOWER() LIKE LOWER()" (to use the index).
Comment #20
andypostBump. This critical still has no patch and solution.
#19 db-backend has ability to retrieve info from information_schema see schema.inc
I think we can extend LIKE with ILIKE or similar operator
#14 points to postgresql and sqlite - pgsql conversion to bytea could be doen in follow-up but sqlite still depends on unresolved https://bugs.php.net/bug.php?id=55226
Comment #21
mikeryanThere's no controversy over the patch in #9 being the correct fix for the critical MySQL upgrade issue, correct? Given that inconsistent collation behavior among the database engines is a pre-existing condition that is not made any worse by this patch, why don't we open a separate major issue for normalizing behavior across the other engines, and commit this one?
Comment #22
tim.plunkettTagging.
Comment #23
catchTagging since the blocks #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case.
Comment #24
iamEAP commentedAgree with mikeryan in #21.
I've broken this out into a new issue #1518506: Normalize how case sensitivity is handled across database engines.
In the meantime, can we take another look at #9?
Comment #25
berdir#9: db_1237252.patch queued for re-testing.
Agreed, triggered a request of the patch in #9, I don't think he will apply anymore :)
Comment #27
berdirHere's a re-roll for the patch from #9.
Comment #28
iamEAP commentedWorks for me and there was pretty widespread consensus previously. RTBC.
Comment #29
btmash commentedSince this is now strictly concerning mysql, it would be a +1 from me.
Comment #30
Crell commentedThis doesn't appear to address #14 and #18, both of which are valid concerns. This feels more like a MySQL hack than a proper solution to still more database fugliness.
Comment #31
Crell commentedUpdate: Talked with catch a bit more in IRC. I totally missed the link to #1518506: Normalize how case sensitivity is handled across database engines above, probably because it was not colored. :-) Given that the follow-up is critical, I can deal with this as a temporary solution to un-bork the update process.
Comment #32
catchI agree with getting the easier fix in to unblock the upgrade path fixes, those two {files} upgrade issues are very painful.
Committed/pushed to 8.x, marking for backport to 7.x.
Comment #33
iamEAP commentedSame against 7.x.
Comment #35
drummReroll.
Comment #36
webchickDrupal.org is hitting this, according to #1481026-11: Get servers in place for Drupal 7 staging/dev environments.
Comment #37
iamEAP commentedI'm using #35 in an upgrade process, myself. Marking RTBC, but could hear from others, possibly.
Comment #38
mfbhrmph I didn't get commit credit after all :p Looks RTBC to me.
Comment #39
webchickGreat, thanks a lot, all. mfb, I gave you credit in D7 anyway. :)
Committed and pushed to 7.x. Thanks!
Comment #40
webchickx
Comment #41.0
(not verified) commentedreformat issue summary