Updated: Comment #5
Problem/Motivation
Any string-like field specified in hook_schema
as 'binary' => FALSE
should compare case insensitively on PostgreSQL and SQLite too. They don't.
Proposed resolution
For SQLite, it is fixed in #2454733: Add a user-space case-insensitive collation to the SQLite driver none. It's hopeless. We can throw in NOCASE but it only helps the 26 ASCII letters. There's nothing we can do unless someone writes an sqlite3 driver (not PDO) and uses the (undocumented) createCollation method to register a comparator -- and that will be *slow* to run PHP for every comparison.
For PostgreSQL, we need to require the citext module. It is possible we want to require 9.1.2 but at least 8.4 for sure.
Remaining tasks
The pgsql driver needs to use the citext type when not binary. It is not slower than varchar http://www.depesz.com/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/ . Also, it needs to check it exists during install requirements. This I do not know how to.
User interface changes
None. Or, who knows... you have a case sensitive sort in Drupal 7? Well, that was a bug!
API changes
None.
Related Issues
#2068655: Entity fields do not support case sensitive queries
#1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL.
Comments
Comment #1
catchRe-classifying since we already have issues open for the functional bugs.
Comment #2
c960657 CreditAttribution: c960657 commentedThis was also discussed in 2009 in #333054: Make all varchar columns case sensitive (regression) (was make page cache case sensitive). But much has changed in the db layer since then, so I suggest we restart the discussion here.
Comment #3
c960657 CreditAttribution: c960657 commentedHere is some info on how other database engines handles this:
http://stackoverflow.com/questions/12855/database-case-insensitive-index
According to this, MS SQL uses collations like MySQL, and Oracle and DB2 use special indexes like PostgreSQL.
Comment #4
greg.1.anderson CreditAttribution: greg.1.anderson commentedc.f. #1575790: Update #7002 fails on postgres - ILIKE operator on bytea not supported.
Comment #5
webchickWe are over thresholds, blocking progress on Drupal 8, and there's been no patch for this issue, nor a plan of attack. I am demoting to major. If this is incorrect, please specify why.
Comment #5.0
chx CreditAttribution: chx commentedUpdated issue summary.
Comment #6
chx CreditAttribution: chx commentedComment #7
mikl CreditAttribution: mikl commentedAs someone who runs Drupal on PostgreSQL routinely, I see this as a minor issue (I've never had complaints). If it can be worked around by requiring newer versions of PostgreSQL, fine by me, but requiring you to install plugins for PostgreSQL would be a major headache for sysadmins, making the cure worse than the disease, IMO.
Comment #8
chx CreditAttribution: chx commentedInstall plugins no, it's one CREATE EXTENSION command on 9.1 and later.
Comment #9
mikl CreditAttribution: mikl commentedYeah, that should not be a problem if we use the
CREATE EXTENSION IF NOT EXISTS
variant. CREATE EXTENSION requires superuser access to the database cluster, so for security, you should not have Drupal run it, but rather run it with the a superuser account before installing Drupal.Comment #9.0
mikl CreditAttribution: mikl commentedUpdated issue summary.
Comment #10
bzrudi71 CreditAttribution: bzrudi71 commentedComment #11
chx CreditAttribution: chx commentedThis means we need to add per driver test exclusions since the test that lead @bzrudi71 here can not pass on SQLite.
Comment #12
amateescu CreditAttribution: amateescu commentedNote that I've implemented a fix for the SQLite driver in #2454733: Add a user-space case-insensitive collation to the SQLite driver.
Comment #13
daffie CreditAttribution: daffie commentedBoth MySQL and SQLite now support case sensitive queries. PostgreSQL can support it with the citext module. If we require the citext module for PostgreSQL all three supported databases support case sensitive queries. I am for the requirement of the citext module. We can add it in this issue or create a new issue for it.
Comment #14
bzrudi71 CreditAttribution: bzrudi71 commentedGiven that this will/should fix two remaining test fails and make PostgreSQL behave as any other DB we should at least try to create proof-of-concept patch and see if that works as expected. If it works and we decide to go for it, this will require changes to the Installer and Documentation and also patches for the docker containers (not to forget ;-)
Comment #15
bzrudi71 CreditAttribution: bzrudi71 commentedOkay, I gave that a very quick test. Enabled CITEXT and matched all varchar and text types to citext in Schema->getFieldTypeMap(). Install works fine and the types changed in database (USER-DEFINED):
And yes, that makes taxonomy TermTest pass now. But it doesn't fix the views GlossaryTest. A quick run of the database test group shows a new exception in CaseSensitivityTest (seems easy to fix btw). I think a full test run will show even more new fails and exceptions, on the other hand we could cleanup the pg-driver itself by removing some quirks like LIKE->ILIKE rewrites and so on.
Anyway, this needs a complete bot run to expose possible new fails and my local environment is ways to slow to do so ;-)
Someone going to mod docker to get a full run :-)
Comment #16
bzrudi71 CreditAttribution: bzrudi71 commentedNo one? ;-)
Don't worry, I just did a complete docker run with enabled citext. As expected we have around a hand full of new fails and exceptions, especially (and as expected) in the entity tests. Anyway, this will make PostgreSQL more compatible and prevent from new fails and exceptions in the future. Let's do it:
#2464481: PostgreSQL: deal with case insensitivity
Comment #21
RoSk0Cross-posting from @Damien Tournoud in #2477413-28: Increase minimum version requirement for Postgres to 9.1.2
I think this a great idea.
Comment #22
andypostYep, we can add collation to schema in minor core release
Comment #24
ferfebles CreditAttribution: ferfebles as a volunteer and commentedI would like to note that there are Postgresql performance problems due to how case sensitivity is handled.
In our case, we have more than 110K comments and 278K nodes. Our Drupal instance took about 30 seconds just to open the content admin interface.
The real problem was a lot of small queries like:
where the ILIKE forced Postgres to do a sequential scan over the URL_ALIAS table: 145K rows for every URL in a Drupal page.
We solved the problem enabling the 'pg_trgm' extension and creating indexes that ILIKE can use. But I think too that switching to case-sensitive, collation-sensitive by default could be a better long-term solution.
More information about our solution here: https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...
Comment #26
xurizaemon@ferfebles, thanks for the detailed post on pg_trgm approach. #2988018: [PP-1] Performance issues with path alias generated queries on PostgreSQL has a (currently WIP) patch which allows case-insensitive aliases, but doesn't depend on indexes to do it. Case sensitive URLs seems like another option with merit for sure.
Comment #32
bklineAlthough the description claims that this is fixed for SQLite, that's not what I'm seeing. For a 9.2.7 site running on MySQL, queries (both entity queries and direct database queries) match string values without regard to case, but those same queries return different results when running
phpunit
tests on SQLite, which is failing to match strings which differ only in case, breaking the tests. I'm working around the problem by replacing the '=' operator with 'LIKE' and avoiding 'IN' and instead usingorConditionGroup()
to break out each value into a separate 'LIKE' test, but obviously that's not an ideal solution.Comment #33
bklineAlso, I see that this issue has the
Needs issue summary update
tag. Although I see tracking information for how the issue's other tags got assigned, I don't see anything comparable for this tag. Where would I find that information, as well as a definition for the semantics of the tag?Comment #37
Chi CreditAttribution: Chi commentedThe issue summary does not explain why it's actually needed. Forcing case insensitivity across different databases has proven to be problematic. What if we just remove this obligation from DB drivers? Services that build SQL queries can take care about case insensitively themselves when needed.