Updated: Comment #5

Problem/Motivation

Any string-like field specified in hook_schema as 'binary' => FALSE should compare case insensitively on PostgreSQL and SQLite too. They don't.

Proposed resolution

For SQLite, it is fixed in #2454733: Add a user-space case-insensitive collation to the SQLite driver none. It's hopeless. We can throw in NOCASE but it only helps the 26 ASCII letters. There's nothing we can do unless someone writes an sqlite3 driver (not PDO) and uses the (undocumented) createCollation method to register a comparator -- and that will be *slow* to run PHP for every comparison.

For PostgreSQL, we need to require the citext module. It is possible we want to require 9.1.2 but at least 8.4 for sure.

Remaining tasks

The pgsql driver needs to use the citext type when not binary. It is not slower than varchar http://www.depesz.com/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/ . Also, it needs to check it exists during install requirements. This I do not know how to.

User interface changes

None. Or, who knows... you have a case sensitive sort in Drupal 7? Well, that was a bug!

API changes

None.

#2068655: Entity fields do not support case sensitive queries
#1237252: DB Case Sensitivity: Allow BINARY attribute in MySQL.

Comments

catch’s picture

Category: bug » task
Priority: Major » Critical

Re-classifying since we already have issues open for the functional bugs.

c960657’s picture

This was also discussed in 2009 in #333054: Make all varchar columns case sensitive (regression) (was make page cache case sensitive). But much has changed in the db layer since then, so I suggest we restart the discussion here.

c960657’s picture

Here is some info on how other database engines handles this:
http://stackoverflow.com/questions/12855/database-case-insensitive-index

According to this, MS SQL uses collations like MySQL, and Oracle and DB2 use special indexes like PostgreSQL.

greg.1.anderson’s picture

webchick’s picture

Priority: Critical » Major

We are over thresholds, blocking progress on Drupal 8, and there's been no patch for this issue, nor a plan of attack. I am demoting to major. If this is incorrect, please specify why.

chx’s picture

Issue summary: View changes

Updated issue summary.

chx’s picture

Title: Normalize how case sensitivity is handled across database engines » Non-MYSQL engines sort binary FALSE case sensitively
mikl’s picture

Title: Non-MYSQL engines sort binary FALSE case sensitively » Normalize how case sensitivity is handled across database engines

As someone who runs Drupal on PostgreSQL routinely, I see this as a minor issue (I've never had complaints). If it can be worked around by requiring newer versions of PostgreSQL, fine by me, but requiring you to install plugins for PostgreSQL would be a major headache for sysadmins, making the cure worse than the disease, IMO.

chx’s picture

Install plugins no, it's one CREATE EXTENSION command on 9.1 and later.

mikl’s picture

Yeah, that should not be a problem if we use the CREATE EXTENSION IF NOT EXISTS variant. CREATE EXTENSION requires superuser access to the database cluster, so for security, you should not have Drupal run it, but rather run it with the a superuser account before installing Drupal.

mikl’s picture

Issue summary: View changes

Updated issue summary.

bzrudi71’s picture

chx’s picture

Issue tags: +sqlite

This means we need to add per driver test exclusions since the test that lead @bzrudi71 here can not pass on SQLite.

amateescu’s picture

Note that I've implemented a fix for the SQLite driver in #2454733: Add a user-space case-insensitive collation to the SQLite driver.

daffie’s picture

Issue summary: View changes

Both MySQL and SQLite now support case sensitive queries. PostgreSQL can support it with the citext module. If we require the citext module for PostgreSQL all three supported databases support case sensitive queries. I am for the requirement of the citext module. We can add it in this issue or create a new issue for it.

bzrudi71’s picture

Given that this will/should fix two remaining test fails and make PostgreSQL behave as any other DB we should at least try to create proof-of-concept patch and see if that works as expected. If it works and we decide to go for it, this will require changes to the Installer and Documentation and also patches for the docker containers (not to forget ;-)

bzrudi71’s picture

Okay, I gave that a very quick test. Enabled CITEXT and matched all varchar and text types to citext in Schema->getFieldTypeMap(). Install works fine and the types changed in database (USER-DEFINED):

d8=# select column_name, data_type, character_maximum_length
d8-# from INFORMATION_SCHEMA.COLUMNS where table_name = 'node__body';
 column_name  |  data_type   | character_maximum_length 
--------------+--------------+--------------------------
 bundle       | USER-DEFINED |                         
 deleted      | smallint     |                         
 entity_id    | bigint       |                         
 revision_id  | bigint       |                         
 langcode     | USER-DEFINED |                         
 delta        | bigint       |                         
 body_value   | USER-DEFINED |                         
 body_summary | USER-DEFINED |                         
 body_format  | USER-DEFINED |                         
(9 rows)

d8=#

And yes, that makes taxonomy TermTest pass now. But it doesn't fix the views GlossaryTest. A quick run of the database test group shows a new exception in CaseSensitivityTest (seems easy to fix btw). I think a full test run will show even more new fails and exceptions, on the other hand we could cleanup the pg-driver itself by removing some quirks like LIKE->ILIKE rewrites and so on.

Anyway, this needs a complete bot run to expose possible new fails and my local environment is ways to slow to do so ;-)
Someone going to mod docker to get a full run :-)

bzrudi71’s picture

No one? ;-)
Don't worry, I just did a complete docker run with enabled citext. As expected we have around a hand full of new fails and exceptions, especially (and as expected) in the entity tests. Anyway, this will make PostgreSQL more compatible and prevent from new fails and exceptions in the future. Let's do it:

#2464481: PostgreSQL: deal with case insensitivity

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

RoSk0’s picture

Version: 8.4.x-dev » 8.5.x-dev
Issue tags: +MySQL

Cross-posting from @Damien Tournoud in #2477413-28: Increase minimum version requirement for Postgres to 9.1.2

I'm not optimistic about citext. It would make more sense to me to switch to case-sensitive, collation-sensitive *by default everywhere* (including on MySQL), and handle case and collation sensitivity manually in PHP. This is the only way to have a well-defined, consistent and efficient behavior on all database engines.

I think this a great idea.

andypost’s picture

Yep, we can add collation to schema in minor core release

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

ferfebles’s picture

I would like to note that there are Postgresql performance problems due to how case sensitivity is handled.
In our case, we have more than 110K comments and 278K nodes. Our Drupal instance took about 30 seconds just to open the content admin interface.
The real problem was a lot of small queries like:

SELECT URL_ALIAS.SOURCE, LANGCODE, PID FROM URL_ALIAS
WHERE (ALIAS ILIKE '/SOME/URL/PATH/HERE')
      AND (LANGCODE IN ('ES', 'UND'))
ORDER BY LANGCODE ASC NULLS FIRST, PID DESC NULLS LAST;

where the ILIKE forced Postgres to do a sequential scan over the URL_ALIAS table: 145K rows for every URL in a Drupal page.
We solved the problem enabling the 'pg_trgm' extension and creating indexes that ILIKE can use. But I think too that switching to case-sensitive, collation-sensitive by default could be a better long-term solution.
More information about our solution here: https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

xurizaemon’s picture

@ferfebles, thanks for the detailed post on pg_trgm approach. #2988018: [PP-1] Performance issues with path alias generated queries on PostgreSQL has a (currently WIP) patch which allows case-insensitive aliases, but doesn't depend on indexes to do it. Case sensitive URLs seems like another option with merit for sure.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

bkline’s picture

Although the description claims that this is fixed for SQLite, that's not what I'm seeing. For a 9.2.7 site running on MySQL, queries (both entity queries and direct database queries) match string values without regard to case, but those same queries return different results when running phpunit tests on SQLite, which is failing to match strings which differ only in case, breaking the tests. I'm working around the problem by replacing the '=' operator with 'LIKE' and avoiding 'IN' and instead using orConditionGroup() to break out each value into a separate 'LIKE' test, but obviously that's not an ideal solution.

bkline’s picture

Also, I see that this issue has the Needs issue summary update tag. Although I see tracking information for how the issue's other tags got assigned, I don't see anything comparable for this tag. Where would I find that information, as well as a definition for the semantics of the tag?

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Chi’s picture

Any string-like field specified in hook_schema as 'binary' => FALSE should compare case insensitively on PostgreSQL and SQLite too. They don't.

The issue summary does not explain why it's actually needed. Forcing case insensitivity across different databases has proven to be problematic. What if we just remove this obligation from DB drivers? Services that build SQL queries can take care about case insensitively themselves when needed.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.