Active
Project:
Drupal core
Version:
main
Component:
base system
Priority:
Normal
Category:
Task
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
11 Nov 2008 at 19:12 UTC
Updated:
20 Mar 2020 at 16:52 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
m3avrck commentedSubscribing...
Comment #2
gábor hojtsyOh, ok, this looks like a regression, since #99970: Page cache matching is case insensitive is indeed not reflected in the Drupal 6 schema.
Comment #3
damien tournoud commentedThat's a fun issue: the schema API introduced in Drupal 6 doesn't know anything about collation, and can't set the "BINARY" flag. Bumping to 7. This is a critical issue to solve and backport to 6.
Comment #4
Scott Reynolds commentedWow never thought this would ever get any attention. So I never explain a bit more.
With the new menu system, not sure that u can have case sensitive paths any more. For instance in Drupal 6, /NODE and /node are the same on MySQL. It is actually a lot more involved then just adding a feature to schema. Menu system needs to updated first, before changing the page_cache table. Otherwise, u end up caching NoDE and node but both line up with the same callback and result. This then puts two records in the page cache that are redundant. There is no difference in Drupal between NoDE and node, so caching two sets is useless.
Now this behaviour I believe will be different on Postgres as it is case sensitive ? Thus the menu system will behave like it does in D5( NoDE and node will be different).
Comment #5
c960657 commentedComment #6
c960657 commentedThis patch makes all varchar columns case-sensitive. AFAIK this makes MySQL behave like PostgreSQL. Amazingly enough all tests still pass (I have only tested with MySQL).
So far the patch does not convert existing tables. In fact, making URLs case-insentive may break some non-standard URLs when upgrading from D6->D7 (e.g. http://example.com/NODE/1), unless some other trickery is done.
Also, I have no idea about the consequences for other parts of Drupal.
Comment #7
c960657 commentedSorry, wrong patch.
Comment #8
c960657 commentedThis also makes existing tables case-sensitive during upgrade from D6.
I chose to upgrade tables from D6 core modules. I couldn't find a way to get the tables belonging to the currently installed modules only - drupal_get_schema() reflects the table names after the upgrade process has completed, but I think it's better to modify tables belonging to external modules before their hook_update_N has been invoked (so they have a chance to change back the collation if actually really want the case-insensitive behaviour and are MySQL-only). Another option is to modify all tables in the database, but that is probably a bit to bold.
The system_update_N function was added as 0012 in order to appear prior to any table renames.
Note that to test this, you may need to comment out system_update_7019 after having applied the patch (see #363262: Missing index on url_alias table).
Comment #9
dries commentedAt some point, it is probably useful to have a bit more control over this at the API level but right now, this patch improves the current situation. This patch looks good to me, but
1. Let's add a code comment to:
+ $table['mysql_suffix'] = 'DEFAULT CHARACTER SET UTF8 COLLATE utf8_bin';That code comment should capture some of the knowledge in this issue.
2. Please also add a test case for this problem to support future database backends; e.g. it helps make sure that new backends works as expected.
Thanks!
Comment #10
Scott Reynolds commentedDo we really want to make every VARCHAR case sensitive ? Kindof a scary statement to me. I think it probably warrants investigation into how it affects other things.
My first reaction to this issue was to make the menu router and the page cache case sensitive, not all VARCHARs.
Comment #11
c960657 commentedI became aware that at least one place in core (user_load_multiple()) currently relies on LIKE being case-insensitive. With this patch, this is no longer the case. This can be fixed (I think) by adding
COLLATE utf8_general_cito mapConditionOperator() for LIKE, though this prevents index usage. But do we want to keep LIKE case-insensitive? If we make it case-sensitive, people needing to do a case-insensitive match will need to resort to e.g.LOWER(username) = LOWER('foo')- is that acceptable? If there is a great need to this, possibly the query builder could offer support for ILIKE.Also note that SQLite does not support proper case-insensitive comparison for strings containing non-ASCII characters.
Do we need to do something to catch old non-standard URLs and redirect them to the proper URL (e.g. from /nOdE/1 to /node/1) to prevent the D6->D7 upgrade from breaking existing URLs?
@Scott Reynolds: I agree that it's scary. But on PostgreSQL all VARCHARs are case-sensitive, so we have to move in that direction in order to keep consistency across databases.
Comment #12
Scott Reynolds commentedOk couple interesting questions that not prepared to answer myself at this time.
Ok so thats all I got, if that makes sense to anybody else then /me shakes ur hand. I will take a look at these issues this week and come up with answers. But if anyone else beats me too it plz post.
Comment #13
c960657 commentedUsernames are made case-insensitive using LOWER(), also on PostgreSQL, so you cannot have both Kathy and kathy. This behaviour is consistent between MySQL and PostgreSQL, and the patch doesn't change this.
I changed user_load_multiple() to use LOWER() like elsewhere in core. This will have a negative performance impact on MySQL. This may be reduced by #279851: Replace LOWER() with db_select() and LIKE() where possible. Autocomplete already uses LOWER(), so no performance impact there.
I also added some tests as requested, and removed the explicit use of mysql_type in locale.install.
Comment #14
dries commentedThe last version of this patch looks good and addresses my issues. I will commit it tomorrow if no one objects. Thanks.
Comment #16
c960657 commentedReroll.
Comment #17
dries commentedCommitted to CVS HEAD. Thanks!
Comment #18
catchSeems like I missed the boat on voicing objections, but I've got to re-open this:
It's not acceptable to use LOWER(foo) like we currently do - this is the cause of slow queries on a lot of Drupal sites and there's only two ways to fix it.
1. Add an extra column
--- previously this was the only cross-database compatible way to deal with this issue. You'll note that #279851: Replace LOWER() with db_select() and LIKE() where possible is a spin-off of at least 3 similar issues which tried to do the same thing - all have so far been rejected by Dries or stalled for months or years. I don't think that issue is ever going to be committed. If Dries is prepared to look at the approach on there and +1 it for at least users and taxonomy terms, then let's close this one again, but given previous feedback on adding extra columns I don't have high hopes.
2. Map LIKE() to case-insensitive operators (ILIKE in postgresql) - which was removed in this patch, and only made possible by dbtng.
--- doing it that way gives us indexable queries on MySQL - something we've never had before, without requiring more than a few lines additional logic in database drivers. It also isn't any penalty in other database systems. I much prefer this approach to #1 personally, this patch makes it impossible to achieve.
SQLITE not having case-insensitive LIKE() for non-ASCII characters is a red-herring, it doesn't do LOWER() for non-ascii characters either unless you load the icu extension. http://www.sqlite.org/lang_corefunc.html - so it's going to be equally broken either way.
We can fix the page cache issue without completely hamstringing our only real opportunity to remove a major cause of slow queries in core. I don't see why it's necessary to conflate the two especially when c960657 already picked up potential issues with it.
Comment #19
catchHere's at least one of the taxonomy issues which is now unfixable without an additional column #277209: Remove lower from taxonomy autocomplete
Comment #20
damien tournoud commentedOh my... this is a big change, nearly not tested. No input was taken from any of the database maintainers. I believe we should revert that, and think about it a little bit more.
Comment #21
catchHere's a patch to revert the change.
Comment #22
catchAnd a pending patch with tests which now fail due to the change - #343788: Taxonomy module doesn't use it's own APIs (yes we could use LOWER() there if we really have to, but can you tell I was hoping not...)
Comment #23
dries commentedI agree that we should revert this patch and discuss it more. It's broken indeed -- too bad the tests didn't catch this regression. The patch in #21 that reverts the change should be rolled back.
Comment #25
c960657 commentedUpdated patch for reverting the change.
Comment #26
catchSince this is a straight revert and Dries suggested rolling this back, marking to RTBC.
Comment #27
webchickOk, committed the rollback patch. Setting back to active, since it sounds like we're back to the drawing board on this. What a thorny issue. :\
Comment #28
Paul Natsuo Kishimoto commented#299433: Sort out query operator mapping may be a duplicate. It's older but has less discussion.
Comment #29
Crell commentedOh my, somehow this never ended up on my radar.
I agree with catch above that we need case-insensitive behavior more often than not. That's why the Postgres driver forces all built queries (we can't do anything about static queries) to be case-insensitive.
If we use a built query (don't we already?) does this issue go away?
I'm marking the other issue as a dupe, since it never got any traction.
Comment #30
sun.core commentedI guess this is D8 stuff now.
Comment #31
damien tournoud commentedOnly feature requests and tasks can be assigned to D8 for now, because the Drupal 8 tree is not opened yet.
This is marked as a critical bug, we should assess if we still consider it critical.
Comment #32
mustanggb commentedCould you force it to lowercase when caching then retrieve with a case insensitive LIKE statement
Comment #33
Bojhan commentedComment #34
damien tournoud commentedSee #31.
Comment #36
webchickI'm not sure how this could possibly be a critical bug given that 19 point releases of Drupal 6 have shipped with it.
Comment #37
pwolanin commentedIt seems like we never addressed the bug described in #4 - that the cache CID may be different for 2 pages that map to the same callback (node vs. Node). Should we open a new issue for that?
Also, in general it seems like we may have Drupal behaving inconsistently based on the database and its settings or collation. While in general URLs are case sensitive, should Drupal be explicit that path ($_GET['q']) handling is case insensitive? Perhaps within http://api.drupal.org/api/drupal/includes--bootstrap.inc/function/reques... we should lowercase it?
Comment #38
tim.plunkettEchoing #36, this has been the status quo for the entire lifetime of both current releases of Drupal.
Comment #46
eric_a commented