Problem/Motivation
It is impossible to convert condition('name', db_like($name), 'LIKE') in user_is_blocked. This curious construct (LIKE without wildcards) has replaced LOWER(name) = $name in #279851: Replace LOWER() with db_select() and LIKE() where possible. entityQuery does not mention case sensitivity and insensitivity.
Proposed resolution
What = means in DBTNG is not clarified anywhere. Neither is the above usage of LIKE. So,
- On
db_likewe documentLIKEbeing used as the case insensitive = operator. And add the (MySQL-specific)LIKE BINARYto the system. - Then we document on
QueryInterface::conditionthat the meaning of=depends on the field being binary or not. - Because the fields/properties in schema can be marked
BINARYbut there is no TypedData equivalent, add aBinaryStringItem. - Change file_managed.uri to use this. There's no other binary entity propery.
- Add a setting to the configurable text fields whether they are binary or not.
Remaining tasks
Do it.
User interface changes
None.
API changes
For MySQL users, there will be no changes, because MySQL is already doing case insensitive comparison for every non-binary field and sensitive comparison for binaries. We are merely documenting and clarifying this.
Related Issues
#279851: Replace LOWER() with db_select() and LIKE() where possible
#2073425: Make PostgreSQL conform to MySQL and use a case insensitive collation by default
Note that this is major because username comparison not being clear is a borderline security issue.
| Comment | File | Size | Author |
|---|---|---|---|
| #127 | entity-query-case-2068655-127-interdiff.txt | 1.73 KB | berdir |
| #127 | entity-query-case-2068655-127.patch | 34.12 KB | berdir |
| #124 | entity-query-case-2068655-123-interdiff.txt | 2.57 KB | berdir |
| #124 | entity-query-case-2068655-123.patch | 33.72 KB | berdir |
| #120 | entity-query-case-2068655-120-interdiff.txt | 11.7 KB | berdir |
Comments
Comment #1
chx commentedThe decision is this: we document that = is case insensitive for text and varchar fields and case sensitive for binary fields. What is binary and what is not, well, the respective TypedData object will need to define binary TRUE we think.
Comment #2
dawehnerI also prefer the operation solution than adding a new parameter to the condition() method to allow configuration.
Comment #2.0
chx commentedUpdated issue summary.
Comment #2.1
chx commentedUpdated issue summary.
Comment #2.2
chx commentedUpdated issue summary.
Comment #3
chx commentedComment #4
alexpottI've discussed this with @chx on IRC and concluded that from my point of view, the problem as described sounds significant, but is difficult/untestable due to mysql only testbots. And should be addressed before release - I think the status of major is correct. I believe that the only changes will be API additions.
Comment #5
chx commentedComment #6
chx commentedThis contains a small portion of #2044841: EFQ relationships broken for entity types with bundles otherwise the entity query test goes kaboom. Assigning to Crell to review the DBTNG changes and the addTables argument passing cos that's a bit of funny business but then again, addTables need to return two things and also the condition classes yield very well to this sort of passing.
Comment #7
berdirWrong id.
Comment #8
chx commentedfixed
Comment #10
berdirHm, BC mode getting in the way again?
(The bc decorator removal patch for nodes is green, which means this can go away soon! Please review)
Comment #11
chx commentedWell, we can add this temporarily.
Comment #13
chx commentedWhat about this?
Comment #15
chx commentedI meant this :)
Comment #17
chx commentedComment #18
Crell commentedIf I understand correctly, the exec summary here is "document what MySQL is already doing as the API, and then add code to make other DB backends behave like MySQL already is". Is that an accurate summary? If so, I'm OK with that in concept.
I see why this is being added. I'd really like to see if there's an alternative, though. This just feels wrong to me to expose an internal detail. I don't know what the alternative is off hand (after midnight, brain shut down), but I'd like to at least try to find one.
Being a DB-savvy guy, I get that "binary" in this case means "stored using a BLOB field in the DB so behaves in a case-sensitive fashion". To the the entire world not counting the dozen or so DB-sensitive people around, "Binary String" likely means "Legal values are 'Yes' and 'No'". We probably want a better name here. And then for the "binary" property, too.
My knee-jerk thought here is that you shouldn't make this a flag but an interface on the class. Then you can do an instanceof CaseSensitiveInterface or something.
Comment #19
chx commented> I see why this is being added. I'd really like to see if there's an alternative, though
Sure. We can add the escapeLike to SelectInterface and propagate up.
> To the the entire world not counting the dozen or so DB-sensitive people around
Hey! We are much more than a dozen! But you are right. I have renamed to 'case sensitive'.
> My knee-jerk thought here is that you shouldn't make this a flag but an interface on the class. Then you can do an instanceof CaseSensitiveInterface or something.
This a per-column property, not a per-class property so, sorry, we can't do that.
I have documented the behavior on QueryInterface::condition as well. It's really pretty... NOT. However, unlike in D7 where this was just floating in the air this is at least documented, no matter how ugly.
Comment #21
chx commentedI left out ConditionAggregate::translateCondition from the binary=>case sensitive rename.
Comment #22
chx commentedRehauled the massive wall of text on condition to be a bit more readable. I feel like we need to document on DBTNG as well but as the $operator of DBTNG condition() is not listed (as it can be anything really that SQL supports) I am a bit wary on that.
#2072467: Add a @defgroup typed_data and use it to document typed data concepts, definitions and group related classes and interfaces. has been filed to document Typed Data better, currently there's no place to put the warning that 'case sensitive' properties need to define 'binary' => TRUE in hook_schema.
Also, I think it's another followup is to add a field setting to the Text* configurable fields to be case sensitive. What a mess.
Comment #23
Crell commentedThanks, chx. I'm not RTBCing because I don't know EFQ well enough to vet those parts, but the DBTNG parts look good to me now.
One other nit:
$sqlQuery should be $sql_query. It's a local variable, not an object property. (This is the case in a few places.)
Comment #24
chx commentedRenamed $sql_query and documented sorts. Given that SQLite does not have Unicode collation tables and pdo_sqlite does not expose the create collation functionality, short of storing a unicode-lowercased version of every case insensitive string field, the only thing we can say about sorts on case insensitive strings is that the behavior is storage specific -- for SQLite it will be most of the time case sensitive, if you enforce the column to be nocase it will be case sensitive for non-ASCII characters only...
Comment #25
berdirNitpick review incoming...
Missing @inheritdoc.
Seems overly complicated, wouldn't "Escapes wildcard characters in a LIKE pattern" be enough? But I can see that you copied this from the other one, so fine with keeping it like that.
missing type on @return, should have an empty line above @see and missing () on escapeLike
Any specific reason that you changed the id but not the label?
Let's make this an @inheritdoc.
Does this need a , after "is a string"?
I think the use here is no longer used in the current patch?
Comment #26
chx commented> Missing @inheritdoc.
Like the rest of Select.php. When in Rome, do as the Romans do. We will surely fix this in one go.
Rest fixed.
Comment #27
Crell commentedComment #28
chx commentedI refuse to tell people they essentially can't sort on strings. Simply: get your database in order and get it to use a CI collation. SQLite is SOL, but I won't waste doxygen on replicating SQLite docs on the QueryInterface. PostgreSQL maintainers might need to raise the requirements to 9.1 for this but a brief distro checks shows that Ubuntu 12.04LTS, Debian 7.0, Fedora 16 (RHEL 7 will be 17 at least) all supporting it. #2073425: Make PostgreSQL conform to MySQL and use a case insensitive collation by default filed. MongoDB will do what needs to be done and store a lowercased version of string fields; some enterprising soul can do the same for SQLite in contrib or in core if we strongly desire conformity. If that's what it takes to get this in, I will write it once we have a SQLite bot.
Comment #28.0
chx commentedUpdated issue summary.
Comment #29
chx commentedHere's TextItem with case sensitive settings.
Comment #31
chx commentedComment #32
amateescu commentedShould we also handle case sensitivity for config entity query in this issue or should I open another one?
Try this code in the standard profile which should have a view named 'Content':
Comment #33
plachYou may want to change also the
$sqlConditionvariable to$sql_conditionfor consistency.This patch definitely needs a reroll :)
Comment #34
plachComment #35
fagoWe can do interface, as have classes for those properties also, e.g. lookup "string". You can access the object by doing $item->get('value'). Also, we could do a new data type in addition to "string".
One idea that came to my mind was going with "string" for the binary variant and "text" for the other one, as the collation enabled matching should apply to all written text, right? Does that make sense or does it sound too arbitrary? Of course it would have to be documented on the respective classes/interfaces.
"case sensitive" sounds wrong to me, as it's not only about the case, but the whole collation thing right?
Comment #35.0
fagoUpdated issue summary.
Comment #36
catch#2068329: Convert user SQL queries to the Entity Query API exposes this bug for non MySQL databases, so bumping to critical.
Comment #37
longwaveAttempted to reroll this, let's see how it goes.
Comment #39
longwaveComment #40
longwaveComment #42
longwaveComment #45
chx commentedComment #46
longwaveComment #47
jibran46: 2068655-typed-data-case-sensitive-46.patch queued for re-testing.
Comment #48
jibranno need to add &
Comment #50
martin107 commentedReroll.
Comment #51
martin107 commentedComment #53
martin107 commentedCorrected some flaws introduced by me in #50
OptionsFieldUITest now passes locally - lets see what testbot makes of the others.
Comment #55
martin107 commentedForumUninstallTest now works ( with more of my mangling removed )
Comment #57
chx commentedIt seems to me
$condition['case sensitive']is never filled in from the field definition and the only reason the normal entity query test doesn't freak out because the operator is never = or <> I put a breakpoint on those lines in Drupal\Core\Entity\Query\Sql\Condition and they are never hit.Comment #58
martin107 commented@chx Thanks that made me see my error!
In local tests exceptions removed.
PS applied fix from #48
Comment #59
chx commentedThanks so much! Could we add some tests please?
Comment #60
martin107 commentedComment #61
xjmComment #62
xjmPatch needs to be updated for #2247991: [May 27] Move all module code from …/lib/Drupal/… to …/src/… for PSR-4 (it also does not apply before that change).
Comment #63
catchComment #64
chx commentedErm, nope. Not even SQL, much postgresql specific.
Comment #65
fagoI'm not sure why this adds a separate field item for strings while it adds a setting for text items. I think we should the same for both, i.e. add a setting.
Comment #66
effulgentsia commentedComment #67
effulgentsia commentedComment #68
jacobsanfordAttempt at a reroll. I believe this preserves the intended functionality. Def needs review.
Comment #69
martin107 commentedtriggering testbot
Comment #71
martin107 commentedMy mistake see below
I can see files in the patch with was following structure .../core/lib/Drupal/Core/Database/Connection.phphere are the instructions for rerolling psr4 patcheshttps://groups.drupal.org/node/424758
Comment #72
berdirThose are Drupal\Core classes, they still use the lib folder
Comment #73
damiankloip commentedLet's see how this gets on.
Also, I think we might want to think about the fact that in field settings we are using 'case_sensitive' and in the query conditions we are using 'case sensitive'.
Comment #75
lokapujyaMay not be the right solution, but it should get the tests to run.
Comment #76
pounardI don't know if this is the right solution, but it could be written this way:
Which avoid calling getFieldDefinition() and its logic twice.
But that's a really tiny nitpick.
Comment #77
yched commentedNo opinion on the approach atm, just dumb code remarks :
This FieldItem class provides no schema() ?? This stores nothing ?
Settings now are defined in static methods, not in the annotation.
Debug leftover
Comment #80
berdirI have no idea if this will work properly, we'll see.
Comment #82
berdirCleanup and test fixes.
Comment #83
fabianx commentedThe patch matches the description in the issue summary.
Overall it looks great to me.
Question: That there have been missing escapeLike calls, would that have any consequences for Drupal 7, too? From this patch?
Comment #84
chx commentedThanks but still needs tests.
Comment #85
berdirFound two things that I think we can remove now. That should give us some more integration test coverage, for example for the binary flag of varchar fields.
Also added the setting and schema logic to the new string field types.
Comment #87
berdirFixing config schema, string field types don't have any schema, various other core field types are probably missing it too.
Comment #89
webflo commentedRerolled added the missing tests.
Comment #92
webflo commentedComment #93
webflo commentedComment #96
webflo commentedThe schema exception is trigged because StringLongItem inherits the max length setting from StringItem. Found another bug during debugging. The storage settings for StringItem are hidden because the method name is wrong. Filled #2380391: Fix storage settings StringLongItem to fix both bugs.
Comment #97
webflo commentedComment #98
webflo commentedYeah, the patch from #2380391: Fix storage settings StringLongItem got in and resolved the schema exception.
Comment #99
chx commentedI really, really like this patch; there is one problem: the entity queries use db_like. entity queries are not SQL specific while db_like is. Also LIKE is not a supported entity query operator -- perhaps we should add an exception for unsupported operators :/ And it shouldn't be necessary; the very point of the patch is (or should be!) that just
condition('field_cs', $fixtures[0]['uppercase'] . $fixtures[1]['lowercase'],does a case sensitive comparison becausefield_csis case sensitive.Comment #100
berdirRerolled, merged config schema and removed LIKE/db_like(). Tests still pass.
@chx: db_like() allows to escape part of the LIKE condition, you can say something like '%' . db_like($user_string) . '%fixedstring'. We have CONTAINS/STARTS_WITH and so on instead to support a limited subset of what you could do with a LIKE. I think those are already case insensitive automatically as the implementation uses LIKE there obviously, so we should be good? Can you confirm that?
Comment #101
xjmComment #102
fagoI've reviewed the patch. Imo, the approach is architecturally sound, I found a few glitches in the code though:
Missing docs.
Here it checks the field item property definition setting - what seems to be fine.
But here it checks the schema? It should check the field item property definition as well.
The langcode is not about display, but about query only right? So the docs are wrong.
This (wrongly) removes the unique key index for uri?
Comment #103
berdirAs discussed, some refactoring on how $case_sensitive is passed around. I also don't see how base fields worked at all...
Restored the unique column for files.
Comment #105
berdirAh, that actually hid the fact that file.uri did *not* work ;)
This fixes it and provides more test coverage, also fixed the other tests, just using a different variable name to avoid the conflict there, don't want to go into details there.
Comment #106
plachJust a quick look:
Sorry if this has just been discussed, but why doesn't
$case_sensitivedefault to FALSE here?Is this being used? Are we missing test coverage?
Cannot tell from the patch: are these two branches handling single column field vs multiple column fields? And if so are we supporting case sensitivity for all columns in the IF branch?
Comment does not wrap correctly.
Comment #107
berdir1. Because it's an internal method that we only call in one place, so it doesn't really matter :) And there we have to define it first anyway.
2. The $case_sensitive = FALSE is a left-over from when there was no default value.
3. As discusses, yes, right now, entity query only supports querying base fields on the main property, opened: #2391217: Support base fields with multiple columns in entity queries
4. Fixed.
Comment #108
chx commentedI thought I posted this already? Where's my comment?? So the problem is that CONTAINS/STARTS_WITH/ENDS_WITH does not obey the case sensitive setting. We can fix that using
LIKE BINARYfrom MySQL which maps to a simpleLIKEin PostgreSQL.Comment #109
berdirOk, this was fun.
So I was writing tests for STARTS_WITH, ENDS_WITH and CONTAINS, and everything passed except one CONTAINS check.
So I started debugging it and found three nested bugs why it never worked in the way we think it did:
1. $propertyConditions only gets loaded when you write something like fieldname.value, so it was never loaded at least in the test and we never updated $case_sensitive.
2. Then, we checked the setting on the property definition, but we set it only on the field. So I changed the field types to pass through the setting to the relevant property.
3. Third, the translateConditions() method had a case '=', but $operator can be NULL at that point if it was not provided, so we never went into that condition unless you manually pass '=' in.
So why did the test pass you ask? Because we set binary on the schema, and then MySQL does it the way we expect to automatically, assuming you use a case insensitive collation (and if you do not, then all of this is also bogus, because it will always be case sensitive).
I also updated the string and text field types as they haven't been updated to the format changes:
- Removed the option from TextItemLong and summary as a blob/case sensitive storage with a text format seems highly unlikely.
- Moved the default setting to StringItemBase
I also updated translateConditions() a bit, we always need to call escapeLike() there.
PS: Oh, btw, my test only failed because I had a typo in the field name and use the wrong field in one case. So that's the only reason I found all this.
Comment #110
berdirUntested test-only patch.
Comment #111
yched commentedMoving this setting to StringItemBase means UriItem field type now has a 'case_sensitive' setting too ?
- That does not seem to make real sense for the uri field type, does it ?
- I don't see the corresponding config schema entry for field.storage_settings.uri ?
Comment #113
berdirUriItem actually extends from StringItem, so it already have, but I did remove the default setting. The schema is there, as discussed.
Comment #114
chx commentedDoxygen'd
LIKE BINARY.Comment #115
chx commentedComment #117
fago:D
Reviewed it. Patch looks good in general.
That's quite confusing if you discover it. Can I now control (=pass) it or not? Maybe we can clarify better that the caller cannot control whether something is to be queried case sensitive or not.
There is, so that's deprecated?
Should be deprecated? If not, needs a comment why it's needed + docs.
Deprecated addition - no text field support any more? (+1 on that)
Making this public worries me a little bit, as it might make future changes more problematic. But in the end it should be about the same as a protected method as it's not in the interface, so ok.
Is there documentation for the added parameter somewhere?
Oh it's in the interface - although it is supposed to be internal?
Sounds good, but as berdir explained to me that depends on the schema to work. So that needs to be documented here then. This sounds like a flaw in the database abstraction layer to me as it depends on the database (mysql/postgre) on whether like is case-sensitive or not. If so, shouldn't this be fixed in the database layer?
Maybe we can do the docblock only once and point to the single one from here instead?
Comment #120
berdirMore fun when debugging this. The condition for $case_sensitive was the wrong way round, so it was still not correct ;)
1. Discussed with alexpott, I think we found a better API. There is a separate method now that you can call. I also made it TRUE/FALSE/NULL, so that we only change it to LIKE if we know that the field type is explicitly not case sensitive (so that we don't use LIKE on integer fields ;))
2./3.: That's why the patch is removing those ;)
4. Yeah, I left it there but agreed that we don't need it either.
5. Yes, we could also make it a trait, that's a bit more code that needs to change ;)
6. It's the same parameter that you reviewed in 1, which is now gone.
7. Still the same thing as 1. What did you do exactly ;p
8. Updated the documentation as discussed. I don't know what else we can do and @alexpott is OK with this.
Note that the interdiff is against #113, but I did include @chx's addition for LIKE BINARY in the new documentation.
Comment #121
fagoThis looks great to me - this should be ready given it comes back green. Just found a small issue:
This really is bool|null.
Comment #123
claudiu.cristeaVery, very small nitpick: "if the field is ..." with capitalized "I".
Comment #124
berdirRemoved the setting from the migration again.
Comment #125
plachLet's do this
Comment #126
alexpottThis looks great. Just a very small thing that can be improved with docs.
I think we should have a bit more documentation for why there is no else for to cover the case sensitive code path. It took me a minute or two to get it.
Comment #127
berdirThis is the best I could come up with. Wasn't sure about documenting the other cases, but I think they're pretty clear.
Also removed the weird case '' and added a @see for the new LIKE/LIKE BINARY docs.
Comment #128
plachLooks good to me
Comment #129
fagoYep, changes look good - thanks.
Comment #130
alexpottThanks for the docs improvements. This issue addresses a critical bug and is allowed per https://www.drupal.org/core/beta-changes. Committed 88a46ae and pushed to 8.0.x. Thanks!