LOWER(name) queries perform badly; add column name_lower and index. [#83738]

Comment	File	Size	Author
#197	lower.patch	5.67 KB	ThomasH

#187	lower_with_sync-73738-186.patch	16.04 KB	chx

#186	lower_with_sync-73738-185.patch	15.99 KB	chx

#184	lower_with_sync-73738-184.patch	16 KB	chx

#183	lower_with_sync-73738-183.patch	15.18 KB	chx

#182	lower_with_sync-73738-182.patch	15.18 KB	chx

#179	lower_with_sync-73738-178.patch	15.18 KB	chx

#178	lower_with_sync-73738-178.patch	15.57 KB	chx

#177	lower_with_sync-73738-177.patch	15.26 KB	chx

#170	lower_with_sync.200711151000UTC.patch	16.22 KB	sammys

#164	lower_with_sync.patch	15.07 KB	chx

#149	name_lower-0.16.patch	8.71 KB	hswong3i

#148	name_lower-0.15.patch	5.48 KB	hswong3i

#124	lower-bench.txt	2.14 KB	JirkaRybka
#123	lower.patch	18.71 KB	John Morahan

#119	lower.patch	16.85 KB	chx

#118	lower.patch	18.62 KB	John Morahan

#116	lower.patch	16.33 KB	chx

#109	lower.patch	18.03 KB	John Morahan

#108	lower.patch	18.03 KB	John Morahan

#102	lower_8.patch	17.89 KB	John Morahan

#101	lower_7.patch	17.13 KB	John Morahan

#99	lower20071101jv.patch	17.02 KB	jvandyk

#96	lower_6.patch	17.12 KB	John Morahan

#95	lower_5.patch	15.68 KB	John Morahan

#93	lower_4.patch	15.18 KB	John Morahan

#77	lower_3.patch	13.47 KB	chx

#75	lower_2.patch	12.91 KB	chx

#74	lower_1.patch	10.93 KB	chx

#73	lower_0.patch	10.71 KB	chx

#47	lower.patch	4.96 KB	chx

#24	name_lower_20060916_0118_patch.txt	5.03 KB	sammys
	name_lower.patch	5.06 KB	robertDouglass

Comment #1

chx CreditAttribution: chx commented 12 September 2006 at 14:03

Version:	4.7.3	» x.y.z
Priority:	Normal	» Critical
Status:	Active	» Needs review

This is very true. Taxonomy should follow. I can attest that the slowest query (far surpassing search and all) at a huge site I run is the LOWER(name) query of taxonomy.

Log in or register to post comments

Comment #2

chx CreditAttribution: chx commented 12 September 2006 at 14:05

Why critical? Two reasons. One, the autocomplete Robert mentions and the other is the taxonomy that should follow -- have you ever tried taxonomy pages when your table has hundreds of thousands of terms?

Log in or register to post comments

Comment #3

Dries CreditAttribution: Dries commented 12 September 2006 at 14:45

This is not critical, IMO. Also, the proposed solution is a hack, imo.

Log in or register to post comments

Comment #4

robertDouglass CreditAttribution: robertDouglass commented 12 September 2006 at 14:51

I'd be thrilled with an alternate solution. But if we cannot get the same performance gain any other way, I am fully comfortable with the proposed solution as well. What are our alternatives?

Log in or register to post comments

Comment #5

Dries CreditAttribution: Dries commented 12 September 2006 at 14:52

By default, MySQL searches are not case sensitive so the lower()s could be removed. Unfortunately, PostgreSQL is case sensitive so it can't be removed. Maybe there is a clean way to make PostgreSQL not case sensitive?

Log in or register to post comments

Comment #6

Dries CreditAttribution: Dries commented 12 September 2006 at 14:56

Crazy idea (not tested): can we create an index on lower(field)?

CREATE INDEX my_index ON my_table (LOWER(name));

Log in or register to post comments

Comment #7

robertDouglass CreditAttribution: robertDouglass commented 12 September 2006 at 14:59

I'll give it a try =)

Log in or register to post comments

Comment #8

robertDouglass CreditAttribution: robertDouglass commented 12 September 2006 at 15:12

That would be for Postgres, right? (guess I should learn how to use it) It doesn't seem like it would do any good to have the index be lowercase since MySQL can't use the index to begin with. And the goal of creating the index that way would have to be to force Postgres into behaving like MySQL (ie case insensitive search). It seems like the best solution would be if Postgres had some setting that would configure the behavior of their string comparison operator. Lacking that, I don't see another solution (other than whipping out the database-specific queries... yuck).

Log in or register to post comments

Comment #9

robertDouglass CreditAttribution: robertDouglass commented 12 September 2006 at 15:14

Just to clarify the problem once more: any time we write SQL that DOES_SOMETHING(column-name), eg. LOWER(name), there is no chance for existing indexes of any nature to be used. Forming the index differently won't correct the problem, only taking the function off of the column name can help.

Log in or register to post comments

Comment #10

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 12 September 2006 at 15:32

Not sure...

CREATE INDEX my_index ON my_table (LOWER(name));

My mad DB skillz are a bit rusty, but it strikes me that even if one or two oddball DBMS systems support something like that, it's going to be the exception rather than the rule. This is especially important now that people are active developing Oracle and MS SQL abstraction layers for Drupal.

An extra column, on the other hand, is something that can be used across the board.

Log in or register to post comments

Comment #11

Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commented 12 September 2006 at 16:17

Just a comment: I don't think that adding a lowered version of the column is a particularly bad hack. it is about similiarly hacky as the introduction of the node_comment_statistics table.

Log in or register to post comments

Comment #12

dww

we/he/they

CreditAttribution: dww commented 12 September 2006 at 18:09

we're talking 4 or 5 orders of magnitude faster, here. ;) i've seen far worse hacks for far less benefit. i agree w/ killes, this doesn't strike me as particularly hacky at all, and storing an already-LOWER() copy of name seems much more portable than playing with various DB-specific hacks.

that said, a little bit of info from the pgsql docs and google results:

http://archives.postgresql.org/pgsql-php/2003-05/msg00044.php
claims that the following works for exactly this sort of problem:
CREATE INDEX id_lower_content ON mytable(lower(column_name))
for more on this, check out: http://www.pgadmin.org/docs/dev/pg/indexes-expressional.html
(less promising) http://archives.postgresql.org/pgsql-php/2002-03/msg00031.php
suggests another way to do it, but i suspect the performance will be crappy:
SELECT * FROM table WHERE field ~* 'searchterm' AND ....
i think that's just another way of doing ILIKE the case-insensitive version of LIKE, that some other thread mentioned was about 10 times slower than LIKE

i'll leave it upto the core committers to argue about portability vs. simplicity vs. hackery. personally, i think storing the LOWER(name) as a seperate column is just fine, and will definitely be the easiest to port (if slightly inefficient w/ space in the DB).

Log in or register to post comments

Comment #13

m3avrck CreditAttribution: m3avrck commented 12 September 2006 at 18:14

What if we had a global variable ... kind of like $connection, but with the $db_type (or do we have this? I forget):

if mysql:
select * from users WHERE name = 'bob'

else if pgsql:
select * from users WHERE lower(name) == lower('bob')

Wouldn't that be the cleanest way?

Log in or register to post comments

Comment #14

chx CreditAttribution: chx commented 12 September 2006 at 18:23

my 2cents are on the separate column. you are heading in the a very dark woods with that if $db_type -- what happens if someone else wants to write a similar query ? Multiple that if? brrrr.

Log in or register to post comments

Comment #15

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 12 September 2006 at 18:24

The problem with that approach is that core can conceivably have many more than just mysql and postgresql supported. There are patches in the queue to add Oracle and MS SQL support, for example. And only a matter of time before someone does an SQLite port, or a Sybase port, or a DB2, etc. That kind of database-specific code will get unmanagable really, really fast.

Log in or register to post comments

Comment #16

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 12 September 2006 at 18:26

Also, then postgres and other database systems don't benefit from the incredible performance benefit of not having to lowercase the fields.

So yep, duplicate column still seems the best way to go for now. :)

Log in or register to post comments

Comment #17

m3avrck CreditAttribution: m3avrck commented 12 September 2006 at 18:27

I concur about the if $db_type, just playing devils advocate there.

Way back when I talked with Robert about this I actually suggested the extra "lower" column in the first place as probably the best solution. And I think it still stands.

Still doesn't hurt to throw out bad suggestions ... makes the best solution look all that much better :-p

I still agree with my original thought (and the path this patch took), have a separate "lower" column.

Log in or register to post comments

Comment #18

jvandyk CreditAttribution: jvandyk commented 12 September 2006 at 18:49

I took the extra column approach a while ago with a very popular site. I think the payoff is worth the extra column.

Log in or register to post comments

Comment #19

hyperlogos CreditAttribution: hyperlogos commented 12 September 2006 at 20:07

Just in case anyone needed further confirmation, I just applied this patch, and now autocomplete is actually usable. (I'm on a shared host, so it was pretty useless before.)

Log in or register to post comments

Comment #20

dww

we/he/they

CreditAttribution: dww commented 12 September 2006 at 21:14

Status:

Needs review

» Needs work

finally had a chance to look closely at the patch itself. ;) mostly, it seems fine on a quick skim (i haven't tried testing it personally). however, system_update_183() is obviously missing the pgsql cases, so this needs work before it could be committed...

Log in or register to post comments

Comment #21

robertDouglass CreditAttribution: robertDouglass commented 12 September 2006 at 21:50

@dww - did you test on HEAD or 4.7? I rolled it against 4.7 and haven't had time to apply it to HEAD.

Log in or register to post comments

Comment #22

dww

we/he/they

CreditAttribution: dww commented 12 September 2006 at 22:18

i didn't test at all. i just visually inspected that there's no 'pgsql' case in system_update_183() in http://drupal.org/files/issues/name_lower.patch

speaking of which, if this issue is targetting the trunk (as the version currently states), that update # needs to be changed, too...

Log in or register to post comments

Comment #23

sammys CreditAttribution: sammys commented 13 September 2006 at 00:10

-1 to the patch as it can't be tested on pgsql. Time for someone to practice some pgsql syntax! :)

Other than that, IMHO the concept is good and the extra 10% of space for 4 orders of magnitude more speed is worth it in core.

Sammy Spets
Synerger
http://synerger.com

Log in or register to post comments

Comment #24

sammys CreditAttribution: sammys commented 15 September 2006 at 15:20

Status:

Needs work

» Needs review

File	Size
name_lower_20060916_0118_patch.txt	5.03 KB

patch including pgsql update code for 4.7.

Log in or register to post comments

Comment #25

robertDouglass CreditAttribution: robertDouglass commented 15 September 2006 at 15:34

This is getting rolled out on a client site today, so we'll have some reports from the wild for you soon.

Log in or register to post comments

Comment #26

Zen CreditAttribution: Zen commented 15 September 2006 at 18:50

Or you could just get rid of the "preserve case" feature and store the usernames in lower case right from the get-go? Capitalisation or w/e can be done in the presentation layer.. The only people who'll get ruffled by this are those with 1337 nicks like "deEp ThRo4T" or "Marquis de Sade" ..

Unless I'm missing the big picture here, this is a minor feature that we can just nix..

Just a thought.
-K

Log in or register to post comments

Comment #27

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 15 September 2006 at 18:53

@Zen: I dislike that idea. It goes against Drupal's philosophy of not mangling user input.

I should be able to have the nick "webchick" and someone else should be able to have the nick "Joan of Arc" -- no amount of logic on the theme layer is going to allow both of those things to happen.

Log in or register to post comments

Comment #28

Zen CreditAttribution: Zen commented 15 September 2006 at 18:58

Also, if this feature needs to be retained, it might be more elegant to use PgSQL's operator overloading feature to accomplish this. Not sure if it's possible, but I'll try and ask Cvbge about it. The separate column approach is IMHO rather tacky.

Cheers,
-K

Log in or register to post comments

Comment #29

robertDouglass CreditAttribution: robertDouglass commented 15 September 2006 at 22:18

It would be good to research some of the other rdbms solutions out there as well and see what issues we might run into if we do an Oracle or MSSQL port. The one shining advantage of the name_lower approach is that we can eat our cake and port it too. It is absolutely guaranteed to run the same on any database we use.

I'm not against a db specific approach, especially if we can apply it to the other major systems I mentioned. But the "hackishness" or "tackiness" or "this hurts my normalized eyes-ness" of the patch shouldn't prevent us from harvesting the performance gain. One way or another, this bottleneck needs to be resolved.

Log in or register to post comments

Comment #30

drumm

he/him

NY, US

CreditAttribution: drumm commented 21 September 2006 at 15:44

Version:

x.y.z

» 4.7.3

Looks like a 4.7 patch.

Log in or register to post comments

Comment #31

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 24 September 2006 at 12:23

Version:

4.7.3

» x.y.z

The issue still exists in HEAD:

Log in or register to post comments

Comment #32

dww

we/he/they

CreditAttribution: dww commented 24 September 2006 at 12:52

i think neil's point is that the performance implications of this fix are so enormous that we could consider this a bug that deserves to be fixed in 4.7, in addition to being fixed in the trunk. i'm not going to move the version again, but i'd vote for fixing this in both branches.

Log in or register to post comments

Comment #33

pwolanin CreditAttribution: pwolanin commented 24 September 2006 at 14:41

(subscribing) I'd wondered why the autocomplete lookup of usernames was so painfully slow. I'll try to test this patch on a copy of my 4.7 site.

Log in or register to post comments

Comment #34

pwolanin CreditAttribution: pwolanin commented 24 September 2006 at 15:44

With this patch in place, username autocomplete is remarkably faster (qualitatively).

Log in or register to post comments

Comment #35

gopherspidey CreditAttribution: gopherspidey commented 24 September 2006 at 17:36

Well at least for postgres, it looks like comment #6 is the correct way to help list problem. According to the postgres manual.

http://www.postgresql.org/docs/8.1/interactive/indexes-expressional.html

Since mysql is by default case insensitive, and does not seem to allow indexs on SQL functions. A hack around the problem, we could just remove the "LOWER()" from the where clauses in select statements before we make the query.

Log in or register to post comments

Comment #36

dww

we/he/they

CreditAttribution: dww commented 24 September 2006 at 23:32

re: pgsql -- yeah, as i linked in #12. however, instead of trying to find DB-specific code for this on every DB we currently (and in the future, might) support, we could just add the extra field. it's simple, it'll always work, we'll never have to port it, and it's 4 or 5 orders of magnitude faster than what we have now.

Log in or register to post comments

Comment #37

JohnAlbin

he/they/他

English

Taipei, Taiwan

CreditAttribution: JohnAlbin commented 1 October 2006 at 23:34

No one seems to be addressing Robert's comment about spoofing addresses.

The motivation for using LOWER is that usernames should be case insensitive [...] to prevent people from spoofing other users' identies (you're "Big_chief" on your site and someone registers as "Big_Chief" and starts acting like the site owner).

If you look at this comment (http://drupal.org/node/56487#comment-107050) on login spoofing:

AIM (AOL Instant Messenger) [...] allows spaces in logins, but "folds" them, so an AIM login of "JimBob32" is equivalent to "Jim Bob 32" is equivalent to "J i m B o b 3 2".

..we could extend the "second column idea" to be both index-able and less spoof-able.

So this indexable field would be created by lowercasing name and stripping all non-alphanumeric characters.

Ideally, this new value could be placed in name, with a new field called display_name which has the capitalization, spacing and punctuation the user prefers to be displayed in comments, etc. (As a bonus, you could allow apostrophes in the the display name for those named O'Brian). And since the display_name can be "folded" to get the name, you could still use the display name in logins and Drupal Authentication.

Of course, this would mean the ubiquitous $user->name would need to be changed to $user->display_name. And would affect user login validation and a slew of other components.

I don't know if this is feasible, but I thought I would make the suggestion before we implement a second column.

Log in or register to post comments

Comment #38

pwolanin CreditAttribution: pwolanin commented 2 October 2006 at 00:58

That is certainly a valuable suggestion, but I think that changing the meaning of $user->name is unlikely to find much support (at least prior to Drupal 6.0) due to the widespread breakage that would ensue. For now, it seems it would be pretty trivial to strip the whitespace, etc. from the entry in the name_lower column and that would make the system more robust.

Log in or register to post comments

Comment #39

robertDouglass CreditAttribution: robertDouglass commented 2 October 2006 at 08:21

Nice suggestions. I think we'll most likely keep $user->name (as the display name), but I can definitely see stripping white space and non-alphnumeric chars from the name_lower column. This, like you said, would make it harder to spoof user names and easier to customize your own user name.

This is for a later patch, however, because there is still a lot of resistance to the idea of having a second column for lookups to begin with.

Log in or register to post comments

Comment #40

Hayakawa CreditAttribution: Hayakawa commented 7 September 2010 at 13:07

Version:	7.x-dev	» x.y.z
Component:	database system	» user system
Status:	Closed (duplicate)	» Needs review

i was really lost in the comments. is this final version of patch after the suggestions? and are there any bottlenecks or imcompatibilities caused by this patch?
thank you very much

Log in or register to post comments

Comment #41

robertDouglass CreditAttribution: robertDouglass commented 2 October 2006 at 18:21

@Hayakawa: this patch is ready to go, if it is decided that it is an approach that we want to pursue. If someone comes up with a different approach that people feel is a better architecture, it might win out. So far, many prominent developers have expressed support for the patch, and a couple have expressed misgivings. What that means is that a better solution is hoped for. But if you want to use the patch, I feel confident that it will work well for you. It is being used by the biggest client that I work for on their production site and has caused the performance gains that are hoped for.

Log in or register to post comments

Comment #42

JohnAlbin

he/they/他

English

Taipei, Taiwan

CreditAttribution: JohnAlbin commented 2 October 2006 at 19:04

If Robert's latest patch can be finalized and put into Drupal 5.0 (or even 4.7), I would recommend that we do it. It provides (right now) a substantial performance improvement.

And for those developers who are reticent to add a new column simply to get a performance gain, think of it this way...

In Drupal 6.0, the new column could be a trifecta for these issues:

username lookup performance
prevention of much username spoofing
better display name capabilities (allowing apostrophes, ampersands, commas, etc.)

Log in or register to post comments

Comment #43

Steven CreditAttribution: Steven commented 3 October 2006 at 14:13

Stripping the lower column of whitespace is a neat idea, but could break existing sites. It would be cool if we could add this for new sites though.

One possible approach would be to only use literal lowercasing in system_update_N() to create the column, but to insert the stripped version for any new users from then on. Their usernames will be checked with the strict restrictions too.

That way we don't have to provide an option and we don't break any site.

Log in or register to post comments

Comment #44

pwolanin CreditAttribution: pwolanin commented 3 October 2006 at 14:43

@Steven - I'm not sure that would work, since you need a apply a function to lower case and strip whitespace before you run the query. If you don't know in advance whether or not to strip, that won't work.

Is it possible to provide a pre-update tool to check whether there will be conflicting usernames after the whitespace stripping? Or is there any way to provide an interactive process during the update?

Log in or register to post comments

Comment #45

sammys CreditAttribution: sammys commented 4 October 2006 at 05:12

FYI: Correct syntax for creating pgsql indexes for drupal tables is:

CREATE INDEX {table}_fieldname_idx ON {table} (fieldname)

Log in or register to post comments

Comment #46

moshe weitzman CreditAttribution: moshe weitzman commented 5 October 2006 at 22:19

Status:

Needs review

» Needs work

needs a 5.0 port.

but further, this is one of those updates which is difficult in our update syste, once you update your code, you can't login. we handled that at one time by having people manually update their sessions table. might be needed here too.

Log in or register to post comments

Comment #47

chx CreditAttribution: chx commented 6 October 2006 at 00:02

Status:

Needs work

» Needs review

File	Size
lower.patch	4.96 KB

I rerolled for HEAD, was surprisingly easy, only hunk failed for user.module.

Why you can't login? I do not see it.

Log in or register to post comments

Comment #48

moshe weitzman CreditAttribution: moshe weitzman commented 6 October 2006 at 01:03

Status:

Needs review

» Reviewed & tested by the community

sorry, i was unclear. if a user is not logged as uid=1 after updating source, he is unable to login and perform update because the new column doesn't exist. however, i tested setting $access_check = FALSE in update.php and that works so is a good workaround for this situation. we'll just have to document this in the upgrade notes for 5.0

i tested chx' recent patch for functionality and update path. works well. rtbc.

Log in or register to post comments

Comment #49

Dries CreditAttribution: Dries commented 8 October 2006 at 06:30

Assigned:	Unassigned	» Dries
Category:	bug	» task
Priority:	Critical	» Normal

I still think this is not the cleanest approach. Also, I am going to benchmark this before it can get committed. I'm tempted to postpone this, as I think we might be able to do a better job with this ...

Changing the status from 'critical bug report' to 'normal task'. This doesn't break your website and therefore isn't critcal.

Log in or register to post comments

Comment #50

Dries CreditAttribution: Dries commented 8 October 2006 at 06:31

Status:

Reviewed & tested by the community

» Needs work

Also changing the status to code needs work. Will update the status once benchmarked.

Log in or register to post comments

Comment #51

robertDouglass CreditAttribution: robertDouglass commented 8 October 2006 at 07:01

Let me know if there is a way I can help with the benchmarking.

Log in or register to post comments

Comment #52

coreb CreditAttribution: coreb commented 30 November 2006 at 02:22

Version:

x.y.z

» 5.x-dev

Moving out of the "x.y.z" queue to a real queue.

It doesn't appear to be a new feature, so I'm putting it in 5.x-dev.

Log in or register to post comments

Comment #53

m3avrck CreditAttribution: m3avrck commented 7 January 2007 at 07:30

Let's bring this back into the light -- lower() sucks :-)

Log in or register to post comments

Comment #54

Gary Feldman CreditAttribution: Gary Feldman commented 18 January 2007 at 07:02

I just came across a problem where users can't login, because the clause "LOWER(name) = LOWER('The Users Name') was failing when the name was in mixed case (on MySQL 4.1, Drupal 4.7.5). This seems to be related to this thread. I fixed it by simply removing the lower (for now) in user_load, trusting that the unique key will prevent duplicates that differ in case.

It seems incredible to me that a problem this severe could exist for this long (though given the number of keyboards without shift keys, perhaps it's not so incredible) . So I'm willing to assume for the moment that I'm missing something. If so, I'd appreciate it if someone could let me in on the secret. If not, should I raise the priority of this or submit it as a separate bug?

Log in or register to post comments

Comment #55

Gary Feldman CreditAttribution: Gary Feldman commented 18 January 2007 at 07:56

Removing the LOWER made the user id case sensitive. So I changed it to

   ...CONVERT($key using utf8) = '%s'";

This is specific to MySql 4.1.1 and higher.

The way I read the MySQL docs, the existing Drupal code using LOWER would work on previous versions of MySQL. Furthermore, I think it would work on a clean install of Drupal with later versions of MySQL, because the clean install sets users to be a varchar. It's the upgrade process that converts varchar columns to varbinary, for reasons I have yet to discern.

Log in or register to post comments

Comment #56

JohnAlbin

he/they/他

English

Taipei, Taiwan

CreditAttribution: JohnAlbin commented 18 January 2007 at 17:33

Version:

5.x-dev

» 6.x-dev

Hi Gary,

This thread is about a proposed performance improvement. And while the patch does remove the LOWER() MySQL function call, it's really unrelated to the bug you are experiencing.

Please, submit a new bug so that we can tackle both of these issues separately.

Thanks!

Log in or register to post comments

Comment #57

quicksketch

he/him

CreditAttribution: quicksketch commented 16 April 2007 at 03:26

Revisiting Dries's idea of making pgSQL case insensitive. It's possible to override operators in PostGreSQL (so I've heard). If it would be possible to make the = operator case insensitive for varchar comparisions, maybe that would be better solution as it would work for all database columns and keep the entire problem contained in the pgsql database implementation.

From http://www.houseoffusion.com/groups/cf-talk/thread.cfm/threadid:49827:

Some operations on data can be case insensitive. So your solution is not case insensitive data, your
solution is a case insensitve operator. Luckily it is very easy to define your own operators in
PostgreSQL. For instance, this defines case-insensitive equality and inequality operators for text
datatypes:

CREATE FUNCTION case_insensitive_equality(text, text) RETURNS boolean
    AS 'SELECT LOWER($1) = LOWER($2)'
    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;
CREATE FUNCTION case_insensitive_inequality(text, text) RETURNS boolean
    AS 'SELECT LOWER($1) <> LOWER($2)'

    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;
CREATE OPERATOR === (
    leftarg = text,
    rightarg = text,
    procedure = case_insensitive_equality,
    commutator = ===,
    negator= <=>
);
CREATE OPERATOR <=> (
    leftarg = text,
    rightarg = text,
    procedure = case_insensitive_inequality,

    commutator = <=>,
    negator= ===
);

I would recommend against overwriting the current = and <> operators with these operators because I
have no idea what the side-effects may be and I don't think they are indexable.

So it looks like we have some questions to answer here. They might not indexable, but surely this will offer a performance improvement over the LOWER() method currently in use which kills the indexes anyway for all database types.

Log in or register to post comments

Comment #58

dww

we/he/they

CreditAttribution: dww commented 16 April 2007 at 03:39

They might not indexable, but surely this will offer a performance improvement over the LOWER() method currently in use

huh? ;) read the definition of the === operator you just pasted above:

SELECT LOWER($1) = LOWER($2)

by defining your own case-insensitive equality operator, you're just hiding the LOWER() calls, not removing them.

Log in or register to post comments

Comment #59

kbahey CreditAttribution: kbahey commented 16 April 2007 at 04:45

@robertDouglass

Are you sure the query cache is not a factor in the improved results at the top of this page? Try restarting MySQL between tests and see.

@All
This approach is one of the no-nos in database design. The reason is that you have data redundancy and one field is a product of the other in the same row. Things get out of hand if one part of the application would update one field but forget to update the other, and then you have data integrity issues and all sorts of problems.

This may not be such an issue if we only have one place for the INSERT and one place for the UPDATE of the name. But if we have more then we are just opening ourselves to trouble in the future.

One approach that allows a new column, yet ensure its integrity, is if we had triggers: this would automatically update the name_lower field every time we have an insert/update to the row, and would be transparent to the application. But these are only available in MySQL 5.0.2 onwards.

The other option to solve this is to get rid of the LOWER()s in SQL altogether. The database is case insensitive anyway, and uniqueness is guaranteed since we have a unique index on name. We can do the lower part in PHP on the name before we pass it to db_query().

Log in or register to post comments

Comment #60

kbahey CreditAttribution: kbahey commented 16 April 2007 at 04:50

I mean instead of this:

db_query("SELECT uid FROM {users} WHERE LOWER(name) = LOWER('%s')", $name);

We do something like this:

db_query("SELECT uid FROM {users} WHERE name = '%s'", strtolower($name));

Can someone benchmark this?

Log in or register to post comments

Comment #61

m3avrck CreditAttribution: m3avrck commented 16 April 2007 at 04:50

I agree with Khalid. Do this at the PHP level with strtolower(), the database should not store 2 representations of the same thing, it's going to cause problems eventually.

Log in or register to post comments

Comment #62

quicksketch

he/him

CreditAttribution: quicksketch commented 16 April 2007 at 04:56

by defining your own case-insensitive equality operator, you're just hiding the LOWER() calls, not removing them.

Yeah you're right, I should read code more carefully. Shoot, another option down the drain :(

Log in or register to post comments

Comment #63

quicksketch

he/him

CreditAttribution: quicksketch commented 16 April 2007 at 05:02

The problem with kbahey's approach is that the user table stores the data including mixed case usernames. Using strtolower() in PHP and then searching will cause any username containing an uppercase letter to fail when using pgSQL whose comparisons are case sensitive, the original cause of all the trouble in this issue.

Log in or register to post comments

Comment #64

kbahey CreditAttribution: kbahey commented 16 April 2007 at 13:25

But Robert's original test and patch was on MySQL.

The motivation was bad performance on MySQL because of the lack of using the index since the name was wrapped in LOWER().

Log in or register to post comments

Comment #65

pwolanin CreditAttribution: pwolanin commented 16 April 2007 at 14:52

Until reliable triggers are available in MYSQL, it seems very hard to think of a solution other than storing the lower-cased username as another column. While this may not be a great practice, it's database-agnostic.

Even if Drupal was pushed in MySQL 5.0, it's not clear that most users would be able to create a TRIGGER:

CREATE TRIGGER was added in MySQL 5.0.2. Currently, its use requires the SUPER privilege.

http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html

Log in or register to post comments

Comment #66

robertDouglass CreditAttribution: robertDouglass commented 16 April 2007 at 21:45

I'm still in favor of the extra column, but rerolled to remove whitespace as was suggested earlier in the thread. Anybody is welcome to beat me to it, since I won't be doing it this week.

Log in or register to post comments

Comment #67

pwolanin CreditAttribution: pwolanin commented 16 April 2007 at 22:39

How do you deal with the update path then? by removing the white space, previously valid usernames may collide.

Log in or register to post comments

Comment #68

kbahey CreditAttribution: kbahey commented 18 April 2007 at 04:50

I still don't like an extra derived column. It is not good practice and a whole can of worms.

Here are some benchmarks using MySQL 5.0.22.

I used today's HEAD and created 10,000 users using devel.

Then I ran two queries, searching for a user called "paritoh" (lower case p in the db):

  $sql = "SELECT * FROM {users} WHERE LOWER(name) = LOWER('%s')";
  db_query($sql, 'Paritoh'));

This is how it is in core today.

Result: Time=22.491 Rows=1 SQL=select * from users where LOWER(name) = LOWER('%s')

  $sql = "SELECT * FROM {users} WHERE name = '%s'";
  db_query($sql, strtolower('Paritoh'));

This is using the case conversion outside the database:

Result: Time=0.395 Rows=1 SQL=select * from users where name = '%s'

56X orders of magnitude improvement.

So, for MySQL, the performance problem is solved by doing the case conversion in PHP, with no backward compatibility issues or the need for a derived column.

This leaves PostgreSQL, which is case sensitive.

By using database abstraction, we always have compromises such as lowest common denominator, or optimizing for one but not the other.

Alternatives:

- Store names in lower case. They are already unique and have a unique index, so no data issues are there, but will be offputting to users to see their names change case. In this case, we can still use LOWER in the SQL, but like so SELECT * FROM {users} WHERE name = LOWER('%s') which works with all databases, and is as fast as the PHP strtolower() for MySQL.

- Do some magic in the abstraction layer to rewrite the SQL for each engine. For example, if we have LOWER(name), we make it just name for MySQL.

- Other alternatives?

Log in or register to post comments

Comment #69

catch

he/him

English

CreditAttribution: catch commented 26 October 2007 at 23:06

Status:

Needs work

» Closed (duplicate)

http://drupal.org/node/105161 and http://drupal.org/node/181625 were duplicates.

Log in or register to post comments

Comment #70

catch

he/him

English

CreditAttribution: catch commented 26 October 2007 at 23:07

Status:

Closed (duplicate)

» Needs work

oops, setting status back, too many tabs.

Log in or register to post comments

Comment #71

merlinofchaos CreditAttribution: merlinofchaos commented 1 November 2007 at 04:21

Category:	task	» bug
Priority:	Normal	» Critical

Coming back over from http://drupal.org/node/181625

I am setting this back to critical and back to bug report.

From firebus' report (I've been working with him), not fixing this made nanowrimo.org unusuable with 800 logged in users; fixing this got him up to 1400 logged in users and a slow but usable site.

This is a critical performance problem.

Log in or register to post comments

Comment #72

hswong3i CreditAttribution: hswong3i commented 1 November 2007 at 09:07

Subscribing

Log in or register to post comments

Comment #73

chx CreditAttribution: chx commented 1 November 2007 at 09:57

Assigned:	Dries	» chx
Status:	Needs work	» Needs review

File	Size
lower_0.patch	10.71 KB

I would side with merinofchaos, as posted in another issue, one single term query on NowPublic runs for 2.42s w/ LOWER and 0.02 without. So, I would argue Dries' earlier

This doesn't break your website and therefore isn't critcal.

-- while the website does run but if its so slow that it can't be used, then IMO that's a broken. The patch is not difficult. The update is not nice but that's life.

Log in or register to post comments

Comment #74

chx CreditAttribution: chx commented 1 November 2007 at 10:04

File	Size
lower_1.patch	10.93 KB

Doh! Forgot the indexes.

Log in or register to post comments

Comment #75

chx CreditAttribution: chx commented 1 November 2007 at 10:23

File	Size
lower_2.patch	12.91 KB

forgot access table as I only search modules. I do not think there is any point keeping the mask case sensitive it's not the usual user submitted data that can not be altered. So, I simply lowercase that and not keep a lower_mask.

Log in or register to post comments

Comment #76

hswong3i CreditAttribution: hswong3i commented 1 November 2007 at 10:23

Give a simple scan about the patch in #74 and sounds beauty: including both upgrade and normal queries patch.

BTW, seems not yet include the handling of drupal_is_denied() (http://drupal.org/node/174025)? It is also a critical issue that affected by non-indexed LOWER() handling. Should we also take care of that within this issue? May be also handled by 2 columns? One for compare (lower case) and the other for admin UI (natural case) ?

Log in or register to post comments

Comment #77

chx CreditAttribution: chx commented 1 November 2007 at 10:30

File	Size
lower_3.patch	13.47 KB

hswong3i says, I should warn the admin about the mask change. Be it.

Log in or register to post comments

Comment #78

John Morahan CreditAttribution: John Morahan commented 1 November 2007 at 11:28

Status:

Needs review

» Needs work

Applied #77, installed a new site into a fresh database, and on completing installation, user 1 was not logged in. Trying to log in gives this error: "The username admin has not been activated or is blocked."

Also, the indexes were not created. (looks like they were only added to the upgrade path)

Log in or register to post comments

Comment #79

hswong3i CreditAttribution: hswong3i commented 1 November 2007 at 12:06

Sorry, please hold on for a while! A simple suggestion: should we split the user "real name", "account name" and "user ID" as 3 different type of fields? Let's take vexim (http://silverwraith.com/vexim/) database users table schema as example:

user_id: as like as our uid, auto_increment.
localpart: the local part of email address (e.g. localpart@domain). It is always unique, and restricted with email local part restriction: [a-z0-9_-]. Similar as our name field, plus lower case limitation.
realname: An additional field with NO strict limitation. User will able to input whatever string as their display name.

So my suggested changes:

uid: unchange.
name: field keeped, and translate all account name into lower case. As like as name_lower proposed in chx's patch.
realname: new field, free typing. As backward compatible, we can simply copy the existing name into this new field.

Benefit include:

No change to name and keep its function. Contribute developers should not be confused by the lower case transformation.
realname is much more meaningful and useful. E.g. I always use profile.module and create some field similar as this nature for user full/real name.
name_lower is only meaningful when compare, but no other benefit. On the other hand, contribute developer may feel confusing: which field should use for which cases? Should I always use name_lower during compare? Or combine the use of both field? Or some other style? We need to document the change clearly.

Yes, this suggestion is a bit complicated when compare with chx's patch: patch in #77 is much simple and direct, and should be functioning with no question (may be we need some more debug, but this shouldn't be difficult). This suggestion may require for more changes, e.g. most UI should update and display both user real name and account name. But this is not a new idea: most forum system supported for this style.

Please feel free to work hard with chx's name_lower patch, it is target for today. BTW, my suggestion shouldn't be too complicated, and is suitable for a long term development. If it should belongs to D7, I will come back for this afterward, and focus on chx's work for the needs of today :)

Log in or register to post comments

Comment #80

alpritt CreditAttribution: alpritt commented 1 November 2007 at 14:57

@#79: While I agree with a need for real display names, it seems like something that would complicate this issue. Whatever the outputted username is, will be the one people search for and must therefore be translatable into the lowercase version. So we cannot have a display name of 'John Doe' and then a username of 'jdoe' since other users would end up searching for the wrong name.

Comment #37 seems like a much more workable idea. And probably something that should be taken into consideration in support of the above patch -- even if we intend to implement it at a later date. If this is the case, maybe the naming of the field should not be as specific as 'name_lower', but should be something more generic like 'name_unique'.

Log in or register to post comments

Comment #81

alpritt CreditAttribution: alpritt commented 1 November 2007 at 15:06

From #79: "This suggestion may require for more changes, e.g. most UI should update and display both user real name and account name. But this is not a new idea: most forum system supported for this style."

Yes, this would solve the problem I raised above. However, it will do so at the cost of the UI. Displaying both names would be confusing.

Log in or register to post comments

Comment #82

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 1 November 2007 at 15:09

I think hswong's suggestion is when someone enters "John Doe" as their username on registration, behind the scenes, "john doe" is stored as the user.username field, and "John Doe" is stored in the user.realname field.

How does all this lowercase business work with accented letters, Japanese characters, etc?

Log in or register to post comments

Comment #83

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 15:20

@webchick: This suggests extra lower case column for accented characters so shouldn't be an issue in this case.

Japanese doesn't have a concept of lower case, so at the db level I assume it makes no difference at all (could only find a mysql internals list discussion from 2000 to support this, and it wasn't directly related).

Log in or register to post comments

Comment #84

hswong3i CreditAttribution: hswong3i commented 1 November 2007 at 15:52

For clarify my idea (http://edin.no-ip.com/html/?q=node/346), here are some useful screenshots from vexim and eGroupWare:
http://edin.no-ip.com/html/files/vexim_adminuseradd.png
http://edin.no-ip.com/html/files/vexim_adminuserchange.png
http://edin.no-ip.com/html/files/egroupware_edit_user.png
http://edin.no-ip.com/html/files/egroupware_setup_authentication.png

The proposal (split field into realname, name and uid) is not too complicated, backward compatible, and widely adopted by many other systems. Should this be one of the possible solution?

Log in or register to post comments

Comment #85

hswong3i CreditAttribution: hswong3i commented 1 November 2007 at 16:05

May be we should restrict "account name" (name) as strict as email local part: [a-z0-9-_.], and leave "real name" (realname) as free as possible. This can simplify the checking (whatever UTF8 or not, we only accept simple text input), and also extensible if we may adopt SMTP authentication backend.

Log in or register to post comments

Comment #86

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 16:31

hswong3i - username currently accepts most UTF-8 characters - including Japanese etc., your suggestion would be a regression if it restricted to [a-z0-9-_.], and I really think the "real name" idea should be left for D7 - we already have profile.module for this.

Looks like #77 needs schema declaration updated in .install, will try to reproduce John Morahan's bug later on.

Log in or register to post comments

Comment #87

John Morahan CreditAttribution: John Morahan commented 1 November 2007 at 16:50

I'm using MySQL version 5.0.22-Debian_0ubuntu6.06.5-log - if that helps.

Log in or register to post comments

Comment #88

hswong3i CreditAttribution: hswong3i commented 1 November 2007 at 16:57

@catch: agree. The idea of restricted name ([a-z0-9-_.]) may be too new and also too late within D6 dev cycle. We may need more time and discussion for this additional issue. I am just exploring about its possibility.

BTW, It should be a good idea if we employ the pair up of realname + name, rather than name + name_lower. This should be more meaningful, and backward compatible: no harm to contribute developer even they don't update the old LOWER(name) handling, will not break existing code but just without performance boost.

Log in or register to post comments

Comment #89

kbahey CreditAttribution: kbahey commented 1 November 2007 at 17:02

As I said earlier in #68, the problem does not exist in MySQL, and we can overcome it with a simple code change that does not involve any schema changes.

Again, I am objecting to chx's approach of derived columns that only hold a variation of an existing column. These are just trouble down the road, and we will pay dearly for them in maintenance and complexity.

If Postgres needs that, then why not limit the complexity to it?

As to hswong3i's suggestion, I need clarifications on:

- The local part of the email may not be unique. There can be someone@hotmail and someone@yahoo with the same "someone" part, but different people. Unlikely but possible.

- Your scheme enforces that the login name ('name' in your writeup) is always lower case. This means that now we cannot have 'Dries', or 'Amazon', but only 'dries' and 'amazon'? That may not be acceptable to some people ("Why can't the login name be what I want it to be?") Also, what about UTF-8 characters? How would Jose/Andre with accents on the e be handled? Strip those as well?

Log in or register to post comments

Comment #90

moshe weitzman CreditAttribution: moshe weitzman commented 1 November 2007 at 17:37

your suggestion to use a different schema for pg users is is more complex than the extra column, IMO. what about al lthe other DBMS? you have objected without proposing an acceptable alternative.

Log in or register to post comments

Comment #91

hswong3i CreditAttribution: hswong3i commented 1 November 2007 at 17:52

@moshe: totally agree. We shouldn't have a hybrid solution which is case-by-case and database-by-database dependent.

@kbahey: local part is not unique, just as like as it is not unique when using drupal.module within D5.3. SMTP authentication should always binded with A authentication server. In case of eGroupWare, we can define the "Mail domain (for Virtual mail manager)", so user will able to use both "username@example.com" and "username" if Mail domain is set as "example.com" (Sorry this topic shouldn't belongs to here, I will create a new one for D7)

On the other hand, login name is restricted, but not realname. Realname allow UTF-8, with no strict limitation. The idea is similar as the "Full name" field which is used within drupal.org but created by profile.module. Realname should be something always binded with user, no matter how the drupal site is used for (in general speaking...). It is not necessary to split this essential field into a non-standardized profile.module handling.

Log in or register to post comments

Comment #92

chx CreditAttribution: chx commented 1 November 2007 at 19:30

realname whatever it is , too big for D6 and does not belong to this thread. Bring that somewhere else. (So much banter and noone fixed the patch!)

Log in or register to post comments

Comment #93

John Morahan CreditAttribution: John Morahan commented 1 November 2007 at 19:30

File	Size
lower_4.patch	15.18 KB

Found the login bug: there was a missing break; in the switch statement in user_save(). Leaving as CNW because this still needs the schema update.

Log in or register to post comments

Comment #94

John Morahan CreditAttribution: John Morahan commented 1 November 2007 at 19:39

Also, user_save() does not populate {users}.name_lower for newly created accounts.

Log in or register to post comments

Comment #95

John Morahan CreditAttribution: John Morahan commented 1 November 2007 at 19:47

File	Size
lower_5.patch	15.68 KB

Now it does.

Log in or register to post comments

Comment #96

John Morahan CreditAttribution: John Morahan commented 1 November 2007 at 19:59

Status:

Needs work

» Needs review

File	Size
lower_6.patch	17.12 KB

Added the schema change, and added placeholder-for-uid-1 to {users}.name_lower in system_install(), so as not to break the new unique key.

Log in or register to post comments

Comment #97

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 20:19

Ok so I patched a fresh checkout and installed with no issues. offset with comment module but I think my tarball is a little out of date.

Made a new user manually and logged in with them, then logged out and back in with the admin, all fine. Posted nodes and comments, did some user admin etc. no problems there either. Also checked the db and the name_lower column and respective index were there as expected.

Looking pretty good!

Log in or register to post comments

Comment #98

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 20:22

Status:

Needs review

» Reviewed & tested by the community

missed a bit.

Log in or register to post comments

Comment #99

jvandyk CreditAttribution: jvandyk commented 1 November 2007 at 20:45

Status:

Reviewed & tested by the community

» Needs review

File	Size
lower20071101jv.patch	17.02 KB

Fixed some grammar issues.

The tests for lowercase masks in the access rules UI were incorrect.

Log in or register to post comments

Comment #100

David Strauss

he/him

CreditAttribution: David Strauss commented 1 November 2007 at 22:05

The index on user.name needs to go, but this patch looks great otherwise.

Log in or register to post comments

Comment #101

John Morahan CreditAttribution: John Morahan commented 1 November 2007 at 22:57

File	Size
lower_7.patch	17.13 KB

Like so?

Log in or register to post comments

Comment #102

John Morahan CreditAttribution: John Morahan commented 1 November 2007 at 23:32

File	Size
lower_8.patch	17.89 KB

Seems user_autocomplete() got lost somewhere along the way

Log in or register to post comments

Comment #103

kbahey CreditAttribution: kbahey commented 2 November 2007 at 02:12

@moshe and @hswong3i:

My comment on MySQL, is that 95% of Drupal sites use MySQL already (we can debate the figure separately). So for the sake of the 5% we are introducing a derived column, denormalizing the schema, which is not needed by those 95%. Dowing a strlower() in PHP solves the problem for MySQL.

I don't use PostgreSQL, nor my clients do, and hence have no need to solve that problem. Someone who uses it can scratch that itch.

The issue with using derived columns is that they have to computed every time the original changes, and the probability of errors because they get out of sync is there. Yes, the UPDATE and INSERT can be in only a few places, but there are modules that can mess this up easily.

Anyways, it seem that I am the only one voicing this concern. I wish we can poll a PostgreSQL expert on what options are there before we see this as the ultimate solution to this problem.

Log in or register to post comments

Comment #104

hswong3i CreditAttribution: hswong3i commented 2 November 2007 at 03:48

Sorry that since drupal_is_denied() is more high priority than user.name (drupal_is_denied() is called for EVERY page generation, whatever deny or not), I adopt both killes's patch (http://drupal.org/node/174025#comment-317309) and chx's patch comment (http://drupal.org/node/83738#comment-332240), and shift the focus of drupal_is_denied() back to its own issue (http://drupal.org/node/174025#comment-332305).

drupal_is_denied() should delay no more, and shouldn't wait for user.name :)

Log in or register to post comments

Comment #105

hswong3i CreditAttribution: hswong3i commented 2 November 2007 at 10:40

I folk this issue and handle with what I proposed in #79: http://drupal.org/node/188734. That issue should not be as simple as name_lower, but should be more complete and elegant :)

Log in or register to post comments

Comment #106

markus_petrux CreditAttribution: markus_petrux commented 2 November 2007 at 11:23

hmm... not sure if someone checked this already... there's a way to extend PostgreSQL by creating new data types, so it is possible to create a "Case insensitive text data type". Here's a project that does this:

http://gborg.postgresql.org/project/citext/projdisplay.php

Log in or register to post comments

Comment #107

chx CreditAttribution: chx commented 2 November 2007 at 15:04

denormalizing is not for speed purposed, next version we can even put in triggers, about citext, well, that's not civarchar so it's somewhat problematic.

Log in or register to post comments

Comment #108

John Morahan CreditAttribution: John Morahan commented 2 November 2007 at 22:21

Status:

Needs review

» Needs work

File	Size
lower.patch	18.03 KB

I worked on this a bit while d.o was down.

I don't think there should be a unique key on {term_data}.name_lower - what if you had terms with the same name in different vocabularies, or in different locations in a hierarchy? I changed it to a regular index in this version.

I also renamed _system_update_6036_strotolower() to _system_update_6036_strtolower() (assuming this was just a typo) and corrected a comment "fail over" to "fall through" in user_load().

I tested the upgrade path and found several problems: db_add_field() needs 'type' => 'varchar', not 'name' => 'varchar'; the access update was checking for db_table_exists('term_data'); the unique key on {users}.name_lower was added before the field was populated; and the multipart update did not increment its counter. These should all be fixed in the attached patch.

I guess the access / drupal_is_denied() stuff should probably be removed from this patch, now that it's being dealt with in the other issue. I haven't done that yet.

Log in or register to post comments

Comment #109

John Morahan CreditAttribution: John Morahan commented 2 November 2007 at 22:25

File	Size
lower.patch	18.03 KB

Note to self: read patch before uploading.

Log in or register to post comments

Comment #110

John Morahan CreditAttribution: John Morahan commented 2 November 2007 at 23:44

Another thing: edit and save a user account, the email address is folded to lowercase. This should not happen as email addresses (left of the @) are case sensitive.

Log in or register to post comments

Comment #111

David Strauss

he/him

CreditAttribution: David Strauss commented 3 November 2007 at 01:18

"This should not happen as email addresses (left of the @) are case sensitive."

I think it's OK to lowercase the email addresses. I don't know a single email server that actually observes case.

Log in or register to post comments

Comment #112

hswong3i CreditAttribution: hswong3i commented 3 November 2007 at 02:37

There is a miss typing in system_update_6035 and system_update_6036, from #102:

db_add_field($ret, 'users', 'name_lower', array('name' => 'varchar', 'length' => 60, 'not null' => TRUE, 'default' => ''));

It should be 'type' => 'varchar' but not 'name' => 'varchar'

Log in or register to post comments

Comment #113

chx CreditAttribution: chx commented 3 November 2007 at 08:45

Assigned:	chx	» Unassigned
Status:	Needs work	» Needs review

Log in or register to post comments

Comment #114

catch

he/him

English

CreditAttribution: catch commented 3 November 2007 at 12:15

OK tested adding and deleting users, created rules, logging in etc. everything fine. The message in access rules is sound as well (rules are stored in lower case etc., reassuring).

However changing e-mail address to something uppercase in user/edit you see it being LOWERed when it saves. I can't actually remember ever seeing an e-mail address with uppercase letters in it but I guess some people might use them - does it need a message to say "All e-mail addresses are stored in lower case, however this will not affect e-mails the system sends to you?" on user/edit maybe? But it should only do this if there's uppercase characters otherwise it'd be annoying. That's tiny, and could be a very small followup patch.

hswong3i's comments in #113 are already fixed in the patch from #109. I can't see any other issues but this could probably do with one more set of eyes.

Log in or register to post comments

Comment #115

John Morahan CreditAttribution: John Morahan commented 3 November 2007 at 13:03

Status:

Needs review

» Needs work

Well, if email addresses are going to be lowercased, then they need to be lowercased on creating a new account, and on updating from D5, too.

Log in or register to post comments

Comment #116

chx CreditAttribution: chx commented 3 November 2007 at 13:46

Status:

Needs work

» Needs review

File	Size
lower.patch	16.33 KB

Creating a new acc, installing or not, goes through user_save which lowercases. The update needed some love indeed.

Log in or register to post comments

Comment #117

John Morahan CreditAttribution: John Morahan commented 3 November 2007 at 14:04

Status:

Needs review

» Needs work

Creating a new account goes through the else branch of the big if (is_object($account) && $account->uid) in user_save(), and that part lowercases only name_lower, not mail / init.

Log in or register to post comments

Comment #118

John Morahan CreditAttribution: John Morahan commented 3 November 2007 at 14:46

Status:

Needs work

» Needs review

File	Size
lower.patch	18.62 KB

So I added another case to the switch.

Log in or register to post comments

Comment #119

chx CreditAttribution: chx commented 3 November 2007 at 14:56

File	Size
lower.patch	16.85 KB

I have changed the INSERT switch to default strtolower the same as the UPDATE one. Signature is held case sensitive.

Log in or register to post comments

Comment #120

John Morahan CreditAttribution: John Morahan commented 3 November 2007 at 15:00

Okay, that works too.

Log in or register to post comments

Comment #123

John Morahan CreditAttribution: John Morahan commented 4 November 2007 at 16:59

File	Size
lower.patch	18.71 KB

There's now another system_update_6035(); rerolling.

Log in or register to post comments

Comment #124

JirkaRybka CreditAttribution: JirkaRybka commented 4 November 2007 at 21:46

Status:

Needs review

» Needs work

File	Size
lower-bench.txt	2.14 KB

I tested the patch by upgrading an existing 6.x clone of real site. Everything seems to work well (upgrade too), adding access rules, editing users... By reading the patch, I got an impression that more things are probably affected than I managed to test, though.

Minor problems:
- The upgrade functions should return some sensible message. Currently updates 6038, 6040, and 6041 claim that "No queries were executed", which is quite not true.
- In user.module, one of the switch ($key) blocks, there are two cases on the same line.

I tried to benchmark this patch, expecting a bit of extra speed due to the access rules check optimized. The results are attached. Unfortunately, the gain with my site's data seems to be rather small:
- Cached pages: 2.0%
- Not cached pages: 1.4%

This makes me inclined to say, although there IS some gain indeed, that such a small gain doesn't make a good reason for schema change this late in D6 release cycle. Other changes than access check are not covered by my benchmarking, but I believe these are targetting rarely requested pages, making no good reason either.

Marking as 'Needs work' for the two tiny problems mentioned above, AND for the need to do some more benchmarking and/or consideration to decide whether this should go into 6.x or 7.x (I'll vote for 7.x).

Log in or register to post comments

Comment #125

David Strauss

he/him

CreditAttribution: David Strauss commented 4 November 2007 at 21:50

@JirkaRybka Your performance tests are completely unusable unless we know the size of your user and access tables.

Log in or register to post comments

Comment #126

chx CreditAttribution: chx commented 4 November 2007 at 21:52

Status:

Needs work

» Reviewed & tested by the community

Two cases on the same line, great, noone said that's a problem, evah. Update messages can be added later. As you say it's not only this one. Benchmark, what more benchmark? I benchmarked above. Create a term_data table with a couple hundred thousand terms and then try to SELECT with LOWER('foo')=LOWER(name) and then 'foo'=name (see above) 2.42s vs. 0.02s. What more do you want? Do not expect this patch to boost your Drupal install with a hundred terms or users. Try, as I say, a hundred thousand.

Log in or register to post comments

Comment #127

JirkaRybka CreditAttribution: JirkaRybka commented 4 November 2007 at 21:54

Status:

Reviewed & tested by the community

» Needs work

users: 554 rows
access: 17 rows
term_data: 102 rows

Log in or register to post comments

Comment #128

JirkaRybka CreditAttribution: JirkaRybka commented 4 November 2007 at 21:58

Status:

Needs work

» Reviewed & tested by the community

I didn't mean to change status now...

Re: #126 - if you think other than me, OK, you've all right to that. I said my opinion. Schema change seem late to me, unless there's a lot of gain. (I can't test with hundreds of thousands, if you expect that, then I just wasted hours of my time.)

Log in or register to post comments

Comment #129

chx CreditAttribution: chx commented 5 November 2007 at 07:27

No you did not waste your time because functionality testing is needed and welcome.

Log in or register to post comments

Comment #130

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 7 November 2007 at 10:02

I am handing this issue to Dries for consideration, whether such a database change is worth the gains so late in the cycle.

Log in or register to post comments

Comment #131

chx CreditAttribution: chx commented 7 November 2007 at 13:17

Of course it is! :)

Log in or register to post comments

Comment #132

m3avrck CreditAttribution: m3avrck commented 7 November 2007 at 15:30

FWIW, I have been running a variation of this patch on most of my sites for the past 4-5 months... have not noticed any ill effects and my database is happy :-)

Log in or register to post comments

Comment #133

merlinofchaos CreditAttribution: merlinofchaos commented 7 November 2007 at 15:39

As firebus said, fixing this problem allowed him to double the number of logged in users his site was able to handle on the same hardware.

If this is not a critical performance enhancement, I don't know what is.

Log in or register to post comments

Comment #134

mfer CreditAttribution: mfer commented 8 November 2007 at 15:20

subscribing

Log in or register to post comments

Comment #135

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 9 November 2007 at 23:22

Status:

Reviewed & tested by the community

» Needs work

LOWER() and strtolower() are not the same, the former is UTF-8 aware, the later is not. So they are not interchangable. Read more here: http://drupal.org/node/174025#comment-625306

I mark this issue "needs work", but actually I would welcome if we could solve the smaller issue at http://drupal.org/node/174025 for which there is an alternate fix embedded in the bigger patch here for whatever reason. Once the UTF-8 friendlyness issue is fixed there, that patch can be committed, and this patch can also get smaller, and can stand on its own instead of trying to solve multiple issues at once.

Log in or register to post comments

Comment #136

moshe weitzman CreditAttribution: moshe weitzman commented 10 November 2007 at 02:09

sounds like a job for drupal_strtolower() in unicode.inc

Log in or register to post comments

Comment #137

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 10 November 2007 at 10:32

Well, refrain to drupal_strtolower() if we are fine with *requiring* the PHP mbstring extension, or our code does funny things. If you look into drupal_strtolower(), it either works with the mbstring extension or falls back on a custom implementation with support for latin-1 only (not even suitable for this area of Europe, where I live).

Log in or register to post comments

Comment #138

hswong3i CreditAttribution: hswong3i commented 10 November 2007 at 10:39

So if mbstring our standard and suggested PHP extension? e.g. I found that admin/logs/status (D5.3) have a "Unicode library" checking. Should we assume users understand the risk if they don't have mbstring installed when using unicode with drupal?

Log in or register to post comments

Comment #139

hswong3i CreditAttribution: hswong3i commented 10 November 2007 at 15:24

Are strtolower() nor drupal_strtolower() suitable for solving slow LOWER() issue? NO, they don't; they can't replace LOWER() unicode compatible feature (please refer to Gabor's comment for more detail: http://drupal.org/node/174025#comment-625544).

What we should do with slow LOWER() is: use LOWER() to filter INSERT/UPDATE, and use LOWER() for filtering user input ONLY during SELECT (but don't use LOWER() for column or else cancel indexing). The key point is: fully utilize LOWER(), on the other hand don't cancel indexing :)

P.S. I have already update both http://drupal.org/node/174025 and http://drupal.org/node/188734 based on Gabor's suggestion. If we decided to keep users.name_lower as slow LOWER() solution, some works may needed for it :(

Log in or register to post comments

Comment #140

klando CreditAttribution: klando commented 10 November 2007 at 21:50

I haven't read the all the thread, but options I see are :

* the so perfect index on table(lower(field))
* use ILIKE operator which work under postgresl as an insensitive comparaison, but the first option is the best. in this second one, index on the name should be : ceate index foo on table(field text_pattern_ops);

adding a new field for that ?! it is very very strange idea .....

Log in or register to post comments

Comment #141

David Strauss

he/him

CreditAttribution: David Strauss commented 10 November 2007 at 22:29

Why don't we just use the LOWER() function in the database for the constant values? There's no performance penalty.

Log in or register to post comments

Comment #142

markus_petrux CreditAttribution: markus_petrux commented 11 November 2007 at 08:58

...there is also the option to add a new data type to PostgresSQL, as pointed to by #106, or the possibility to build different queries depending on the SQL engine.

BTW, citext seems to work for text and varchars. It would be nice to hear from someone versed in PostgreSQL about this approach.

Log in or register to post comments

Comment #143

klando CreditAttribution: klando commented 11 November 2007 at 15:09

as far as only 'lower(field)=lower($name)' is resquested (i.e. not 'field like ...'), the best approach in postgresql *is* the index on lower(). I am not a drupal user's since long time, and I will investigate more in the core code next weeks (I hope). I can claim to be versed to postgresql...

In postgresql good result ordering are defined by lc_collate well positionned for the language indexed when using like/ilike operator. Also, you have to use the special 'text_pattern_ops' when indexing in the idea to use LIKE. when using the '=', functionnal index can be used (the lower() for example, or special criteria more complex-the words > x letters and with no numbers..or everything you want)

Log in or register to post comments

Comment #144

chx CreditAttribution: chx commented 11 November 2007 at 16:05

yes, use LOWER on constants. About pgsql being able to index on lower, that's not too significant as we won't have two different schema and query and all for pgsql.

Log in or register to post comments

Comment #145

klando CreditAttribution: klando commented 12 November 2007 at 00:53

Well, afais it is an issue for big website, those which get people able to add an index on a specific demand.

Log in or register to post comments

Comment #146

hswong3i CreditAttribution: hswong3i commented 12 November 2007 at 03:12

@klando: I am totally agree with chx's comment in #144. We should always avoid database specific tweaking and hacking if possible. If an universal solution is available as alternative choose, seems it should be chosen as a better solution :)

Log in or register to post comments

Comment #147

kbahey CreditAttribution: kbahey commented 12 November 2007 at 23:03

A far saner approach to this can be seen in #191504 without the hoops we have to jump through to do this.

Log in or register to post comments

Comment #148

hswong3i CreditAttribution: hswong3i commented 13 November 2007 at 05:16

Assigned:	Unassigned	» hswong3i
Status:	Needs work	» Needs review

File	Size
name_lower-0.15.patch	5.48 KB

Recall my previous comment for users.fullname (http://drupal.org/node/188734#comment-628268):

On the other hand, I guess we are now able to conclude name_lower as the most suitable solution for D6, by the counter prove of fullname for D7, based on its complete research founding and successful foreseeing. Yes, name_lower is very late within D6 development cycle; it also seems ugly, lossy and temporary solution. BTW, name_lower hidden most handling from UI, and there is no change to client behavior. It also give no impact to existing API, as no additional abstraction is required. And the most important is, name_lower should give us the most smooth upgrade path for D7 with minimum impact :)

I merge the progress result of users.fullname with users.name_lower, which utilizing the use of LOWER() in order to boost up overall performance if possibile. Since the performance bottleneck is mainly related to user_load() and user_access(), I try JUST FOCUSING on the use of users.name_lower within this 2 functions, so reduce the impact for D6, and also the impact for D6 (name_lower) -> D7 (fullname) migration as minimum as possible.

Patch is tested with MySQL (5.0.32-Debian_7etch1-log) and PostgreSQL (8.1.9, Debian 4.1.1-21) with no error. Also tested with D5.3 -> D6 upgrade (MySQL). Currently, only handle users related patch; taxonomy patch will handle afterward.

Log in or register to post comments

Comment #149

hswong3i CreditAttribution: hswong3i commented 13 November 2007 at 07:23

File	Size
name_lower-0.16.patch	8.71 KB

Follow up with taxonomy patch. UPDATE is required after drupal_write_record() since LOWER() is not replaceable by strtolower() nor drupal_strtolower(). JUST FOCUSING on taxonomy_save_term() and taxonomy_get_term_by_name() as they are the key point of performance bottleneck. Change as minimum as possible in order to reduce the impact for D6 and the upgrade path for D7 (with whatever alternative and better solution).

Patch is tested with MySQL (5.0.32-Debian_7etch1-log) and PostgreSQL (8.1.9, Debian 4.1.1-21) with no error. Also tested with D5.3 -> D6 upgrade (MySQL).

Log in or register to post comments

Comment #150

chx CreditAttribution: chx commented 13 November 2007 at 12:13

Upcoming is a patch which will use triggers to maintain name_lower on databases that support that and something else for MySQL 4.1. The MySQL part is done, I am waiting for the postgresql trigger.

Log in or register to post comments

Comment #151

hswong3i CreditAttribution: hswong3i commented 13 November 2007 at 12:30

Component:

user system

» database system

@chx: WOW! Trigger! I love it (Oracle's trigger give me A LOT OF FUN once before...)! Just one minor concern: seem schema API don't currently support trigger officially, how can we support trigger without affecting existing D6 implementation?

Log in or register to post comments

Comment #152

klando CreditAttribution: klando commented 13 November 2007 at 12:47

Component:

database system

» user system

1. SQL language does not provide trigger facility to postgreql, we need at least plpgsql.
2. perhaps some rules/views
3. I still find *bad* the adding column, or this implementation.
4. some operator rewriting are also possible, perhaps ....

As I read(in posts), what is needed is a NAME and a UNIX_NAME. Which mean some name for webpage and some unix_name for auth. Not a name and name_lower (even only the field name change, it is important for the model)

Another point is that, core drupal devs claims same queries for any db, I am not sure it is good:
* when you have an issue on mysql, patchs impact postgresql and vice-versa
* when you want to optimise the code you can not.

Every db engine have is own positive point, and I think it is important to catch the advantages of the db server. It can not be done with only identical queries.

I am not already really versed to drupal code, so I am disapointed : I can not offer any help until I get time to look at the code.

Is there timeline in drupal which make so urgent the patch ? Be patient, working on a solution can be longest than few days.

Log in or register to post comments

Comment #153

chx CreditAttribution: chx commented 13 November 2007 at 12:49

Interested parties can get the code from http://drupalbin.com/107 combine with postgresql: create function term_name_lower() returns trigger language plpgsql as $f$ begin NEW.name_lower := lower(NEW.name); return NEW; end; $f$; and select has_language_privilege(oid,'USAGE') from pg_language where lanname='plpgsql'; to check for plpgsql hence trigger possibility. I have no time to roll together right now.

Log in or register to post comments

Comment #154

chx CreditAttribution: chx commented 13 November 2007 at 13:11

Assigned:

hswong3i

» chx

The PostgreSQL code is from AndrewSN on #postgresql who also said:

[AndrewSN] if you absolutely have to use the same query on mysql and pg, then the trigger is probably the best solution
[chx] AndrewSN: may I quote this on drupal.org ?
[AndrewSN] yeah, you can quote me

So here were are. Patch, as said, forthcoming either very late today or early tomorrow. If someone does not do it before, of course. All code is provided.

Log in or register to post comments

Comment #155

hswong3i CreditAttribution: hswong3i commented 13 November 2007 at 13:27

Component:

user system

» database system

@klando: If we are talking about the pair up of NAME and UNIX_NAME, it is just similar idea as users.name (for auth) and users.fullname (for webpage): http://drupal.org/node/188734. As proved by users.fullname, we will need some time for UI update, which seems not suitable for current D6 status. I am giving a great hope for this in D7.

Every DB engine have its own positive point, e.g. Oracle support both trigger and sequence (I use it for emulating MySQL auto_increment within Oracle driver), on the other hand SQLite don't. BTW, that's not means we MUST have trigger and sequence supporting; complicated handling can always split into small and simple alternative solution, which usually simpler for code maintenance and more portable.

Anyway, I am agree about time limitation. name_lower + trigger should able to simplify overall implementation, but debug of trigger may also delay overall D6 release schedule (well, I don't really know if we still have such schedule...). If works can be done by a simpler solution, it should be more suitable for our current requirement with smaller impact.

Log in or register to post comments

Comment #156

klando CreditAttribution: klando commented 13 November 2007 at 17:22

for info, I provide here some infos from mysql:
http://dev.mysql.com/doc/refman/5.1/en/case-sensitivity.html

So mysql can perfectly be case sensitive.

Additionnaly, in this thread, the guy provide an explain of a request which *use* index. (WHERE LOWER(name) LIKE LOWER('a%') ORDER BY name LIMIT 100;)

I have no mysql here, perhaps having a look at that ?

Log in or register to post comments

Comment #157

klando CreditAttribution: klando commented 13 November 2007 at 20:14

oops :-)
the thread : http://forums.mysql.com/read.php?24,171205,171205

Log in or register to post comments

Comment #158

chx CreditAttribution: chx commented 14 November 2007 at 10:03

klando, I told you on IRC we are aware of the fact that MySQL can be case sensitive but I pray how that's relevant here? We are adding a name_lower column anyways.

Log in or register to post comments

Comment #159

cburschka

they

CreditAttribution: cburschka commented 14 November 2007 at 10:14

What is needed to review this patch? (Unfortunately I don't have pgSQL, and I can only benchmark a large site if I first generate one with devel).

Log in or register to post comments

Comment #160

chx CreditAttribution: chx commented 14 November 2007 at 14:22

Status:

Needs review

» Needs work

Um sorry for not updating the status.

Log in or register to post comments

Comment #161

hswong3i CreditAttribution: hswong3i commented 14 November 2007 at 18:06

Status:

Needs work

» Needs review

We seems now a bit off-topic. This issue is all about LOWER(name) queries perform badly; add column name_lower and index. From my point of view, it is already complete in #149. name_lower + indexing can solve slow LOWER() for D6, without require any other additional API/database abstraction/schema changes. It is the most simple solution that we can now figure out, with minimum impact to existing implementation (yes, it is even simpler than users.fullname issue).

Using trigger for name_lower maintenance should always be a better and handy solution than calling UPDATE manually, but this is completely another issue. We still have many other concern about trigger, e.g. what I can guess about: 1. it is not the lowest common factor of databases, some database (e.g. SQLite) may not able to support this; 2. we need most modern database for trigger supporting (e.g. MySQL 4.x+, PgSQL + plpgsql); 3. do we really need trigger? will it really give us benefit when compare with alternative simple solution? 4. do we really have time and chance for committing trigger within D6? etc.

This issue seems now hijacked by trigger. We may come back and update the ugly and lossy UPDATE handling by trigger, when trigger goes core support; but that's not means we need to wait for it or else postpone the commit of name_lower for D6. According to our understanding, slow LOWER() is a critical performance impact which should fix within D6 if possible. Should we shift focus back to #149, in order to commit this issue on time?

Log in or register to post comments

Comment #162

robertDouglass CreditAttribution: robertDouglass commented 14 November 2007 at 21:10

Just for the record, I agree with hswong3i in #161. I have (obviously) supported the name_lower + UPDATE solution from the beginning.

Log in or register to post comments

Comment #163

cburschka

they

CreditAttribution: cburschka commented 15 November 2007 at 06:22

It seems to be the lowest common denominator, and a very straightforward space-for-searchtime trade, like any form of indexing. Trying to get it to use the special API of each of the database systems looks clean in theory, but the schema split would be impractically complex.

As a precedent example, what about db_next_id and {sequences}? MySQL's auto_increment feature hasn't been used since D5...

Log in or register to post comments

Comment #164

chx CreditAttribution: chx commented 15 November 2007 at 07:22

File	Size
lower_with_sync.patch	15.07 KB

First of all, you left out a few LOWER(name) calls in users. Second, taxonomy_autocomplete searched for '%foo%' which is not consistent with user_autocomplete, not consistent with how the world works (specifically checked with gmail) and the world works this way because if you start with a % then you can't index. There is no worse place to not index than in an autocomplete function.

Keeping the name_lower update in the update query only is burying our heads in the sand. You presume people will always use the API to change the database. We all know this is not the case. Of course, they can also take care about keeping the DB in sync but if it comes for free, why not have core do that? And indeed it comes for free if your database supports triggers and as MySQL 4.1 general support ended almost a year ago, it's quite OK to suppose that 5.0 is prevalent. You might say, even if it's so, some users have ISPs who do not support triggers. For them, we provide a function which does a table scan, this comes at the same cost as one LOWER(%s)=LOWER(name) did which also scanned the table, and during the scan, it picks those fields where the name_lower is out of sync -- presumedly very few -- and updates those. This function is called from taxonomy_save_term and taxonomy_get_term_by_name but not from taxonomy_autocomplete -- autocomplete must be speedy. users table is synced in user_search and in user_save but nowhere else, this should be more than enough. If you are worried by there are reads without a sync call, please note that on every save we sync every row in the table and there are extra synces too -- so this is a lot better than just simply keeping an update in the save functions.

Almost forgot to credit David Strauss for the underlying idea. Good we can edit comments now.

Log in or register to post comments

Comment #165

markus_petrux CreditAttribution: markus_petrux commented 15 November 2007 at 07:57

What about performance? cost of using triggers? cost of db_sync_name_lower() for those that can't use triggers?

Also, does this change affect/limit the possibilities to implement more DB engines into Drupal? What about performance in those cases? (I belive there was something already made for Oracle or DB2).

Then, maybe I missed this if it was already commented before... comparing name_lower = LOWER('%') might be unnecessary, if when you inserted the name_lower value using drupal_strtolower(), the same conversion would apply when comparing, no?

Log in or register to post comments

Comment #166

sammys CreditAttribution: sammys commented 15 November 2007 at 08:03

Status:

Needs review

» Needs work

Looking ok with a visual check. Not testing yet as chx and I have discussed caching the return value of db_has_trigger() into a variable. Of course, this will require some care if the system is moved from one version of MySQL to another. However, the need to avoid the query overhead outweighs the number of incidents AND the likelihood of downgrading is very low.

--
Sammy Spets
Synerger
http://synerger.com

Log in or register to post comments

Comment #167

hswong3i CreditAttribution: hswong3i commented 15 November 2007 at 08:24

@markus_petrux: LOWER() is not replaceable by strtolower() nor drupal_strtolower(). Please read #139 for more information.

@chx and sammys: just a little request: would you mind to free name_lower + indexing from trigger, and handle trigger as another issue? This issue may still have chance for sliding into D6 even it is such late, if we are able to limit the impact as small as possible; but I have no idea if it is hijacked by trigger. As mentioned in #161, we can raise TOO MANY different questions about trigger, and the duty of answering these questions should belong to its own issue :'(

Log in or register to post comments

Comment #168

chx CreditAttribution: chx commented 15 November 2007 at 08:48

Status:

Needs work

» Needs review

I will implement sammys' suggestion later but that should not stop reviews coming. @hswong3i, the change without triggers/sync is not acceptable for me because it's utterly fragile. @markus, I have detailed the cost of sync (almost the same as an old SELECT unless there is sync to do, then one UPDATE) and the cost of trigger, since when we care about the speed of writes? I guess it's a bit slower but also, a very little bit.

Log in or register to post comments

Comment #169

sammys CreditAttribution: sammys commented 15 November 2007 at 08:51

Giving chx a little backup on this... cost of using the trigger only affects DB writes. Not worth worrying about write cost on the user table when the benefit far outweighs. Thanks to chx for rolling the patch!

Log in or register to post comments

Comment #170

sammys CreditAttribution: sammys commented 15 November 2007 at 10:19

File	Size
lower_with_sync.200711151000UTC.patch	16.22 KB

Patch updated to fix following:
* Unique index fails because creation attempted before field was populated
* Update/install fails to create triggers on PostgreSQL because of incorrect CREATE TRIGGER syntax

Also rolled in the latest commits made to system.install so the update numbers are correct.

Checked update/insert with queries on PostgreSQL and it behaves as expected.

Please double check this new patch on a MySQL installation and then mark as RTBC if all ok.

Log in or register to post comments

Comment #171

markus_petrux CreditAttribution: markus_petrux commented 15 November 2007 at 10:31

Oh, I see the LOWER() stuff... good then. Thanks for pointing that out :-|

So if using Triggers... shouldn't db_sync_name_lower() be invoked at INSERT time, not at SELECT time?

Also, I believe SHOW TRIGGERS needs SUPER privilege on MySQL, is this ok with the MySQL user used to access the DB?

Then, maybe checking for Triggers support could be cached somehow? Maybe detecting if CREATE TRIGGER works or not?

Edited: hmm... if using SHOW TRIGGERS, then what if that statement returns a lot of informacion. Wouldn't it be better to query the MySQL system catalogs?

Log in or register to post comments

Comment #172

markus_petrux CreditAttribution: markus_petrux commented 15 November 2007 at 10:21

Oh, variable $name_field seems to be undefined in db_sync_name_lower().

Log in or register to post comments

Comment #173

chx CreditAttribution: chx commented 15 November 2007 at 12:20

OK, I will check db_sync_name_lower and also store in a variable the trigger results. This we already agreed on, just not implemented.

The crux of the matter is that sync is not enough on write time because then you need to always do it. This is very fragile as triggers fire automatically and it's quite safe to presume that the developer will have MySQL 5.0. If you say, no triggers, put the update into user_save, then the first contrib query direct against users can break this -- if we denormalize then we should not put the burden introduced by denormalizaiton on contrib authors if we can avoid.

Log in or register to post comments

Comment #174

markus_petrux CreditAttribution: markus_petrux commented 15 November 2007 at 14:42

What about using cron to do the db_sync_name_lower stuff?

Also, say you're running MySQL 4.x and install D6 (how many sites are like this today, is everyone using MySQL 5.x outthere?). Well, you have no triggers, so D6 installation/upgrade doesn't populate them. Let say your hosting upgrades to MySQL 5.x, so you could use triggers, but... how could you migrate? So... would it be possible to trigger this from a site option? ...maybe a script to make the relevant changes?

Log in or register to post comments

Comment #175

chx CreditAttribution: chx commented 15 November 2007 at 16:05

The site status report screen could try installing 'em and warning you if !db_has_trigger. And I believe most of the world is on MysQL 5.x, again active support ended for 4.1 on 2006 dec 31. I would not rely on cron for the syncing when we are aiming at low end sites here -- what if they run cron daily?

Log in or register to post comments

Comment #176

David Strauss

he/him

CreditAttribution: David Strauss commented 15 November 2007 at 19:25

"what if they run cron daily?"

Or never.

Log in or register to post comments

Comment #177

chx CreditAttribution: chx commented 15 November 2007 at 21:41

File	Size
lower_with_sync-73738-177.patch	15.26 KB

Added a variable, kept fresh on system_status about triggers. Fixed up db_sync_name_lower.

Log in or register to post comments

Comment #178

chx CreditAttribution: chx commented 15 November 2007 at 22:06

File	Size
lower_with_sync-73738-178.patch	15.57 KB

Doh, I really fixed name_lower field but then throwed it away because i forogt to merge in sammys' patch first ah well here it is.

Log in or register to post comments

Comment #179

chx CreditAttribution: chx commented 15 November 2007 at 22:15

File	Size
lower_with_sync-73738-178.patch	15.18 KB

W/o menu.inc part.

Log in or register to post comments

Comment #180

adrian CreditAttribution: adrian commented 16 November 2007 at 12:37

Why not use rules in postgres? (as in what are the actual issues with them)

[?
create or replace "users_upd" as on update do also (
update users set name=LOWER(NEW.real_name) WHERE uid = NEW.uid;
);

create or replace "users_insert" as on insert do also (
update users set name=LOWER(NEW.real_name) WHERE uid = NEW.uid;

);

?]
although i concede we might need to use currval('{users}_uid_seq') on insert to get the value regardless of whether the insert statement had a uid field or not.

Log in or register to post comments

Comment #181

chx CreditAttribution: chx commented 16 November 2007 at 13:08

Rule is very debated. let's leave that for a followup. klando even says they will fail for certain queries. please leave them out of this already very long and troubled issue.

Log in or register to post comments

Comment #182

chx CreditAttribution: chx commented 1 December 2007 at 07:31

File	Size
lower_with_sync-73738-182.patch	15.18 KB

Applied with offset only so I rerolled.

Log in or register to post comments

Comment #183

chx CreditAttribution: chx commented 1 December 2007 at 09:28

File	Size
lower_with_sync-73738-183.patch	15.18 KB

System.install had a conflict in update numbering.

Log in or register to post comments

Comment #184

chx CreditAttribution: chx commented 1 December 2007 at 09:42

File	Size
lower_with_sync-73738-184.patch	16 KB

install works again without triggers.

Log in or register to post comments

Comment #185

catch

he/him

English

CreditAttribution: catch commented 1 December 2007 at 10:44

Status:

Needs review

» Needs work

Patched clean d6, ran install from scratch. No problems.

Then I tried to create a user from admin >> users and got this:

    * user warning: Duplicate entry '' for key 3 query: INSERT INTO users (name, mail, pass, status, timezone, init, created, access) VALUES ('catch', 'xxx@googlemail.com', '088f6bcd4621d373cade4e832627b4f6', 1, 0, 'xxx@googlemail.com', 1196505455, 1196505455) in drupal6/modules/user/user.module on line 315.
    * notice: Trying to get property of non-object in drupal6/modules/user/user.module on line 330.
    * notice: Trying to get property of non-object in drupal6/modules/user/user.module on line 2180.
    * notice: Trying to get property of non-object in drupal6/modules/user/user.module on line 2183.
    * notice: Undefined property: stdClass::$uid in drupal6/modules/user/user.module on line 2198.
    * notice: Undefined property: stdClass::$name in drupal6/modules/user/user.module on line 2198.

Log in or register to post comments

Comment #186

chx CreditAttribution: chx commented 1 December 2007 at 11:00

Status:

Needs work

» Needs review

File	Size
lower_with_sync-73738-185.patch	15.99 KB

I created a uniq key on users.name_lower which collided with anonymous user. Checked: term_data.name_lower is just an index, as users is now.

Log in or register to post comments

Comment #187

chx CreditAttribution: chx commented 1 December 2007 at 11:08

File	Size
lower_with_sync-73738-186.patch	16.04 KB

Patch had a typo. Sigh.

Log in or register to post comments

Comment #188

catch

he/him

English

CreditAttribution: catch commented 1 December 2007 at 11:43

Status:

Needs review

» Reviewed & tested by the community

OK I tested this with and without triggers in MySQL 5 by adding SUPER for my db user during the second of two patched installs.

With both I tested creating users, adding rules, adding users who should be denied by the rules, and adding taxonomy terms. All worked as expected, so rtbc.

Output of SHOW TRIGGERS when the MySQL user has SUPER included for reference:

mysql> SHOW TRIGGERS LIKE '%';
+------------------------------------+--------+-----------+--------------------------------------+--------+---------+----------+-----------------------+
| Trigger                            | Event  | Table     | Statement                            | Timing | Created | sql_mode | Definer               |
+------------------------------------+--------+-----------+--------------------------------------+--------+---------+----------+-----------------------+
| term_data_name_lower_before_insert | INSERT | term_data | SET NEW.name_lower = LOWER(NEW.name) | BEFORE | NULL    |          | user@localhost |
| term_data_name_lower_before_update | UPDATE | term_data | SET NEW.name_lower = LOWER(NEW.name) | BEFORE | NULL    |          | user@localhost |
| user_name_lower_before_insert      | INSERT | users     | SET NEW.name_lower = LOWER(NEW.name) | BEFORE | NULL    |          | user@localhost |
| user_name_lower_before_update      | UPDATE | users     | SET NEW.name_lower = LOWER(NEW.name) | BEFORE | NULL    |          | user@localhost |
+------------------------------------+--------+-----------+--------------------------------------+--------+---------+----------+-----------------------+
4 rows in set (0.04 sec)

Log in or register to post comments

Comment #189

markus_petrux CreditAttribution: markus_petrux commented 1 December 2007 at 20:18

ahem... SHOW TRIGGERS requires the MySQL user to have SUPER privilege. I belive this might be a problem for some installations when Trigger support could be checked at installation time by checking the result of CREATE TRIGGER, or maybe using a query against the system catalogs.

Log in or register to post comments

Comment #190

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 2 December 2007 at 15:47

Version:

6.x-dev

» 7.x-dev

These level of changes, and the introduction of triggers is way past what is possible now with Drupal 6. Please focus on stabilizing what's there in the code, and happy hacking with this early in Drupal 7. We are moving in circles and would never reach Drupal 6 stable if we introduce new stuff on this scale still.

Log in or register to post comments

Comment #191

chx CreditAttribution: chx commented 2 December 2007 at 17:17

Assigned:

chx

» Unassigned

If that is so then I unassign myself, I do not really care, my site has solved it simply (we run mysql, you see) and i can keep my core hacks around. *shrug*

Log in or register to post comments

Comment #192

markus_petrux CreditAttribution: markus_petrux commented 2 December 2007 at 20:56

Status:

Reviewed & tested by the community

» Needs work

Well, it looks like this needs a bit more love.

Log in or register to post comments

Comment #193

catch

he/him

English

CreditAttribution: catch commented 19 May 2008 at 15:09

At the risk of repeating myself, I still think it'd be fairly simple to do it like this:

Expand username to allow more characters, currently apostrophes are excluded. This field is always displayed in theme_user_name() whatever happens, and by default, it's also the name that people use to log in (which will need to happen on all existing Drupal sites that upgrade anyway).

Add a new 'login-name' field - which is lowercased in the database to save the bad LOWER() queries we currently have. This is very similar to: http://drupal.org/node/83738 which only dealt with the performance aspects.

Add an admin option for users to specify separate login and display names. When this happens, there's two fields on registration user/edit instead of one, and they're populated separately instead of from the same field. We populate the column even if the option isn't switched on (to fix the LOWER() issue), the difference is you'd be able to edit them separately in the UI instead of one field. In fact that'd be pretty much the only difference.

login names are forced to be unique - since we do LOWER() on usernames this ought to work with existing data. Admins get options for the following:

1. Allow users to change their login name AND/OR display name (if enabled) so two permissions.
2. Specify whether 'display names' should be unique (if display names are enabled).

If you disable display names on a live site, you're warned that all display names will be deleted from the database and that usernames will be displayed instead (which'd in code terms would be update users set display_name = login_name

We might need to keep the login name 'as typed' somewhere (a third column, maybe another table since it'd only be used very rarely) - this would make the 'switch display names off' behaviour less destructive, for example - so if I'm Dries, and I make a display name Dries Buytaert, it get's set back to Dries instead of dries when the admin disables display names).

Log in or register to post comments

Comment #194

klando CreditAttribution: klando commented 30 May 2008 at 12:05

I agree with catch.

Except (perhaps) the update users set display_name = login_name, I am not enought confident with that part of the code, but it sound better to me that the admin choice is only done at php level (what to display ? login | username).

Log in or register to post comments

Comment #195

catch

he/him

English

CreditAttribution: catch commented 1 July 2008 at 15:11

I have a patch for taxonomy over at http://drupal.org/node/277209 using mb_strtolower() which appears to be working fine. This approach ought to be applicable to user.module if it's any good.

Log in or register to post comments

Comment #196

Brumisek CreditAttribution: Brumisek commented 27 February 2009 at 11:01

Any glue for 5.x version of Drupal? I have big site with Drupal 5.x and exactly THIS problem is reason why my site is slow. I'm try find any ideas how to take it out, but here are patchs only for 7.0 version :( Could you anybody help me?

Log in or register to post comments

Comment #197

ThomasH CreditAttribution: ThomasH commented 10 March 2009 at 10:42

Version:

7.x-dev

» 5.2

File	Size
lower.patch	5.67 KB

Drupal 5 patch attached... hope it useful for someone!

Log in or register to post comments

Comment #198

catch

he/him

English

CreditAttribution: catch commented 10 March 2009 at 11:21

Version:

5.2

» 7.x-dev

Back to 7.x.

Log in or register to post comments

Comment #199

mikhailian CreditAttribution: mikhailian commented 27 March 2009 at 11:49

I am appying this kind of patches for a few years already, from 4.x times (I am running 6.x now).

Just wondering when this LOWER() nonsense finally disappears from the Drupal codebase.

Log in or register to post comments

Comment #200

mfer CreditAttribution: mfer commented 27 March 2009 at 14:18

@mikhailian This will change in drupal when someone writes a patch that's a generic solution (not just MySQL). Care to take a shot at it?

Log in or register to post comments

Comment #201

catch

he/him

English

CreditAttribution: catch commented 27 March 2009 at 19:47

Status:

Needs work

» Closed (duplicate)

There's a more up-to-date patch not requiring trigger/SUPER permissions at #279851: Replace LOWER() with db_select() and LIKE() where possible - just needs a re-roll and some benchmarking. Was leaving this open since it's a different solution, but not been much activity here for a while so closing this as duplicate.