Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
A client just brought up an interesting point; there is never any need to SELECT * FROM users WHERE uid=0, is there? Likewise, updating the login and data columns for uid is equally senseless. Could we spare ourselves all the uid=0 queries, perhaps by checking for uid=0 in user_load? (I haven't searched to see if there are any current implementations of these ideas... I'm just looking for feedback).
Comment | File | Size | Author |
---|---|---|---|
#36 | uid0.patch.txt | 648 bytes | robertDouglass |
#28 | sessions-user_0.patch | 4.7 KB | robertDouglass |
#26 | pluggablesessions_5.patch | 5.92 KB | robertDouglass |
#22 | sessions-user.patch | 3.91 KB | robertDouglass |
#15 | session-user_1.patch | 2.81 KB | robertDouglass |
Comments
Comment #1
robertDouglass CreditAttribution: robertDouglass commentedIn particular, it looks like this code from sessions.inc could be improved:
$user = new stdClass(); $user->uid = 0;
do the trick?That would save a query per page load if it worked.
Comment #2
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedfunny, I thought about the same today. Yes, we need the select since we store session data in the sessions table. Things like language switches rely on this.
Comment #3
robertDouglass CreditAttribution: robertDouglass commentedSorry, I didn't understand that. I know we need to select from the sessions table for whatever sid we get, but do we ever need to select from the users table for uid = 0, and why?
Do you think my idea for the "SELECT FROM sessions LEFT JOIN users" would work?
Comment #4
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedOops, yeeah, we probably don't need that.
Comment #5
robertDouglass CreditAttribution: robertDouglass commentedSweet! It works. Three queries condensed into one.
Comment #6
robertDouglass CreditAttribution: robertDouglass commentedThere is likewise no need to select uid = 0 in user_load. With this patch, I just build an object, call the user_module_invoke('load'...), and return it. This saves two queries when anonymous users access non-cached pages.
Comment #7
m3avrck CreditAttribution: m3avrck commentedI'm a bit tired looking at the patch, but can't the bottom of that patch change that second IF to an ELSE IF to prevent the query building there?
Otherwise the patch looks great, knocking off one query page is a nice performance boost :)
Comment #8
robertDouglass CreditAttribution: robertDouglass commentedThe $user object gets returned before then, so it never gets to that.
Comment #9
Dries CreditAttribution: Dries commentedFor the anonymous user we still unpack $user?
I think we might be able to refactor sess_read() some more -- we'd need less if-checks.
The change to user_load is a bit icky.
Comment #10
robertDouglass CreditAttribution: robertDouglass commentedApologies if this patch is weird in any way... it was resisting making a nice patch and I had to tweak it by hand.
Comment #11
robertDouglass CreditAttribution: robertDouglass commentedthis one is better.
Comment #12
Dries CreditAttribution: Dries commentedOn my test setup, this patch improves performance by 4% (when serving cached pages to anonymous users). That's a substantial improvement. :-)
When serving non-cached pages to anonymous users, performance improves by
There are two small improvements left for user_read():
1. Change
if (!$user = ...)
toif ($user = ...)
by switching things around. This eliminate a '!'.2. The last line of sess_read() can be eliminated
return !empty($user->session) ? $user->session : '';
. Just do the returns in the if-else clauses. This saves an '!empty()' test in the critical path.Also, anonymous users can have session information too (eg. to store comment display settings) ... Would that still work after your patch? Mmm. Your patch seems to assume that when session information is found, it must be from an authenticated user ... or something. Please double-check.
Comment #13
Dries CreditAttribution: Dries commentedWhen serving non-cached pages to anonymous users, performance improves by ... 1%. ;-)
Comment #14
robertDouglass CreditAttribution: robertDouglass commentedMade the code changes.
The sessions work just the same way they did before. The initial query in sess_read is this:
For an anonymous user, it returns all the information from the sessions table and nothing from the users table by virtue of being a left join. I tested the comment settings and they work as expected.
Comment #15
robertDouglass CreditAttribution: robertDouglass commentedpatch
Comment #16
Bart Jansens CreditAttribution: Bart Jansens commentedDo we really need a left join? All users, including uid 0, exist in the users table.
I think we can improve session handling a bit further. When no session ID is sent by the client, it is generated by PHP, but we still check if it exists in the database (twice even). We can skip this step because we already know that no session exists. Aside from performing better in ab-benchmarks, it also saves a few queries for the first request made by a client and more important, crawlers. On many sites crawlers are responsible for a significant percentage of all requests (like it or not).
pseudo code:
- if (empty($_COOKIE[session_name()]) : no session ID was sent by the client. Create the simple uid 0 $user object.
- else try loading it from the database. If a session is found, return that user object
- else, (client sent a session id that no longer exists, probably expired) return the simple uid 0 object
We can add a similar optimisation to sess_write as well:
- if (empty($_COOKIE[session_name()] && empty($value)) : do nothing
That way we can serve pages to crawlers without a single database query - assuming nothing was written to $_SESSION.
I'm not sure about combining the queries though, IIRC this used to be a single query in the past, needs some investigating, it was a long time ago ;)
Comment #17
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI am too curious why you replyca an inner by a left join. left joins are usually more expensive and also usually not needed.
Comment #18
Dries CreditAttribution: Dries commentedBart: great idea! That will help a lot with agressive crawlers. :-)
Comment #19
Bart Jansens CreditAttribution: Bart Jansens commentedThe patch that split the query was http://drupal.org/node/44947
Maybe we could do "select * from session where key=X" and only when uid > 0, "select * from user where uid = X" and merge the two results. But i'm no DB expert, i don't know which is faster
Comment #20
robertDouglass CreditAttribution: robertDouglass commentedSmart people! Great suggestions. If nobody beats me to it I'll re-roll this tomorrow. I'm looking forward to Dries putting it through his performance testing environment.
Comment #21
Dries CreditAttribution: Dries commentedBart: will sess_read() be called with $key = NULL/0/empty string? We'll want to test that.
Comment #22
robertDouglass CreditAttribution: robertDouglass commenteddoes the above
Comment #23
Dries CreditAttribution: Dries commentedI believe the sess_write() changes are incorrect. We still need to update the access time in the user table, or things like the "who's online" block stop working properly.
I removed the changes to sess_write() and benchmarked this patch. The result is a 13% speed-up for cached serving pages!
Comment #24
robertDouglass CreditAttribution: robertDouglass commentedI believe that the code is correct (or at least it is as correct as the existing code). If you run it with the debugging code added, you can verify that the first request (no cookie) goes into the first "if", and would otherwise not go into the second "if", thus the query in between is superfluous:
The question is, in the first if, do we want to already write to the session table? I think not. What would we write, after all?
Comment #25
drummdrupal_anonymous_user() will need some function documentation.
Comment #26
robertDouglass CreditAttribution: robertDouglass commentedChanged the syntax for the parameter validation to something less vertical.
Comment #27
robertDouglass CreditAttribution: robertDouglass commented!*%$ wrong patch for this issue.... ignore the last one.
Comment #28
robertDouglass CreditAttribution: robertDouglass commentedI added documentation to drupal_anonymous_user and moved it to bootstrap.inc. It doesn't belong in session.inc because that contradicts my goal of making our session handling pluggable (because any implementing session API would have to recreate drupal_anonymous_user as well). So it seems it belongs in bootstrap.
Comment #29
robertDouglass CreditAttribution: robertDouglass commentedComment #30
Dries CreditAttribution: Dries commentedTested the last version of this patch (unmodified) and performance increases by 20% when serving cached pages, and by 2% when serving non-cached pages. That's big! :)
We should shif focus to non-cached pages now...
Comment #31
moshe weitzman CreditAttribution: moshe weitzman commentedi think the name column for uid=0 is used to give anonymous user a unique name: 'anonymous coward, anonymous hero, etc.' we had a UI for this but i can't find it right now. are we ditching this feature? not sure if we can store this in variables without incurring performance penalty.
Comment #32
Dries CreditAttribution: Dries commentedThere is still a setting for the anonymous user name. If you can't find it, we should move it. Where did you look for it initially, or where did you expect to find it?
IIRC, we added the uid=0 column so we can do inner joins rather than left joins.
Comment #33
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedWe indeed added uid = 0 for doing inner joins which are faster. The name and all the other fields are not used. We did have a lenghty discussion on the matter, but we never switched away from using the variables we have for this.
Comment #34
moshe weitzman CreditAttribution: moshe weitzman commentedthe first place i looked was admin/user/settings ... we should set this name in drupal_anonymous_user(), right? then we can remove the variable_get() from theme_username() and anywhere else
Comment #35
Dries CreditAttribution: Dries commentedCommitted this to CVS HEAD, partially.
I didn't commit the user.module part because there is no good reason to load the anonymous user. Modules that load the anonymous user through user_load() are b0rked.
Marking this 'code needs work' as Moshe's suggestions might need further follow-up.
Comment #36
robertDouglass CreditAttribution: robertDouglass commentedif modules loading the anonymous user through user_load are b0rked, then user.module is guilty of the crime.
Comment #37
m3avrck CreditAttribution: m3avrck commentedLooks good to me?
Comment #38
robertDouglass CreditAttribution: robertDouglass commentedComment #39
drummCommitted to HEAD.
Comment #40
(not verified) CreditAttribution: commented