A client just brought up an interesting point; there is never any need to SELECT * FROM users WHERE uid=0, is there? Likewise, updating the login and data columns for uid is equally senseless. Could we spare ourselves all the uid=0 queries, perhaps by checking for uid=0 in user_load? (I haven't searched to see if there are any current implementations of these ideas... I'm just looking for feedback).

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

robertDouglass’s picture

In particular, it looks like this code from sessions.inc could be improved:


function sess_read($key) {drupal_set_message("getting a normal session");
  global $user;

  // retrieve data for a $user object
  $result = db_query("SELECT sid FROM {sessions} WHERE sid = '%s'", $key);
  if (!db_num_rows($result)) {
    $result = db_query("SELECT u.* FROM {users} u WHERE u.uid = 0");
  }
  else {
    $result = db_query("SELECT u.*, s.* FROM {users} u INNER JOIN {sessions} s ON u.uid = s.uid WHERE s.sid = '%s'", $key);
  }
  • Do we really need to SELECT where u.uid = 0? Wouldn't $user = new stdClass(); $user->uid = 0; do the trick?
  • Couldn't we do the third query as a LEFT join from sessions to users and use that query both to see if the client has a session AND if they have a user account?

That would save a query per page load if it worked.

Gerhard Killesreiter’s picture

funny, I thought about the same today. Yes, we need the select since we store session data in the sessions table. Things like language switches rely on this.

robertDouglass’s picture

Sorry, I didn't understand that. I know we need to select from the sessions table for whatever sid we get, but do we ever need to select from the users table for uid = 0, and why?

Do you think my idea for the "SELECT FROM sessions LEFT JOIN users" would work?

Gerhard Killesreiter’s picture

Oops, yeeah, we probably don't need that.

robertDouglass’s picture

Category: support » bug
Status: Active » Needs review
FileSize
1.18 KB

Sweet! It works. Three queries condensed into one.

robertDouglass’s picture

FileSize
2.07 KB

There is likewise no need to select uid = 0 in user_load. With this patch, I just build an object, call the user_module_invoke('load'...), and return it. This saves two queries when anonymous users access non-cached pages.

m3avrck’s picture

I'm a bit tired looking at the patch, but can't the bottom of that patch change that second IF to an ELSE IF to prevent the query building there?

Otherwise the patch looks great, knocking off one query page is a nice performance boost :)

robertDouglass’s picture

The $user object gets returned before then, so it never gets to that.

Dries’s picture

For the anonymous user we still unpack $user?

   $user = drupal_unpack($user);

I think we might be able to refactor sess_read() some more -- we'd need less if-checks.

The change to user_load is a bit icky.

robertDouglass’s picture

FileSize
2.71 KB

Apologies if this patch is weird in any way... it was resisting making a nice patch and I had to tweak it by hand.

robertDouglass’s picture

FileSize
2.7 KB

this one is better.

Dries’s picture

On my test setup, this patch improves performance by 4% (when serving cached pages to anonymous users). That's a substantial improvement. :-)

When serving non-cached pages to anonymous users, performance improves by

There are two small improvements left for user_read():

1. Change if (!$user = ...) to if ($user = ...) by switching things around. This eliminate a '!'.

2. The last line of sess_read() can be eliminated return !empty($user->session) ? $user->session : '';. Just do the returns in the if-else clauses. This saves an '!empty()' test in the critical path.

Also, anonymous users can have session information too (eg. to store comment display settings) ... Would that still work after your patch? Mmm. Your patch seems to assume that when session information is found, it must be from an authenticated user ... or something. Please double-check.

Dries’s picture

When serving non-cached pages to anonymous users, performance improves by ... 1%. ;-)

robertDouglass’s picture

Made the code changes.

The sessions work just the same way they did before. The initial query in sess_read is this:

SELECT u. * , s. *
FROM sessions s
LEFT JOIN users u ON s.uid = u.uid
WHERE s.sid = /* php session id */
LIMIT 0 , 30

For an anonymous user, it returns all the information from the sessions table and nothing from the users table by virtue of being a left join. I tested the comment settings and they work as expected.

robertDouglass’s picture

FileSize
2.81 KB

patch

Bart Jansens’s picture

Do we really need a left join? All users, including uid 0, exist in the users table.

I think we can improve session handling a bit further. When no session ID is sent by the client, it is generated by PHP, but we still check if it exists in the database (twice even). We can skip this step because we already know that no session exists. Aside from performing better in ab-benchmarks, it also saves a few queries for the first request made by a client and more important, crawlers. On many sites crawlers are responsible for a significant percentage of all requests (like it or not).

pseudo code:
- if (empty($_COOKIE[session_name()]) : no session ID was sent by the client. Create the simple uid 0 $user object.
- else try loading it from the database. If a session is found, return that user object
- else, (client sent a session id that no longer exists, probably expired) return the simple uid 0 object

We can add a similar optimisation to sess_write as well:
- if (empty($_COOKIE[session_name()] && empty($value)) : do nothing

That way we can serve pages to crawlers without a single database query - assuming nothing was written to $_SESSION.

I'm not sure about combining the queries though, IIRC this used to be a single query in the past, needs some investigating, it was a long time ago ;)

killes@www.drop.org’s picture

I am too curious why you replyca an inner by a left join. left joins are usually more expensive and also usually not needed.

Dries’s picture

Bart: great idea! That will help a lot with agressive crawlers. :-)

Bart Jansens’s picture

The patch that split the query was http://drupal.org/node/44947

Maybe we could do "select * from session where key=X" and only when uid > 0, "select * from user where uid = X" and merge the two results. But i'm no DB expert, i don't know which is faster

robertDouglass’s picture

Smart people! Great suggestions. If nobody beats me to it I'll re-roll this tomorrow. I'm looking forward to Dries putting it through his performance testing environment.

Dries’s picture

Bart: will sess_read() be called with $key = NULL/0/empty string? We'll want to test that.

robertDouglass’s picture

FileSize
3.91 KB

does the above

Dries’s picture

I believe the sess_write() changes are incorrect. We still need to update the access time in the user table, or things like the "who's online" block stop working properly.

I removed the changes to sess_write() and benchmarked this patch. The result is a 13% speed-up for cached serving pages!

robertDouglass’s picture

I believe that the code is correct (or at least it is as correct as the existing code). If you run it with the debugging code added, you can verify that the first request (no cookie) goes into the first "if", and would otherwise not go into the second "if", thus the query in between is superfluous:

function sess_write($key, $value) {
  global $user;

  // If the client doesn't have a session, and one isn't being created ($value), do nothing.
  if (empty($_COOKIE[session_name()]) && empty($value)) {watchdog('session', '0');
     return TRUE;
  }

  $result = db_queryd("SELECT sid FROM {sessions} WHERE sid = '%s'", $key);

  if (!db_num_rows($result)) {watchdog('session', '1');
    // Only save session data when when the browser sends a cookie. This keeps
    // crawlers out of session table. This improves speed up queries, reduces
    // memory, and gives more useful statistics. We can't eliminate anonymous
    // session table rows without breaking throttle module and "Who's Online"
    // block.
    if ($user->uid || $value || count($_COOKIE)) {watchdog('session', '2');
      db_queryd("INSERT INTO {sessions} (sid, uid, cache, hostname, session, timestamp) VALUES ('%s', %d, %d, '%s', '%s', %d)", $key, $user->uid, $user->cache, $_SERVER["REMOTE_ADDR"], $value, time());
    }
  }
  else {watchdog('session', '3');
    db_queryd("UPDATE {sessions} SET uid = %d, cache = %d, hostname = '%s', session = '%s', timestamp = %d WHERE sid = '%s'", $user->uid, $user->cache, $_SERVER["REMOTE_ADDR"], $value, time(), $key);

    // TODO: this can be an expensive query. Perhaps only execute it every x minutes. Requires investigation into cache expiration.
    if ($user->uid) {watchdog('session', '4');
      db_queryd("UPDATE {users} SET access = %d WHERE uid = %d", time(), $user->uid);
    }
  }

  return TRUE;
}

The question is, in the first if, do we want to already write to the session table? I think not. What would we write, after all?

drumm’s picture

Status: Needs review » Needs work

drupal_anonymous_user() will need some function documentation.

robertDouglass’s picture

FileSize
5.92 KB

Changed the syntax for the parameter validation to something less vertical.

robertDouglass’s picture

!*%$ wrong patch for this issue.... ignore the last one.

robertDouglass’s picture

FileSize
4.7 KB

I added documentation to drupal_anonymous_user and moved it to bootstrap.inc. It doesn't belong in session.inc because that contradicts my goal of making our session handling pluggable (because any implementing session API would have to recreate drupal_anonymous_user as well). So it seems it belongs in bootstrap.

robertDouglass’s picture

Status: Needs work » Needs review
Dries’s picture

Status: Needs review » Reviewed & tested by the community

Tested the last version of this patch (unmodified) and performance increases by 20% when serving cached pages, and by 2% when serving non-cached pages. That's big! :)

We should shif focus to non-cached pages now...

moshe weitzman’s picture

i think the name column for uid=0 is used to give anonymous user a unique name: 'anonymous coward, anonymous hero, etc.' we had a UI for this but i can't find it right now. are we ditching this feature? not sure if we can store this in variables without incurring performance penalty.

Dries’s picture

There is still a setting for the anonymous user name. If you can't find it, we should move it. Where did you look for it initially, or where did you expect to find it?

IIRC, we added the uid=0 column so we can do inner joins rather than left joins.

killes@www.drop.org’s picture

We indeed added uid = 0 for doing inner joins which are faster. The name and all the other fields are not used. We did have a lenghty discussion on the matter, but we never switched away from using the variables we have for this.

moshe weitzman’s picture

the first place i looked was admin/user/settings ... we should set this name in drupal_anonymous_user(), right? then we can remove the variable_get() from theme_username() and anywhere else

Dries’s picture

Status: Reviewed & tested by the community » Needs work

Committed this to CVS HEAD, partially.

I didn't commit the user.module part because there is no good reason to load the anonymous user. Modules that load the anonymous user through user_load() are b0rked.

Marking this 'code needs work' as Moshe's suggestions might need further follow-up.

robertDouglass’s picture

Status: Needs work » Needs review
FileSize
648 bytes

if modules loading the anonymous user through user_load are b0rked, then user.module is guilty of the crime.

m3avrck’s picture

Looks good to me?

robertDouglass’s picture

Status: Needs review » Reviewed & tested by the community
drumm’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD.

Anonymous’s picture

Status: Fixed » Closed (fixed)