node_load_multiple() currently joins on the user table to get name, picture etc. for users. This is crufty. Eventually we should consider a user reference field for author, but since user reference isn't in core yet, that's a long way off.

So to start with I think we should remove the join from node_load_multiple() and add user_node_load() instead.

This means one extra query per page vs. the join. No measurable performance impact.

HEAD (front page, 10 nodes):
9.47 / 9.83 [#req/sec]

Patch:
9.73 / 9.66 [#req/sec]

CommentFileSizeAuthor
#8 user_node_load.patch2.14 KBcatch
user_node_load.patch2.15 KBcatch
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

modularity++

Frando’s picture

We could also give the node_load_multiple query a tag and let user.module alter the query to include the fields. Would save the additional query while providing the same level of modularity, right?

catch’s picture

Status: Reviewed & tested by the community » Needs review

So if we add a tag, then we've got four ways to attach things to the node object in node_load_multiple():

1. hook_query_alter()
2. hook_load()
3. field_attach_load()
4. hook_node_load()

That's not necessarily a bad thing, but it's lots.

Also this reminded me of yhahn's blog post about litenode, where I see you suggested the same thing.

One major caveat about query rewriting for me would be this:

One important thing regarding that is Big sets make for slow joins

The last and most interesting problem is that while in raw speed the litenode query time is much faster than the regular node load view on small sets, on large sets (we have 13,000+ nodes on our intranet) the total query time can become slower. As the set you query increases in size, the several joins that Views makes to retrieve all the litenode fields slow down.

See also: http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-w...

It looks like this isn't restricted to poorly indexed queries but is more about buffers etc. so if query rewriting becomes a general option for adding stuff to nodes, we should probably get a better handle on where the various trade offs are. It's not that unusual for Drupal sites to have a few hundred thousand users and 1 million+ nodes but generating a dataset like that to compare with is a bit of a pain.

Marking back to CNR for more discussion.

moshe weitzman’s picture

I prefer the approach in this patch (remove join, add separate users query). the join approach can kill performance if you need to query on conditions in multiple tables. for example, if you need to load all published nodes for non blocked users you are in temp table hell with mysql. the multiple queries resolves this problem.

Adding a query tag is probably still a good idea, but lets not use it in core.

moshe weitzman’s picture

I think this is RTBC, but catch can undo his needs review change when he is ready.

catch’s picture

Status: Needs review » Reviewed & tested by the community

I guess I can mark this RTBC since I unmarked it before.

Let's add the tag in another issue.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
2.14 KB

Straight re-roll.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed. Thanks.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.