comment block sql very slow with large numbers of comments [#69111]

Comment	File	Size	Author
#67	comments_speedup_0.patch	2.76 KB	robertDouglass

#65	comments_speedup.patch	2.77 KB	robertDouglass

#62	comment_speedup_10.diff	2.43 KB	pwolanin

#58	comment_speedup_9a.patch	2.46 KB	pwolanin

#57	comment_speedup_9.patch	2.46 KB	pwolanin

#51	comment_speedup_8a.patch	2.48 KB	pwolanin

#50	comment_speedup_8.patch	2.48 KB	pwolanin

#46	comment_speedup_7.patch	2.46 KB	pwolanin

#42	comment_speedup_6.patch	1.91 KB	pwolanin

#39	comment_speedup_5.patch	2.29 KB	robertDouglass

#31	comment_speedup_4.patch	1.86 KB	pwolanin

#29	comment_speedup_3.patch	1.85 KB	pwolanin

#27	comment_speedup_2.patch	1.85 KB	pwolanin

#25	comment_speedup_1.patch	1.3 KB	chx

#24	comment_speedup_0.patch	1.3 KB	chx

#23	comment_speedup.patch	3.72 KB	chx

#8	comment_query_patch_47_8.txt	1.47 KB	pwolanin
#1	database_6.patch	330 bytes	Jax

Comment #1

Jax CreditAttribution: Jax commented 2 August 2006 at 17:25

Status:

Active

» Needs review

File	Size
database_6.patch	330 bytes

The only useful key would be status. So the proposed patch adds a key on status just like it's done for the nodes table.

Log in or register to post comments

Comment #2

Wesley Tanaka CreditAttribution: Wesley Tanaka commented 2 August 2006 at 18:18

Status:

Needs review

» Needs work

status, if i remember right, is the flag that tells you if a comment is published or not. I imagine in most cases, that wouldn't be particularly selective (almost all comments would be published) in which case the database should correctly decide to ignore the key and do the full table scan anyway...

right?

Log in or register to post comments

Comment #3

Jax CreditAttribution: Jax commented 2 August 2006 at 20:21

Do you have any reference documentation that explains why and when a full table scan would be desired? I wasn't aware that that ever would perform better.

Log in or register to post comments

Comment #4

pwolanin CreditAttribution: pwolanin commented 2 August 2006 at 20:58

Does this only display one comment per node, or really the 10 most recent comments?

Could this query be faster to do in two parts:

1) limit the query by finding the 10 nodes with the most recent comment timestamp

2) use these nid values to speed the query of comments.

I'm a little fuzzy on the mysql syntax, since there are notations like:

KEY node_comment_timestamp (last_comment_timestamp)

in the node_comment_statistics table.

Here's the original code (with linebreaks for readibility):

  $result = db_query_range(db_rewrite_sql('SELECT c.nid, c.subject, c.cid, c.timestamp FROM {comments} c 
INNER JOIN {node} n ON n.nid = c.nid WHERE n.status = 1 AND c.status = %d 
ORDER BY c.timestamp DESC', 'c'), COMMENT_PUBLISHED, 0, 10);

Anyhow, how about something like (leaving off the db_rewrite_sql calls for simplicity):


$result1=db_query_range("SELECT nid from {node_comment_statistics} ORDER BY last_comment_timestamp DESC",0, 10)

$recent_nids=array();

while ($cnode=db_fetch_object($result1)) {
  $recent_nids[]=$cnode->nid;
}

$result2= db_query_range('SELECT c.nid, c.subject, c.cid, c.timestamp FROM {comments} c 
INNER JOIN {node} n ON n.nid = c.nid WHERE c.nid IN ('.implode(','$recent_nids).') 
AND n.status = 1 AND c.status = %d ORDER BY c.timestamp DESC',COMMENT_PUBLISHED,0,10);

etc.

Maybe this could also be written as a compound query?

Log in or register to post comments

Comment #5

Wesley Tanaka CreditAttribution: Wesley Tanaka commented 2 August 2006 at 21:31

> Do you have any reference documentation that explains why and when a full table scan would be desired?

web search for index selectivity

Log in or register to post comments

Comment #6

pwolanin CreditAttribution: pwolanin commented 4 August 2006 at 18:35

I tested the code below and it seems to work, but I don't have enough comments to tell if there's a speed difference:

$result1=db_query_range(db_rewrite_sql("SELECT n.nid from {node_comment_statistics} n WHERE n.comment_count > 0 ORDER BY n.last_comment_timestamp DESC"),0, 15);

$recent_nids=array();

while ($cnode=db_fetch_object($result1)) {
  $recent_nids[]=$cnode->nid;
}

$result2= db_query_range('SELECT c.nid, c.subject, c.cid, c.timestamp FROM {comments} c INNER JOIN {node} n ON n.nid = c.nid WHERE c.nid IN ('.implode(',',$recent_nids).') AND n.status = 1 AND c.status = %d ORDER BY c.timestamp DESC',COMMENT_PUBLISHED,0,10);

while ($comment=db_fetch_object($result2)) {
   print $comment->subject."<br>";
}

Log in or register to post comments

Comment #7

Wesley Tanaka CreditAttribution: Wesley Tanaka commented 5 August 2006 at 00:52

Would you be willing to upload that in patch form?

Log in or register to post comments

Comment #8

pwolanin CreditAttribution: pwolanin commented 5 August 2006 at 01:15

File	Size
comment_query_patch_47_8.txt	1.47 KB

Ok, here's a patch against 4.7. Seems to work (at least in as much as the recent comments are displayed).

Log in or register to post comments

Comment #9

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 27 September 2006 at 13:28

Did anybody test this?

I think adding an index on c.status is probably better.

Log in or register to post comments

Comment #10

pwolanin CreditAttribution: pwolanin commented 27 September 2006 at 15:09

@killes - I don't think having an index an c.status will help at all, since most comments wil have the same status.

Log in or register to post comments

Comment #11

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 27 September 2006 at 15:24

well, in any case we'd need some benchmarks.

Log in or register to post comments

Comment #12

pwolanin CreditAttribution: pwolanin commented 29 September 2006 at 16:42

do you know of anyone who has a backup site suitable for testing this? What's the most appropriate way to benchmark this?

Log in or register to post comments

Comment #13

pwolanin CreditAttribution: pwolanin commented 30 September 2006 at 00:05

looking at the exisintg query, maybe it would help to put a new index on c.timestamp?

Log in or register to post comments

Comment #14

pwolanin CreditAttribution: pwolanin commented 30 September 2006 at 01:41

quick test with ~455 nodes and ~4600 comments generated by the devel module.

Local install, MySQL 5.0, PHP 5.1.2, Apache 2.0,

the current core version takes ~120-185 ms to run after a new comment is added.

Substituting the code at the theme level (see below), the 2 queries run in about 3-4 ms each, with the longest I saw of about of 6-7 ms each - so total query time is reduced ~10-fold.

for testing purposes, you can just substitute my version at the theme level:

function phptemplate_comment_block() {
  $result1 = db_query_range(db_rewrite_sql("SELECT n.nid from {node_comment_statistics} n WHERE n.comment_count > 0 ORDER BY n.last_comment_timestamp DESC"),0, 10);
  
  $recent_nids = array();
  
  while ($cnode = db_fetch_object($result1)) {
    $recent_nids[] = $cnode->nid;
  }  
  $result = db_query_range('SELECT c.nid, c.subject, c.cid, c.timestamp FROM {comments} c INNER JOIN {node} n ON n.nid = c.nid WHERE c.nid IN ('.implode(',',$recent_nids).') AND n.status = 1 AND c.status = %d ORDER BY c.timestamp DESC',COMMENT_PUBLISHED,0,10);
  while ($comment = db_fetch_object($result)) {
    $items[] = l($comment->subject, 'node/'. $comment->nid, NULL, NULL, 'comment-'. $comment->cid) .'<br />'. t('%time ago', array('%time' => format_interval(time() - $comment->timestamp)));
  }
  return theme('item_list', $items);
}

Log in or register to post comments

Comment #15

pwolanin CreditAttribution: pwolanin commented 30 September 2006 at 01:44

also, an index on c.timestamp didn't seem to make a difference.

Log in or register to post comments

Comment #16

pwolanin CreditAttribution: pwolanin commented 1 October 2006 at 23:10

Version:	4.7.2	» 4.7.3
Status:	Needs work	» Needs review

since there is an apparent marked gain in speed, this at leat deserves a review ;-)

Log in or register to post comments

Comment #17

FiReaNGeL CreditAttribution: FiReaNGeL commented 4 October 2006 at 05:56

Try the following and report back. Before, add a status|timestamp index (in the same index) for the comments table. I think its better than the original one in terms of speed. Don't have a huge database to test it on though. Also, I don't know if pgsql / mysql 4.0 support this kind of query (i think they should, but we better check).

SELECT DISTINCT (k.nid), k.subject, k.cid, k.timestamp
FROM node n, (
SELECT c.nid, c.cid, c.subject, c.timestamp
FROM comments c
WHERE c.status =0
ORDER BY c.timestamp DESC
) AS k

WHERE k.nid = n.nid
AND n.status =1
LIMIT 10 ;

Log in or register to post comments

Comment #18

FiReaNGeL CreditAttribution: FiReaNGeL commented 4 October 2006 at 05:59

Little follow up. If we assume that the last 1000 comments are in at least 10 separate nodes (not a wild assumption), we can do

EXPLAIN SELECT distinct(k.nid), k.subject, k.cid, k.timestamp FROM node n, (SELECT c.nid, c.cid, c.subject, c.timestamp from comments c where c.status = 0 order by c.timestamp DESC LIMIT 1000) as k WHERE k.nid = n.nid AND n.status = 1 LIMIT 10;

(Notice the additional limit 1000 in the subselect), which is way, way faster than my other query. We could set this limit to 2000, or even 5000 for safety purposes and get fast performance.

Log in or register to post comments

Comment #19

Dries CreditAttribution: Dries commented 12 October 2006 at 19:57

Unfortunately, we can't really rely on assumptions like that. There are sites that might do more comments per node.

Log in or register to post comments

Comment #20

FiReaNGeL CreditAttribution: FiReaNGeL commented 12 October 2006 at 20:09

Yes, I know. But 5000 concurrent, back-to-back comments in less than 10 nodes? And even if that somehow happens, you'll get only those in the 'most recently commented on' block, which would be accurate. But I understand its a less than perfect solution (althought the performance gain from it on large site is crazy, as only the small 5000 comments table need to be file sorted, instead of the whole comments table).

Will try to find something better, but I'll probably use this version on my site. Thanks!

Log in or register to post comments

Comment #21

FiReaNGeL CreditAttribution: FiReaNGeL commented 12 October 2006 at 20:10

BTW, my query in comment 17 should get benchmarked compared to the original one, im pretty sure its better and doesnt make any crazy assumptions :)

Log in or register to post comments

Comment #22

IncrediblyKenzi CreditAttribution: IncrediblyKenzi commented 18 November 2006 at 19:06

We applied the patch in comment #8 on a site with 300,000 comments in the DB.

The original query was causing some significant backlog on the mysql side and a subsequent "too many connections" error from MySQL itself. After applying the patch, the site has been purring along nicely.

Thanks for the help!

Log in or register to post comments

Comment #23

chx CreditAttribution: chx commented 18 November 2006 at 19:56

Version:	4.7.3	» 5.x-dev
Status:	Needs review	» Reviewed & tested by the community

File	Size
comment_speedup.patch	3.72 KB

Then let this get in. I rerolled #8 against HEAD and did nothing else.

Log in or register to post comments

Comment #24

chx CreditAttribution: chx commented 18 November 2006 at 19:58

File	Size
comment_speedup_0.patch	1.3 KB

*SIGH* I have a sloppy day.

Log in or register to post comments

Comment #25

chx CreditAttribution: chx commented 18 November 2006 at 23:38

File	Size
comment_speedup_1.patch	1.3 KB

Wow. How many style errors there can be in five lines? Apparently, tons. I think I catched them all now.

Log in or register to post comments

Comment #26

Dries CreditAttribution: Dries commented 21 November 2006 at 20:36

Status:

Reviewed & tested by the community

» Needs work

Needs a code comment or two.

Log in or register to post comments

Comment #27

pwolanin CreditAttribution: pwolanin commented 22 November 2006 at 03:34

Status:

Needs work

» Needs review

File	Size
comment_speedup_2.patch	1.85 KB

added code comments

Log in or register to post comments

Comment #28

RobRoy CreditAttribution: RobRoy commented 22 November 2006 at 04:56

Status:

Needs review

» Needs work

Comment should be in "This is a sentence form." so

- The header comment should end with a period.
- "//select" should be "// Select" and that should end with a period too, same with the next one.

Log in or register to post comments

Comment #29

pwolanin CreditAttribution: pwolanin commented 22 November 2006 at 12:56

Status:

Needs work

» Needs review

File	Size
comment_speedup_3.patch	1.85 KB

Ok, fixed comment formatting.

Log in or register to post comments

Comment #30

Jax CreditAttribution: Jax commented 22 November 2006 at 14:11

You forgot the space in the comment after the // :)

Log in or register to post comments

Comment #31

pwolanin CreditAttribution: pwolanin commented 22 November 2006 at 14:20

File	Size
comment_speedup_4.patch	1.86 KB

doh!

Log in or register to post comments

Comment #32

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 16:24

If you use the IN syntax, you need to make sure to only do the query if there are actual items to be selected from. Otherwise you generate a SQL error because of IN ().

Why not select into a temporary table?

Log in or register to post comments

Comment #33

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 16:53

I tested #31. The results were not better with 50 nodes and 500 comments.

Before the patch

* comment #313
  9 min 9 sec ago
* comment #330
  9 min 9 sec ago
* comment #331
  9 min 9 sec ago
* comment #332
  9 min 9 sec ago
* comment #333
  9 min 9 sec ago
* comment #334
  9 min 9 sec ago
* comment #335
  9 min 9 sec ago
* comment #336
  9 min 9 sec ago
* comment #337
  9 min 9 sec ago
* comment #338
  9 min 9 sec ago

Total time: 21.153
Average: 2.115
Standard Deviation: 0.083
Count: 10
Function: theme_comment_block
Query: SELECT c.nid, c.subject, c.cid, c.timestamp FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = D

After the patch

Different comments in a different order are selected:

* comment #499
  12 min 8 sec ago
* comment #484
  12 min 8 sec ago
* comment #439
  12 min 8 sec ago
* comment #426
  12 min 8 sec ago
* comment #356
  12 min 8 sec ago
* comment #337
  12 min 8 sec ago
* comment #272
  12 min 8 sec ago
* comment #478
  12 min 8 sec ago
* comment #359
  12 min 8 sec ago
* comment #267
  12 min 8 sec ago

There are two queries to track, and the first one runs slower than the original query.

Total time: 22.446
Average: 2.245
Standard Deviation: 0.063
Count: 10
Function: theme_comment_block
Query: SELECT c.nid, c.subject, c.cid, c.timestamp FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE c.nid IN (D

Total time: 3.777
Average: 0.378
Standard Deviation: 0.024
Count: 10
Function: theme_comment_block
Query: SELECT n.nid from node_comment_statistics n WHERE n.comment_count > D

Conclusion

This is an important issue to solve but I'm not convinced this is the solution. Why are different comments selected and in a different order? Was the original query wrong?

I'll investigate further and look into using a temporary table as well.

Log in or register to post comments

Comment #34

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 16:55

Status:

Needs review

» Needs work

Log in or register to post comments

Comment #35

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 17:41

I did some more testing with fewer comments (40) and the performance gap widens:

unpatched, the average time is: 0.883
patched, the average time (of the two queries added together) is: 1.631

I was unable to recreate the order problems I had in the first test (where I used devel module to generate the comments), so it is possible that it won't happen under normal circumstances.

Log in or register to post comments

Comment #36

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 17:52

At 20,000 comments, the patched version starts to show an advantage:

unpatched average time: 332.26
patched average time: 68.156

The patched version is around 5 times faster.

Log in or register to post comments

Comment #37

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 18:01

Note that my approach to testing has been inherently flawed in favor of the path because I've been adding the sum of the database queries. The path introduces an extra PHP loop, so I should be testing the return time of the function itself.

Log in or register to post comments

Comment #38

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 18:14

Status:

Needs work

» Reviewed & tested by the community

I don't think db_query_temporary is capable of querying a range, so I'm giving up on trying the same approach with a temporary table. In all, I think the performance gain on larger sites is worth the performance hit on smaller sites. RTBC.

Log in or register to post comments

Comment #39

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 18:18

File	Size
comment_speedup_5.patch	2.29 KB

Oops, I meant to upload this updated version which protects against running the second query (with the dangerous IN()) if there are no comments.

Log in or register to post comments

Comment #40

pwolanin CreditAttribution: pwolanin commented 22 November 2006 at 18:19

@RobertDouglas- the short answer is that I haven't put a lot of effort into perfecting this code, since it's essentially just the first thing that came into my head. Also, I'm not an SQL expert by any stretch.

Ok, just did too much reading on mysql.com and postgresql.org. By temporary table, do you mean like this?

CREATE TEMPORARY TABLE cnids AS SELECT n.nid from `node_comment_statistics` n WHERE n.comment_count > 0 ORDER BY n.last_comment_timestamp DESC LIMIT 10

I think this syntax works for both servers.

~~Is this used anywhere in core?~~ To answer my own question, the DB layer provides db_query_temporary() for making temporary tables, and this is used by the search module.

Log in or register to post comments

Comment #41

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 18:19

Status:

Reviewed & tested by the community

» Needs review

Log in or register to post comments

Comment #42

pwolanin CreditAttribution: pwolanin commented 22 November 2006 at 18:39

File	Size
comment_speedup_6.patch	1.91 KB

looks like we were posting at the same time.

wouldn't it be simpler like the attached?

Log in or register to post comments

Comment #43

robertDouglass CreditAttribution: robertDouglass commented 22 November 2006 at 21:51

@pwolanin: yeah, search is the reason why we got db_query_temporary(), but I am very disappointed that it doesn't seem to be able to support the LIMIT 0, 10 part. As for your alternate flow structure, I'll leave it to the core team to decide which syntax they like best. Some people like to see the returns at the end of the function.

Any more people test this? It'd be good to get more confirmation that the set and order of comments shown in the block are the same as before.

Log in or register to post comments

Comment #44

pwolanin CreditAttribution: pwolanin commented 22 November 2006 at 21:54

Well, we can open a Drupal 6.x feature request for an implementation of db_query_range_temporary(). I'm guessing there is some difference among SQL variants so that db_query and db_query_range need to be kept separate.

Log in or register to post comments

Comment #45

drumm

he/him

NY, US

CreditAttribution: drumm commented 27 November 2006 at 01:01

Status:

Needs review

» Needs work

Since no themes in contrib seem to implement theme_comment_block(), lets go ahead and take all SQL out of that themeable function while we are here.

Log in or register to post comments

Comment #46

pwolanin CreditAttribution: pwolanin commented 27 November 2006 at 01:24

Status:

Needs work

» Needs review

File	Size
comment_speedup_7.patch	2.46 KB

Ok, all SQL moved to a reusable helper function.

Log in or register to post comments

Comment #47

robertDouglass CreditAttribution: robertDouglass commented 27 November 2006 at 19:47

I like this last patch the best! I can imagine doing things like keyword analysis on those last 10 comments and theming the block totally differently, whereas before, it was a pointless theme function.

Log in or register to post comments

Comment #48

chx CreditAttribution: chx commented 27 November 2006 at 19:52

Idea is cool. But recent_comments break Drupal namespacing rules , it must start with comment_

Log in or register to post comments

Comment #49

pwolanin CreditAttribution: pwolanin commented 27 November 2006 at 20:08

Status:

Needs review

» Needs work

ok, for a function name- what would you prefer?

comment_get_recent()
comment_find_recent()
comment_recent_comments()
comment_recent()

looks like X_get_X is pretty common in core now.

Log in or register to post comments

Comment #50

pwolanin CreditAttribution: pwolanin commented 27 November 2006 at 23:59

Status:

Needs work

» Needs review

File	Size
comment_speedup_8.patch	2.48 KB

ok, pciked comment_get_recent()- patch attached. A question, however. Would it make this function more useful to return all the fields form the comment table, rather than just the 4 currently selected?

Log in or register to post comments

Comment #51

pwolanin CreditAttribution: pwolanin commented 28 November 2006 at 00:05

File	Size
comment_speedup_8a.patch	2.48 KB

like this- all the fields

Log in or register to post comments

Comment #52

Wesley Tanaka CreditAttribution: Wesley Tanaka commented 28 November 2006 at 07:24

Selecting more fields is slower than selecting just the fields you need

Log in or register to post comments

Comment #53

m3avrck CreditAttribution: m3avrck commented 4 December 2006 at 06:09

Note, I just applied patch #8 (nice and simple) to fix my 4.7 site that was very sluggish.

Log in or register to post comments

Comment #54

robertDouglass CreditAttribution: robertDouglass commented 4 December 2006 at 08:42

@m3avrck: did you consider #51? It is the one that is currently being considered for core inclusion.

Log in or register to post comments

Comment #55

m3avrck CreditAttribution: m3avrck commented 4 December 2006 at 08:47

I had a look over but I need a 4.7 specific fix and wasn't sure of the state of #51 in being applied to 4.7 I had been referred to this thread to apply patch #8 specifically since my referrer was already using it on a 4.7 site :-)

Seems like the patch in 51 should be even more efficient.

Log in or register to post comments

Comment #56

pwolanin CreditAttribution: pwolanin commented 4 December 2006 at 14:21

Based on the comment in #52, the #50 patch should be faster than the #51. Both are for HEAD, but are essentially the same code as above for 4.7, just refactored into a separate function.

Log in or register to post comments

Comment #57

pwolanin CreditAttribution: pwolanin commented 8 December 2006 at 00:59

File	Size
comment_speedup_9.patch	2.46 KB

Based on the comments, I think this is RTBC (anyone?). I prefer the last patch (#51), since it makes the new function much more general and useful. Since the bottleneck was the query, I don't think returning all the data for 10 rows will cause a measurable speed difference (actual benchmarking appreciated).

Updated patch from #51 attached that applies without offset to current HEAD.

Log in or register to post comments

Comment #58

pwolanin CreditAttribution: pwolanin commented 8 December 2006 at 01:01

File	Size
comment_speedup_9a.patch	2.46 KB

doh- typo in the comments fixed. No change to actual code.

Log in or register to post comments

Comment #59

Wesley Tanaka CreditAttribution: Wesley Tanaka commented 8 December 2006 at 10:21

On a current real database I have, running the c.* query 10000 times took 29 second, and running the query which just pulls the 4 columns 10000 times 16 seconds (that was piping output to a file)

Piping output to /dev/null (which I imagine is the best possible case for the c.* query, since allocating memory in PHP and shuffling the data around for all the extra columns will take extra time), the c.* query took 25 seconds and the 4 column query took 15 seconds.

The difference in speed will change depending on how large the extra columns are (like the comment body, etc)

Maybe the new function could be called "comment_get_recent_headers" or something, if that would match the "getting summary data only" functionality better?

Log in or register to post comments

Comment #60

Wesley Tanaka CreditAttribution: Wesley Tanaka commented 8 December 2006 at 10:47

Minor thing: instead of

$recent = comment_get_recent();
foreach ($recent as $comment) {

you could do:

foreach (comment_get_recent() as $comment) {

Log in or register to post comments

Comment #61

robertDouglass CreditAttribution: robertDouglass commented 8 December 2006 at 10:47

Status:

Needs review

» Reviewed & tested by the community

I'm convinced that the 4 column approach is better. I like this patch. Let's get it in before RC.

Log in or register to post comments

Comment #62

pwolanin CreditAttribution: pwolanin commented 8 December 2006 at 13:23

File	Size
comment_speedup_10.diff	2.43 KB

Ok, attached patch against current HEAD, returns only the 4 columns, fixes type in code comment, and utilzes Wesley's suggestion for compacting the code.

Log in or register to post comments

Comment #63

Dries CreditAttribution: Dries commented 8 December 2006 at 15:41

Status:

Reviewed & tested by the community

» Needs work

But, but ... if I understand the performance results, the benefits are marginal?

- from should be FROM, in that SQL query.
- I'd rename $cnode to $comment.
- Indentation of the PHPdoc looks awkward.
- I think we need to add a little paragraph explaining why we do this in two queries.
- Long code comments should be wrapped.

Log in or register to post comments

Comment #64

pwolanin CreditAttribution: pwolanin commented 8 December 2006 at 16:43

@Dries - the last benchmarking was between two versions of this patch NOT between the current code and patch code. So, the marginal difference in recent comments is for returning only 4 fields from the comment table row, as opposed to returning the entire row. As for the patch speed see #36:

At 20,000 comments, the patched version starts to show an advantage:

unpatched average time: 332.26
patched average time: 68.156

The patched version is around 5 times faster.

More benchmarking would certainly be good. I'll work on the other issues.

Log in or register to post comments

Comment #65

robertDouglass CreditAttribution: robertDouglass commented 8 December 2006 at 17:55

Status:

Needs work

» Needs review

File	Size
comments_speedup.patch	2.77 KB

Patch rerolled according to Dries' suggestions. I'm also benchmarking.

Log in or register to post comments

Comment #66

pwolanin CreditAttribution: pwolanin commented 8 December 2006 at 18:13

Status:

Needs review

» Needs work

two minor points:

1) I personally find the re-use of $comments potentially confusing. I'd rather see $cnode be renamed to $node (or something-node), since actually what we are selecting in the first step are the nodes with the most recent comments.

2) in my code, I think I should not have removed the initialization $items = array(), since otherwise we get the non-E_ALL test if (NULL)

Log in or register to post comments

Comment #67

robertDouglass CreditAttribution: robertDouglass commented 8 December 2006 at 18:27

File	Size
comments_speedup_0.patch	2.76 KB

agree.

Log in or register to post comments

Comment #68

robertDouglass CreditAttribution: robertDouglass commented 8 December 2006 at 19:15

Status:

Needs work

» Needs review

Benchmarks. I tested with ab against the front page with only the recent comments block enabled.

0 nodes, 0 comments

no patch
Requests per second: 42.03 [#/sec] (mean)
Requests per second: 42.36 [#/sec] (mean)
Requests per second: 42.23 [#/sec] (mean)

with patch
Requests per second: 42.13 [#/sec] (mean)
Requests per second: 42.26 [#/sec] (mean)
Requests per second: 42.58 [#/sec] (mean)

5 nodes, 20 comments

no patch
Requests per second: 29.18 [#/sec] (mean)
Requests per second: 29.53 [#/sec] (mean)
Requests per second: 28.55 [#/sec] (mean)

with patch
Requests per second: 27.66 [#/sec] (mean)
Requests per second: 28.80 [#/sec] (mean)
Requests per second: 29.44 [#/sec] (mean)

100 nodes, 1000 comments

no patch
Requests per second: 13.48 [#/sec] (mean)
Requests per second: 13.86 [#/sec] (mean)
Requests per second: 13.74 [#/sec] (mean)

with patch
Requests per second: 13.73 [#/sec] (mean)
Requests per second: 13.68 [#/sec] (mean)
Requests per second: 13.44 [#/sec] (mean)

200 nodes, 2000 comments

no patch
Requests per second: 8.20 [#/sec] (mean)
Requests per second: 8.10 [#/sec] (mean)
Requests per second: 8.04 [#/sec] (mean)

with patch
Requests per second: 12.26 [#/sec] (mean)
Requests per second: 12.28 [#/sec] (mean)
Requests per second: 12.15 [#/sec] (mean)
Requests per second: 12.59 [#/sec] (mean)

2000 nodes, 10000 comments

no patch
Requests per second: 2.12 [#/sec] (mean)
Requests per second: 2.12 [#/sec] (mean)
Requests per second: 2.08 [#/sec] (mean)

with patch
Requests per second: 9.90 [#/sec] (mean)
Requests per second: 9.50 [#/sec] (mean)
Requests per second: 9.70 [#/sec] (mean)

Log in or register to post comments

Comment #69

pwolanin CreditAttribution: pwolanin commented 8 December 2006 at 19:49

Status:

Needs review

» Reviewed & tested by the community

fantastic.

Log in or register to post comments

Comment #70

ChrisKennedy CreditAttribution: ChrisKennedy commented 8 December 2006 at 20:15

My host doesn't have ab, but I eventually figured out how to insert 3400 comments into my test site (only a few nodes). When viewing the front page with comments and devel blocks enabled the patch caused the average execution time to drop by about 50%.

Log in or register to post comments

Comment #71

drumm

he/him

NY, US

CreditAttribution: drumm commented 19 December 2006 at 09:03

Status:

Reviewed & tested by the community

» Fixed

Committed to HEAD.

Log in or register to post comments

Comment #72

pwolanin CreditAttribution: pwolanin commented 19 December 2006 at 15:30

Version:	5.x-dev	» 4.7.x-dev
Status:	Fixed	» Patch (to be ported)

thanks- maybe for 4.7 too?

Log in or register to post comments

Comment #73

chx CreditAttribution: chx commented 21 December 2006 at 00:47

This is an API change. I doubt this needs porting.

Log in or register to post comments

Comment #74

chx CreditAttribution: chx commented 21 December 2006 at 01:07

Status:

Patch (to be ported)

» Reviewed & tested by the community

As ChrisKennedy points out, no modules or themes will break just they won't get faster. And if someone wants to use this function in his 4.7 theme in such a way that's compatible with every 4.7 then it's still possible to do:

if (!function_exists('comment_get_recent')) {
  function _mytheme_comment_get_recent( // here comes the comment_get_recent from 4.7.5
}

function phptemplate_comment_block() {
  $function = function_exists('comment_get_recent') ? 'comment_get_recent' : '_mytheme_comment_get_recent';
  foreach ($function() as $comment) {
....
}

Log in or register to post comments

Comment #75

Zen CreditAttribution: Zen commented 21 December 2006 at 15:23

Version:	4.7.x-dev	» 5.x-dev
Status:	Reviewed & tested by the community	» Needs work

IMO, this speedup patch is incomplete as the first node_comment_statistics query does not use a key.. A compound index for comment_count and last_comment_timestamp ought to be added and an appropriate update will be required.

It would also be nice if core committers decide on a best practice when it comes to dealing with queries using IN ().. Some of you insist that _all_ (including internal) input be vetted using %ds, while others do not care in cases where the input is assuredly safe.

chx: I don't see a 4.7 patch. Which patch are you RTBC-ing?

Thanks,
-K
P.S IMO, however, the recent comment block should live up to its name and only show _recent_ comments (based on a configurable period of time) and not _only_ the last X comments. This would likely make it possible to retrieve everything in one (probably) efficient set of JOINS.

Log in or register to post comments

Comment #76

Wesley Tanaka CreditAttribution: Wesley Tanaka commented 22 December 2006 at 09:42

recent comment block should live up to its name and only show _recent_ comments (based on a configurable period of time) and not _only_ the last X comments.

I disagree with this -- it sounds like a case of changing a UI because of implementation details.

The "latest N comments" behavior is better because the block is the same size even if there has been a recent spate of comments or a recent dearth.

Log in or register to post comments

Comment #77

pwolanin CreditAttribution: pwolanin commented 22 December 2006 at 21:13

@Zen I don't understand your comment- the table node_comment_statistics has defined KEY node_comment_timestamp (last_comment_timestamp).

Log in or register to post comments

Comment #78

FiReaNGeL CreditAttribution: FiReaNGeL commented 23 December 2006 at 00:17

pwolanin : He means a compound index on both keys at the same time.

Log in or register to post comments

Comment #79

pwolanin CreditAttribution: pwolanin commented 25 December 2006 at 02:46

Ok, I see- I guess the real question is whether a compound index would make a difference in real life? Maybe it would for sites with many new nodes where few of those nodes have comments.

Log in or register to post comments

Comment #80

pwolanin CreditAttribution: pwolanin commented 22 January 2007 at 20:02

Version:

5.x-dev

» 6.x-dev

I just discovered there is a minor problem with the code committed to 5.x/HEAD:

This query:

  $result = db_query_range(db_rewrite_sql("SELECT n.nid FROM {node_comment_statistics} n WHERE n.comment_count > 0 ORDER BY n.last_comment_timestamp DESC"), 0, $number);

Actually should (I think) be written something like:

  $result = db_query_range(db_rewrite_sql("SELECT nc.nid FROM {node_comment_statistics} nc WHERE nc.comment_count > 0 ORDER BY nc.last_comment_timestamp DESC", 'nc'), 0, $number);

to conform to the API for db_rewrite_sql

Otherwise, this causes SQL errors in modules that implement hook_db_rewrite_sql: http://drupal.org/node/111421