Tracker performance improvements [#105639]

Comment	File	Size	Author
#269	tracker.patch	27.99 KB	catch

#266	tracker_105639.patch	28.08 KB	drewish

#264	tracker_105639.patch	27.8 KB	drewish

#263	tracker_105639.patch	24.55 KB	drewish

#262	tracker_105639.patch	24.55 KB	drewish

#261	tracker_105639.patch	24.66 KB	drewish

#258	tracker_105639.patch	25.63 KB	drewish

#255	tracker_105639.patch	25.84 KB	drewish

#253	tracker_105639.patch	27.33 KB	drewish

#250	tracker_105639.patch	20.79 KB	drewish

#245	tracker.patch	25.88 KB	catch

#244	tracker.patch	25.98 KB	catch

#243	tracker.patch	5.57 KB	catch

#241	tracker.patch	25.98 KB	catch

#240	tracker.patch	25.45 KB	catch

#238	tracker.patch	25.81 KB	catch

#236	tracker.patch	20.24 KB	catch

#234	tracker_24.patch	24.95 KB	Berdir

#233	tracker_23.patch	25.17 KB	Berdir

#231	tracker_22.patch	25.18 KB	Berdir

#229	tracker_21.patch	25.18 KB	Berdir

#225	tracker_20.patch	24.97 KB	Berdir

#221	tracker_18.patch	21.82 KB	Berdir

#211	tracker.patch	19.52 KB	catch

#209	tracker.patch	18.32 KB	catch

#207	tracker.patch	13.35 KB	catch

#205	tracker.patch	19.18 KB	catch

#201	tracker_9.patch	19.09 KB	swentel

#198	tracker.patch	19.28 KB	catch

#194	tracker2.txt	19.55 KB	swentel
#178	tracker-devel.txt	472 bytes	catch
#177	tracker_union_all.patch	4.36 KB	catch

#171	tracker_union_all_0.patch	4.11 KB	catch

#169	tracker_union_all.patch	4.11 KB	catch

#167	tracker_union_all.patch	4.09 KB	catch

#163	tracker_union_reroll.patch	4.08 KB	catch

#159	tracker-union-3.patch	4.07 KB	kbahey

#152	tracker-union-2.patch	3.88 KB	kbahey

#144	tracker-union.patch	3.37 KB	kbahey

#126	tracker_patch_5-1_1.patch	8.61 KB	David Strauss

#124	tracker_patch_5-1_0_1.patch	8.61 KB	catch

#123	tracker_patch_5-1_0_0.patch	8.61 KB	catch

#121	tracker_patch_5-1_0.patch	8.71 KB	David Strauss

#120	tracker_patch_5-1.patch	8.65 KB	David Strauss

#117	tracker_update.patch	6.4 KB	David Strauss

#86	tracker_5.patch	9.35 KB	webchick

#69	split_tracker_queries_2.diff	5.99 KB	David Strauss

#68	split_tracker_queries_1.diff	5.96 KB	David Strauss

#66	split_tracker_queries_0.diff	5.12 KB	David Strauss

#65	split_tracker_queries.diff	5.14 KB	David Strauss

#53	tracker_last_updated.patch.txt	10.7 KB	dww
#25	tracker_query_25.patch	4.77 KB	pwolanin

#17	tracker_query_17.patch	4.54 KB	pwolanin

#7	tracker_query.patch.txt	3.1 KB	mindless

Comment #1

ChrisKennedy CreditAttribution: ChrisKennedy commented 31 December 2006 at 21:05

That would be handy - my one Views patch simulates such a column: http://drupal.org/node/89421

Log in or register to post comments

Comment #2

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 15 January 2007 at 21:36

Great suggestion and sure useful. However, the new table belongs into node_comment_statistics. Maybe we could rename it to node_statistics.

Log in or register to post comments

Comment #3

dww

we/he/they

CreditAttribution: dww commented 15 January 2007 at 21:56

why? then we still have the additional INNER JOIN. is the thought that comments get added faster than node updates, so we'd get into insert/update locking hell again or something? if this value belongs in {node_statistics} (good name, if that's what it's turning into) what about {node}.changed itself? just a little unclear on the logic for using the separate table, when one of the goals was avoiding the INNER JOIN itself.

thanks,
-derek

Log in or register to post comments

Comment #4

geodaniel CreditAttribution: geodaniel commented 23 January 2007 at 16:47

Just a note that the INNER JOIN to {node_comment_statistics} is also causing issues in 4.7.x and 5.x when the comment module is not enabled: #110556.

Log in or register to post comments

Comment #5

pwolanin CreditAttribution: pwolanin commented 19 February 2007 at 19:45

With 5.x and multiple cache table, why (for 6.x at least) can't the result cached?

On d.o, the tracker page takes a long time to load. However, maybe cache_clear_all() gets called too often for caching the tracker to be of benefit unless there's a minimum cache lifetime?

Log in or register to post comments

Comment #6

pwolanin CreditAttribution: pwolanin commented 25 February 2007 at 16:36

Priority:

Normal

» Critical

The performance of the tracker on d.o has been terrible recently...

Log in or register to post comments

Comment #7

mindless CreditAttribution: mindless commented 3 April 2007 at 21:59

File	Size
tracker_query.patch.txt	3.1 KB

Here's another idea, perhaps a first step on the way to a more involved change. This one doesn't require any db change.. it removes the inner join on users table and queries for the needed usernames afterwards. Much faster on our system with 59k nodes, 227k comments.

Log in or register to post comments

Comment #8

kbahey CreditAttribution: kbahey commented 4 April 2007 at 02:51

@ mindless

Can you please install the devel module and enable the query log and the page generation time, and post the before the patch values, and after the patch?

This will quantify the difference more precisely.

Log in or register to post comments

Comment #9

mindless CreditAttribution: mindless commented 4 April 2007 at 05:26

query time(ms):
tracker/ Before: 5041.671 After: 2828.042
tracker/{uid} Before: 3657.517 After: 1790.737

Log in or register to post comments

Comment #10

kbahey CreditAttribution: kbahey commented 4 April 2007 at 13:11

Status:

Active

» Needs review

You have my +1.

The fact that no schema changes are required makes it +2 for me.

Log in or register to post comments

Comment #11

mindless CreditAttribution: mindless commented 5 April 2007 at 16:41

Forgot to include the additional query for usernames in the numbers.. adds just 22.692 ms to each "After" value above.

Log in or register to post comments

Comment #12

FiReaNGeL CreditAttribution: FiReaNGeL commented 6 April 2007 at 02:00

mindless: its a good improvement, but wouldnt it be better to actually fix the problem, which as dww suggested, requires a database change? 3 seconds query are not what I would call acceptable, even on a 'big' site.

That being said, it's still an improvement. If nobody is up to the task of implementing the required changes (and I'm not such a person, sadly), it's way better than nothing.

Log in or register to post comments

Comment #13

kbahey CreditAttribution: kbahey commented 6 April 2007 at 02:08

something like "updated". we'd need to set it anytime either the node is edited, or a comment is posted. we'd have an index on this.

The issue with this is that now for each comment that is added, you need to update the node table. This means you are incurring a lock and write on the node table.

On busy sites, this can be a bottleneck, since others cannot read the node table until the lock/write is done.

killes has decoupled a column in the users table and moved it to its own table because of that reason. So now we are moving the other way.

Log in or register to post comments

Comment #14

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 7 April 2007 at 07:30

mindess: make sure to propery use the db API and escape input values

Log in or register to post comments

Comment #15

dww

we/he/they

CreditAttribution: dww commented 7 April 2007 at 18:07

Status:

Needs review

» Needs work

right, the patch in #7 would allow SQL injection...

Log in or register to post comments

Comment #16

mindless CreditAttribution: mindless commented 7 April 2007 at 18:19

sorry, i was proposing an idea/concept in the form of a patch, that wasn't intended as a patch for review. anyone feel free to take the idea and go with it..

Log in or register to post comments

Comment #17

pwolanin CreditAttribution: pwolanin commented 8 April 2007 at 17:19

Status:

Needs work

» Needs review

File	Size
tracker_query_17.patch	4.54 KB

Is there really an SQL injection issue? These uid values all came from the DB.

Attached is a patch against the lastest DRUPAL-5, though it should apply fine to HEAD with offset. It's mostly just a rewrite of the patch and approach suggested by mindless. Includes code to escape all the uids in the query and other minor improvements. In general, this seems like a good approach, since at max we will have 25 uids in this second query.

Note, the patch looks bigger than it really is since some if the code was wrapped in an if{} block, and hench there is a chance in indentation.

Log in or register to post comments

Comment #18

pwolanin CreditAttribution: pwolanin commented 14 April 2007 at 17:37

Any thought on this? Improving tracker performance on d.o is really critical.

Log in or register to post comments

Comment #19

FiReaNGeL CreditAttribution: FiReaNGeL commented 15 April 2007 at 02:50

The suggestion dww did is in my opinion the only viable one, long term, and isn't that hard to implement.

Log in or register to post comments

Comment #20

pwolanin CreditAttribution: pwolanin commented 15 April 2007 at 03:31

I assume you mean the suggestion at the top of adding a new column or table? Yes, for Drupal 6, that might be best.

However, please take a look at the attached patch in terms of an immediate improvement that might be applied to Drupal 5. It's similar in spirit to the change in code made to the recent comments block to get a speedup.

Log in or register to post comments

Comment #21

moshe weitzman CreditAttribution: moshe weitzman commented 11 May 2007 at 15:36

this query is now a smoking gun for drupal.org suckiness. please review and RTBC if it makes sense. Would be nice to test with lots of nodes and comments.

Log in or register to post comments

Comment #22

Dries CreditAttribution: Dries commented 11 May 2007 at 16:03

The patch's code looks good, however, I'd like to see two things happen:

1. benchmarking -- does this really have a performance gain? I think this needs to be benchmarked first. So far, we've only been guessing.

2. documentation -- can we document in the code why we use this approach? Novice developers might otherwise think that this can be optimized by merging the two queries into one query. One or two lines of documentation would be sufficient.

Log in or register to post comments

Comment #23

pwolanin CreditAttribution: pwolanin commented 11 May 2007 at 16:44

@Dries - I don't have a setup for benchmarking at this point- can you or Moshe recruit someone with the setup and experience?

#2 I'll address tonight (EDT).

Log in or register to post comments

Comment #24

mindless CreditAttribution: mindless commented 11 May 2007 at 22:16

fwiw, http://gallery.menalto.com/tracker is using this patch. Afraid I didn't do any measurements, but when I put it in place the performance of tracker pages definitely improved. Site has 61k nodes and 232k comments.

Log in or register to post comments

Comment #25

pwolanin CreditAttribution: pwolanin commented 11 May 2007 at 23:20

File	Size
tracker_query_25.patch	4.77 KB

Ok, new patch attached against lastest DRUPAL-5. I added 2 lines of comment, and switch back to the original, simpler, implode for the uids. Since these are ints coming straight out of the DB, it seems silly to run the extra code to escape them. We did the same in comment.module on line 263.

Log in or register to post comments

Comment #26

kbahey CreditAttribution: kbahey commented 13 May 2007 at 02:31

Can't we test this on scratch.d.o for a before and after scenario?

This would give us LOTS of nodes and comments, and users and will tell us definitely if it would help d.o or not, and by how much?

Log in or register to post comments

Comment #27

merlinofchaos CreditAttribution: merlinofchaos commented 13 May 2007 at 16:57

Subscribing

Log in or register to post comments

Comment #28

pwolanin CreditAttribution: pwolanin commented 13 May 2007 at 20:36

ok my uninformed attempt to benchmark this: I inserted 10000 users, 6800 nodes, 18000 comments, then disabled the devel module. Ran some queries to get the right data into {node_comment_statistics} (I think the generate functionality of devel needs work).

Added the line $uid = mt_rand(10000); at the top of the function to avoid any caching of the query results.

Serving from my laptop (2.0 GHz Intel, Mac 10.4, Apache 1.33 with mod_php, PHP 4.4.4, MySQL 4.1) across my local network to another computer running ab (so latency ~1 ms).

Hitting /tracker without the patch (and random uid) I get ~1.05 pages/sec at c=1, or ~0.59/sec at c=5
Hitting /tracker with the patch (and with random uid) I get ~1.20 pages/sec at c=1, or ~0.76/sec at c=5

However, in both cases the number of "failed requests" was >95% of the total (due the "Length" I'm not sure what that means).

So, this should probably be repeated with a larger number of nodes/comments per user. With this ratio, only a few items show up on each user's tracker page.

Log in or register to post comments

Comment #29

FiReaNGeL CreditAttribution: FiReaNGeL commented 14 May 2007 at 05:56

OK fiddled around a little with this tonight. The proposed patch in the above comments is only a temporary fix - code freeze is looming for d6, and i'd rather fix the problem than apply a temporary band-aid.

First of all, simply adding a db column to replace the GREATEST value and index it as dww suggested would help, but isnt sufficient. The second part of the problem lies in the "AND (n.uid = %d OR c.uid = %d)" section of the query; mysql couldnt use an index on the proposed 'updated or commented' column because of the where clause on another table.

This specific where wants to filter on nodes that are A- created by the users or B- in which the user commented.

So lets do an UNION of those things, which is much faster. WARNING: Tons of mysql incoming (even if its just one query).

SELECT DISTINCT (
t.nid
), t.title, t.type, t.changed, t.uid, t.last_updated, t.comment_count, t.name
FROM (
(

SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, GREATEST( n.changed, l.last_comment_timestamp ) AS last_updated, l.comment_count
FROM node n
INNER JOIN node_comment_statistics l ON n.nid = l.nid
INNER JOIN users u ON n.uid = u.uid
LEFT JOIN comments c ON n.nid = c.nid
WHERE n.uid =1
AND n.status =1
ORDER BY last_updated DESC
LIMIT 0 , 30
)
UNION (

SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, GREATEST( n.changed, l.last_comment_timestamp ) AS last_updated, l.comment_count
FROM node n
INNER JOIN node_comment_statistics l ON n.nid = l.nid
INNER JOIN users u ON n.uid = u.uid
LEFT JOIN comments c ON n.nid = c.nid
WHERE c.uid =1
AND (
c.status =0
OR c.status IS NULL
)
AND n.status =1
ORDER BY last_updated DESC
LIMIT 0 , 30
)
) AS t
ORDER BY last_updated DESC
LIMIT 0 , 30

I replaced the %d for testing purposes. On my dev box (16000 nodes, not many comments), this query is 3 times faster and SHOULD give the same results as the other one. With the db change and index dww proposed, this should get much faster.

Can anyone bench this query along the original one and the proposed patch? Also, is UNION ok with postgre? I have no idea. Also, I'm in the process of moving my girlfriend in my app, so I dont have time to code this right now.

Also, I think the proposed 'updated or commented' column shouldnt go in 'node' directly, as kbahey pointed out. Maybe we could move 'changed' from node to its own table along with the new 'updated or commented' column.

Log in or register to post comments

Comment #30

pwolanin CreditAttribution: pwolanin commented 14 May 2007 at 11:44

@FiReaNG3L - as will the recent comments query in comment module, I think there are problems since the limit syntax varies a bit by DB ( which is why we have db_query_range()) as well as support for compound queries.

So, even with the new column it seem likely to be necessary to do the second query as in the patch.

Log in or register to post comments

Comment #31

kbahey CreditAttribution: kbahey commented 15 May 2007 at 01:29

Looking at an explain of this query, I get this:

mysql> explain SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp AS last_post, l.comment_count  FROM node n  INNER JOIN node_comment_statistics l ON n.nid = l.nid  INNER JOIN users u ON n.uid = u.uid  LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL)  WHERE n.status = 1  AND (n.uid = 9575 OR c.uid = 9575)  ORDER BY last_post DESC;
+----+-------------+-------+--------+--------------------+---------+---------+---------------------+------+---------------------------------+
| id | select_type | table | type   | possible_keys      | key     | key_len | ref                 | rows | Extra                           |
+----+-------------+-------+--------+--------------------+---------+---------+---------------------+------+---------------------------------+
|  1 | SIMPLE      | l     | ALL    | PRIMARY            | NULL    | NULL    | NULL                | 8323 | Using temporary; Using filesort |
|  1 | SIMPLE      | n     | eq_ref | PRIMARY,status,uid | PRIMARY | 4       | adsoftheworld.l.nid |    1 | Using where                     |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY            | PRIMARY | 4       | adsoftheworld.n.uid |    1 | Using where                     |
|  1 | SIMPLE      | c     | ref    | lid                | lid     | 4       | adsoftheworld.n.nid |    9 | Using where; Distinct           |
+----+-------------+-------+--------+--------------------+---------+---------+---------------------+------+---------------------------------+

What this tells me is that it is doing a temporary table for the node_comment_statistics table and doing filesort on it, and that should be the slow part.

All we need from this table are the columns last_comment_timestamp and the comment_count. These columns are 1:1 with the node table, and according to normalization rules can (or should) be in the node table itself. Yes, this will cause an UPDATE on the node table every time a new comment is added, or deleted, but it may not be an issue.

So,

1. Can someone with ssh access on d.o do an explain on the query on d.o, and paste the output here, to confirm if the access path and contention is indeed in the node_comment_statistics table?

2. If there are no significant objections, I can go ahead and create a patch for this, and we can test it on scratch.d.o for a definitive yes/no.

Log in or register to post comments

Comment #32

pwolanin CreditAttribution: pwolanin commented 15 May 2007 at 01:50

What about the suggested solution of making this a new column in {node_comment_statistics}? Yes, you'll still have the join, but you could index this new column and so the penalty of sorting a computed value would go away.

I think (from comments above) writing the node table for each comment update is a bad idea- that's why {node_comment_statistics} is a separate table.

Log in or register to post comments

Comment #33

kbahey CreditAttribution: kbahey commented 15 May 2007 at 02:44

Here are the explains again with the correct queries (the first ones were not as per the module).

With the GREATEST and node_comment_statistics:

mysql> explain SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, 
  GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count 
  FROM node n 
  INNER JOIN node_comment_statistics l ON n.nid = l.nid 
  INNER JOIN users u ON n.uid = u.uid 
  LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) 
  WHERE n.status = 1 
  AND (n.uid = 9575 OR c.uid = 9575) 
  ORDER BY last_updated DESC;
+--+------+-----+------+------------------+-------+-----+-------------------+----+-------------------------------+
|id|type  |table|type  |possible_keys     |key    |k_len|ref                |rows|Extra                          |
+--+------+-----+------+------------------+-------+--+----------------------+----+-------------------------------+
| 1|SIMPLE|l    |ALL   |PRIMARY           |NULL   |NULL |NULL               |8323|Using temporary; Using filesort|
| 1|SIMPLE|n    |eq_ref|PRIMARY,status,uid|PRIMARY|4    |adsoftheworld.l.nid|   1|Using where                    |
| 1|SIMPLE|u    |eq_ref|PRIMARY           |PRIMARY|4    |adsoftheworld.n.uid|   1|Using where                    |
| 1|SIMPLE|c    |ref   |lid               |lid    |4    |adsoftheworld.n.nid|   9|Using where; Distinct          |
+--+------+-----+------+------------------+-------+-----+-------------------+----+-------------------------------+
4 rows in set (0.00 sec)

Without the GREATEST and the node_comment_statistics table at all:

mysql> explain SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name 
  FROM node n 
  INNER JOIN users u ON n.uid = u.uid 
  LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) 
  WHERE n.status = 1 
  AND (n.uid = 9575 OR c.uid = 9575) 
  ORDER BY n.changed DESC;
+--+------+-----+------+-------------+-------+-----+-------------------+----+---------------------------------------------+
|id|type  |table|type  |possible_keys|key    |k_len|ref                |rows| Extra                                       |
+--+------+-----+------+-------------+-------+-----+-------------------+----+---------------------------------------------+
| 1|SIMPLE|n    |ref   |status,uid   |status |4    |const              |8237| Using where; Using temporary; Using filesort|
| 1|SIMPLE|u    |eq_ref|PRIMARY      |PRIMARY|4    |adsoftheworld.n.uid|   1| Using where                                 |
| 1|SIMPLE|c    |ref   |lid          |lid    |4    |adsoftheworld.n.nid|   9| Using where; Distinct                       |
+--+------+-----+------+-------------+-------+-----+-------------------+----+---------------------------------------------+
3 rows in set (0.00 sec)

I am getting variances from site to site as to keys and such, but all do a temp and file sort without a where for the first query (the existing one in tracker module).

Can we confirm that this is the same on d.o ... someone with ssh access ... please?

Log in or register to post comments

Comment #34

dww

we/he/they

CreditAttribution: dww commented 15 May 2007 at 06:04

on d.o:

mysql> explain SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, 
    ->   GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count 
    ->   FROM node n 
    ->   INNER JOIN node_comment_statistics l ON n.nid = l.nid 
    ->   INNER JOIN users u ON n.uid = u.uid 
    ->   LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) 
    ->   WHERE n.status = 1 
    ->   AND (n.uid = 9575 OR c.uid = 9575) 
    ->   ORDER BY last_updated DESC;
+-------+--------+----------------------------------+---------+---------+-------+--------+----------------------------------------------+
| table | type   | possible_keys                    | key     | key_len | ref   | rows   | Extra                                        |
+-------+--------+----------------------------------+---------+---------+-------+--------+----------------------------------------------+
| n     | ALL    | PRIMARY,uid,node_status_type,nid | NULL    |    NULL | NULL  | 104980 | Using where; Using temporary; Using filesort |
| l     | eq_ref | PRIMARY                          | PRIMARY |       4 | n.nid |      1 |                                              |
| u     | eq_ref | PRIMARY                          | PRIMARY |       4 | n.uid |      1 | Using where                                  |
| c     | ref    | lid                              | lid     |       4 | n.nid |      4 | Using where; Distinct                        |
+-------+--------+----------------------------------+---------+---------+-------+--------+----------------------------------------------+
4 rows in set (0.00 sec)

and:

mysql> explain SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name 
    ->   FROM node n 
    ->   INNER JOIN users u ON n.uid = u.uid 
    ->   LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) 
    ->   WHERE n.status = 1 
    ->   AND (n.uid = 9575 OR c.uid = 9575) 
    ->   ORDER BY n.changed DESC;
+-------+--------+----------------------+---------+---------+-------+--------+----------------------------------------------+
| table | type   | possible_keys        | key     | key_len | ref   | rows   | Extra                                        |
+-------+--------+----------------------+---------+---------+-------+--------+----------------------------------------------+
| n     | ALL    | uid,node_status_type | NULL    |    NULL | NULL  | 104980 | Using where; Using temporary; Using filesort |
| u     | eq_ref | PRIMARY              | PRIMARY |       4 | n.uid |      1 | Using where                                  |
| c     | ref    | lid                  | lid     |       4 | n.nid |      4 | Using where; Distinct                        |
+-------+--------+----------------------+---------+---------+-------+--------+----------------------------------------------+
3 rows in set (0.00 sec)

Log in or register to post comments

Comment #35

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 15 May 2007 at 10:51

@Fireangel: Your Union query takes about the same time as the original one :( It also only returned 6 results.

@kbahey: Exchanging temporary table usage for table locks strikes me as a bad idea.

Log in or register to post comments

Comment #36

pwolanin CreditAttribution: pwolanin commented 15 May 2007 at 13:11

@killes - Do you see any improvement with just the patch in #25?

Log in or register to post comments

Comment #37

kbahey CreditAttribution: kbahey commented 15 May 2007 at 14:32

@dww; thanks

@killes: true, locking is not good, but we don't know how much of that will happen vs. the filesort of 105,000 nodes every time an authenticated users hits tracker.

Also, my suggestion does not avoid the GREATEST problem anyway, which causes the filesort.

Seems the most promising solution (from a performance point of view) is to have a new derived column with GREATEST precalculated in it, as pwolanin said.

Log in or register to post comments

Comment #38

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 15 May 2007 at 15:04

@pwolanin: Yes, #25 helps bit, but not that much. It brought the query down from 23 to 19 seconds.

Log in or register to post comments

Comment #39

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 15 May 2007 at 15:14

SELECT DISTINCT(t.id) AS nid, t.title, t.type FROM ((SELECT c.nid AS id, n.title, n.type, c.timestamp as timestamp FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE c.uid = 83 AND n.status = 1 ORDER BY timestamp DESC) UNION (SELECT n2.nid AS id, n2.title AS title, n2.type, c.last_comment_timestamp as timestamp FROM node n2 INNER JOIN node_comment_statistics c ON n2.nid = c.nid WHERE n2.uid = 83 AND n2.status = 1 ORDER BY timestamp DESC) UNION (SELECT n3.nid AS id, n3.title AS title, n3.type, n3.changed as timestamp FROM node n3 WHERE n3.uid = 83 AND n3.status = 1 ORDER BY timestamp DESC)) AS t ORDER BY t.timestamp DESC LIMIT 0, 10;

I've been playing ith the above query which brought the execution time down to 16 seconds.

Log in or register to post comments

Comment #40

pwolanin CreditAttribution: pwolanin commented 15 May 2007 at 15:16

@killes- even that much improvement makes it seem worth committing #25 as the basis for further improvement?

Log in or register to post comments

Comment #41

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 15 May 2007 at 15:29

I should mention that my query gets a slightly different set of results. Mainly the order seems to differ.

@pwolanin: It probably won't hurt.

Log in or register to post comments

Comment #42

Dries CreditAttribution: Dries commented 15 May 2007 at 18:31

As the node table and the node_comment_statistics table have a 1-on-1 mapping, we might as well merge both into one table.

Log in or register to post comments

Comment #43

merlinofchaos CreditAttribution: merlinofchaos commented 15 May 2007 at 18:34

Before we merge node_comment_statistics into node, we should check into why we even need to.

If there is a 1::1 mapping, node_comment_statistics should be sorting on its primary key. Why would it need a filesort?

Is the problem just that we're not indexing that table properly?

Log in or register to post comments

Comment #44

magico CreditAttribution: magico commented 15 May 2007 at 18:47

SELECT
  DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name,
  GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count
FROM
  {node} n
  INNER JOIN {users} u ON n.uid = u.uid
  INNER JOIN {node_comment_statistics} l ON n.nid = l.nid
WHERE
  n.status = 1
ORDER BY
  last_updated DESC

Does not need any DISTINCT. There are only INNER JOIN that should return only one row.
Removing DISTINCT should give some speed up.

SELECT
  n.nid, n.title, n.type, n.changed, n.uid, u.name,
  GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count
FROM
  node n
  INNER JOIN users u ON u.uid = n.uid
  INNER JOIN node_comment_statistics l ON l.nid = n.nid
  LEFT JOIN (
    SELECT n.nid
    FROM node n INNER JOIN comments c ON c.nid = n.nid AND (c.status = %d OR c.status IS NULL)
    WHERE n.uid = %d OR c.uid = %d
    GROUP BY n.nid
  ) c ON c.nid = n.nid
WHERE
  n.status = 1
ORDER BY
  last_updated DESC

Removed the DISTINCT; the most expensive query that does a LEFT JOIN with the "comments" was filtered before the final join.

Cannot test with thousands of records, but the query works.

Log in or register to post comments

Comment #45

magico CreditAttribution: magico commented 15 May 2007 at 18:49

It's 2 queries:
* the first should have the DISTINCT removed, no more optimization to do except merge the tables schema
* the second, should be applied like it is

Log in or register to post comments

Comment #46

moshe weitzman CreditAttribution: moshe weitzman commented 15 May 2007 at 18:54

i already tried to merge node_comment_statistics into node but the initial benchmarks were not promising. folks are welcome to keep trying though. see http://drupal.org/node/75122

Log in or register to post comments

Comment #47

merlinofchaos CreditAttribution: merlinofchaos commented 15 May 2007 at 18:56

Removing DISTINCT won't help. node_access will just add it back.

Log in or register to post comments

Comment #48

kbahey CreditAttribution: kbahey commented 15 May 2007 at 18:57

@magico, the DISTINCT is unnecessary, I agree. But even if it is removed, I think that db_rewrite_sql() will add it back (a common problem).

@Dries, I suggested the same in #31 (merging back the fields in the node table). The issue is now that the node table will be updated with every comment post and delete, which *might* cause extra locking. This is the best solution from a schema/normalization point of view, but may create bottlenecks.

@merlinofchaos, The main problem is that we use GREATEST(col1, col2), which must do a filesort of the entire table since it does not know which is the greatest of the two per column, hence the performance problem.

Creating a new column called last_update and making it an index should solve this bottleneck, but that field is now a derived field that depends on other columns. If a module updates the others, but forgets to update the derived field, then we have data inconsistency.

None of the above solutions appeal to me so far.

Log in or register to post comments

Comment #49

magico CreditAttribution: magico commented 15 May 2007 at 19:00

@kbahey: don't forget the second query, that has a different JOIN.

Could you spend some minutes and test both queries, running them at the server (if you have access)?

Thanks

Log in or register to post comments

Comment #50

FiReaNGeL CreditAttribution: FiReaNGeL commented 15 May 2007 at 20:13

@kbahey: 'Creating a new column called last_update and making it an index should solve this bottleneck, but that field is now a derived field that depends on other columns. If a module updates the others, but forgets to update the derived field, then we have data inconsistency.'

I dont see this as a problem. Modules shouldnt directly update node.changed and comment.last_updated_timestamp directly.

@killes: Did you try it as-is? Cause I wrongly put status = 0 (instead of 1) while testing; this could explain why you only got 6 results :) The LIMITs in the subqueries should also go, as this strategy will screw up paging.

I'll rework the UNION query, as I think its key to optimize this beast. The reason is that we want a merge of two different things : recent posts by the user, and posts that the user recently commented in (correct me if im wrong). Im sure we can optimize those two parts separately and merge them, and that it would be faster. If we do both in one query, it cant use indexes effectively because of the "AND (n.uid = %d OR c.uid = %d)" (where clause in two different tables, cant optimize for a ORDER BY).

I also think that joins on users and node_comments_statistics can be LEFT JOINs, because there's a 1-1 mapping and they cant be null.

That being said, the db change (adding a new column) is still needed.

Also, removing the DISTINCT if its useless is needed. We should rework the DB API if its adding it back uselessly, as DISTINCT can be costly.

We should get in the DB change and optimize the query from there.

Log in or register to post comments

Comment #51

dww

we/he/they

CreditAttribution: dww commented 15 May 2007 at 22:11

to clarify, in english, what this query is trying to accomplish:

find all the nodes that the user has either composed or commented in, and sort the list by the "latest activity time", the most recent of either a) the last time a comment was added, or b) when the last edit of the node happened.

for obvious reasons, this "latest activity time" stuff is the killer in this query, which is why way back in the original post, i suggested we just store that timestamp ourselves, update it whenever either the node is edited or a comment is added, have an index on that field, and (probably) see all of our problems go away. i'm fine with renaming {node_comment_statistics} to just {node_statistics} and putting the column in there, to avoid LOCKs on the {node} table for every new comment...

yes, it's "derived" info, in the sense that it's always equal to either {node}.changed or {node_comment_statistics}.last_comment_timestamp. however, last_comment_timestamp is only set in exactly 2 places in all of core: 1) comment_nodeapi('insert') and 2) _comment_update_node_statistics(). {node}.changed is only touched via node_save(). so, there's only 3 places we'd have to change to support this field. no contribs should ever be updating these columns directly, anyway.

i don't want to spend the time on a patch if it's going to be vetoed outright by the D6 core maintainers, but i've always believed this is the simplest and most effective solution to this problem. should we at least try it and see? the patch should be relatively simple (~100 lines, including the schema upgrade), and we could run it on s.d.o for testing. but, if it's doomed from the start, i don't want to put effort into that approach.

Log in or register to post comments

Comment #52

moshe weitzman CreditAttribution: moshe weitzman commented 15 May 2007 at 22:23

@dww - i always thought that was the best solution too. go for it.

the "but it is another derived column" argument is not worth discussing anymore. this problem requires some compromise.

Log in or register to post comments

Comment #53

dww

we/he/they

CreditAttribution: dww commented 16 May 2007 at 00:45

Assigned:

Unassigned

» dww

File	Size
tracker_last_updated.patch.txt	10.7 KB

here's a patch for the new column:
- schema upgrade (coded for both mysql and pgsql).
- changes to update the column correctly in comment.module and node.module.
- changes the tracker query to use the new column instead of GREATEST().

i tested the upgrade path on mysq, but haven't had a chance to test pgsql yet. the schema update is probably more complicated than it should be, since it started out to rename the table to just {node_statistics}, but that seemed like a lot more of a diff than i wanted to mess with in the patch. so, we could probably simplify things in there, but it certainly works as is (and has the benefit of a single query to update all the records, even on mysql 4.0 sites like d.o). ;)

in my testing, this all seems to work great. i don't have a good environment for stress testing and benchmarking, so help with that would be most appreciated.

Log in or register to post comments

Comment #54

FiReaNGeL CreditAttribution: FiReaNGeL commented 16 May 2007 at 03:14

Thanks dww for this great initiative! I manually patched my 5.x install and everything is working #1; the upgrade path worked too.

Sadly I think I have bad news to report; it didnt change a thing on my dev box (performance-wise). For some reason, mysql refuse to use the new index to sort.

I'd be interested in seeing the results on scratch.d.o.
It would also be interesting to have others bench it.

The EXPLAIN changed a bit

id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	u 	ALL 	PRIMARY 	NULL 	NULL 	NULL 	3 	Using temporary; Using filesort
1 	SIMPLE 	n 	ref 	PRIMARY,node_status_type,uid 	uid 	4 	zscience.u.uid 	5894 	Using where
1 	SIMPLE 	l 	eq_ref 	PRIMARY 	PRIMARY 	4 	zscience.n.nid 	1 	 
1 	SIMPLE 	c 	ref 	lid,status 	lid 	4 	zscience.n.nid 	2 	Using where; Distinct

Its strange, cause I thought as dww and others, that an additional column to get rid of the ORDER BY GREATEST would help things. The problem is that mysql cant use it.

From: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

You are joining many tables, and the columns in the ORDER BY are not all from the first non-constant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)

So the problem is that were doing a WHERE on many tables, and mysql can only use one index at a time, so it cannot sort by our new index.

The following query is 3 times faster when compared to the original query (both pre and post dww's patch, with a proper ORDER BY on the new column and without the GREATESTs) and give the same results (on my dev box at least). Its better than my last one which couldnt be paged and gave funky results:

(
SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, GREATEST( n.changed, l.last_comment_timestamp ) AS last_updated, l.comment_count
FROM node n
LEFT JOIN node_comment_statistics l ON n.nid = l.nid
LEFT JOIN users u ON n.uid = u.uid
WHERE n.uid =1
AND n.status =1
)
UNION (

SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, GREATEST( n.changed, l.last_comment_timestamp ) AS last_updated, l.comment_count
FROM node n
LEFT JOIN comments c ON n.nid = c.nid
LEFT JOIN node_comment_statistics l ON n.nid = l.nid
LEFT JOIN users u ON n.uid = u.uid
WHERE c.uid =1
AND n.status =1
AND (
c.status =0
OR c.status IS NULL
)
)
ORDER BY last_updated DESC
LIMIT 0 , 10

And for the record I found out that the DISTINCT was needed (for the cases where you commented in your own node). UNION queries do it by default, so no need to add it back.

Log in or register to post comments

Comment #55

Dries CreditAttribution: Dries commented 16 May 2007 at 11:43

I think dww's approach makes sense. Good analysis.

I'd still like to see this combined with http://drupal.org/node/75122 -- maybe that will make the index work!

Log in or register to post comments

Comment #56

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 16 May 2007 at 14:13

@fireangel: I don't get a performance improvement from your new query.

@dww: can you patch s.d.o?

@Dries: I am afraid #75122 will give us some additional table locks. However, my offer to benchmark it stills stands if somebody updates the patch.

Log in or register to post comments

Comment #57

dww

we/he/they

CreditAttribution: dww commented 16 May 2007 at 17:50

i patched s.d.o and ran the db update (took just over a minute to run, which is pretty good IMHO). sadly, neither http://scratch.drupal.org/tracker nor http://scratch.drupal.org/tracker/46549 seem any faster. :( stupid mysql.

Log in or register to post comments

Comment #58

FiReaNGeL CreditAttribution: FiReaNGeL commented 16 May 2007 at 17:55

dww: yeah it seem consistent with what i saw on my dev box :(

killes: strange; maybe the improvements i see are mysql 5.1 specific, or the size of d.o. db change things. Still, i find it strange and disappointing.

Here's another idea; we could restrict "recent posts" to a month, with links to see the past 6 months, 1 year and since the dawn of (unix) time. At the end of the pager, we could add a link such as 'click here to see more' or something along those lines.

This would definitively be faster. And most of the time, I dont think people are interested in seeing all their post history all the time.

Log in or register to post comments

Comment #59

kbahey CreditAttribution: kbahey commented 16 May 2007 at 20:21

Strange and sad.

I was hoping this would improve things.

:-(

Log in or register to post comments

Comment #60

pwolanin CreditAttribution: pwolanin commented 17 May 2007 at 03:27

Can we combine dww's patch with #25 to avoid the join?

Perhaps also a compound index is needed (at least for MySQL)?

Log in or register to post comments

Comment #61

kbahey CreditAttribution: kbahey commented 17 May 2007 at 03:36

I am trying to contact a MySQL expert for this (why there is no improvement with the latest patch).

Will let everyone know if I find something useful.

Log in or register to post comments

Comment #62

magico CreditAttribution: magico commented 17 May 2007 at 08:13

Can someone instruct me on how can I generate thousand of nodes and comments in an heretogenic way, so I can test my queries above?

Thanks

Log in or register to post comments

Comment #63

dww

we/he/they

CreditAttribution: dww commented 17 May 2007 at 08:53

@magico: install devel.module and devel_generate.module.

Log in or register to post comments

Comment #64

pwolanin CreditAttribution: pwolanin commented 17 May 2007 at 13:37

@dww - I think devel module needs a patch for generate- it was not properly populating the {node_comment_statistics} table for me. I'll assume that the current code is just a refactoring of the 5.x scripts, which saved nodes directly to the DB, rather than calling node_save().

Log in or register to post comments

Comment #65

David Strauss

he/him

CreditAttribution: David Strauss commented 17 May 2007 at 16:16

File	Size
split_tracker_queries.diff	5.14 KB

Here is a patch versus Drupal 5.1. I chose to make it against Drupal 5.1 because Drupal.org desperately needs the update. It can easily be adapted to Drupal 6.

This patch splits the bad tracker queries into two queries: one to get recent nodes and another to get recent comments. The results are intelligently combined in PHP. With this change, the queries can now run in logarithmic time.

Log in or register to post comments

Comment #66

David Strauss

he/him

CreditAttribution: David Strauss commented 17 May 2007 at 16:25

File	Size
split_tracker_queries_0.diff	5.12 KB

I've updated the patch to address a problem where you could only see results from nodes you posted.

Log in or register to post comments

Comment #67

David Strauss

he/him

CreditAttribution: David Strauss commented 17 May 2007 at 16:38

Ignore the patches I've posted so far. I'm working on a few fixes.

Log in or register to post comments

Comment #68

David Strauss

he/him

CreditAttribution: David Strauss commented 17 May 2007 at 20:12

File	Size
split_tracker_queries_1.diff	5.96 KB

Here's my latest patch, produced in cooperation with killes, who helped by testing on scratch.drupal.org.

This patch slightly changes tracker functionality. The code is optimized around showing a reasonably long history of recent comments and node updates. It doesn't show the entire history if it's really long. (If the history is short, then the functionality should be nearly identical.) This trade off should be alright because any site with enough nodes and comments that the limit affects the results probably has performance issues with the current module's implementation.

The following indexes (shown below in MySQL syntax) need to be added:
ALTER TABLE `comments` ADD INDEX `tracker_user` ( `uid` , `status` , `timestamp` );
ALTER TABLE `comments` ADD INDEX `tracker_global` ( `status` , `timestamp` );
ALTER TABLE `node` ADD INDEX `tracker_user` ( `uid` , `status` , `changed` );
ALTER TABLE `node` ADD INDEX `tracker_global` ( `status` , `changed` );

Log in or register to post comments

Comment #69

David Strauss

he/him

CreditAttribution: David Strauss commented 17 May 2007 at 20:33

File	Size
split_tracker_queries_2.diff	5.99 KB

webchick requested that $sanity_limit pull from a variable (even though there won't be an interface exposed).

Log in or register to post comments

Comment #70

Senpai CreditAttribution: Senpai commented 17 May 2007 at 20:51

@ #51 by dww

Why is it necessary for Tracker to display *all* posts and comments by a single user, and then try to sort that mountain of data by most recent activity? Wouldn't it be much more efficient to offer the user a chance to see nodes created by that uid OR comments posted by that uid? I mean no disrespect, but it appears very difficult from a user friendly standpoint to offer a complete list of everything that user has ever done, and lump it into page after page of links. Not to mention that nodes and comments are still two different entities, so why not treat them as such?

What if, since you're going back into Tracker to optimize, you look at splitting the query into two parts? Offer users a tab to see their own comments, or a tab to see their own posts. After all, in all my years of surfing sites, I've never needed to look up all my own comments while searching for content that I created, and I've never needed to see stories I've posted while I was looking for that thread I commented to.

You could cut down on the amount of data retrieved by almost 50% if you went this route. Now, having said that, I have no idea how much it would jack up the Tracker module to attempt this. It might be darn easy, but then again, it could require a complete refactor. I just don't know.

Log in or register to post comments

Comment #71

pwolanin CreditAttribution: pwolanin commented 17 May 2007 at 20:56

I'm not really liking that you run individual queries to get this data:

+  foreach ($results as $node) {
+    // Load these data here (instead of in a JOIN) to only load data for the current page
+    $node->comment_count = db_result(db_query('SELECT comment_count FROM {node_comment_statistics} WHERE nid = %d', $node->nid));
+    if (!isset($names[$node->uid])) {
+      $names[$node->uid] = db_result(db_query('SELECT name FROM {users} WHERE uid = %d', $node->uid));
+    }
+    $node->name = $names[$node->uid];

Wouldn't it be more efficient to run through the array, collect the nid and uid values and get them in 2 queries?

Also, I don't like the naming of the array as $results. Almost everywhere in core $results is used as the name of the variable holding the resource returned from a db_query(). How about calling it something like $tracker or just $rows?

Log in or register to post comments

Comment #72

David Strauss

he/him

CreditAttribution: David Strauss commented 17 May 2007 at 20:59

@Senpai
It was one of the first things I asked killes about. Killes thought that it wouldn't be friendly to users. This whole problem becomes quite trivial if you don't combine comments with nodes. As you can see from my patch, I actually split nodes and comments into two separate queries to get the speed increase.

Log in or register to post comments

Comment #73

David Strauss

he/him

CreditAttribution: David Strauss commented 17 May 2007 at 21:21

Global comments query:
EXPLAIN SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000;
+----+-------------+-------+------+----------------------------------------------------+----------------+---------+-----------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------------------+----------------+---------+-----------------+------+-------------+
| 1 | SIMPLE | c | ref | lid,tracker_global | tracker_global | 1 | const | 359 | Using where |
| 1 | SIMPLE | n | ref | PRIMARY,status,node_status_type,nid,tracker_global | PRIMARY | 4 | fk_portal.c.nid | 1 | Using where |
+----+-------------+-------+------+----------------------------------------------------+----------------+---------+-----------------+------+-------------+

User comments query:
EXPLAIN SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 1 ORDER BY last_updated DESC;
+----+-------------+-------+------+----------------------------------------------------+--------------+---------+-----------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------------------+--------------+---------+-----------------+------+-------------+
| 1 | SIMPLE | c | ref | lid,tracker_user,tracker_global | tracker_user | 5 | const,const | 47 | Using where |
| 1 | SIMPLE | n | ref | PRIMARY,status,node_status_type,nid,tracker_global | nid | 4 | fk_portal.c.nid | 1 | Using where |
+----+-------------+-------+------+----------------------------------------------------+--------------+---------+-----------------+------+-------------+
2 rows in set (0.00 sec)

These are running on my company's portal, but the query plan shouldn't differ elsewhere.

Log in or register to post comments

Comment #74

David Strauss

he/him

CreditAttribution: David Strauss commented 17 May 2007 at 21:27

Global comments query:
EXPLAIN SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000;
+----+-------------+-------+------+----------------------------------------------------+----------------+---------+-----------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------------------+----------------+---------+-----------------+------+-------------+
| 1 | SIMPLE | c | ref | lid,tracker_global | tracker_global | 1 | const | 359 | Using where |
| 1 | SIMPLE | n | ref | PRIMARY,status,node_status_type,nid,tracker_global | PRIMARY | 4 | fk_portal.c.nid | 1 | Using where |
+----+-------------+-------+------+----------------------------------------------------+----------------+---------+-----------------+------+-------------+

Global nodes query:
EXPLAIN SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND 1 ORDER BY last_updated DESC;
+----+-------------+-------+------+----------------------------------------+----------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------+----------------+---------+-------+------+-------------+
| 1 | SIMPLE | n | ref | status,node_status_type,tracker_global | tracker_global | 4 | const | 222 | Using where |
+----+-------------+-------+------+----------------------------------------+----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

User nodes query:
EXPLAIN SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND n.uid = 1 ORDER BY last_updated DESC;
+----+-------------+-------+------+---------------------------------------------------------+--------------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------------------------------------------------+--------------+---------+-------------+------+-------------+
| 1 | SIMPLE | n | ref | status,uid,node_status_type,tracker_user,tracker_global | tracker_user | 8 | const,const | 77 | Using where |
+----+-------------+-------+------+---------------------------------------------------------+--------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)

User comments query:
EXPLAIN SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 1 ORDER BY last_updated DESC;
+----+-------------+-------+------+----------------------------------------------------+--------------+---------+-----------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------------------+--------------+---------+-----------------+------+-------------+
| 1 | SIMPLE | c | ref | lid,tracker_user,tracker_global | tracker_user | 5 | const,const | 47 | Using where |
| 1 | SIMPLE | n | ref | PRIMARY,status,node_status_type,nid,tracker_global | nid | 4 | fk_portal.c.nid | 1 | Using where |
+----+-------------+-------+------+----------------------------------------------------+--------------+---------+-----------------+------+-------------+
2 rows in set (0.00 sec)

@pwolanin

"Wouldn't it be more efficient to run through the array, collect the nid and uid values and get them in 2 queries?"

It would only decrease chatting with the database. If you give MySQL a disjoint set of values to match (like a list of uids), it will basically re-run the query to satisfy each value. Plus, the collection, pre-processing, and post-processing would possibly cancel out the meager reduction in communication with the DB.

Log in or register to post comments

Comment #75

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 17 May 2007 at 21:33

Output from devel module.. values are ms, # of times queries ran, query executed.

drupal,org, without the patch:

/tracker:

37711.93	1	pager_query	SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status = 1 ORDER BY last_updated DESC LIMIT 0, 25

/tracker/24967:

49569.59	1	pager_query	SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 24967 OR c.uid = 24967) ORDER BY last_updated DESC LIMIT 0, 25
1.06	1	node_last_viewed	SELECT timestamp FROM history WHERE uid = '24967' AND nid = 79504

scratch.drupal.org, with the patch:
/tracker:

44.68	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000
115.69	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000
5.2	1	tracker_page	SELECT comment_count FROM node_comment_statistics WHERE nid = 141187
5.49	1	tracker_page	SELECT name FROM users WHERE uid = 23

/tracker/24967:

35.22	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND n.uid = 24967 ORDER BY last_updated DESC LIMIT 0, 1000
36.75	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 24967 ORDER BY last_updated DESC LIMIT 0, 1000
2.7	1	tracker_page	SELECT comment_count FROM node_comment_statistics WHERE nid = 94153
0.64	21	tracker_page	SELECT name FROM users WHERE uid = 24967

Log in or register to post comments

Comment #76

Dries CreditAttribution: Dries commented 17 May 2007 at 21:37

Might be better to run both on the same database (i.e. run both on scratch.drupal.org) -- it might make a big difference.

That said, this seems like a clear winner. I'll try to look at it more closely tomorrow morning. In the mean time, feel free to work on this more. We might be able to squize more out of this.

Log in or register to post comments

Comment #77

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 17 May 2007 at 21:54

Right you are. Here are the numbers from scratch.drupal.org, sans patch, so we're comparing apples to apples:

tracker:

758.58	1	pager_query	SELECT COUNT(n.nid) FROM node n WHERE n.status = 1
16099.19	1	pager_query	SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status = 1 ORDER BY last_updated DESC LIMIT 0, 25

tracker/24967:

17896.58	1	pager_query	SELECT COUNT(DISTINCT(n.nid)) FROM node n LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 24967 OR c.uid = 24967)
22946.22	1	pager_query	SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 24967 OR c.uid = 24967) ORDER BY last_updated DESC LIMIT 0, 25

Log in or register to post comments

Comment #78

FiReaNGeL CreditAttribution: FiReaNGeL commented 17 May 2007 at 22:13

Great improvement! Same strategy I tried to apply intra mysql with my union query, but this is definitively better! Kudos, David :)

Log in or register to post comments

Comment #79

dww

we/he/they

CreditAttribution: dww commented 17 May 2007 at 22:15

i helped webchick get the numbers on s.d.o, and everyone agreed this looks like a vast improvement. feeling bold, i applied the patch on d.o, and the tracker pages clearly speak for themselves. if we want to tweak this and make it even better, great, but d.o is finally usable again, even on the old db hardware. in order to stop the bleeding, i decided to take bold action...

huge thanks to david strauss for this work!

Log in or register to post comments

Comment #80

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 17 May 2007 at 22:20

drupal.org, post-patch:

tracker:

19.44	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000
22.28	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000
29.38	1	tracker_page	SELECT comment_count FROM node_comment_statistics WHERE nid = 144841
11.19	1	tracker_page	SELECT name FROM users WHERE uid = 59521

tracker/24967:

12.27	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND n.uid = 24967 ORDER BY last_updated DESC LIMIT 0, 1000
26.09	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 24967 ORDER BY last_updated DESC LIMIT 0, 1000
8.42	1	tracker_page	SELECT comment_count FROM node_comment_statistics WHERE nid = 79504
5.13	1	tracker_page	SELECT name FROM users WHERE uid = 24967

We owe David Strauss, and his company Four Kitchen Studios who paid to let him work on this all day ;), a great deal of gratitude. If possible, please credit both in the commit message.

Btw, it looks like we can tweak this further by getting rid of the username query in /tracker.

Log in or register to post comments

Comment #81

catch

he/him

English

CreditAttribution: catch commented 17 May 2007 at 23:42

on my site. 15,000 nodes, 176,000 comments, 3,750 users

etch vps, 1gb ram, php5.2, mysql5, apache2.2, apc.

before patch:

/tracker
478.79 1 pager_query SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status = 1 ORDER BY last_updated DESC LIMIT 0, 25

/tracker/11
1021.4 1 pager_query SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 11 OR c.uid = 11) ORDER BY last_updated DESC LIMIT 0, 25
739.03 1 pager_query SELECT COUNT(DISTINCT(n.nid)) FROM node n LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 11 OR c.uid = 11)

after patch

/tracker
415.74 1 tracker_page SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000

/tracker/11
338.51 1 tracker_page SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 11 ORDER BY last_updated DESC LIMIT 0, 1000

And a lot of page hits it's getting served up by mysql query cache (0.35 speeds), not sure if that's relevant but it gave me a shock when I saw it.

Log in or register to post comments

Comment #82

kbahey CreditAttribution: kbahey commented 17 May 2007 at 23:49

Excellent work David!

Silly MySQL got fooled by discrete queries vs. one combined. No filesorts anywhere anymore.

I guess we can return the new database server for a refund now ... :-)

Log in or register to post comments

Comment #83

dww

we/he/they

CreditAttribution: dww commented 18 May 2007 at 00:52

@catch are you sure you added the new indexes that the patch relies on? see comment #68 for details. the patch doesn't (yet) include a schema update to add these indexes via update.php so you have to do it manually.

Log in or register to post comments

Comment #84

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 01:16

@catch

Your results look like you haven't run the index updates.

Log in or register to post comments

Comment #85

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 18 May 2007 at 01:41

Ok, here is David's patch with the following changes:
1) Rolled against HEAD, rather than 5.x.
2) Minor re-formatting/re-wording of comments to fit coding standards
3) Upgrade path. I included this in a tracker.install file, because I wasn't sure if those indexes slowed down the site if you're not using tracker. I also am not sure if we should have tracker.install adding and removing indexes willy-nilly (on uninstall, etc.)... on scratch.drupal.org this locked the site for over a minute.

Remaining issues:
1) (I think) Global tracker should not be doing the username query.
2) There's a small bug.. if you go to http://drupal.org/tracker?page=55, the last entry is a an empty post by Anonymous posted 37 years 20 weeks ago.

I will need help to solve these, and also input from someone who knows databases well on where these index changes should go (system.install or tracker.install).

Log in or register to post comments

Comment #86

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 18 May 2007 at 01:41

File	Size
tracker_5.patch	9.35 KB

Sigh. How about a patch? :)

Log in or register to post comments

Comment #87

Michelle

she/her

English

Wisconsin, USA

CreditAttribution: Michelle commented 18 May 2007 at 02:24

I noticed that my recent posts is now a bit odd... The status update post, for example, is showing that it was last updated 1 day 12 hours ago even though there are 7 new posts since I last read it just a few hours ago. I also seem to remember there being other more recent posts at the top, though I can't swear to that.

On the 37 week post in #85, that might not be new. Some time ago... Several months at least, maybe even a year or more ago, I got curious and looked at the end of the tracker and I remember their being some weirdness with impossibly old posts. I know that's vague and not terribly helpful, but I thought I should mention that it might be an existing bug from before this work was done.

Michelle

Log in or register to post comments

Comment #88

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 02:28

Replace:

    // Case where only $node is set
    else {
      $results[$commented_node->nid] = $commented_node;
    }

With:

    // Case where only $node is set
    else if ($commented_node) {
      $results[$commented_node->nid] = $commented_node;
    }

I inaccurately assumed that something would get added every loop, but that's not the case because of duplicates.

Log in or register to post comments

Comment #89

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 02:30

Hey, I even messed up the comment:

    // Case where only $commented_node is set
    else {
      $results[$commented_node->nid] = $commented_node;
    }

Log in or register to post comments

Comment #90

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 02:49

I keep pasting the wrong thing:

    // Case where only $commented_node is set
    else if ($commented_node) {
      $results[$commented_node->nid] = $commented_node;
    }

This fixes the phantom post on the last tracker page.

Log in or register to post comments

Comment #91

mcurry CreditAttribution: mcurry commented 18 May 2007 at 07:11

subscribing

Log in or register to post comments

Comment #92

catch

he/him

English

CreditAttribution: catch commented 18 May 2007 at 08:12

@dww/david strauss: oops.

Won't be able to run again until a bit later but will post back!

Log in or register to post comments

Comment #93

xamox CreditAttribution: xamox commented 18 May 2007 at 13:37

Thanks! the site seems 10x faster!

Log in or register to post comments

Comment #94

catch

he/him

English

CreditAttribution: catch commented 18 May 2007 at 16:28

OK with patch, same spec as before, with the index update this time:

/tracker
1.49	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000
1.33	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000

/tracker/11

21.22	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.timestamp AS last_updated FROM comments c INNER JOIN node n ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 11 ORDER BY last_updated DESC LIMIT 0, 1000

3.34	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND n.uid = 11 ORDER BY last_updated DESC LIMIT 0, 1000

I love you all.

Log in or register to post comments

Comment #95

catch

he/him

English

CreditAttribution: catch commented 18 May 2007 at 16:34

arrgh, that one was query cache fooling me. Still about

21.47

which is 20-50 times as fast on 15k nodes, 170kcomments, 3.7k users. Amazing work.

Log in or register to post comments

Comment #96

Michelle

she/her

English

Wisconsin, USA

CreditAttribution: Michelle commented 18 May 2007 at 20:12

Ok, I just found another post that is showing up properly as updated in all recent posts and not in my tracker. http://drupal.org/node/112421 is the post. And http://drupal.org/node/144228 is still doing it as well as I reported a few comments back.

Michelle

Log in or register to post comments

Comment #97

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 20:36

@Michelle

Node 144228 is the 8th item in your tracker. Node 112421 is on the second page of your tracker.

Log in or register to post comments

Comment #98

webernet CreditAttribution: webernet commented 18 May 2007 at 20:41

There also seems to be a small bug - the last entry in each tracker is strange.

For example:

<tr class="even"><td></td><td><a href="/node/"></a> </td><td>Anonymous (not verified)</td><td class="replies">0</td><td>37 years 20 weeks ago</td> </tr>

Log in or register to post comments

Comment #99

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 20:44

@webernet This issue is corrected with the change in #90.

@Michelle I should clarify that there is a change in the method the tracker uses to determine how recent something is. The old tracker used the latest comment timestamp even if it wasn't your comment. My tracker patch uses your latest comment timestamp on a node. I think this is an improvement. Why should a node you commented on last year appear on the first page of your tracker just because someone else commented on it today?

Log in or register to post comments

Comment #100

pwolanin CreditAttribution: pwolanin commented 18 May 2007 at 20:50

@David - I think ordering should be based on when any comment is made - isn't that the point of the tracker? How else can I see when my comment provoked another comment, or someone else had related information. I think there was a suggestion above for separate tabs with "my recent comments" and "my recent posts", but that's separate.

Log in or register to post comments

Comment #101

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 21:07

@pwolanin You can easily see when your (recent) comments have created discussion. Just go through the first few pages of your tracker and look for "X new" next to the items. Moreover, if the goal is to see how your comments have created discussion, then we should create a better, thread-based model for tracking.

Log in or register to post comments

Comment #102

pwolanin CreditAttribution: pwolanin commented 18 May 2007 at 21:12

@David - I disagree. I know already when I made a comment- the purpose (to me at least) of the tracker is to know when others comment on the same threads.
In any case, this patch should not change the function of the tracker- at least not without more serious discussion.

Log in or register to post comments

Comment #103

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 21:22

@pwolanin Always keep in mind that the tracker isn't just for your use. I might want to see nodes you've recently commented on. If you'd like to see discussions you've sparked, I'd prefer to add a tracker tab for that purpose alone.

Log in or register to post comments

Comment #104

FiReaNGeL CreditAttribution: FiReaNGeL commented 18 May 2007 at 21:24

I agree with pwolanin here, to me the tracker is useful to check for replies to comments I made in the past. As I understand it, this functionality is gone since your (quite excellent) patch. Tradeoff to be made maybe, but i`d like to keep that functionality :)

Log in or register to post comments

Comment #105

dww

we/he/they

CreditAttribution: dww commented 18 May 2007 at 21:40

i agree w/ pwolanin, too. the underlying behavior should stay the same: any edit to the node or comment (regardless of who made it) counts as "activity", and the tracker should be sorted by most recent activity, period.

also, sorry i haven't gotten around to re-patching d.o based on the bug fixes posted here. however, i'd like to just wait for a new patch (a real patch, please) that a) includes the bug fixes and b) corrects the tracker behavior to return it to its original behavior...

thanks,
-derek

Log in or register to post comments

Comment #106

Michelle

she/her

English

Wisconsin, USA

CreditAttribution: Michelle commented 18 May 2007 at 22:15

Just getting back to my computer now. Ok, so the weirdness I've been experiencing is intentional by the sound of it, but I don't agree. I use my tracker all day long to see if any threads I've commented on have been updated. Having to go through every single page of my tracker looking for updated markings every time would suck big time. They really need to float to the top just like they do in the all posts tracker. To answer your question, yes, if someone comments on a thread I participated in a year ago, I want to see it.

Michelle

Log in or register to post comments

Comment #107

David Strauss

he/him

CreditAttribution: David Strauss commented 18 May 2007 at 22:22

@dww I'm not sure we can return tracker behavior to the original behavior with acceptable performance. Currently, my tracker code JOINs comments with nodes to produce its results. Listing the tracker items by the latest comment on the node requires another JOIN from nodes to the node_comment_statistics table. Such a JOIN may push the query back into temp_table/filesort territory. I initially tried to optimize the global feed to use the node_comment_statistics table instead of comments, but changing the JOIN sent the query back into using temp_table/filesort because of the condition requiring nodes to be published. MySQL seems to want to apply WHERE restrictions first, even if they're on the node table and very inclusive. Returning tracker to the original behavior may involve larger schema changes than adding a few indexes.

Log in or register to post comments

Comment #108

catch

he/him

English

CreditAttribution: catch commented 18 May 2007 at 22:40

another -1 on functionality change. If people have to go back through ten pages to find replies to their posts, then we lose the benefit of the query optimisation from all those extra page requests.

Log in or register to post comments

Comment #109

catch

he/him

English

CreditAttribution: catch commented 18 May 2007 at 22:46

Also, "last updated" now displays the last time you posted, not when the node was last updated, which isn't what it says. This is obviously a side-effect of the change in behaviour, but in terms of UI it's a bug since that column isn't there to tell you when you commented.

Log in or register to post comments

Comment #110

kbahey CreditAttribution: kbahey commented 19 May 2007 at 00:24

David

How about pushing all the fields (or the two we care about) in node_comment_statistics columns back in the node table? This will avoid the join you talk about, and it is a 1:1 relationship anyway.

Moshe tried a bit with this. See above for the issue on that.

Log in or register to post comments

Comment #111

Dries CreditAttribution: Dries commented 19 May 2007 at 05:46

I also use the tracker to look for follow-ups on my posts/comments.

Log in or register to post comments

Comment #112

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 19 May 2007 at 14:08

IMO, splitting these into two tabs would not be the end of the world. It would make it easier for me to find handbook pages I wrote vs. random forum posts where I helped someone (I care a lot more about making sure I'm keeping up with the comments on the former :)). And people who want just a single tab with the old behaviour on their sites can just use Views, which comes with a 'tracker view for this purpose.

But I didn't quite understand the most recent comments... does that mean we'd actually need *four* tabs? "recent posts", "recent comments", "my recent posts", "my recent comments" ? If so that would be... icky. ;)

Log in or register to post comments

Comment #113

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 16:43

@kbahey This would be a good idea. It's the {node_comment_statistics} and {node} JOIN that kills the query. What MySQL does is see the published = 1 restriction on {node} and no restriction on {node_comment_statistics}, so it start

@catch We only lose the benefit if they go through about 300 pages. :-) Fixing last updated is easy: we only need to load those values for the current page.

@webchick Along those lines, we could split it into "My posts" and "My conversations." The latter would list nodes where you've commented in the decending order of latest comment by anyone.

Log in or register to post comments

Comment #114

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 16:46

I'm going to play around with joins to {node_comment_statistics} to see if there's anything that won't require a temp_table.

Log in or register to post comments

Comment #115

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 17:07

After some research and experimentation, I've concluded that placing the fields from {node_comment_statistics} in {node} will not help. The problem is that, for the user tracker, we put conditions on {comments} but sort on a different table. This makes sorting using a key impossible, forcing a filesort.

Log in or register to post comments

Comment #116

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 17:18

Okay, here's what needs to happen to the database schema to maintain tracker performance while returning it to the original functionality. The last_comment_timestamp column needs to appear in {comments}. (Yes, it's a violation of normalization, but it's the only way to make this work.)

This needs to run every time a comment is posted:
UPDATE {comments} SET last_comment_timestamp = %d WHERE nid = %d (with the obvious values for timestamp and nid)

Log in or register to post comments

Comment #117

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 17:52

File	Size
tracker_update.patch	6.4 KB

Updates for system.install:
ALTER TABLE {comments} ADD last_comment_timestamp INT(11) NOT NULL;
ALTER TABLE {comments} DROP INDEX tracker_user, ADD INDEX tracker_user (uid, status, last_comment_timestamp);

This patch may need some revisions, but I'd rather post early so people can have a look.

Log in or register to post comments

Comment #118

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 17:56

One more update for system.install (in addition to those in #117):
ALTER TABLE {comments} DROP INDEX tracker_global, ADD INDEX tracker_global (status, last_comment_timestamp);

Log in or register to post comments

Comment #119

catch

he/him

English

CreditAttribution: catch commented 19 May 2007 at 18:37

happy to test changes, but only have 5.1 site to test on. Is it cheeky to ask for a 5.1 version of that patch?

Log in or register to post comments

Comment #120

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 21:33

File	Size
tracker_patch_5-1.patch	8.65 KB

Complete patch for 5.1, including schema updates.

Log in or register to post comments

Comment #121

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 22:01

File	Size
tracker_patch_5-1_0.patch	8.71 KB

This is a reordering of the updates in the 5.1 patch to create the index only after the column gets populated.

Log in or register to post comments

Comment #122

David Strauss

he/him

CreditAttribution: David Strauss commented 19 May 2007 at 23:23

Title:	try to improve tracker query performance	» Tracker performance improvements
Assigned:	dww	» David Strauss

Log in or register to post comments

Comment #123

catch

he/him

English

CreditAttribution: catch commented 20 May 2007 at 12:49

File	Size
tracker_patch_5-1_0_0.patch	8.61 KB

@David Strauss, thanks for 5.1 patch

I've applied that. database update went fine, patch applied ok. attached new patch which just fixes typo db_queryd > db_query and removes some folder information to make it apply easier.

Something's going on with the main tracker query though:

/tracker/1

6.84	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, n.changed AS last_updated FROM node n WHERE n.status = 1 AND n.uid = 1 ORDER BY last_updated DESC LIMIT 0, 1000
6.01	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.last_comment_timestamp AS last_updated FROM node n INNER JOIN comments c ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 1 ORDER BY last_updated DESC

/tracker

4842.86	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.last_comment_timestamp AS last_updated FROM node n INNER JOIN comments c ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC

on comments I have indexes: tracker_user (uid, stats, last_comment_timestamp) and tracker_global (status, last comment timestamp)

last_comment_timestamp seems to have been updated ok in comments table as well. So not sure why it's so much slower than the tracker/1

Log in or register to post comments

Comment #124

catch

he/him

English

CreditAttribution: catch commented 20 May 2007 at 13:47

File	Size
tracker_patch_5-1_0_1.patch	8.61 KB

argg, last patch added my own typo in. This one ought to be ok. Going to try to see whether my 2,000-4,000ms /tracker query is a result of the new query or my own incompetence applying patches ;)

Log in or register to post comments

Comment #125

catch

he/him

English

CreditAttribution: catch commented 20 May 2007 at 14:30

alright, for some reason $sanity_limit isn't being called when the query gets executed:

I hard coded LIMIT 0, 1000 into $comments_sql

17.11	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.last_comment_timestamp AS last_updated FROM node n INNER JOIN comments c ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000

which is fine. can confirm that "my recent posts" now functions as it used to as well.

Can't work out why $sanity_limit isn't running though, it seems to be the same as before.

Log in or register to post comments

Comment #126

David Strauss

he/him

CreditAttribution: David Strauss commented 20 May 2007 at 16:47

File	Size
tracker_patch_5-1_1.patch	8.61 KB

You replaced db_queryd (which was admittedly a typo) with db_query. It should have been replaced with db_query_range. Here's an updated patch.

Log in or register to post comments

Comment #127

David Strauss

he/him

CreditAttribution: David Strauss commented 20 May 2007 at 17:06

The LIMIT patch I posted should fix the problem. Because placing a LIMIT on the query speeds it up so much, the 2 to 4 seconds spent on the non-limited query must be due to the sheer quantity of results. Every comment is in that result set when there's no LIMIT.

Log in or register to post comments

Comment #128

catch

he/him

English

CreditAttribution: catch commented 20 May 2007 at 19:07

We have c. 190k comments, I think that's why the LIMIT made such a difference.

Changed to db_query_range (manually, but patch looks fine), took out hard coding:

17.19	1	tracker_page	SELECT n.nid, n.title, n.type, n.changed, n.uid, c.last_comment_timestamp AS last_updated FROM node n INNER JOIN comments c ON n.nid = c.nid WHERE n.status = 1 AND c.status = 0 AND 1 ORDER BY last_updated DESC LIMIT 0, 1000

RTBC?

Log in or register to post comments

Comment #129

David Strauss

he/him

CreditAttribution: David Strauss commented 20 May 2007 at 20:47

RTBC?

Log in or register to post comments

Comment #130

catch

he/him

English

CreditAttribution: catch commented 20 May 2007 at 21:01

"ready to be committed" - i.e. it works great, no bugs left that I can see, so just time to commit to core. I'd imagine core comitters will want another review or two before it goes in, so I didn't mark it as such.

Log in or register to post comments

Comment #131

David Strauss

he/him

CreditAttribution: David Strauss commented 20 May 2007 at 21:09

Considering the necessity of having this patch on Drupal.org and the fact that it's already made for 5.1, I'd like to roll this patch into 5.2 and 6.x-dev.

Log in or register to post comments

Comment #132

David Strauss

he/him

CreditAttribution: David Strauss commented 20 May 2007 at 21:11

@catch The patch still needs PostgreSQL schema updates.

Log in or register to post comments

Comment #133

catch

he/him

English

CreditAttribution: catch commented 20 May 2007 at 21:16

I think the way it's done now is committed to 6.x then backported to 5.x-dev (which'd mean it'd get into the 5.2 release when that comes out assuming a security fix doesn't beat your patch going in).

BTW, I know this was for drupal.org, but it's made a big difference to the load on my own site as well, so thanks again!

Log in or register to post comments

Comment #134

kbahey CreditAttribution: kbahey commented 23 May 2007 at 04:01

The last_comment_timestamp column needs to appear in {comments}. (Yes, it's a violation of normalization, but it's the only way to make this work.)

This needs to run every time a comment is posted:
UPDATE {comments} SET last_comment_timestamp = %d WHERE nid = %d (with the obvious values for timestamp and nid)

I have serious objections to this.

So far the denormalizations we have been doing was of the benign type, such as having a 1:1 relationship broken into two tables to minimize the locks, ...etc.

This one is far less benign, because of the sheer number of rows involved. We have far more data redundancy.

Before I say more, let us see the impact of this via benchmarks though. If it doesn't do much good, then it is dead in the womb. If it does improve things we can weigh in the trade off of redundancy vs. performance.

Log in or register to post comments

Comment #135

David Strauss

he/him

CreditAttribution: David Strauss commented 23 May 2007 at 04:58

@kbahey Object all you want, but maintaining the original tracker interface at a good level of performance requires it. The only way to avoid temporary tables and filesorts is to have the WHERE and SORT BY criteria in the same table. (If they're not in the same table, an index cannot span them.) Because the WHERE criteria are in the comments table, the SORT BY criteria must exist there, too. Take a close look at the tracker query for comments, and you'll see what I mean.

The UPDATE query I've written shouldn't have too bad of an effect. It scales linearly with the number of comments on a node. It only takes logarithmic time to identify the comments associated with a node because of the nid index on comments. The UPDATE also won't affect row sizes because we're only updating an integer. Keeping the rows the same size goes a long way to keeping performance up.

{node_comment_statistics}is a bad example of denormalization. I'm working toward removing it from Drupal entirely. A query has to JOIIN it with {node} to make it useful. The extra JOIN reverses many of the performance benefits it was designed to provide.

Log in or register to post comments

Comment #136

Chris Johnson CreditAttribution: Chris Johnson commented 24 May 2007 at 12:49

I tend to agree with David here. Denormalizing by duplicating data to improve performance is a classic case of when denormalization is justified.

Log in or register to post comments

Comment #137

David Strauss

he/him

CreditAttribution: David Strauss commented 24 May 2007 at 17:24

One other place I could denormalize instead of {comments} is {history}. By putting an interacted (boolean) and node_last_updated (timestamp) into there, it would be easy to query for nodes that the user had interacted with but that had been updated since the last visit. It would be slightly less duplication of data than in {comments}. It would also allow us to have only one query for tracker because node_last_updated would be updated with a node change or a comment.

Log in or register to post comments

Comment #138

David Strauss

he/him

CreditAttribution: David Strauss commented 24 May 2007 at 17:29

One other place I could denormalize instead of {comments} is {history}. By putting an interacted (boolean) and node_last_updated (timestamp) into there, it would be easy to query for nodes that the user had interacted with but that had been updated since the last visit. It would be slightly less duplication of data than in {comments}. It would also allow us to have only one query for tracker because node_last_updated would be updated with a node change or a comment.

Log in or register to post comments

Comment #139

David Strauss

he/him

CreditAttribution: David Strauss commented 24 May 2007 at 19:35

Status:

Needs review

» Needs work

I'm sending this back to "code needs work" so I can move the denormalization to {history}.

Log in or register to post comments

Comment #140

macm CreditAttribution: macm commented 17 July 2007 at 18:47

Title:	Tracker performance improvements	» Custom Tracker page
Version:	6.x-dev	» 5.1
Assigned:	David Strauss	» macm
Category:	task	» support
Priority:	Critical	» Normal
Status:	Needs work	» Active

I have tabs created by Jquery in tpl pages.

I need print tracker_page, "Recent posts of a user" or "My recent posts" into a tab. How can I do this?

I tried a lot of things but always fail.

So my tab is in /mylanguage/user/1#mytracker

There is a easy way?

Could I get and render the path 'tracker/' .arg(1) --- into a tab(/mylanguage/user/1#mytracker)?

Folks sorry if here isnt appropriate place but I cant find best and this node was in tracker.module.

Fell free to delete with Drupal Webmaster.

I tried adapt this function in my page, but is very expensive to do this with my level of knowledge of Drupal's core.

Anyway thanks cores you are amazing.

Regards

Log in or register to post comments

Comment #141

dww

we/he/they

CreditAttribution: dww commented 17 July 2007 at 18:51

Title:	Custom Tracker page	» Tracker performance improvements
Version:	5.1	» 6.x-dev
Assigned:	macm	» dww
Category:	support	» task
Priority:	Normal	» Critical
Status:	Active	» Needs work

@macm: Please don't hijack issues. This issue is about fixing the performance of the tracker page. Yours is a completely separate support request. So, file your request separately and leave this issue to address the underlying performance problems.

Log in or register to post comments

Comment #142

dww

we/he/they

CreditAttribution: dww commented 17 July 2007 at 18:52

Assigned:

dww

» Unassigned

This is David's issue now, not mine, but I can't assign it directly to him directly using the UI (yet), so I'm just unassigning myself...

Log in or register to post comments

Comment #143

David Strauss

he/him

CreditAttribution: David Strauss commented 18 July 2007 at 00:58

Assigned:	Unassigned	» David Strauss
Status:	Needs work	» Closed (fixed)

This patch is superseded by http://drupal.org/node/148849.

Log in or register to post comments

Comment #144

kbahey CreditAttribution: kbahey commented 30 July 2007 at 03:54

Status:

Closed (fixed)

» Needs review

File	Size
tracker-union.patch	3.37 KB

So, I consulted the folk from MySQLPerformanceBlog and they generously investigated the issue for us free of charge, because we are an Open Source project (thanks Peter, Vadim and Maciej

They gave a detailed analysis of the queries, and their end advice was to rewrite them using unions.

Attached is a patch to tracker.module for 5.2, for testing on drupal.org.

Here are the results of testing on my development server, with restarting MySQL in between the tests to nullify the effect of the query cache. The tests were done using 10,000 nodes and 10,000 comments.

However, I have been getting a lot of variance on this machine lately, so the results need to be replicated somewhere, preferrably directly on d.o (what do we have to lose?)

The schema is the standard Drupal 5.2 schema, so if you have any changes to that, roll them back first.

Before the patch

This is the performance of this query without MySQL cache query, and using the existing 5.2 code.

Executed 75 queries in 1640.97 milliseconds. Page execution time was 1734.76 ms.
ms	#	where	query

874.04	1	pager_query	SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid = l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 1 OR c.uid = 1) ORDER BY last_updated DESC LIMIT 0, 25

296.64	1	pager_query	SELECT COUNT(DISTINCT(n.nid)) FROM node n LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 1 OR c.uid = 1)

With patch

Two variants are tested for this patch.

Variant 1: Using the original count query

This variant rewrites the query using a UNION with an ORDER BY, but leaves the $sql_count as it is. This proved faster on the test data set described above, which is different from d.o (no users, and less number of rows). This proved faster on my test server, but that may not be the case for d.o due to volume and cardinality.

Executed 75 queries in 586.61 milliseconds. Page execution time was 666.62 ms.
ms	#	where	query

363.42	1	pager_query	(SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM users u JOIN node n ON n.uid = u.uid JOIN node_comment_statistics l ON l.nid = n.nid WHERE n.status = 1 AND n.uid = 1) UNION (SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM users u JOIN comments c ON c.uid = u.uid JOIN node n ON n.nid = c.nid JOIN node_comment_statistics l ON l.nid = n.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 1) ORDER BY last_updated DESC LIMIT 0, 25

146.03	1	pager_query	SELECT COUNT(DISTINCT(n.nid)) FROM node n LEFT JOIN comments c ON n.nid = c.nid AND (c.status = 1 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 0 OR c.uid = 1)

Variant 2: Using a union in the count query

This uses a UNION for both queries. That was slower on my test server, but may be faster on d.o.

Executed 75 queries in 851.03 milliseconds. Page execution time was 997.75 ms.
ms	#	where	query

446.17	1	pager_query	(SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM users u JOIN node n ON n.uid = u.uid JOIN node_comment_statistics l ON l.nid = n.nid WHERE n.status = 1 AND n.uid = 1) UNION (SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM users u JOIN comments c ON c.uid = u.uid JOIN node n ON n.nid = c.nid JOIN node_comment_statistics l ON l.nid = n.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 1) ORDER BY last_updated DESC LIMIT 0, 25

338.77	1	pager_query	SELECT SUM(cnt) FROM ((SELECT COUNT(DISTINCT(n.nid)) AS cnt FROM users u JOIN node n ON n.uid = u.uid WHERE n.status = 1 AND u.uid = 1) UNION (SELECT COUNT(DISTINCT(n.nid)) AS cnt FROM users u JOIN comments c ON c.uid = u.uid JOIN node n ON n.nid = c.nid WHERE c.status = 0 AND n.status = 1 AND u.uid = 1)) AS sum_cnt

Log in or register to post comments

Comment #145

David Strauss

he/him

CreditAttribution: David Strauss commented 30 July 2007 at 04:17

While those changes win points for less invasiveness, they will never approach the performance of the rewrites I've suggested (and largely finished in my NCS/node merge patch). It seems better than what the base Drupal code has now, but I guarantee it will be far slower than the current patched tracker on Drupal.org.

Log in or register to post comments

Comment #146

kbahey CreditAttribution: kbahey commented 30 July 2007 at 05:01

David

I recommended to Dries that we disable tracker altogether if it is the only source of contention in the database. At least issue submission, browsing and the rest of the site will be fast, rather than the continued pain of what d.o is today.

Barring that, this patch is not a replacement for whatever schema changes you are doing which are needed to improve d.o overall. But I see those are long term.

This is (at least) a stop gap until we get d.o on D6 or D7 (whichever one has the schema changes). If it proves to be worth keeping, then so be it.

Worth testing at least to see if they are more functional that what is present at the moment on d.o (does not give people what they are looking for).

Here is the analysis done on a copy if drupal.org by Maciej.

** 1. **
SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name,
GREATEST(n.changed, l.last_comment_timestamp) AS last_updated,
l.comment_count FROM node n INNER JOIN node_comment_statistics l ON n.nid =
l.nid INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid =
c.nid AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid =
1 OR c.uid = 1) ORDER BY last_updated DESC
********
EXPLAIN
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: n
type: ALL
possible_keys: PRIMARY,node_status_type,nid,uid
key: NULL
key_len: NULL
ref: NULL
rows: 140437
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: l
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: drupal_export.n.nid
rows: 1
Extra:
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: u
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: drupal_export.n.uid
rows: 1
Extra: Using where
*************************** 4. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ref
possible_keys: lid
key: lid
key_len: 4
ref: drupal_export.n.nid
rows: 4
Extra: Using where; Distinct
********
As you see the main problem of this query is that it does full table scan on
`node`. This is because you tied together conditions from two different
tables in <>, also the index that
contains `status` cannot be used because almost all rows has this column
value equal 1. This is why the engine cannot filter out any rows from `node`
before doing join. What you can do is to split the two OR-ed conditions into
UNIONs of two SELECTs.

(SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name,
GREATEST(n.changed, l.last_comment_timestamp) AS last_updated,
l.comment_count FROM users u JOIN node n ON n.uid = u.uid JOIN
node_comment_statistics l ON l.nid = n.nid WHERE n.status = 1 AND n.uid = 1)

UNION
(SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name,
GREATEST(n.changed, l.last_comment_timestamp) AS last_updated,
l.comment_count FROM users u JOIN comments c ON c.uid = u.uid JOIN node n ON
n.nid = c.nid JOIN node_comment_statistics l ON l.nid = n.nid WHERE n.status
= 1 AND c.status = 0 AND c.uid = 1)
ORDER BY last_updated DESC;

This query reads all the data efficiently through indexes and then sorts the
final result set with filesort. If the returned data can fit into the
in-memory temporary table, which size is limited by tmp_table_size and
max_heap_table_size MySQL parameters, then this query should run decently
fast. On my low-end test server the original and rewritten queries took a
few minutes and 0.45s respectively (rows examined: 644642 vs 15229).
Although 15k rows is still a bit large number for a busy on-line
application, but I guess this cannot be avoided without some changes in the
schema.

** 2. **
SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name FROM node
n INNER JOIN users u ON n.uid = u.uid LEFT JOIN comments c ON n.nid = c.nid
AND (c.status = 0 OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = 9575
OR c.uid = 9575) ORDER BY n.changed DESC
********
EXPLAIN
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: n
type: ALL
possible_keys: node_status_type,uid
key: NULL
key_len: NULL
ref: NULL
rows: 140437
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: u
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: drupal_export.n.uid
rows: 1
Extra: Using where
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: c
type: ref
possible_keys: lid
key: lid
key_len: 4
ref: drupal_export.n.nid
rows: 4
Extra: Using where; Distinct
********
Again the same problem. So I rewrite the query to:

(SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name FROM
users u JOIN node n ON n.uid = u.uid WHERE n.status = 1 AND u.uid = 1) UNION
(SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name FROM
users u JOIN comments c ON c.uid = u.uid JOIN node n ON n.nid = c.nid WHERE
c.status = 0 AND n.status = 1 AND u.uid = 1) ORDER BY changed DESC;

Statistics are:
Time: 0.58s vs 1 minute
Examined rows: 11522 vs 503170

Log in or register to post comments

Comment #147

Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commented 30 July 2007 at 08:52

@David: Your rewrite will be great for Drupl 6, but we are still in D5 land and I need a solution "now". Your patch that is running on d.o isn't really helping since the results it provides aren't that great. As soon as we've resolved our performance issues I'll want to revert the patch and have somethign that provides the original result set.

Log in or register to post comments

Comment #148

catch

he/him

English

CreditAttribution: catch commented 30 July 2007 at 11:12

@Killes: is drupal.org running the patch from #126? iirc it's running an earlier one, and #126 provides a better result set.

Both the UNION and the long term schema changes are obviously better long term, but it'd help until they're sorted.

Log in or register to post comments

Comment #149

dww

we/he/they

CreditAttribution: dww commented 30 July 2007 at 14:20

d.o is still running the patch from #69. Seems like we should use #126, but that includes the schema change for last_comment_timestamp in {comments} (not just some new indexes). So, I'd need to coordinate with killes and make sure he's happy with this, do a DB dump, etc, etc. David: anything else we should be aware of when moving from #69 to #126? I don't have time right now to carefully study every detail of both patches. Thanks.

Log in or register to post comments

Comment #150

cog.rusty CreditAttribution: cog.rusty commented 30 July 2007 at 21:07

I am getting my user name as the author of every node I have commented on.
When hovering, the uid under the name is the correct node author.
Could this patch be responsible? I don't understand much but here's a theory:

.... UNION (SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM users u JOIN comments c ON c.uid = u.uid JOIN node n ON n.nid = c.nid JOIN node_comment_statistics l ON l.nid = n.nid WHERE n.status = 1 AND c.status = 0 AND c.uid = 1)....

n.uid -- found from the node table
u.name -- found not via the node table but via the comments table with "FROM users u JOIN comments c ON c.uid = u.uid"

Previously it was found using "n.nid = c.nid", while "c.uid" was only in the WHERE clause, not in the JOINs.

Log in or register to post comments

Comment #151

cog.rusty CreditAttribution: cog.rusty commented 30 July 2007 at 23:10

The above was about the /user/[uid]/track page.
The /tracker/user page displays "anonymous" as the author and has more problems.

Log in or register to post comments

Comment #152

kbahey CreditAttribution: kbahey commented 31 July 2007 at 00:21

File	Size
tracker-union-2.patch	3.88 KB

This version contains a fix for the wrong user name displayed.

Gerhard, please install on d.o.

Log in or register to post comments

Comment #153

dww

we/he/they

CreditAttribution: dww commented 31 July 2007 at 00:36

Reviewed #152, backed out #144 from d.o and applied #152.
Confirmed that the user names are now correct in the tracker page.

Log in or register to post comments

Comment #154

Ron Williams CreditAttribution: Ron Williams commented 31 July 2007 at 05:05

On user tracker page, when clicking the last button, it appears to take users to page "59" unless they have greater than 59 pages of postings.

Log in or register to post comments

Comment #155

Dries CreditAttribution: Dries commented 31 July 2007 at 13:33

It would be great if we could test the performance improvements of this patch in a better way. Applying it on drupal.org is one thing, but it doesn't tell us whether it is actually faster. We've made a dozen other configuration changes to Drupal.org the past couple of days. A more structured approach to testing this patch -- once the patch is in its final state -- would help me convince to commit it once it's ready.

This should be fairly easy to do: just run the queries 20 times from the mysql command line and average the result.

Log in or register to post comments

Comment #156

dww

we/he/they

CreditAttribution: dww commented 31 July 2007 at 15:09

@Dries: I agree. I just want to point out that we're only applying these patches on d.o to stop the bleeding. And, there have been some mysql numbers posted above in the issue (see #75, #77, #80, #144, #146). Killes decided to move d.o to #144 -- I only applied #152 because #144 was buggy and needed a fix.

Given the explain results on #75-#80 vs. #144 and #146, I don't see why Killes switched d.o. Yes, the tracker results weren't ideal for the earlier patch, but the performance was orders of magnitude better. #152 seems like a minor improvement over what's in core already, which requires no schema changes. #126 would be a significant improvement, but requires a schema change. Killes doesn't want to make that change on d.o unless we're sure that schema change will be going into core for D6...

Log in or register to post comments

Comment #157

cog.rusty CreditAttribution: cog.rusty commented 31 July 2007 at 20:28

I am getting 40 pages of nodes and 21 pages of "No posts available" in my tracker. $sql_count does not seem to agree with the rows returned.

Log in or register to post comments

Comment #158

kbahey CreditAttribution: kbahey commented 4 August 2007 at 09:24

Here is another version that tries to correct the pager issue.

Note that the code has not changed for /tracker. The changes are effective only when you are using /tracker/UID.

Index: tracker.module
===================================================================
RCS file: /cvs/drupal/drupal/modules/tracker/tracker.module,v
retrieving revision 1.143.2.2
diff -u -F^f -r1.143.2.2 tracker.module
--- tracker.module  26 Jul 2007 19:16:50 -0000  1.143.2.2
+++ tracker.module  4 Aug 2007 09:20:56 -0000
@@ -78,11 +78,22 @@ function tracker_page($uid = 0) {

   // TODO: These queries are very expensive, see http://drupal.org/node/105639
   if ($uid) {
-    $sql = 'SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_upd
ated, l.comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {users} u ON n.uid = u.uid LEFT JO
IN {comments} c ON n.nid = c.nid AND (c.status = %d OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = %d OR c.uid = %d) ORDER BY last_
updated DESC';
+    // This is the original slow query that kills drupal.org
+    //$sql = 'SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_u
pdated, l.comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {users} u ON n.uid = u.uid LEFT
JOIN {comments} c ON n.nid = c.nid AND (c.status = %d OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = %d OR c.uid = %d) ORDER BY las
t_updated DESC';
+
+    // This is a newer variant with UNION
+    $sql = '(SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.c
omment_count FROM {users} u JOIN {node} n ON n.uid = u.uid JOIN {node_comment_statistics} l ON l.nid = n.nid WHERE n.status = 1 AND n.uid
 = %d) UNION (SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.c
omment_count FROM {users} u JOIN {comments} c ON c.uid = u.uid JOIN {node} n ON n.nid = c.nid JOIN {node_comment_statistics} l ON l.nid =
 n.nid WHERE n.status = 1 AND c.status = %d AND c.uid = %d) ORDER BY last_updated DESC';
     $sql = db_rewrite_sql($sql);
-    $sql_count = 'SELECT COUNT(DISTINCT(n.nid)) FROM {node} n LEFT JOIN {comments} c ON n.nid = c.nid AND (c.status = %d OR c.status IS
NULL) WHERE n.status = 1 AND (n.uid = %d OR c.uid = %d)';
+
+    // Variant 1: This is the original count query
+    //$sql_count = 'SELECT COUNT(DISTINCT(n.nid)) FROM {node} n LEFT JOIN {comments} c ON n.nid = c.nid AND (c.status = %d OR c.status I
S NULL) WHERE n.status = 1 AND (n.uid = %d OR c.uid = %d)';
+
+    // Variant 2: This is a UNION variant of the above query. Slower on 10,000 nodes,
+    // but needs to be tested on drupal.org to see how it behaves with the volume and cardinality there
+    $sql_count = 'SELECT SUM(cnt) FROM ((SELECT COUNT(DISTINCT(n.nid)) AS cnt FROM {users} u JOIN {node} n ON n.uid = u.uid WHERE n.stat
us = 1 AND u.uid = %d) UNION (SELECT COUNT(DISTINCT(n.nid)) AS cnt FROM {users} u JOIN {comments} c ON c.uid = u.uid JOIN {node} n ON n.n
id = c.nid WHERE c.status = %d AND n.status = 1 AND u.uid = %d)) AS sum_cnt';
+
     $sql_count = db_rewrite_sql($sql_count);
-    $result = pager_query($sql, 25, 0, $sql_count, COMMENT_PUBLISHED, $uid, $uid);

Log in or register to post comments

Comment #159

kbahey CreditAttribution: kbahey commented 4 August 2007 at 09:26

File	Size
tracker-union-3.patch	4.07 KB

Attached.

Log in or register to post comments

Comment #160

catch

he/him

English

CreditAttribution: catch commented 7 September 2007 at 09:31

So the API changes aren't going to get int 6.x now, but what about the UNION query?

Log in or register to post comments

Comment #161

David Strauss

he/him

CreditAttribution: David Strauss commented 7 September 2007 at 09:40

@catch I support using the UNION query in lieu of the more fundamental changes that are going to have to wait until Drupal 7.

Log in or register to post comments

Comment #162

catch

he/him

English

CreditAttribution: catch commented 16 October 2007 at 22:28

Status:

Needs review

» Needs work

No longer applies - split patch must have gone in since this was rolled.

Log in or register to post comments

Comment #163

catch

he/him

English

CreditAttribution: catch commented 25 October 2007 at 12:02

File	Size
tracker_union_reroll.patch	4.08 KB

Re-roll of kbahey's patch from #159 against HEAD. Didn't change anything I hope except some minor punctuation in code comments.

Log in or register to post comments

Comment #164

catch

he/him

English

CreditAttribution: catch commented 25 October 2007 at 12:15

Status:

Needs work

» Needs review

Very quick review.

Patch applies cleanly, tracker works as expected in both /tracker and user/1/track situations - so no real change in functionality.

Still needs some proper benchmarking for the actual performance implications which I'm not really set up to do. Marking as needs review.

Log in or register to post comments

Comment #165

dmitrig01 CreditAttribution: dmitrig01 commented 28 October 2007 at 03:13

I benchmarked it with lotsa nodes (forgot how many)... here are the results

Without

ab -n1000 -c10 http://localhost:8888/drupal-head/tracker
This is ApacheBench, Version 2.0.41-dev <$Revision: 1.121.2.12 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests


Server Software:        Apache/2.0.59
Server Hostname:        localhost
Server Port:            8888

Document Path:          /drupal-head/tracker
Document Length:        10168 bytes

Concurrency Level:      10
Time taken for tests:   144.553750 seconds
Complete requests:      1000
Failed requests:        166
   (Connect: 0, Length: 166, Exceptions: 0)
Write errors:           0
Total transferred:      10685077 bytes
HTML transferred:       10162077 bytes
Requests per second:    6.92 [#/sec] (mean)
Time per request:       1445.538 [ms] (mean)
Time per request:       144.554 [ms] (mean, across all concurrent requests)
Transfer rate:          72.18 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   673 1440 225.6   1398    2350
Waiting:      663 1432 225.8   1389    2343
Total:        673 1440 225.6   1398    2350

Percentage of the requests served within a certain time (ms)
  50%   1398
  66%   1495
  75%   1571
  80%   1633
  90%   1792
  95%   1875
  98%   1938
  99%   1986
 100%   2350 (longest request)

With

ab -n1000 -c10 http://localhost:8888/drupal-head/tracker
This is ApacheBench, Version 2.0.41-dev <$Revision: 1.121.2.12 $> apache-2.0
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests


Server Software:        Apache/2.0.59
Server Hostname:        localhost
Server Port:            8888

Document Path:          /drupal-head/tracker
Document Length:        10168 bytes

Concurrency Level:      10
Time taken for tests:   131.945790 seconds
Complete requests:      1000
Failed requests:        194
   (Connect: 0, Length: 194, Exceptions: 0)
Write errors:           0
Total transferred:      10684567 bytes
HTML transferred:       10161567 bytes
Requests per second:    7.58 [#/sec] (mean)
Time per request:       1319.458 [ms] (mean)
Time per request:       131.946 [ms] (mean, across all concurrent requests)
Transfer rate:          79.08 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:   270 1313 196.5   1281    2163
Waiting:      262 1305 196.4   1274    2155
Total:        270 1313 196.5   1281    2163

Percentage of the requests served within a certain time (ms)
  50%   1281
  66%   1323
  75%   1358
  80%   1397
  90%   1534
  95%   1735
  98%   1944
  99%   2039
 100%   2163 (longest request)

Not much faster but there is at least some gained speed

Log in or register to post comments

Comment #166

joshk CreditAttribution: joshk commented 3 November 2007 at 22:49

This applies cleanly and causes no errors, but my benchmarking tests results aren't quite as expected.

I used devel_generate to add about 2000 nodes, and ran an ab series also with a higher concurrency to see if that brought out a bigger delta on the times. Unfortunately, I got the opposite of the expected result:

Before

 ab -n1000 -c50 http://localhost/~joshk/drupal-HEAD/?q=tracker
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests
Server Software:        Apache/1.3.33                                      
Server Hostname:        localhost
Server Port:            80

Document Path:          /~joshk/drupal-HEAD/?q=tracker
Document Length:        15130 bytes

Concurrency Level:      50
Time taken for tests:   425.018 seconds
Complete requests:      1000
Failed requests:        645
   (Connect: 0, Length: 645, Exceptions: 0)
Broken pipe errors:     0
Total transferred:      15627773 bytes
HTML transferred:       15138773 bytes
Requests per second:    2.35 [#/sec] (mean)
Time per request:       21250.90 [ms] (mean)
<b>Time per request:       425.02 [ms] (mean, across all concurrent requests)</b>
Transfer rate:          36.77 [Kbytes/sec] received

Connnection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0     0    0.1      0     2
Processing:  1980 20604 1812.9  20505 26773
Waiting:     1978 20604 1812.9  20505 26772
Total:       1980 20604 1812.8  20505 26773

Percentage of the requests served within a certain time (ms)
  50%  20505
  66%  20809
  75%  21041
  80%  21219
  90%  21805
  95%  22437
  98%  25153
  99%  26428
 100%  26773 (last request)

After

ab -n1000 -c50 http://localhost/~joshk/drupal-HEAD/?q=tracker
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Finished 1000 requests
Server Software:        Apache/1.3.33                                      
Server Hostname:        localhost
Server Port:            80

Document Path:          /~joshk/drupal-HEAD/?q=tracker
Document Length:        15155 bytes

Concurrency Level:      50
Time taken for tests:   520.055 seconds
Complete requests:      1000
Failed requests:        256
   (Connect: 0, Length: 256, Exceptions: 0)
Broken pipe errors:     0
Total transferred:      15636909 bytes
HTML transferred:       15147909 bytes
Requests per second:    1.92 [#/sec] (mean)
Time per request:       26002.75 [ms] (mean)
<b>Time per request:       520.05 [ms] (mean, across all concurrent requests)</b>
Transfer rate:          30.07 [Kbytes/sec] received

Connnection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0     0    0.3      0     9
Processing:  6458 25441 2599.1  24822 33852
Waiting:     6458 25441 2599.1  24822 33852
Total:       6458 25441 2599.0  24822 33852

Percentage of the requests served within a certain time (ms)
  50%  24822
  66%  26500
  75%  27565
  80%  27918
  90%  28907
  95%  29545
  98%  30391
  99%  31080
 100%  33852 (last request)

The same pattern repeated itself, although wasn't as pronounced, at a -c10 level. Could be my system, but...

Log in or register to post comments

Comment #167

catch

he/him

English

CreditAttribution: catch commented 7 November 2007 at 11:19

File	Size
tracker_union_all.patch	4.09 KB

OK try this baby out. Modified to use UNION ALL instead of plain UNION. UNION ALL is supported by pgsql so shouldn't be an issue why

MySQL performance blog explains why - could be six times faster.

No change in functionality at my end, but again needs benchmarking which I'm not set up for.

Log in or register to post comments

Comment #168

Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commented 8 November 2007 at 08:13

Status:

Needs review

» Needs work

This patch has an error in the count query. In the second part you need to exclude the nodes that the user authored, otherwise you'll count them twice.

Log in or register to post comments

Comment #169

catch

he/him

English

CreditAttribution: catch commented 8 November 2007 at 11:47

Status:

Needs work

» Needs review

File	Size
tracker_union_all.patch	4.11 KB

Well I'm just re-rolling and this is hardly my strong point, but here's a stab at it:

Changed:

SELECT COUNT(DISTINCT(n.nid)) AS cnt FROM {users} u JOIN {comments} c ON c.uid = u.uid JOIN {node} n ON n.nid = c.nid WHERE c.status = %d AND n.status = 1 AND u.uid = %d

to this?

SELECT COUNT(DISTINCT(n.nid)) AS cnt FROM {users} u JOIN {comments} c ON c.uid = u.uid JOIN {node} n ON n.nid = c.nid WHERE c.status = %d AND n.status = 1 AND u.uid = %d AND n.uid <> c.uid

Patch just in case I'm not barking up the wrong tree.

Log in or register to post comments

Comment #170

cosmicdreams CreditAttribution: cosmicdreams commented 8 November 2007 at 21:59

From my reading of that altered query, it appears you are excluding the nodes that the author commented on (as well as the comments from that node that were authored by that user)

if you wanted to exclude the records from nodes that the user has authored you should have this in you're where clause:

AND n.uid <> u.uid

Log in or register to post comments

Comment #171

catch

he/him

English

CreditAttribution: catch commented 8 November 2007 at 22:08

File	Size
tracker_union_all_0.patch	4.11 KB

OK re-roll with that change:

Log in or register to post comments

Comment #172

Zothos CreditAttribution: Zothos commented 9 November 2007 at 07:45

I would prefere using the != operator. I know they are the same, but != has a greater readability :P

Log in or register to post comments

Comment #173

catch

he/him

English

CreditAttribution: catch commented 9 November 2007 at 08:31

Zothos, I dimly remember a recent development list where it was decided to standardise on <>, will leave like that for now. This could really do with benchmarks on the UNION ALL change.

Log in or register to post comments

Comment #174

catch

he/him

English

CreditAttribution: catch commented 9 November 2007 at 10:56

Status:

Needs review

» Needs work

no longer applies :(

Log in or register to post comments

Comment #175

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 13 November 2007 at 15:27

Category:

task

» bug

There are tabs issues too in the patch.

Also moving to the bugs queue, where D6 blockers are waiting in line.

Log in or register to post comments

Comment #176

catch

he/him

English

CreditAttribution: catch commented 20 November 2007 at 12:27

OK I tried to re-roll this again, but because the if $uid has been taken out of tracker.pages.inc somewhere along the line you get undefined variable on the tracker/1 page.

So this needs some love to set the $uid somewhere useful, and it still needs review and benchmarking after that. I don't think I'll have time to do that today so it'd be great if someone could take it on.

Log in or register to post comments

Comment #177

catch

he/him

English

CreditAttribution: catch commented 24 November 2007 at 10:44

Status:

Needs work

» Needs review

File	Size
tracker_union_all.patch	4.36 KB

here's a re-roll, it works.

Still needs review/benchmarks.

Log in or register to post comments

Comment #178

catch

he/him

English

CreditAttribution: catch commented 26 November 2007 at 19:31

Version:	6.x-dev	» 7.x-dev
Status:	Needs review	» Needs work

File	Size
tracker-devel.txt	472 bytes

OK well my re-roll was no good, but I fixed that in my testing install and did some very primitive profiling with devel module. Overall the main /tracker was slower by about 30-40%, and although /tracker/4 was quicker, it gave slightly different results to the current core query (about 10 nodes more in a 12 page pager).

So I'm going to bump this to D7 since there doesn't appear to be a satisfactory solution without the major schema changes (node activity counter etc.) proposed earlier in this issue. I've attached the devel results for reference.

Log in or register to post comments

Comment #179

David Strauss

he/him

CreditAttribution: David Strauss commented 10 March 2008 at 06:21

Implemented in Contrib for Drupal 5: http://drupal.org/project/tracker2

Drupal 6 and 7 should only require simplification from the Drupal 5 version.

Log in or register to post comments

Comment #180

chipk CreditAttribution: chipk commented 22 August 2008 at 20:21

FWIW - our use of tracker only needs visibility to nodes, not comments. As an interim fix, we modified 'tracker_page()' (i.e. the SQL in both $sql and $sql_count as well as the call to 'page_query()') to eliminate the join and logic related to table 'node_comment_statistics'. On a deployment of 160k+ nodes, one typical benchmark query shrank from 2680ms to 15ms. Code changes as follows:

  // TODO: These queries are very expensive, see http://drupal.org/node/105639
  if ($uid) {
    //$sql = 'SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {users} u ON n.uid = u.uid LEFT JOIN {comments} c ON n.nid = c.nid AND (c.status = %d OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = %d OR c.uid = %d) ORDER BY last_updated DESC';
    $sql = 'SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, n.changed AS last_updated FROM {node} n INNER JOIN {users} u ON n.uid = u.uid WHERE n.status = 1 AND n.uid = %d ORDER BY last_updated DESC';
    $sql = db_rewrite_sql($sql);
    //$sql_count = 'SELECT COUNT(DISTINCT(n.nid)) FROM {node} n LEFT JOIN {comments} c ON n.nid = c.nid AND (c.status = %d OR c.status IS NULL) WHERE n.status = 1 AND (n.uid = %d OR c.uid = %d)';
    $sql_count = 'SELECT COUNT(DISTINCT(n.nid)) FROM {node} n WHERE n.status = 1 AND n.uid = %d';
    $sql_count = db_rewrite_sql($sql_count);
    //$result = pager_query($sql, 25, 0, $sql_count, COMMENT_PUBLISHED, $uid, $uid);
    $result = pager_query($sql, 25, 0, $sql_count, $uid);
  }

Log in or register to post comments

Comment #181

David Strauss

he/him

CreditAttribution: David Strauss commented 26 August 2008 at 23:20

@chipk: Your implementation may work for your situation, but it undermines a large portion of what Tracker is for, which is tracking comments.

Log in or register to post comments

Comment #182

apotek CreditAttribution: apotek commented 20 October 2008 at 21:34

Title:

Tracker performance improvements

» Using UNION

Using UNION, we get 2 temporary tables created and have to run filesort twice. Not sure how this would improve things. It takes a heck of a lot longer.

Log in or register to post comments

Comment #183

Michelle

she/her

English

Wisconsin, USA

CreditAttribution: Michelle commented 20 October 2008 at 21:36

Title:

Using UNION

» Tracker performance improvements

Putting the title back. "Using UNION" is not a good title for an issue.

Michelle

Log in or register to post comments

Comment #184

catch

he/him

English

CreditAttribution: catch commented 20 October 2008 at 21:53

Latest approach to this is http://drupal.org/project/tracker2 - this needs a port to Drupal 6, then a port to Drupal 7, then conversion into a patch.

Log in or register to post comments

Comment #185

Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commented 21 October 2008 at 09:44

Considering everything, the tracker query has been proven to not really be salvagable. This issue should thus be marked "won't fix".

Porting tracker2 to core might be an option. I have discussed this with Angie and she did say that this is not a priority for her. I think it is fair to say that you shouldn't use the core tracker for a large site. We might put a warning into the docs and maybe include a link to tracker2.

Log in or register to post comments

Comment #186

catch

he/him

English

CreditAttribution: catch commented 21 October 2008 at 11:02

We should either have tracker2 in core, or remove tracker from core and rename tracker2 to tracker in contrib. Having core modules limping along with site-killing queries release after release doesn't help anyone.

Log in or register to post comments

Comment #187

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 21 October 2008 at 11:07

Is the query really site killing for your average Drupal site? I doubt it.

Log in or register to post comments

Comment #188

dww

we/he/they

CreditAttribution: dww commented 21 October 2008 at 17:11

I don't understand why we don't just put tracker2 in core. Even if it's not "a priority" for webchick doesn't mean it can't happen. A brief skim of the code looks like there's nothing that fundamentally evil about tracker2, and if anything, it'd be better if it were in core, instead of modifying core's DB schema from the outside. ;) Maybe I didn't fully grasp something about how tracker2 works, but it seems like a case where a little DB de-normalization goes a long way. Why not put it in core?

Log in or register to post comments

Comment #189

Damien Tournoud CreditAttribution: Damien Tournoud commented 21 October 2008 at 17:15

@dww: I think everyone agree that the best solution is to bring tracker2 in core. The only remaining question is: who will take the lead on that?

Log in or register to post comments

Comment #190

jrabeemer CreditAttribution: jrabeemer commented 31 October 2008 at 01:50

+1 for tracker2 in core, i was completely unaware of the slow performance of tracker module in core..

sub

Log in or register to post comments

Comment #191

kbahey CreditAttribution: kbahey commented 31 October 2008 at 02:45

I can work on porting it to Drupal 7, but only if Angie or Dries say that it would go in 7.x, and not just wasted effort.

Log in or register to post comments

Comment #192

jrabeemer CreditAttribution: jrabeemer commented 4 November 2008 at 21:54

No work is wasted effort. Worst case, it gets delayed to D8. If you can demonstrate good performance gains, it will most likely go in core.

Log in or register to post comments

Comment #193

battochir CreditAttribution: battochir commented 18 November 2008 at 11:36

subscribing

Log in or register to post comments

Comment #194

swentel CreditAttribution: swentel commented 18 November 2008 at 19:09

File	Size
tracker2.txt	19.55 KB

Ok, attached is a patch which ports tracker2 to D7. I added it with the .txt extension because I don't want the testbot to run.
Weird thing is, I did some benchmarks with 50000 nodes and noticed no real performance difference between tracker 1 & 2 at this point. So basically, try this out, see if it makes any difference and we can discuss further if this is really so critical to fix.

Log in or register to post comments

Comment #195

battochir CreditAttribution: battochir commented 18 November 2008 at 19:35

Quick question. Is there a working port of tracker2 to D6 anywhere? It's awesome that tracker2 is getting so much love for D7. Somehow, I thought D6 was important. From what I see, many people have not upgraded to D6 from D5 because many modules have not been updated yet or they are waiting for D7 because the update to D6 is so painful. D6 is a big leap from D5. Anyway, D6 tracker still has those expensive queries that hurt big-traffic sites.

edit: Sorry. I just saw that this thread is for D7. I did'nt mean to hijack the thread.

Log in or register to post comments

Comment #196

Michelle

she/her

English

Wisconsin, USA

CreditAttribution: Michelle commented 18 November 2008 at 19:45

battochir - You want this issue: #294091: Port tracker2 to 6.x

Michelle

Log in or register to post comments

Comment #197

battochir CreditAttribution: battochir commented 18 November 2008 at 19:53

Thank you Michelle. That's what I wanted to know much appreciated.

cheers,

wim

Log in or register to post comments

Comment #198

catch

he/him

English

CreditAttribution: catch commented 18 November 2008 at 20:09

Status:

Needs work

» Needs review

File	Size
tracker.patch	19.28 KB

Seems like tracker tests actually pass with this patch. Moving to needs review for a general look at it (re-rolled for some whitespace fixes, needs some comment/variable name tidying and dbtng conversion still).

Log in or register to post comments

Comment #199

dww

we/he/they

CreditAttribution: dww commented 18 November 2008 at 20:15

Status:

Needs review

» Needs work

Very quick skim revealed this:

tracker_track_user() is wrong. hook_menu() is already auto-loading the user object, but then tracker_track_user() tries to reload the user object again using arg(). tracker_track_user() isn't properly inspecting either of the arguments passed into it from the menu system, in fact.

Log in or register to post comments

Comment #200

dww

we/he/they

CreditAttribution: dww commented 18 November 2008 at 20:15

Oh, and those should be return drupal_access_denied();, etc...

Log in or register to post comments

Comment #201

swentel CreditAttribution: swentel commented 18 November 2008 at 22:47

File	Size
tracker_9.patch	19.09 KB

Still need work, but gets rid of tracker_track_user function, we don't need that at all. One thing that doesn't work yet is the 'my recent posts' tab. I simply can't get it to work, unless I change the user_uid_optional_load function. Not sure if I'm doing anything wrong, or if this is a bug in the load function, I'll try to talk to peter or chx tomorrow. Note, this also doesn't work in the current version of tracker - and also D6 suffers from it ..

// If I change isset to !empty, than the my recent tab works ..
function user_uid_optional_load($arg) {
  return user_load(isset($arg) ? $arg : $GLOBALS['user']->uid);
}

Edit: that bug is known, so waiting for #298722 to get in

Log in or register to post comments

Comment #202

catch

he/him

English

CreditAttribution: catch commented 18 November 2008 at 22:54

Log in or register to post comments

Comment #203

David Strauss

he/him

CreditAttribution: David Strauss commented 7 December 2008 at 12:07

I'll take a look at the "my recent posts" problem.

Log in or register to post comments

Comment #204

apotek CreditAttribution: apotek commented 12 January 2009 at 18:54

@Michelle (#183)
Yes, sorry about that. I'm still not used to the way these issue trackers work. I keep thinking the "Issue title" is the subject of my post, rather than a way to modify the title of the *entire thread*. It's an odd implementation when any logged in user can do that. My mistake, and certainly not intentional.

Log in or register to post comments

Comment #205

catch

he/him

English

CreditAttribution: catch commented 21 January 2009 at 17:56

Status:

Needs work

» Needs review

File	Size
tracker.patch	19.18 KB

Here's a re-roll. The bug swentel found is already well documented in HEAD, so there's no reason to slow this up.

It's possible some of this might be centralised in materialized views if/when it gets committed - but might as well keep this one live - we can't ship yet another release with this query.

Log in or register to post comments

Comment #206

21 January 2009 at 18:05

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #207

catch

he/him

English

CreditAttribution: catch commented 21 January 2009 at 18:13

Status:

Needs work

» Needs review

File	Size
tracker.patch	13.35 KB

hmm

Log in or register to post comments

Comment #208

David Strauss

he/him

CreditAttribution: David Strauss commented 21 January 2009 at 20:12

This seems to look OK, but I can't RTBC what is mostly my own code.

Log in or register to post comments

Comment #209

catch

he/him

English

CreditAttribution: catch commented 21 January 2009 at 23:26

File	Size
tracker.patch	18.32 KB

Patch was missing tracker.pages.inc changes. Also this all needs converting to dbtng. Here's benchmarks:

ab -c1 -n500

HEAD:
/tracker: 6.15 [#/sec]
/uid/track: 1.16 [#/sec]

Patch:
/tracker: 12.75 [#/sec]
/uid/track: 12.86 [#/sec]

Log in or register to post comments

Comment #210

22 January 2009 at 00:00

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #211

catch

he/him

English

CreditAttribution: catch commented 22 January 2009 at 13:05

Status:

Needs work

» Needs review

File	Size
tracker.patch	19.52 KB

OK this one updates 95% of the patch to dbtng. Gabor has indicated he might want to have a backport of this in Drupal 6, so let's get this one in.

A few todos and things needing discussion (and leaving at CNR for the test bot if nothing else).

I don't think we need a UI for the indexing variable - or if we do, it should be in admin/settings/performance instead of a tiny form by itself at admin/settings/tracker

There's a few places where it looks like we could replace some update, select, insert logic with db_merge.

A couple of places need to use db_select() because they're wrapped in db_rewrite_sql(), didn't get to those yet.

And also there's one INSERT INTO .. SELECT FROM which afaik isn't current handled by db_insert() at all - so while that shouldn't hold this patch up, it'd be good to open another issue so we can do that.

Log in or register to post comments

Comment #214

David Strauss

he/him

CreditAttribution: David Strauss commented 22 January 2009 at 19:11

I don't see a reason to "backport" this to Drupal 6. It should probably remain the separate Tracker 2 module.

Log in or register to post comments

Comment #215

catch

he/him

English

CreditAttribution: catch commented 22 January 2009 at 19:24

I didn't think it was on the table - but #362151: Test and compare tracker2, views, Drupal 6 tracker suggests otherwise. If we can do it with no UI or API changes, then there's no particular reason not to simply cp contributions/modules/tracker2 modules/tracker - then we no longer have a pointless slow tracker module in Drupal 6 core.

Would need to co-ordinate the install/upgrade path of course for people upgrading from D6 - D7.

Log in or register to post comments

Comment #219

23 February 2009 at 07:05

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #220

lilou CreditAttribution: lilou commented 17 May 2009 at 20:53

Issue tags:

+Performance

Add tag.

Log in or register to post comments

Comment #221

Berdir

German

Switzerland

CreditAttribution: Berdir commented 3 June 2009 at 13:59

Status:

Needs work

» Needs review

File	Size
tracker_18.patch	21.82 KB

Ok, I worked a bit on that one..

Converted SELECT Queries to DBTNG.. I am not sure about a few things, for example, why do we need to query node for the count query? Wouldn't that actually report wrong numbers if the tracker_node table does not yet contain all nodes?
DBTNG has currently no direct support for "INSERT INTO () SELECT ... ". Imho, there are 3 options
- Execute it with db_query and simply replace %d and so on with named parameters
- Do a select and then use multi-insert to insert the values... (That is what I did in the patch)
- Implement support for something like db_insert('tracker_node')->from($query)
Used db_merge() where possible
Renamed hook_nodeapi_$op() to hook_node_$op()
Moved hook_comment() to hook_comment_$op()
Added a update function (untested)

There are still some things to improve..
- tracker_batch_node_alter: atleast the $op = 'delete' handling is imho unecessary, that will try to delete the node twice, or not?
- _tracker_add and _tracker_remove queries the node table even when not necessary.. (no need to fetch the information we already have in hook_node_*)
- More tests, for cron stuff, comment handling and so on...

Log in or register to post comments

Comment #222

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 4 June 2009 at 08:49

"# DBTNG has currently no direct support for "INSERT INTO () SELECT ... ". Imho, there are 3 options
- Execute it with db_query and simply replace %d and so on with named parameters
- Do a select and then use multi-insert to insert the values... (That is what I did in the patch)
- Implement support for something like db_insert('tracker_node')->from($query) "

IMO DBTNG should support INSERT INTO () SELECT, otherwise I'd consider it incomplete.

Failing this, I'd use "Execute it with db_query and simply replace %d and so on with named parameters".

The nice thing about "INSEERT INSO SELECT" is that everythign is done inside mysql and php does not need to bother about this.

Log in or register to post comments

Comment #223

catch

he/him

English

CreditAttribution: catch commented 4 June 2009 at 08:54

Looks like we'll get INSERT INTO () SELECT via #481288: Add support for INSERT INTO ... SELECT FROM ...

Log in or register to post comments

Comment #224

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 4 June 2009 at 08:59

Status:

Needs review

» Needs work

Yes, great, we should wait until that has landed and then rework this patch to use it.

Log in or register to post comments

Comment #225

Berdir

German

Switzerland

CreditAttribution: Berdir commented 9 June 2009 at 12:37

Status:

Needs work

» Needs review

File	Size
tracker_20.patch	24.97 KB

Updated the patch, note that it will fail without the patch in #481288: Add support for INSERT INTO ... SELECT FROM .... As soon as that one is in, this should pass the tests.

What I did
- converted to insert query to INSERT INTO () SELECT
- fixed the admin/content/node form alter stuff for publish/unpublish, I'm pretty sure that did never work ;)
- Updated the update function to set the tracker_index_nid variable
- Added tests for cron indexing and admin/content/node unpublishing.

Log in or register to post comments

Comment #226

9 June 2009 at 12:51

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #227

Berdir

German

Switzerland

CreditAttribution: Berdir commented 22 July 2009 at 11:49

Status:

Needs work

» Needs review

Let's see if this needs a re-roll, the tests should now pass.

Log in or register to post comments

Comment #228

22 July 2009 at 11:55

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #229

Berdir

German

Switzerland

CreditAttribution: Berdir commented 22 July 2009 at 18:34

Status:

Needs work

» Needs review

File	Size
tracker_21.patch	25.18 KB

Ok well, re-roll...

Log in or register to post comments

Comment #230

22 July 2009 at 18:46

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #231

Berdir

German

Switzerland

CreditAttribution: Berdir commented 22 July 2009 at 19:11

Status:

Needs work

» Needs review

File	Size
tracker_22.patch	25.18 KB

Wrong path in the tests...

Log in or register to post comments

Comment #232

31 July 2009 at 11:53

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #233

Berdir

German

Switzerland

CreditAttribution: Berdir commented 3 August 2009 at 16:18

Status:

Needs work

» Needs review

File	Size
tracker_23.patch	25.17 KB

Another path changes... re-rolled.

Log in or register to post comments

Comment #234

Berdir

German

Switzerland

CreditAttribution: Berdir commented 5 August 2009 at 20:55

File	Size
tracker_24.patch	24.95 KB

I can't see a reason why the special count query on {node} is necessary and it does give false results during indexing... simply removing it seems to work just fine...

Log in or register to post comments

Comment #235

catch

he/him

English

CreditAttribution: catch commented 5 August 2009 at 22:13

1000 nodes, anonymous user, no page caching:

HEAD:

Concurrency Level:      1
Time taken for tests:   58.802 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      6383000 bytes
HTML transferred:       6150500 bytes
Requests per second:    8.50 [#/sec] (mean)
Time per request:       117.605 [ms] (mean)
Time per request:       117.605 [ms] (mean, across all concurrent requests)
Transfer rate:          106.01 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   108  118  10.8    114     181
Waiting:      104  114  10.6    110     175
Total:        108  118  10.8    114     181

Patch:

Concurrency Level:      1
Time taken for tests:   49.343 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      6383000 bytes
HTML transferred:       6150500 bytes
Requests per second:    10.13 [#/sec] (mean)
Time per request:       98.687 [ms] (mean)
Time per request:       98.687 [ms] (mean, across all concurrent requests)
Transfer rate:          126.33 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    93   99   6.4     97     149
Waiting:       89   95   6.2     93     143
Total:         94   99   6.4     97     149

This doesn't really need benchmarking though since the same code is responsible for Drupal.org being up at the moment instead of down.

Quick nitpick review on the patch:

+    drupal_set_message(t('Tracker will index from node %nid downward.', array('%nid' => $max_nid)));

Don't think we need this. However maybe indexing one batch of nodes on enable would help to avoid wtfs?

+ * Create new tables.

Should say "Create the new tracker_node and tracker_user tables.

+        'description' => 'The id of the node.',

ID should be uppercase.

+      '#markup' => t('Max node ID for indexing on the next cron run: @max', array('@max' => $max_nid)),
+    );
+  }
+  else {
+    $form['max_nid'] = array(
+      '#markup' => t('Existing nodes have finished tracker indexing.'),

"All content has finished tracker indexing". Not sure what to do about Max node ID here, this is why I hate content/node/post inconsistency.

+function tracker_batch_node_alter($form, &$form_state) {

No phpdoc.

+ * Implement hook_nodeapi_insert().

need updating to hook_node_insert(), the function itself is right though.

+function _tracker_add($nid, $uid, $changed) {

No phpdoc, same for other private functions here.

+   // Add a comment to the last node as other user.

indentation is off.

Log in or register to post comments

Comment #236

catch

he/him

English

CreditAttribution: catch commented 15 August 2009 at 17:17

File	Size
tracker.patch	20.24 KB

Implemented my own review.

Log in or register to post comments

Comment #237

15 August 2009 at 17:35

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #238

catch

he/him

English

CreditAttribution: catch commented 15 August 2009 at 18:58

Status:

Needs work

» Needs review

File	Size
tracker.patch	25.81 KB

Forgot -N.

Log in or register to post comments

Comment #239

23 August 2009 at 12:37

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #240

catch

he/him

English

CreditAttribution: catch commented 24 August 2009 at 00:59

Status:

Needs work

» Needs review

File	Size
tracker.patch	25.45 KB

re-rolled.

Log in or register to post comments

Comment #241

catch

he/him

English

CreditAttribution: catch commented 24 August 2009 at 01:01

File	Size
tracker.patch	25.98 KB

Minus a syntax error.

Log in or register to post comments

Comment #242

24 August 2009 at 01:20

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #243

catch

he/him

English

CreditAttribution: catch commented 24 August 2009 at 01:40

Status:

Needs work

» Needs review

File	Size
tracker.patch	5.57 KB

Can't reproduce locally, trying again.

Log in or register to post comments

Comment #244

catch

he/him

English

CreditAttribution: catch commented 24 August 2009 at 01:50

File	Size
tracker.patch	25.98 KB

doh. sorry.

Log in or register to post comments

Comment #245

catch

he/him

English

CreditAttribution: catch commented 24 August 2009 at 02:37

File	Size
tracker.patch	25.88 KB

Removing a pointless drupal_set_message() from tracker.install

Log in or register to post comments

Comment #246

pwolanin CreditAttribution: pwolanin commented 24 August 2009 at 13:33

It's unclear to me what tracker_cron is doing - is this only supposed to run once per node to initially populate the table - i.e. instead of trying to do a large batch operation when the module is installed?

Log in or register to post comments

Comment #247

David Strauss

he/him

CreditAttribution: David Strauss commented 24 August 2009 at 14:11

@pwolanin
"is this only supposed to run once per node to initially populate the table - i.e. instead of trying to do a large batch operation when the module is installed?"

Yes. Large sites can't build all trackers all at once.

Log in or register to post comments

Comment #248

pwolanin CreditAttribution: pwolanin commented 24 August 2009 at 14:41

Status:

Needs review

» Needs work

ok, then please add some code comments to that effect.

Log in or register to post comments

Comment #249

drewish CreditAttribution: drewish commented 24 August 2009 at 14:52

These seem pointless:

+  // Create tables.

+  // Remove tables

but I guess there's a couple other instances of them in core.

The schema descriptions for the tracker_node and tracker_user tables seem way too terse and don't really explain what they do.

The field descriptions also don't seem to match up with user.install and node.install. Which say things like:

Primary Key: {users}.uid for user.

And in the image.module we got told we had to add foreign key info ala:

  $schema['image_effects'] = array(
    'description' => 'Stores configuration options for image effects.',
    'fields' => array(
      'ieid' => array(
        'description' => 'The primary identifier for an image effect.',
        'type' => 'serial',
        'unsigned' => TRUE,
        'not null' => TRUE,
      ),
      'isid' => array(
        'description' => 'The {image_styles}.isid for an image style.',
        'type' => 'int',
        'unsigned' => TRUE,
        'not null' => TRUE,
        'default' => 0,
      ),
// ...
    ),
// ...
    'foreign keys' => array(
      'isid' => array('image_styles' => 'isid'),
    ),
  );

Incorrect indenting:

+function tracker_update_7000() {
+    $schema['tracker_node'] = array(

These comments need to be proper sentences and end in periods:

+      // Prepare a starting point for the next run

+      // If all nodes have been indexed, set to zero to skip future cron runs

I feel like this comment is wrong... it's a callback not a callback argument. Also wouldn't this be better off in .pages.inc or .admin.inc?:

+/**
+ * Menu callback argument. Prints a listing of active nodes on the site.
+ */
+function tracker_admin_settings() {

The setting for this either needs to be a select with a fixed list of values (like search module) or it needs a validate function to ensure that they provide a number.

+  $form['tracker_batch_size'] = array(
+    '#title' => t('Batch size'),
+    '#description' => t('Number of items to index during each cron run.'),
+    '#type' => 'textfield',
+    '#size' => 6,
+    '#maxlength' => 7,
+    '#default_value' => variable_get('tracker_batch_size', 1000),
+  );

Why is tracker_batch_node_alter() not called tracker_batch_node_submit()?

Not related to this patch specifically but I was trying to figure out if tracker_comment_update() and tracker_comment_publish() were doing the same thing and realized that hook_comment_unpublish() is never getting called. Need to open a new issue for that.

I feel like this comment is pointless and should be dropped, considering it's right before a function named add_css... Or baring that it needs a period:

+  // Add CSS
+  drupal_add_css(drupal_get_path('module', 'tracker') . '/tracker.css', array('preprocess' => FALSE));

The closing curly brace and else need to be on separate line:

+  } else {
+    $query = db_select('tracker_node', 't')->extend('PagerDefault');
   }

These comments need to be proper sentences and end in periods:

+    // Now, get the data and put into the placeholder array

+    // Finally display the data

Initially I'd thought there was duplicate code in this test then realized it just needs comments explaining that you're checking the user's page first then the site wide tracker page.

+    $this->drupalLogin($this->user);
+    $this->drupalGet('tracker/' . $this->user->uid);
+
+    // Assert that all node titles are displayed.
+    foreach ($edits as $i => $edit) {
+      $this->assertText($edit['title'], t('Node @i is displayed on the tracker listing pages.', array('@i' => $i)));
+    }
+    $this->assertText('1 new', t('New comment is counted on the tracker listing pages.'));
+    $this->assertText('updated', t('Node is listed as updated'));
+
+    $this->drupalGet('tracker');
+
+    // Assert that all node titles are displayed.
+    foreach ($edits as $i => $edit) {
+      $this->assertText($edit['title'], t('Node @i is displayed on the tracker listing pages.', array('@i' => $i)));
+    }
+    $this->assertText('1 new', t('New comment is counted on the tracker listing pages.'));

The comment refers to a "comment" when it should refer to a "node".

+    // Unpublish the comment.
+    $edit = array(
+      'operation' => 'unpublish',
+      'nodes[' . $node->nid . ']' => $node->nid,
+    );
+    $this->drupalPost('admin/content', $edit, t('Update'));

Log in or register to post comments

Comment #250

drewish CreditAttribution: drewish commented 24 August 2009 at 15:21

File	Size
tracker_105639.patch	20.79 KB

I started addressing some of the issues I raised but ran out of time. Also removed some junk from tracker_page()—like $header that wasn't used and drupal_add_css() that duplicated the '#attached_css'.

Log in or register to post comments

Comment #251

catch

he/him

English

CreditAttribution: catch commented 24 August 2009 at 15:30

drewish, thanks for the review and the re-roll. I'm moving continent tomorrow so unlikely to be able to look at this again before the end of the week, hoping someone else can jump in on remaining cleanup.

Log in or register to post comments

Comment #252

drewish CreditAttribution: drewish commented 24 August 2009 at 16:09

Well this has been a nice change from what I've been working on so I'll see if I can't clear up my remaining issues this afternoon.

Log in or register to post comments

Comment #253

drewish CreditAttribution: drewish commented 24 August 2009 at 22:22

Status:

Needs work

» Needs review

File	Size
tracker_105639.patch	27.33 KB

My last patch was missing the .install file... corrected that and fixed all the schema problems I'd identified. I also did some more clean up on the settings form. I fixed a bug in the DBTNG conversion tracker_cron() where it was supposed to be matching all users other than the author.

For some reason I can't do anything with simpletest locally so i'll get some bot feedback.

One other issue I noticed is that we're double processing the comments. We catch them on both insert/update and publish. I don't know if it's a huge deal I just wanted to mention that it seemed a little tacky to me.

Log in or register to post comments

Comment #254

catch

he/him

English

CreditAttribution: catch commented 25 August 2009 at 00:28

Looked over this, I think we should kill the settings page. While we expose a setting for search, I've seen indexing 100 nodes cause 25,000 queries - since it can be an insert for every word, loading thousands of comments etc. - whereas the tracker indexing is a known quantity. We can leave the variable there for sites which need to raise it, but it's unlikely to need lowering.

Spotted a couple of missing periods in tracker.test as well, but otherwise not much to complain about (obviously I didn't spot these when re-rolling myself).

Log in or register to post comments

Comment #255

drewish CreditAttribution: drewish commented 25 August 2009 at 02:44

File	Size
tracker_105639.patch	25.84 KB

dropped the admin settings page. and cleaned up the comment ops where we were doing expensive no ops.

Log in or register to post comments

Comment #256

catch

he/him

English

CreditAttribution: catch commented 25 August 2009 at 09:44

OK this is looking a lot tidier, but do we need the no-op hook implementation? Seems like that would be better documented in _publish()

+/**
+ * Implement hook_comment_insert().
+ *
+ * @see tracker_comment_publish()
+ */
+function tracker_comment_insert($comment) {
+  // Do nothing here. If the comment is published we'll take care of adding the
+  // user in tracker_comment_publish(). If it's unpublished, we don't want to
+  // bother adding to the tracker.
+}

Log in or register to post comments

Comment #257

drewish CreditAttribution: drewish commented 25 August 2009 at 15:56

I'm okay with dropping that but it seemed harder to document it on the other hook. I'm not sure which is a worse example for core to set, not having the operation be clear risking some one "fixing" it later, or having an empty hook implementation. I'll roll a no-op free version.

Log in or register to post comments

Comment #258

drewish CreditAttribution: drewish commented 25 August 2009 at 16:08

File	Size
tracker_105639.patch	25.63 KB

Log in or register to post comments

Comment #259

David Strauss

he/him

CreditAttribution: David Strauss commented 25 August 2009 at 16:37

Status:

Needs review

» Needs work

Is tracker_form_node_admin_content_alter() still necessary? Don't D6 and later call normal node_save() stuff even for batches in the admin area?

$comment = (array) $comment; should go. It was in my code because the hook variously got objects and arrays. We don't have that problem anymore.

Log in or register to post comments

Comment #260

Berdir

German

Switzerland

CreditAttribution: Berdir commented 25 August 2009 at 16:54

I tested the content_alter thingy and afaik, it is still required for publish/unpublish, I removed it for the delete.

Log in or register to post comments

Comment #261

drewish CreditAttribution: drewish commented 25 August 2009 at 21:20

Status:

Needs work

» Needs review

File	Size
tracker_105639.patch	24.66 KB

had a diversion to get devel_generate working again then was able to test the bulk updates. looks like publish/unpublish now calls the full node_save() so we can drop tracker_form_node_admin_content_alter() and tracker_batch_node_submit() since they're just doing double processing.

Log in or register to post comments

Comment #262

drewish CreditAttribution: drewish commented 25 August 2009 at 21:36

File	Size
tracker_105639.patch	24.55 KB

Realizing our test coverage sucks. Two big fatal errors that went uncaught:
- tracker_comment_delete() was calling _tracker_delete() which isn't even a real function.
- _tracker_remove() was referencing {comments} which has been renamed to {comment}.

So things that need to be tested before this is committed:
- Adding unpublished comment: Commenter should not be in tracker.
- Adding published comment: Commenters and owner should be in tracker.
- Publishing a comment: Commenters and owner should be in tracker.
- Deleting a comment: Owner still in tracker, commenter no longer in tracker.
- Need better testing of the per user tracker.

Log in or register to post comments

Comment #263

drewish CreditAttribution: drewish commented 28 August 2009 at 07:52

File	Size
tracker_105639.patch	24.55 KB

this fixes up some of the comment tests.

Log in or register to post comments

Comment #264

drewish CreditAttribution: drewish commented 28 August 2009 at 07:54

File	Size
tracker_105639.patch	27.8 KB

that was clearly the same version of the patch.

Log in or register to post comments

Comment #265

TheRec CreditAttribution: TheRec commented 28 August 2009 at 17:08

Status:

Needs review

» Needs work

+++ modules/tracker/tracker.install
@@ -0,0 +1,215 @@
+    'description' => "Tracks when nodes were last changed or commented on, for each user that authored the node or one of its comments.",

Single quotes should be used to be consistant with the rest of the file (2 occurences of the exact same line in the patch)

+++ modules/tracker/tracker.install
@@ -0,0 +1,215 @@
+      'users' => 'uid'

Missing a comma at the end (2 occurences of the exact same line in the patch)

+++ modules/tracker/tracker.pages.inc
@@ -8,63 +8,67 @@
+    $result = db_query("SELECT n.nid, n.title, n.type, n.changed, n.uid, u.name, l.comment_count FROM {node} n INNER JOIN {node_comment_statistics} l ON n.nid = l.nid INNER JOIN {users} u ON n.uid = u.uid WHERE n.nid IN (:nids)", array(':nids' => array_keys($nodes)));

Single quotes should be used to be consistant with the rest of the file.

+++ modules/tracker/tracker.test
@@ -51,22 +49,37 @@ class TrackerTest extends DrupalWebTestCase {
+    $this->assertNoText($unpublished->title, t("Unpublished nodes do not show up in the author's tracker listing."));
+    $this->assertText($my_published->title, t("Nodes show up in the author's tracker listing."));
+    $this->assertNoText($other_published_no_comment->title, t("Other user's nodes do not show up in the user's tracker listing."));
+    $this->assertText($other_published_my_comment->title, t("Other user's nodes do not show up in the user's tracker listing."));

Single quotes should be used to be consistant with the rest of the file.

Beer-o-mania starts in 3 days! Don't drink and patch.

Log in or register to post comments

Comment #266

drewish CreditAttribution: drewish commented 28 August 2009 at 17:30

Status:

Needs work

» Needs review

File	Size
tracker_105639.patch	28.08 KB

corrected the issues TheRec raised. also cleaned up the tests a bit more. this is pretty close aside from publishing/unpublishing comments.

Log in or register to post comments

Comment #267

catch

he/him

English

CreditAttribution: catch commented 29 August 2009 at 07:55

Status:

Needs review

» Reviewed & tested by the community

Since this is a schema change, I think we're better of trying to get it in before freeze, and add those final tests in a followup. If you beat webchick or Dries to a commit, please leave at RTBC.

Log in or register to post comments

Comment #268

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 29 August 2009 at 08:00

Status:

Reviewed & tested by the community

» Needs work

Just glanced through this patch quickly, but...

+++ modules/tracker/tracker.module
@@ -55,11 +57,103 @@ function tracker_menu() {
+function tracker_perm() {

that's gonna be a problem. ;)

Beer-o-mania starts in 2 days! Don't drink and patch.

Log in or register to post comments

Comment #269

catch

he/him

English

CreditAttribution: catch commented 29 August 2009 at 08:03

Status:

Needs work

» Reviewed & tested by the community

File	Size
tracker.patch	27.99 KB

We removed the settings page, so this no longer needs a permission to administer it, doh.

No other changes.

Log in or register to post comments

Comment #270

pwolanin CreditAttribution: pwolanin commented 30 August 2009 at 13:15

Issue tags:

+API change

add tag

Log in or register to post comments

Comment #271

Dries CreditAttribution: Dries commented 31 August 2009 at 06:52

Status:

Reviewed & tested by the community

» Fixed

Reviewed this patch and it looks good. Thanks!

Log in or register to post comments

Comment #272

14 September 2009 at 07:00

Status:	Fixed	» Closed (fixed)
Issue tags:	-Performance, -API change

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Comment #273

14 September 2009 at 07:00

Issue tags:

+Performance, +API change

Restoring issue tags, see #2125755: System messages removed all issue tags during D7 upgrade.

Log in or register to post comments

Tracker performance improvements

Comments