Support from Acquia helps fund testing for Drupal Acquia logo

Comments

qgil’s picture

At the same time, admin/node reflects the correct list, puttin updated nodes on the top when they are edited.

drewish’s picture

playing bug bingo when i ended up here. i just wanted to point out that you'll probably have better luck getting a response if you can verify that the bug still exists in the most current 4.6 release... i know that i'm not too interested in tracking down bugs in old releases.

magico’s picture

Version: 4.6.4 » 4.6.9

Still happening?

dww’s picture

Assigned: Unassigned » dww
Status: Active » Needs review
FileSize
3.15 KB

yes it's still this way in 4.6, 4.7 and HEAD. foolishly, i didn't see this issue when i recently submitted a patch for this. http://drupal.org/node/79611. i've got patches in that issue for 4.7 and HEAD. here's a patch for 4.6 to fix this. not sure if it'll be considered a bug fix or a feature, but i hope it's consided a bug and gets in...

dww’s picture

here's a patch for 4.7

dww’s picture

and here's HEAD (see http://drupal.org/node/79611#comment-126559 for more on why this patch uses ! not @ for the placeholder in the call to t()).

webchick’s picture

Status: Needs review » Reviewed & tested by the community

Tested, works. RTBC.

dww’s picture

Version: 4.6.9 » x.y.z

in the off chance that someone considers this a feature, not a bug, i'm moving the version to "cvs" to give this enough visibility to make it in before the next code freeze. i'd still like to see it back-ported to 4.7.x and 4.6.x (patches already attached).

drumm’s picture

Category: bug » feature

I'd consider this a feature.

drumm’s picture

Status: Reviewed & tested by the community » Needs work

This is taking an order by that was a regular field and changing it to order on a computed value. I'd expect this to be slower. I'd like to see some benchmarks.

dww’s picture

Category: feature » bug
Status: Needs work » Needs review

by request, some ab-generated benchmarks on a totally clean HEAD install with bogus content generated via recommendations in http://drupal.org/node/79237.

test machine specs:
g4 powerbook laptop, 1gig ram, macos 10.4.7
Apache/1.3.33
mysql 4.1.19
PHP 4.4.2

pre-patch:

Document Path:          /drupal-cvs/?q=tracker
Document Length:        10002 bytes

Concurrency Level:      1
Time taken for tests:   221.887 seconds
Complete requests:      100
Failed requests:        87
   (Connect: 0, Length: 87, Exceptions: 0)
Broken pipe errors:     0
Total transferred:      1050436 bytes
HTML transferred:       1002136 bytes
Requests per second:    0.45 [#/sec] (mean)
Time per request:       2218.87 [ms] (mean)
Time per request:       2218.87 [ms] (mean, across all concurrent requests)
Transfer rate:          4.73 [Kbytes/sec] received

Connnection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0     0    0.1      0     2
Processing:  1886  2219  279.8   2159  3462
Waiting:     1885  2218  279.7   2159  3462
Total:       1886  2219  279.8   2159  3462

post-patch:

Document Path:          /drupal-cvs/?q=tracker
Document Length:        10023 bytes

Concurrency Level:      1
Time taken for tests:   235.479 seconds
Complete requests:      100
Failed requests:        86
   (Connect: 0, Length: 86, Exceptions: 0)
Broken pipe errors:     0
Total transferred:      1050386 bytes
HTML transferred:       1002086 bytes
Requests per second:    0.42 [#/sec] (mean)
Time per request:       2354.79 [ms] (mean)
Time per request:       2354.79 [ms] (mean, across all concurrent requests)
Transfer rate:          4.46 [Kbytes/sec] received

Connnection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0     0    0.4      0     5
Processing:  1995  2354  231.8   2294  3015
Waiting:     1995  2354  232.1   2293  3014
Total:       1995  2355  231.8   2294  3015

summary
pre:
Requests per second: 0.45 [#/sec] (mean)
Time per request: 2218.87 [ms] (mean)

post:
Requests per second: 0.42 [#/sec] (mean)
Time per request: 2354.79 [ms] (mean)

therefore, the time per request is 135.92 ms slower (mean), or 135.92/2218.87 = 6% slower with the patch.

however, before anyone gets all bent out of shape about a 6% slow-down, i want to remind everyone that:

  1. correctness is more important than performance -- *every* drupal site i've ever setup has solicited complaints from users about this bug. everyone expects that when you edit a page, it should show up at the top of the tracker.
  2. this is just 1 possible benchmarking installation, and the numbers might be a lot different in more real environments

if it was 10% or more, i'd be more worried about this, but ~6% doesn't seem like the end of the world to me -- a small price to pay for correctness and a more intuitive UI. ;)

oh, and i've already gotten killes, webchick, and a few others to agree this is a bug. furthermore, killes said he'd apply the 4.7-specific patch as soon as this hits the trunk. ;) so, i'm moving this back to being a bug report.

anything else i can do to shepard this along, just let me know...

thanks,
-derek

chx’s picture

If the tracker misses new content then it's a bug. And editing means new content. Hence, it's a bug.

dww’s picture

not sure this is CHANGELOG-worthy, but if so, here's the same patch along with a draft of the CHANGELOG entry (which i put under the "content system" blurb, instead of adding a new section w/ 1 entry for "tracker module").

Steven’s picture

Whether an edit constitutes 'new content' is open for discussion, but IMO depends mostly on the site's purpose.

There is already the option of modifying the published date (either manually, or by emptying it and thus setting it to the submission time) to force a post to show up as newer.

qgil’s picture

In Drupal the tracker module is defined as "The tracker module displays the most recently added or updated content to the website". This was the behaviour of this module a while ago, until it stopped listing updated content. Therefore this is a bug.

If it's not a bug but a feature request, then the defintion of the Tracker module at http://drupal.org/handbook/modules/tracker needs an update. Put in perspective, though, this would be a loss of a (useful) feature.

Maybe some sites don't need this but then it could be a feature request to make this behaviour optional and selectable.

notabenem’s picture

Yes, make it selectable please, then you have the best of both words.

dww’s picture

in IRC, steven said i hadn't presented convincing enough arguments. here goes... ;)

just to clarify the existing behavior and what happens with my fix, imagine this scenario:

  • you've got 300 nodes on your site
  • people are posting new nodes and comments on the more recent nodes
  • the 1st page of the tracker for a given user shows nodes 300-280 as "new", and in the "replies" column there would be the little "N new" link on the ones they've already read that have had new comments posted since they read the original.
  • someone edits node 15...

given the current behavior, node 15 would now be marked as "updated", but would still be burried on page 16 of your tracker. if the site was 30,000 nodes instead of 300, you'd basically never have any idea this content was added to the site, since you'd never bother looking through 100s of pages of tracker output, scanning for little red "update" markers that could be anywhere.

with my patch, node 15 would show up on page 1, along with all the other "recent" content on your site. it'll still be marked as "updated" (not "new", unless the user had never read the original), but at least they'd be able to find it. true, there's no specific comment that would be marked with the little 'new' decoration, there's no other UI to immediately mark/indicate the "diff" from the previous revision of the node (unless someone's using the diff.module and/or has access to the revisions tab), but i still believe visitors to the page about "recent posts" should know that this post was recently changed.

as another example, the admin/content/node page (the primary "posts" administrative UI) sorts by n.changed, so when you edited node 15, it would show up at the top of your posts page, even if it was buried on page 16 (or 537) of your tracker output. in fairness, the posts page *only* sorts on n.changed, so if you comment on an issue that hasn't been added or edited in a while, you won't see that at the top of the posts page, only at the top of the comments page (admin/content/comment). however, this is the detailed admin UI. the end-user UI for "recent posts" should show you the merged view of all "recent content", as the name suggests and users expect (at least given the handful of sites i've setup, with very different user bases). that's why the computed value comparing both n.changed and the last_comment_timestamp is the right value to sort by, even if it's slightly more expensive. the detailed UI for admins makes sense, but i think end users want to see a simple, unified view of new content. the existing UI is already good about indicating if it's a new comment (in the replies column) a change to the node itself (the "updated" in the "post" column) or an entirely new post ("new"), even if the ordering of all 3 is interwoven on the same page. the only thing missing from the current UI is that "updated" content can be burried anywhere, and is therefore impossible to find.

re: make it selectable -- adding an administrative UI and setting to make this behavior configurable would be a needless complication. we'd have to introduce a settings page for the tracker (which currently doesn't require any settings). ultimate flexibility would be selecting if you wanted to sort by last_comment_timestamp (last post) n.changed (last edit) or both. however, at that point, you might as well just install views and configure it exactly how you want. ;) let's keep the core tracker simple, but intuative and powerful, and have it sort by both.

re: "[it] depends mostly on the site's purpose." -- that's true. so, for sites that never edit old content, they'd never see any difference in behavior. however, for sites that ever edit existing content, those edits won't be impossibly hard to find by keeping an eye on the "recent posts" page. think d.o handbook pages -- don't we want site visitors to know when we updated the handbook page about the coding conventions, or how to best generate patches, etc? of course, on d.o, the scale is so huge that just watching recent posts you'd be overwhelmed with new issues and forum posts, anyway, but if you could sort the tracker by node type (http://drupal.org/node/83190) then i'd certainly want the "tracker/book" page to show me the most-recently-edited pages, not just the pages people most recently added comments to.

RTBC? ;)

thanks,
-derek

drumm’s picture

Version: x.y.z » 6.x-dev
Status: Needs review » Needs work

patching file modules/tracker/tracker.module
Hunk #1 succeeded at 77 with fuzz 2 (offset -8 lines).
Hunk #2 FAILED at 108.
1 out of 2 hunks FAILED -- saving rejects to file modules/tracker/tracker.module.rej
patching file CHANGELOG.txt
Hunk #1 succeeded at 59 (offset 11 lines).

dww’s picture

Version: 6.x-dev » 5.x-dev
Status: Needs work » Needs review
FileSize
4 KB

re-rolled for current HEAD. it's still a bug in 5.x (and 4.7.x for that matter), so i'm not sure why you bumped this to 6.x-dev. ;) hoping you'll reconsider. killes already said he'd backport to 4.7.x once it hits HEAD.

drumm’s picture

6.x is my default choice when the patch doesn't apply and I haven't done any other review.

dww’s picture

ahh, gotcha. ;) so there's still hope for this patch in 5.0 afterall... cool.

thanks,
-derek

Dries’s picture

Status: Needs review » Needs work

Looks like a bug to me.

-1 on the CHANGELOG.txt.

Do we still need to select l.last_comment_timestamp ? I think the query can be shortened.

dww’s picture

Status: Needs work » Needs review
FileSize
3.24 KB

ahh, good point. we need it in GREATEST(), but not as a separate field in the SELECT. new patch fixes that, and leaves out the CHANGELOG.txt change completely.

i won't bother supplying an updated 4.7.x backport of this until it's committed to HEAD, in case there are any other final adjustments that need to be made.

thanks,
-derek

dww’s picture

by request, some more SQL benchmarks on these 2 queries. b/c of the query cache, i ran the same query on the scratch DB with a different LIMIT value a bunch of times, and inter-wove calls to each query a little (e.g. 2 or 3 from 1 style, then switch to the other). not sure what else to do to keep it roughly accurate. anyway, here are the results:

old:

100 rows in set (3.74 sec)
101 rows in set (4.45 sec)
102 rows in set (4.10 sec)
103 rows in set (5.27 sec)
104 rows in set (4.79 sec)
105 rows in set (4.19 sec)
106 rows in set (3.84 sec)
20 rows in set (5.75 sec)

avg: 4.51 secs/query

new:

100 rows in set (6.84 sec)
101 rows in set (5.47 sec)
102 rows in set (6.56 sec)
103 rows in set (5.00 sec)
104 rows in set (7.04 sec)
105 rows in set (6.33 sec)
106 rows in set (4.04 sec)
20 rows in set (4.13 sec)

avg: 4.51 secs/query

so, by these (potentially flawed) numbers, the new query is more like 25% slower. i still don't think that's a deal breaker. again, correctness is more important than performance.

finally, the EXPLAIN output:

old:

mysql> EXPLAIN SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name, l.last_comment_timestamp
AS last_post, l.comment_count FROM node n INNER JOIN users u ON n.uid = u.uid
INNER JOIN node_comment_statistics l ON n.nid = l.nid WHERE n.status = 1 ORDER BY last_post DESC LIMIT 20;
+-------+--------+------------------------------+---------+---------+-------+--------+---------------------------------+
| table | type   | possible_keys                | key     | key_len | ref   | rows   | Extra                           |
+-------+--------+------------------------------+---------+---------+-------+--------+---------------------------------+
| l     | ALL    | PRIMARY                      | NULL    |    NULL | NULL  | 101720 | Using temporary; Using filesort |
| n     | ref    | PRIMARY,uid,node_status_type | PRIMARY |       4 | l.nid |      1 | Using where                     |
| u     | eq_ref | PRIMARY                      | PRIMARY |       4 | n.uid |      1 | Using where                     |
+-------+--------+------------------------------+---------+---------+-------+--------+---------------------------------+

new:

mysql> EXPLAIN SELECT DISTINCT(n.nid), n.title, n.type, n.changed, n.uid, u.name,
GREATEST(n.changed, l.last_comment_timestamp) AS last_updated, l.comment_count FROM node n
INNER JOIN users u ON n.uid = u.uid INNER JOIN node_comment_statistics l ON n.nid = l.nid
WHERE n.status = 1 ORDER BY last_updated DESC LIMIT 20;
+-------+--------+------------------------------+---------+---------+-------+--------+---------------------------------+
| table | type   | possible_keys                | key     | key_len | ref   | rows   | Extra                           |
+-------+--------+------------------------------+---------+---------+-------+--------+---------------------------------+
| l     | ALL    | PRIMARY                      | NULL    |    NULL | NULL  | 101720 | Using temporary; Using filesort |
| n     | ref    | PRIMARY,uid,node_status_type | PRIMARY |       4 | l.nid |      1 | Using where                     |
| u     | eq_ref | PRIMARY                      | PRIMARY |       4 | n.uid |      1 | Using where                     |
+-------+--------+------------------------------+---------+---------+-------+--------+---------------------------------+

surprisingly, even the old query has to do an entire filesort walk through the {node_comment_statistics} table, as does the new query. so this isn't going to be orders of magnitude different, no matter how bad the ORDER BY clause is. ;)

there, is it RTBC yet? ;)

thanks,
-derek

dww’s picture

whoops, typo in the avg's.
old: 4.51 sec
new: 5.67 sec
diff: 1.16 sec
% change: ~25%

chx’s picture

Status: Needs review » Reviewed & tested by the community

If there is no serious speed regression (ie. a new filesort) then I have no complaints. (Correctness indeeds overrules speed concerns but not when it kills the scalability -- however this is not the case here.)

drumm’s picture

Status: Reviewed & tested by the community » Needs work

Lets keep string changes out of this patch and keep "Last post" the same.

Anyone have speed improvements? I don't think it is a dealbreaker, but they are always nice to have.

dww’s picture

Status: Needs work » Needs review
FileSize
2.85 KB

here's one that keeps "Last post" (even though i think that's misleading, and worse from a usability standpoint, though i know we were hoping for a string freeze). however, i'm still having to change one of the t() calls anyway, since there's a minor bug (using @ instead of ! for something that doesn't need check_plain()). so, if we're going to break a string anyway, why not break 2 and use the more correct name for the UI (#23)? *shrug*

dww’s picture

for good measure (and so this never has to go back to "needs work" again...) here's yet another option. ;) this one doesn't change any t() calls at all, so we retain the ! vs. @ bug i mentioned before (which isn't a security hole, just bloat/cruft).

dww’s picture

i think the only way to improve the speed of this would be a DB schema change, which is a huge no-no at this point in the release. we'd have to add a new column to the {node} table, sort of like "changed" called something like "updated". we'd need to set it anytime either the node is edited, or a comment is posted. we'd have an index on this. that would avoid:

- the GREATEST()
- ORDER BY on a computed value
- one of the INNER JOIN's (on {node_comment_statistics})

i'm guessing the speedup could be quite huge. but, as i said, probably a very contentious DB schema change, and it's *way* too late in D5 for that.

otherwise, i don't know what we could do to speed this up any more, but i'd be happy to be wrong about that if someone has any bright ideas. ;)

thanks,
-derek

Dries’s picture

Maybe it is worth adding a TODO to the code? It would explain how we could speed things up in future versions of Drupal.

I'm OK with this patch, and with changing those two strings. Correctness is more important that freezing the strings.

dww’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
3.32 KB

i created http://drupal.org/node/105639 (no sense leaving a mini-book as a comment in the code).

new patch with both of the more accurate (but changed) t() calls restored, and this:
// TODO: These queries are very expensive: http://drupal.org/node/105639a

i'll be bold: RTBC. ;)

thanks,
-derek

dww’s picture

(hehe, sorry about that "a" at the end of the link -- it's just in the post, not the patch... still RTBC) ;)

drumm’s picture

Status: Reviewed & tested by the community » Fixed

Committed to HEAD.

dww’s picture

Version: 5.x-dev » 4.7.x-dev
Status: Fixed » Patch (to be ported)

cool, thanks!

i'll work on a 4.7 backport (probably with the frozen string for the table column header) when i have a spare moment...

dww’s picture

Status: Patch (to be ported) » Needs review
FileSize
3.17 KB

here's a version that changes the column header so we continue to be accurate and match HEAD. otherwise, identical query and comment as the version from HEAD. tested locally on a 4.7.x-dev test site, just to be sure. RTBC, except for the issue of the string freeze...

dww’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
2.92 KB

and here's the one with the string freeze. either of these 2 are RTBC, i'll leave it to dries/killes to decide which one to commit. ;)

thanks,
-derek

dww’s picture

whoops, a few extra spaces in the string_freeze version.

killes@www.drop.org’s picture

Status: Reviewed & tested by the community » Fixed

applied

Anonymous’s picture

Status: Fixed » Closed (fixed)