node_tag_new() currently does a merge query on every authenticated page view, to update the {history} table.
afaik history is used for only three purposes:
1. To determine if a node is 'new' for that user or not - whether they ever visited it.
2. To determine if a node has new comments since the last time the user visited it.
3. To determine if the node was updated since the last time the user visited it.
It seems like we could check the node updated and last comment timestamp times, then compare them to the history table, then only do an insert, if one of those three conditions is met - and possibly also every few hours regardless of how many times they've visited the node. This would remove the insert on any repeat page views, and it's not likely to have any real noticeable cost when the value actually needs to be updated. This would be a similar approach to that taken for updating user_access().
| Comment | File | Size | Author |
|---|---|---|---|
| #16 | drupal-node_tag_new-667152-16.patch | 1.19 KB | igor.ro |
| #14 | drupal-node_tag_new-667152-14.patch | 1.12 KB | rjbrown99 |
| #11 | node_tag_new.patch | 1.86 KB | catch |
| #9 | node_tag_new.patch | 1.63 KB | catch |
| #8 | node_tag_new.patch | 1.36 KB | catch |
Comments
Comment #1
catchHere's a patch. We only update history if it's > 30 minutes since the last time we updated, if the node was updated, or if a comment was posted. That should avoid any hammering from people hitting refresh but still keep it accurate enough.
Completely unrealistic benchmarks but they show the difference from removing the insert, benchmarked with auth session cookie. However they're very disappointing. The query takes around 7ms on my system, which ought to be in the region of 10% of request time, but I'm only seeing a couple percent improvement. It may be the benchmarks are wrong though?
HEAD:
Patch:
Comment #2
catchHmm, profiling shows it's about 3% unpatched and 0.3% patched - that at least means the node_last_viewed() and node_load() are very little overhead, however that means the benchmarks are about right, at least with a 1 node, 2 users data set.
However I also realised we have #165061: New indexes on flood.timestamp and history.timestamp for cron runs. open, which if we patch it (and we probably should for stability) will mean slower inserts on this table. i benchmarked with an index on history.timestamp though, and got very similar results to above, so I dunno if this is worth it really. It's relatively harmless but might be a no-op.
Comment #3
dries commentedNo patch?
Comment #4
catchArgh.
Comment #5
alexanderpas commentedspelling
Comment #6
dries commentedThis makes sense and I support it. It sounds a little complex. Everywhere we call
node_tag_new()we call it usingnode_tag_new($node->nid);so it seems like we could pass in $node instead of $nid and simplify the logic drastically. I support us doing that, even at this stage.node_tag_new()is not really called by other modules, AFAIK.Comment #7
catchYeah that would massively simplify the patch, and I agree it doesn't really get used as an API function. Will also see about generating large numbers of dummy rows in {history} to see if I can get those inserts dragging some more for benchmarks. CNW for now.
Comment #8
catchHere it is with the param change.
Comment #9
catchNow with the threshold as a variable - that way sites which need this 100% accurate, or sites which want to update it less frequently can do so without hacking core. We don't need a UI for the variable though. I haven't tested this in all possible situations yet.
Comment #10
dries commentedThis looks good, but let's add a small code comment to explain why we are doing this.
Comment #11
catchPut the comment back in from previous patches, sloppy work on my part :(
Comment #12
dries commentedCommitted to CVS HEAD. Thanks.
Comment #14
rjbrown99 commentedI like avoiding INSERT/UPDATE queries so I rolled a slightly different backport for D6. Instead of modifying node_tag_new (which would break a number of widely used contrib modules such as ctools) I added the avoidance logic directly to node_show(). It won't catch all cases but should work for the biggest offender.
Doubtful this ever makes its way into D6 but I thought it was worth posting back in case someone else has a desire to use it.
Comment #15
igor.ro commentedI suggest to move update request to the shutdown function.
This will make bulk update in once.
@Dries what do you think about this?
Comment #16
igor.ro commentedComment #18
arunvs commented#16: drupal-node_tag_new-667152-16.patch queued for re-testing.
Comment #20
ianthomas_ukThe patch on 7.x-dev was rolled back in https://drupal.org/node/704362
I can't see this being patched in 6.x at this stage, so setting version to 7.x-dev, although 8.x might be more appropriate. The equivalent function in 8.x is history_write() in history.module.
Comment #21
catchYes this would need to go into 8.x first.
Comment #22
xjm(Merging "node system" and "node.module" components for 8.x; disregard.)
Comment #41
acbramley commentedThis was committed but the issue was never closed. Marking as fixed (although the function no longer exists)