After #1991684: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user, node history markers require an HTTP request in order to get the last viewed timestamp for the current user of the nodes on the page.

Towards the end of that issue we started discussing alternatives to this, but since that patch in itself blocks entity render caching, opening this as a follow-up instead.

Very quick summary of possible approaches:

Instead of fetching recently read nids on demand, pre-load at least some of them (the last 20 the user read, all of them?)

This causes a problem where users could potentially read (or 'read') hundreds of nodes on a site in a month - think social networking sites, forums, Drupal.org issues. HISTORY_READ_LIMIT could be made configurable, or we could try to come up with some other approach like loading the last x nodes, then fetching remaining nids via HTTP request if that's not enough.

.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Another idea:

Select the oldest timestamp timestamp from the history table for a user on every request and stick it in drupalSettings.

Include the updated timestamp and last_comment_timestamp in the markup. If it's older then the oldest timestamp for that user, nothing is going to be new or updated for that user so there's no point looking it up.

thedavidmeister’s picture

just going to mention that all the fetched nids, whether pre-loaded or pulled in via a HTTP request "on demand" are cached in localStorage.

For users with hundreds of nodes in their history that are "new", there probably needs to be a cap on how many can be pulled into localStorage, or considered "new" at all, regardless of what we do to avoid extra requests.

Wim Leers’s picture

#1: that's a good additional heuristic, except that it will almost always exceed the "30 days" mark, beyond which we always just mark nodes/comments as read.

I still don't see how prefetching NIDs "last read timestamps for the X most recently read nodes" is going to have a big impact; it may have an impact, but it might just as well not. It'd be much better to just know which nodes are on the current page (and as said in #1991684-40: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user, we already know that!) and then just #attach all necessary "last read timestamps" data in such a way that it won't end up in the render cache. I.e. the #uncached_attached that I was talking about.

catch’s picture

@Wim it won't exceed the 30 days mark because entries older than that get cleared out on hook_cron(). It's true though that a frequent visitor to a site is always going to have a timestamp that's 29-30 days though.

Thinking about #uncached_attached. Could we not #attach the nid + last updated/commented time which is fine in the render cache, then at the end of request/page building (same spot as js/css is collected), collect the nids and query history for those in one go. That'd be one query so better than the per-node query in Drupal 7 and fine with caching no? That would save a new #attached, but we'd need a generic mechanism to access stuff in #attached so history module can access it at the right point.

Wim Leers’s picture

No, attaching the NID and whatnot is pointless, it's not the problem that we need to solve. The #cache array is built for each page load anyway, in order to calculate the cache key!

The problem is: if you want to embed the current user's last read time stamp, it must happen through drupalSettings and to do that you must use #attached, and if you do that, then it will end up in the render cache!

(I agree with the overall concept, but that's also what I've been saying; the problem is that with our current infrastructure we can't do it, hence my proposal for something like #uncached_attached.)

thedavidmeister’s picture

I'd be interested in a formal proposal for more fine-grained cache settings for #attached (why just cached and uncached? why not support page/user/role granularities too?) that is well thought out and has some proof of concept patches. I am a little bit concerned that there's only one use-case (AFAIK) for the addition, but I'm sure we could find more examples if we wanted to look.

That said, the pre-fetching/cache warming approach is a potential fix for the exact problem with node history markers outlined here and in other issues and it doesn't require API modifications/additions, so in the end it might just be easier to get over the line for D8.

Wim Leers’s picture

#6: the reason cached vs. uncached makes sense is because it's the simplest possible thing that could work: when uncached, use logic to determine what should be attached.

If you're going to define "this is attached for roles X and Y", then … indeed you're entering tricky territory, because it's nigh impossible to anticipate all possible contexts. Just "uncached" is free of context.

"Prefetching the data for the most recently read X nodes" is flawed, as I've explained numerous times by now, in this issue and in the other. Cache warming is something else, and that'd be even more flawed, because for some things you will inherently have to do requests because you can't be certain that the client-side cached data is the latest. So the benefit would be negligible.
Embedding the metadata on the page rather than doing HTTP requests is only going to work if you embed all the required data. Otherwise, you're still going to end up doing HTTP requests for the exact same scenarios that #1991684: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user is already making requests for. The only benefit would be to not have a HTTP request to "warm the cache", after that it'd be exactly the same as what's already being done in that issue.

You may want to read #1991684-40: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user, thedavidmeister.

thedavidmeister’s picture

reading now :)

thedavidmeister’s picture

sorry, I didn't read anything. I got hijacked in IRC. I will go over this again as soon as I get the chance.

catch’s picture

@Wim I mean collecting the nids at the very end of drupal_render_page() and then putting into drupalSettings then, nothing gets into cache then (unless it's the full page cache but that's fine).

Wim Leers’s picture

#10: How would you implement that cleanly? Could you implement that cleanly with the APIs we have today? I don't yet see how.

Wim Leers’s picture

andypost’s picture

Component: base system » history.module

If we going to have #1029708: History table for any entity then improvement should affect all entities so probably better to wrap history logic into #2081585: [META] Modernize the History module so allow contrib to easily swap the service with more performant implementation that could be specific to target site.

The HISTORY_READ_LIMIT also should be defined as class constant to be overridable

Wim Leers’s picture

Assigned: Unassigned » Wim Leers
Status: Active » Postponed
Issue tags: +Performance, +D8 cacheability
FileSize
9.5 KB

This is now blocked on #2118703: Introduce #post_render_cache callback to allow for personalization without breaking the render cache. The discussion about potential solutions that we had in this issue in comments #1#11 is incorporated there.

Once that is in, this patch solves this issue.

Wim Leers’s picture

Issue summary: View changes
Status: Postponed » Needs review
Issue tags: +Spark, +sprint
FileSize
12.59 KB
4.15 KB

#post_render_cache/#2118703: Introduce #post_render_cache callback to allow for personalization without breaking the render cache landed a while ago. Several big performance patches already leveraged it. Now let's do this one :)

While rerolling, I found a bug in the above patches: the comment ID instead of the node ID was being used to retrieve the last read timestamp.

And I also found a subtle bug for #post_render_cache usage, despite the battery of tests that was added by #2118703: #post_render_cache callbacks for child elements are only processed when #cache is set. This broke the attaching of the last read timestamp as drupalSettings on individual comments: the callback would never be called.

Wim Leers’s picture

Priority: Normal » Critical

Since this gets rid of one HTTP request per page — on many pages, for all authenticated users on most sites, this is at least major, probably critical.

larowlan’s picture

+++ b/core/modules/comment/comment.module
@@ -1731,3 +1739,54 @@ function comment_library_info() {
+function comment_attach_new_comments_link_metadata(array $element, array $context) {

My only observation is perhaps this could be a static method on the CommentViewBuilder object for the sake of lazy loading (ie instead of having it in .module and loaded all the time).

Wim Leers’s picture

FileSize
12.74 KB
4.84 KB

#17: good point. Done.

I was tempted to do the same for history_attach_timestamp(), but it feels wrong to have a public static attachTimestamp() method on HistoryController. What do you think?

The last submitted patch, 18: history_personalization_http_reqs-2073535-18.patch, failed testing.

Wim Leers’s picture

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

I think we are good here.

nod_’s picture

Issue tags: +JavaScript
catch’s picture

Status: Reviewed & tested by the community » Needs work

This uses history_read() for each node.

Could we not use history_read_multiple() somehow? Won't help much with a page of comments, but node teasers/views listings that need last read timestamp I'd expect it to help - we actually added that for the original issue moving this to a separate request iirc.

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community

#23: each "thing" on the page declares which #post_render_cache callbacks it wants, with which context. To be able to use one history_read_multiple() call for *all* the "things" on the page that say they want some history read, we need to make the code responsible for invoking #post_render_cache callbacks smart enough to be able to merge multiple contexts together. That is not trivial.

It might be possible, but it would require a significant change in how #post_render_cache works, and might require additional metadata to be declared ("is mergeable"). Therefor, that is significantly out of scope.

If you think it's crucial, then we should probably postpone this issue on getting support for that in #post_render_cache. Your call. Back to RTBC for now.

moshe weitzman’s picture

Assigned: Wim Leers » catch
catch’s picture

Status: Reviewed & tested by the community » Fixed

There's no regression in the number of queries compared to D7, so let's stick with history_read(). Committed/pushed to 8.x.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Wim Leers’s picture

Assigned: catch » Wim Leers
Issue tags: -sprint