Node history markers require an HTTP request [#2073535]

After #1991684: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user, node history markers require an HTTP request in order to get the last viewed timestamp for the current user of the nodes on the page.

Towards the end of that issue we started discussing alternatives to this, but since that patch in itself blocks entity render caching, opening this as a follow-up instead.

Very quick summary of possible approaches:

Instead of fetching recently read nids on demand, pre-load at least some of them (the last 20 the user read, all of them?)

This causes a problem where users could potentially read (or 'read') hundreds of nodes on a site in a month - think social networking sites, forums, Drupal.org issues. HISTORY_READ_LIMIT could be made configurable, or we could try to come up with some other approach like loading the last x nodes, then fetching remaining nids via HTTP request if that's not enough.

Comment	File	Size	Author
#18	interdiff.txt	4.84 KB	Wim Leers
#18	history_personalization_http_reqs-2073535-18.patch	12.74 KB	Wim Leers
#18
#15	interdiff.txt	4.15 KB	Wim Leers
#15	history_personalization_http_reqs-2073535-15.patch	12.59 KB	Wim Leers
#15
#14	history_personalization_http_reqs-2073535-14.patch	9.5 KB	Wim Leers
#14

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

catch

he/him

English

CreditAttribution: catch commented 25 August 2013 at 07:37

Another idea:

Select the oldest timestamp timestamp from the history table for a user on every request and stick it in drupalSettings.

Include the updated timestamp and last_comment_timestamp in the markup. If it's older then the oldest timestamp for that user, nothing is going to be new or updated for that user so there's no point looking it up.

Comment #2

thedavidmeister CreditAttribution: thedavidmeister commented 25 August 2013 at 09:25

just going to mention that all the fetched nids, whether pre-loaded or pulled in via a HTTP request "on demand" are cached in localStorage.

For users with hundreds of nodes in their history that are "new", there probably needs to be a cap on how many can be pulled into localStorage, or considered "new" at all, regardless of what we do to avoid extra requests.

Comment #3

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 26 August 2013 at 06:39

#1: that's a good additional heuristic, except that it will almost always exceed the "30 days" mark, beyond which we always just mark nodes/comments as read.

I still don't see how prefetching NIDs "last read timestamps for the X most recently read nodes" is going to have a big impact; it may have an impact, but it might just as well not. It'd be much better to just know which nodes are on the current page (and as said in #1991684-40: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user, we already know that!) and then just #attach all necessary "last read timestamps" data in such a way that it won't end up in the render cache. I.e. the #uncached_attached that I was talking about.

Comment #4

catch

he/him

English

CreditAttribution: catch commented 26 August 2013 at 10:13

@Wim it won't exceed the 30 days mark because entries older than that get cleared out on hook_cron(). It's true though that a frequent visitor to a site is always going to have a timestamp that's 29-30 days though.

Thinking about #uncached_attached. Could we not #attach the nid + last updated/commented time which is fine in the render cache, then at the end of request/page building (same spot as js/css is collected), collect the nids and query history for those in one go. That'd be one query so better than the per-node query in Drupal 7 and fine with caching no? That would save a new #attached, but we'd need a generic mechanism to access stuff in #attached so history module can access it at the right point.

Comment #5

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 26 August 2013 at 11:01

No, attaching the NID and whatnot is pointless, it's not the problem that we need to solve. The #cache array is built for each page load anyway, in order to calculate the cache key!

The problem is: if you want to embed the current user's last read time stamp, it must happen through drupalSettings and to do that you must use #attached, and if you do that, then it will end up in the render cache!

(I agree with the overall concept, but that's also what I've been saying; the problem is that with our current infrastructure we can't do it, hence my proposal for something like #uncached_attached.)

Comment #6

thedavidmeister CreditAttribution: thedavidmeister commented 26 August 2013 at 11:41

I'd be interested in a formal proposal for more fine-grained cache settings for #attached (why just cached and uncached? why not support page/user/role granularities too?) that is well thought out and has some proof of concept patches. I am a little bit concerned that there's only one use-case (AFAIK) for the addition, but I'm sure we could find more examples if we wanted to look.

That said, the pre-fetching/cache warming approach is a potential fix for the exact problem with node history markers outlined here and in other issues and it doesn't require API modifications/additions, so in the end it might just be easier to get over the line for D8.

Comment #7

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 26 August 2013 at 11:59

#6: the reason cached vs. uncached makes sense is because it's the simplest possible thing that could work: when uncached, use logic to determine what should be attached.

If you're going to define "this is attached for roles X and Y", then … indeed you're entering tricky territory, because it's ~~nigh~~ impossible to anticipate all possible contexts. Just "uncached" is free of context.

"Prefetching the data for the most recently read X nodes" is flawed, as I've explained numerous times by now, in this issue and in the other. Cache warming is something else, and that'd be even more flawed, because for some things you will inherently have to do requests because you can't be certain that the client-side cached data is the latest. So the benefit would be negligible.
Embedding the metadata on the page rather than doing HTTP requests is only going to work if you embed all the required data. Otherwise, you're still going to end up doing HTTP requests for the exact same scenarios that #1991684: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user is already making requests for. The only benefit would be to not have a HTTP request to "warm the cache", after that it'd be exactly the same as what's already being done in that issue.

You may want to read #1991684-40: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user, thedavidmeister.

Comment #8

thedavidmeister CreditAttribution: thedavidmeister commented 26 August 2013 at 12:36

reading now :)

Comment #9

thedavidmeister CreditAttribution: thedavidmeister commented 26 August 2013 at 14:12

sorry, I didn't read anything. I got hijacked in IRC. I will go over this again as soon as I get the chance.

Comment #10

catch

he/him

English

CreditAttribution: catch commented 27 August 2013 at 08:30

@Wim I mean collecting the nids at the very end of drupal_render_page() and then putting into drupalSettings then, nothing gets into cache then (unless it's the full page cache but that's fine).

Comment #11

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 27 August 2013 at 16:28

#10: How would you implement that cleanly? Could you implement that cleanly with the APIs we have today? I don't yet see how.

Comment #12

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 11 September 2013 at 16:38

#1991684: Node history markers (comment & node "new" indicator, "x new comments" links) forces render caching to be per user landed, so we can now continue here.

Comment #13

andypost

he/him

Russian

CreditAttribution: andypost commented 15 October 2013 at 10:40

Component:

base system

» history.module

If we going to have #1029708: History table for any entity then improvement should affect all entities so probably better to wrap history logic into #2081585: [META] Modernize the History module so allow contrib to easily swap the service with more performant implementation that could be specific to target site.

The HISTORY_READ_LIMIT also should be defined as class constant to be overridable

Comment #14

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 23 October 2013 at 15:56

Assigned:	Unassigned	» Wim Leers
Status:	Active	» Postponed
Issue tags:		+Performance, +D8 cacheability

File	Size
history_personalization_http_reqs-2073535-14.patch	9.5 KB

This is now blocked on #2118703: Introduce #post_render_cache callback to allow for personalization without breaking the render cache. The discussion about potential solutions that we had in this issue in comments #1–#11 is incorporated there.

Once that is in, this patch solves this issue.

Comment #15

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 6 December 2013 at 15:23

Issue summary:	View changes
Status:	Postponed	» Needs review
Issue tags:		+Spark, +sprint

File	Size
history_personalization_http_reqs-2073535-15.patch	12.59 KB

interdiff.txt	4.15 KB

1 file was hidden/shown/deleted

File	Size
history_personalization_http_reqs-2073535-14.patch	9.5 KB

#post_render_cache/#2118703: Introduce #post_render_cache callback to allow for personalization without breaking the render cache landed a while ago. Several big performance patches already leveraged it. Now let's do this one :)

While rerolling, I found a bug in the above patches: the comment ID instead of the node ID was being used to retrieve the last read timestamp.

And I also found a subtle bug for #post_render_cache usage, despite the battery of tests that was added by #2118703: #post_render_cache callbacks for child elements are only processed when #cache is set. This broke the attaching of the last read timestamp as drupalSettings on individual comments: the callback would never be called.

Comment #16

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 9 December 2013 at 09:29

Priority:

Normal

» Critical

Since this gets rid of one HTTP request per page — on many pages, for all authenticated users on most sites, this is at least major, probably critical.

Comment #17

larowlan

🇦🇺🏝.au GMT+10

CreditAttribution: larowlan commented 9 December 2013 at 22:47

+++ b/core/modules/comment/comment.module
@@ -1731,3 +1739,54 @@ function comment_library_info() {
+function comment_attach_new_comments_link_metadata(array $element, array $context) {

My only observation is perhaps this could be a static method on the CommentViewBuilder object for the sake of lazy loading (ie instead of having it in .module and loaded all the time).

Comment #18

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 3 January 2014 at 15:13

File	Size
history_personalization_http_reqs-2073535-18.patch	12.74 KB

interdiff.txt	4.84 KB

2 files were hidden/shown/deleted

File	Size
history_personalization_http_reqs-2073535-15.patch	12.59 KB

interdiff.txt	4.15 KB

#17: good point. Done.

I was tempted to do the same for history_attach_timestamp(), but it feels wrong to have a public static attachTimestamp() method on HistoryController. What do you think?

Comment #19

3 January 2014 at 15:22

The last submitted patch, 18: history_personalization_http_reqs-2073535-18.patch, failed testing.

Comment #20

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 3 January 2014 at 16:34

18: history_personalization_http_reqs-2073535-18.patch queued for re-testing.

Comment #21

moshe weitzman CreditAttribution: moshe weitzman commented 3 January 2014 at 19:22

Status:

Needs review

» Reviewed & tested by the community

I think we are good here.

Comment #22

nod_

French

Lille

CreditAttribution: nod_ commented 5 January 2014 at 12:21

Issue tags:

+JavaScript

Comment #23

catch

he/him

English

CreditAttribution: catch commented 6 January 2014 at 12:06

Status:

Reviewed & tested by the community

» Needs work

This uses history_read() for each node.

Could we not use history_read_multiple() somehow? Won't help much with a page of comments, but node teasers/views listings that need last read timestamp I'd expect it to help - we actually added that for the original issue moving this to a separate request iirc.

Comment #24

Wim Leers

Ghent 🇧🇪🇪🇺

CreditAttribution: Wim Leers commented 10 January 2014 at 11:51

Status:

Needs work

» Reviewed & tested by the community

#23: each "thing" on the page declares which #post_render_cache callbacks it wants, with which context. To be able to use one history_read_multiple() call for *all* the "things" on the page that say they want some history read, we need to make the code responsible for invoking #post_render_cache callbacks smart enough to be able to merge multiple contexts together. That is not trivial.

It might be possible, but it would require a significant change in how #post_render_cache works, and might require additional metadata to be declared ("is mergeable"). Therefor, that is significantly out of scope.

If you think it's crucial, then we should probably postpone this issue on getting support for that in #post_render_cache. Your call. Back to RTBC for now.