Updated: Comment #0

Problem/Motivation

If we want to improve the page cache by removing the "content" cache tag (see #2124957: Replace 'content' cache tag with 'rendered' and use it sparingly), then we must ensure that cache tags exist for all user-modifiable pieces of content on the page. Including any entities that displayed in a View but rendered using the Entity/Field API render pipeline. (i.e. using "Content" as the thing to "Show", not "Fields" — that's a completely different problem space since that uses Views' render pipeline).

Entities rendered by the Entity/Field API render pipeline do contain the proper cache tags, but they're lost somewhere in Views' rendering process. Fixing that is what this issue is about.

Steps to reproduce

  1. Install Drupal 8 in the Standard profile.
  2. Enable page caching (go to admin/config/development/performance, check the "Use internal page cache" setting and set the "Page cache maximum age" setting to e.g. 3 minutes).
  3. Create one node (with a title, a taxonomy term or two and something in the body field), that is visible on the front page.
  4. Visit the front page as the anonymous user

Expected: page cache entry for the front page. This is the case.

Also expected: the page cache entry has the appropriate page cache tags: for the node entity, for the node's author's user entity, for the node's tags' taxonomy term entities. This is not the case.

Proposed resolution

I started debugging this, and thought I found why cache tags were not being bubbled up. StylePluginBase::renderGroupingSets() only called drupal_render() on the render array of the entity, without collecting the cache tags. That was easy enough to solve. The problem is that this is unfortunately too late…

That is, HtmlPageController::content() invokes ViewPageController->handle(), which after several layers of calls, results in a call to DisplayPluginBase::render()… which simply does this:

  public function render() {
    $element = array(
      '#theme' => $this->themeFunctions(),
      '#view' => $this->view,
    );
    $element['#attached'] = &$this->view->element['#attached'];

    return $element;
  }

At this point, amazingly, the damage is done! Because HtmlPageController::getContentResult() will receive that render array (which, as you can clearly see, will never contain cache tags), which HtmlPageController::content() will turn into a HTML fragment, which will cause drupal_render() to be invoked on that render array. Doing so, will cause the appropriate theme function to be called, which will do the entire rendering process… but theme functions must by definition return only a HTML string and not a render array (which could contain cache tags!).

I think that if you look at this call stack, it's pretty clear it's simply impossible to bubble up the cache tags. Now, I don't know the theme system that well, and I don't know the Views render pipeline at all. But AFAICT… if we want cache tags to be bubbled up by Views correctly, we need to refactor the way a View is rendered quite significantly.

Remaining tasks

User interface changes

None.

API changes

TBD.

Comments

wim leers’s picture

Issue tags: +Spark, +sprint
wim leers’s picture

dawehner’s picture

+++ b/core/modules/views/lib/Drupal/views/Plugin/views/style/StylePluginBase.php
@@ -467,10 +467,14 @@ public function renderGroupingSets($sets, $level = 0) {
+            // Collect cache tags for each row render array, bubble them up.
+            $cache_tags = drupal_render_collect_cache_tags($render, $cache_tags);

I try to figure out the difference between these cache_tags and the one coming from CachePluginBase::getCacheTags ... are they fundamentally different?

wim leers’s picture

#4: yes. See #2183039: Views' cache tag-based caching only sets cache tags for related entities based on Views relationships, should also do so for entity reference fields for details about that. In a nutshell: the cache tags generated by Views are only for the entities they render, not for the entities referenced by fields *in* that entity, nor for "meta" things like the cache tag for the text format used by the body field.

The patch is a first (but very, very incomplete) step at getting those cache tags. But they're not yet being bubbled up, and that's a big, big problem.

wim leers’s picture

Priority: Major » Critical

Marking critical since the parent issue is critical and this blocks the parent.

dawehner’s picture

+++ b/core/modules/views/lib/Drupal/views/Plugin/views/style/StylePluginBase.php
@@ -467,10 +467,14 @@ public function renderGroupingSets($sets, $level = 0) {
             $set['rows'][$index] = drupal_render($render);
...
+            // Collect cache tags for each row render array, bubble them up.
+            $cache_tags = drupal_render_collect_cache_tags($render, $cache_tags);
             // Row render arrays cannot be contained by style render arrays.

Is there any reason we could not / should not drop the use of the drupal_render() call?

On top of that some kind of testcoverage would certainly be nice here.

wim leers’s picture

Yes, "tests" are listed in the remaining tasks. Totally agreed.

To answer your question: are you suggesting we don't call drupal_render() and simply bubble up the render array? I'd totally be in favor of that.
But IIRC that is not possible because higher levels in the Views render pipeline assumes that HTML strings are passed around, not render arrays.

This patch is not at all a proposed solution, it's just a first simple step that I could make, to demonstrate the problem :)

jibran’s picture

damiankloip’s picture

As mentioned a few times now, and in light of stuff like #2222847: Simplify Views cache tags to just a list tag per entity type. IMO we definitely do not want this. The general rule I feel we should go with now is 'Do not do anything based on rows for cache tags in views'.

I think we want to push on with #2221433: Clean up views rendering. Move stuff from template_preprocess_views_view(), into a #pre_render callback, then add #cache to the render array there. This should consist of tags for the views, and our entity type/bundle list tags.

Thoughts? Leaving the issue properties as they are for now.

wim leers’s picture

#10: Can you please elaborate? It's not fully clear to me what you're saying.

Especially 'Do not do anything based on rows for cache tags in views' is confusing to me, because if plugins responsible for rendering rows cannot declare which cache tags they contain, then how will we ever be able to tag a View with e.g. the taxonomy terms that are referenced by a node that is rendered in a View?

damiankloip’s picture

how will we ever be able to tag a View with e.g. the taxonomy terms that are referenced by a node that is rendered in a View?

That is precisely where we will get many issues I think. If you do that, you will only have stuff tagged for e.g. page 1 of your view. There are many many things that could change the order of that, or mean the data should be invalidated. If it is not invalidated, you are stuck with a cached page. Any change could potentially change the result set of the view, so all cache items for that view would need to be invalidated anyway?

moshe weitzman’s picture

cache tags aren't supposed to be perfect. it is OK if we sometimes serve stale pages. i expect that folks are going to have to do time based expiration but our best effort on cache tags will let them use long expiration times, instead of short. so, i think it is perfectly fine to only care about the rows appearing on page 1.

wim leers’s picture

I think we should distinguish between two "types" of cache tags in the case of Views:

  1. Invalidation because the entities matching the View's query have changed. This is what #2222847: Simplify Views cache tags to just a list tag per entity type was about, and it should be sufficient.
    Example: If a View lists the 5 last nodes by authors that live in Belgium, then it should also tag the cache entry with user cache tags (even if the author isn't rendered by the nodes at all).
    I.e.: cache tags based on information in the query.
  2. Invalidation because the rendered representation of the entities matching the View's query render something that the View couldn't possibly have known about
    Example: If a View lists the 5 last nodes by authors that live in Belgium, and it renders the taxonomy terms associated with each node, then it should also tag the cache entry with the taxonomy term cache tags. Why? Because when a taxonomy term's label changes, that doesn't invalidate the query results, nor does it change the listing. It only means that the rendered View is now invalid.
    I.e.: cache tags based on the rendered output.

I hope that's a sufficiently clear example.

damiankloip’s picture

If it was fine to only care about what happens on page 1 we could have just left in the cache tags implementation we had before :)

I don't think we can really ship this saying "might work, might not". If you only care about page 1 you could get a stale version of that, go to page 2 then see some of the same results you see on the first page.

To take your example above, we should try to care about the fact it is using users but not actually add cache tags like user:1 user:2, that IMO will just not work.

berdir’s picture

I still can't really follow your explanations :)

As I understand it, we will add these cache tags *additionally* to the list tag. So anything that affects the list directly can still clear that cache tag. But by letting the cache tags from the contained entries/rows bubble up, we can also react to changes that are not directly related to that view, like displayed tags or a change to the text format used to render a text field and so on.

Yes, there are many examples, like the one from WimLeers where changing a user to no longer be from belgium would affect the order of all nodes, we we have to clear the list tag. But I don't get how this issue can make anything worse than before?

The only thing I'm worried about is the amount of cache tags and that we need to think about if certain cache tags are really worth it. Let's assume that you have a large frontpage with a set of views with different content that display 50 to 100 nodes, a bunch of blocks, and so on.. how many cache tags will you have? Is there a limit to what we're able to handle (how big is the cache tag column? what happens if it gets too large?) So maybe you don't want to let cache tags for all 100 nodes on that page bubble up into the page cache as you know that saving any of them will also clear the list cache anyway which is enough for all relevant pages to be rebuilt...

Another example, being able to re-act to image style changes. What's worse, having to do a full cache clear when you actually do change an image style or the need to check a cache tag or a set of cache tags more likely on every single page, when image styles are never going to change outside of config syncs on production systems anyway (which does a dfac())?

dawehner’s picture

Let's be clear, we don't have a clue about the list result invalidation yet On the other hand though I do agree that single element tags
would help to improve the lifetime of a time based cache entry.

In general I think we should try to not expose a cache configuration without time, at least for views, as this is not really what people want.

damiankloip’s picture

Let's assume that you have a large frontpage with a set of views with different content that display 50 to 100 nodes

This is going to generate some massive cache tag entries for a view. If we are talking about having the entity type, bundle, id, any related entities like user/author, taxonomy terms. Then x that by amount of rows. You end up with a silly amount of cache tags.

As I understand it, we will add these cache tags *additionally* to the list tag

.. But I don't get how this issue can make anything worse than before?

Yes, these will be additional to the list tags but I think we will basically need the list tags all the time for this to function remotely as expected. So why bother with these?

So maybe you don't want to let cache tags for all 100 nodes on that page bubble up

So, if you don't let them all bubble up, or even if you do (you wouldn't have all of them anyway, not for all potential pages of a view) you could still get incomplete tag data...

Say your first page has no basic pages on, because you have promoted lots of articles to the front page. That view will then have no idea if contains page nodes..until it hits a page that has some on it. Then only that cache entry will have tags for that type. So then invalidating by the page bundle from that point of view will not clear out all of your cache items for that view. Same goes for IDs, related terms - whatever it is. Not sure if that makes any more sense ?! :)

catch’s picture

This is going to generate some massive cache tag entries for a view. If we are talking about having the entity type, bundle, id, any related entities like user/author, taxonomy terms. Then x that by amount of rows. You end up with a silly amount of cache tags.

It's a silly amount of cache tags. But an array of 500 or 1000 items or more is still smaller than the individual render array for a single node in some cases. Or for that matter a View object with multiple displays (if you're using blocks with #pre_render instead of views cache plugins you don't even need to get the view at all).

We'll also have backends (at least one of Varnish, Redis, MongoDB) where the tag invalidations don't have to be loaded each request, because they can clear directly. In the case of something like MongoDB/Redis there'd be no need necessarily to load the cache tags with the cache item on get() at all - it can just store when it's set + delete on invalidation.

We should absolutely try to set up an extreme example of say ten views blocks with ten items each referencing ten more items and see what happens. That's going to be an IN() query with more than 1,000 items, but we already generate similar queries on huge pages via path alias pre-loading and it's not been the bottleneck compared to everything else that's happening. Good case for #2241377: [meta] Profile/rationalise cache tags anyway.

wim leers’s picture

#19: documented that test case in #2241377: [meta] Profile/rationalise cache tags.


It's been almost a month. Any news? :)

damiankloip’s picture

I'm still not convinced there is any sense in doing this. I still think any tags we do need to generate should be based on entity meta data and not rows.

wim leers’s picture

Can you then please explain how you see that yielding correct cache invalidation results?

Let's take the front page view as an example. It just lists all published nodes, in descending order according to publication date, paginated. Each entity (cfr. "entity metadata") is rendered in the "teaser" view mode. That means that e.g. the "tags" field will also be rendered. If a taxonomy term is renamed, then the render cache item of the view (or the containing page) should be invalidated. AFAICT that won't be the case unless the cache tags for each rendered entity (cfr. "rows") are bubbled up.

P.S.: just fixed a HTML typo in #14, which prevented the majority of the text there from being rendered… That essentially contains the same message as the one here: #14.1 corresponds to "entity metadata" and #14.2 corresponds to "rows".

damiankloip’s picture

What you are talking about is still per entity, which is exactly what I am saying is not right. We should maybe have another call about this, because I am bored of going over the same stuff over and over :) Plus I know I am not communicating what is in my mind as well as I could.

So, in the above example, your view cache entry will only get invalidated if there is a cache entry with that term on already. So if that was E.g. on page 3 then page 3 gets invalidated, and you are left with cached versions of page 1 and page 2 still. So either way, all cache items related to a view need to be cleared. Partially clearing odd pages is not a good option IMO.

By entity metadata, I mean entity info. We need to calculate the entity types/filters etc.. in a view and use that to generate list tags. In my mind, that is about as good as it's going to get. Plus I am still not convinced at all about storing a crazy amount of tags for each item. Peoples databases will fill up if we start doing this everywhere.

wim leers’s picture

So, in the above example, your view cache entry will only get invalidated if there is a cache entry with that term on already. So if that was E.g. on page 3 then page 3 gets invalidated, and you are left with cached versions of page 1 and page 2 still. So either way, all cache items related to a view need to be cleared. Partially clearing odd pages is not a good option IMO.

If one the entities being listed themselves is modified (in this case: nodes), then I agree, partial clearing is not a good option. That's exactly why we have the "entity list cache tags".

In this case, though, there is zero risk of harm when clearing only page 3 if only page 3 contains this modified term.

By entity metadata, I mean entity info. We need to calculate the entity types/filters etc.. in a view and use that to generate list tags. In my mind, that is about as good as it's going to get.

If I understand you correctly, then you're saying "entity list cache tags are not enough, because they're not specific enough and will hence cause too many invalidations; therefore we need entity filter/query-specific cache tags". To which I say: YES! But only as a next step! Baby steps!
Because even if we have that, the "taxonomy term reference field formatter" use case from above is still not going to be solved by this.

Plus I am still not convinced at all about storing a crazy amount of tags for each item. Peoples databases will fill up if we start doing this everywhere

Please, no, let's not reiterate this again. catch has put it very clearly: let's first put cache tags on everything, and *then* worry about performance. We already have an issue for that, see #2241377: [meta] Profile/rationalise cache tags.

damiankloip’s picture

What if your view has a filter on it that is using the taxonomy term field? You will then either see duplicates or you will not see what you are expecting on the page you are expecting it.

wim leers’s picture

#25: AFAICT that's wrong, both in the current scenario and the one you're envisioning.

But first: if you modify a taxonomy term, then that won't modify its ID, so if you're filtering based on the taxonomy term that you're modifying, nothing changes in terms of which entities are shown and where they are shown; only their rendered output is affected.

Second, assuming you actually meant "modifying a node to reference a different taxonomy term", which is something very different than what I said, because it's a case of #14.1 (list changes), while I was talking about #14.2 (rendered output changes):

  1. Current scenario ("entity list cache tags"): every page in the view is tagged with the entity list cache tag, so everything will be invalidated. Hence: behaves as expected.
  2. Your envisioned scenario ("entity filter/query-specific cache tags", let's call it EQCT): every page in the view is tagged with this EQCT. If a node's taxonomy term reference changes, then the EQCT should be invalidated, if not, then the EQCT is broken. Hence: behaves as expected.
damiankloip’s picture

You could be doing something by term name. Not necessarily the id.

And yes, I think point 2 is more correct. What I'm saying is simple. Let's not do anything that is row based, whether it is render or not.

wim leers’s picture

True, if you're filtering by term *name*, changing any term would need to invalidate the EQCT. In the current scenario ("entity list cache tags", #26.1), that would mean that you'd need to tag the view not only with the list cache tag of the entity type you're listing (nodes), but also the list cache tags of the entity types you're sorting or filtering by (taxonomy terms, in this case). That'd solve the problem.
I guess this is the part you've been trying to get across, but that I was overlooking? :)

#26.2 is not "more correct" (at least with the above addendum: include list cache tags of sort/filter entity types). Cache invalidation is either correct or it isn't, in that it either results in serving the correct information to end users or it doesn't. Point 2 is just more efficient: it minimizes unnecessary cache invalidation, whereas point #26.1 is also correct, but invalidates many more cache items unnecessarily.

What I'm saying is simple. Let's not do anything that is row based, whether it is render or not.

What I'm saying is also simple: what you suggest would be fundamentally broken. It would lead to incorrect results! Anything that you're not filtering/sorting the View on, but that you are rendering, would not be captured by the EQCT!

moshe weitzman’s picture

Status: Needs review » Needs work
wim leers’s picture

wim leers’s picture

dawehner’s picture

wim leers’s picture

Hurray! We actually already fixed this… by accident! :)

From #2378789-10: Views output cache is broken:

After thoroughly reading through #2183017, reading the Views render pipeline code, stepping through it with a debugger and verifying it through testing, I'm now absolutely certain that #2273277 fixed it!

It's even easy to convince yourself of this as well. Just install a fresh Drupal 8, create an article node that's promoted to the front page, and load the front page, you should see the node:1 cache tag (amongst others, such as the associated taxonomy term's cache tag, the node owner's cache tag, etc.) listed in the X-Drupal-Cache-Tags header.

If there's any lingering doubt, give it a try on simplytest.me, with the last commit before #2273277 and then the commit that landed #2273277:

BUT! But, the above only verifies that it's working for uncached Views. Since Views has "output caching", which is its own form of render caching, that also needs to support cache tags. So far that wasn't the case. Hence it's not working for cached Views. This issue fixes that too! (Simply by supporting the bubbling of all render metadata, just like drupal_render() bubbles it.)

Conclusion:

  1. Views already bubbles up rendered entities' cache tags when rendering an uncached view
  2. Views does not yet do that for cached views, but with this patch, it does

Therefore, once this issue lands, #2183017: Views doesn't bubble up rendered entities' cache tags is fully fixed, and #2350551: Views fields that have attached assets are lost when Views output caching is enabled already is!