Attached patch implements ideas I described at http://cyrve.com/node/7. The patch is incomplete - Id like some feedback soon though.
I introduce a simple pattern that drupal_render() elements may use to skip any complex building that they might have to do. drupal_render() itself is not changed. We merely take advantage of the existing #pre_render and #post_render callbacks to retrieve from cache and save to cache as needed.
The patch converts the 'Recent blog topics' block to this new pattern (recall that block content may now be a renderable array). In practice, this blog block is quite fast so I will soon convert the intensive forum blocks instead of blog block. I am blocked by #394116: DBTNG Forum.module.
One major advantage to this cache strategy is that works even for sites that use a node_access module. Thats possible because our cache key actually contains the query that generates the node listing. Users who are in different groups or in different roles or whatever your node access criteria are, will see a different query and thus their cache keys will differ. This terrific idea is borrowed from eaton's new caching plugin for Views.
This patch mimics standard block cache in that the cache is expired whenever a new node is posted. It is really simple to let admin override that logic with a time based expiration. I will add that option on the the block configure page.
I decided to add a new block cache constant called BLOCK_CACHE_CUSTOM which behaves just like BLOCK_CACHE_NONE in every way. It is just a signal to code readers and DB table readers that a given block implements its own render cache.
TODO:
- The cache key needs to include language and theme. Possibly move _block_cache_id() from block module to common.inc and reuse that.
- Refactor selectQuery::execute method so I can get $query after it has been altered but before it has been run.
- Revert blog changes and instead convert both forum blocks. These blocks will have tests as part of #394116: DBTNG Forum.module so no new tests are needed.
- Refactor the block cache so it uses this new pattern. This is a minor consistency win so we will see if I can get to it.
Comment | File | Size | Author |
---|---|---|---|
#61 | cache.patch | 627 bytes | catch |
#55 | forum.diff | 1.2 KB | moshe weitzman |
#51 | bc3.diff | 25.44 KB | moshe weitzman |
#46 | bc3.diff | 25.42 KB | moshe weitzman |
#43 | bc3.diff | 25.36 KB | moshe weitzman |
Comments
Comment #1
David StraussTagging.
Comment #2
catchToo sick to review but subscribing.
Comment #3
moshe weitzman CreditAttribution: moshe weitzman commentedChanged from blog to forum blocks and refactored some block cache id logic to common.inc so it can be reused. See drupal_render_cid_parts().
However, I have uncovered a problem with DBTNG that is causing many cache misses. Our SQL is not constant but rather the placeholder names change on every page view depending on how many placeholders are on that particular page. The placeholder names are not unique per query but rather unique to whole page. This is also a problem for Views caching and for hook_query_alter(). A little sleuthing brought me to http://drupal.org/node/225450#comment-797859 where this was discussed. I don't see any open issue for fixing this. Any thoughts on this from Crell or bjaspan or other?
This will be ready once the DBTNG cache id issue is resolved.
Comment #4
Crell CreditAttribution: Crell commentedRegarding #2, do you need the query with values replaced or without? If you just want the prepared statement itself, that's easy. Just cast the query object to a string. With, well, doesn't exist.
Actually now that I think about it, you're right, hook_query_alter runs in execute so it won't actually affect just a straight cast. Hmmm... We'll need some sort of check to ensure that a query object doesn't get altered multiple times.
Comment #5
moshe weitzman CreditAttribution: moshe weitzman commentedAdded issues for the current blockers:
#496500: Remove global placeholder counter
#496516: Move query_alter() into a preExecute() method
Comment #6
bjaspan CreditAttribution: bjaspan commentedsubscribe
Comment #7
yched CreditAttribution: yched commentedVery interesting.
As a pattern, it's still a little complex to implement, though. I'm wondering how much of this could be tied directly into drupal_render() and triggered by a couple #-properties ?
+ minor : having the query built in forum_block_view() makes it easily overlooked that it's only actually executed in the case of a cache miss.
[edit: OK, except that the query generates the cache cid - nevermind]
Comment #8
FiReaNGeL CreditAttribution: FiReaNGeL commentedThank you for working on this! I can barely contain my excitement! Marking http://drupal.org/node/300935 as duplicate.
Can you confirm that this system also would allow to display dynamic content to anonymous users when cache is enabled? Like a 'this post was posted 5 minutes ago' depending on their local time, or choose to display advertisements or not depending on their provenance (might not want to display ads to users who access the site directly and have a cookie showing they frequent the site often, for example)?
The implications for this patch are huge for Drupal adoption by complex websites!
Comment #9
moshe weitzman CreditAttribution: moshe weitzman commentedNo, it does not do that at all. Requests serviced by the page cache do not run through drupal_render(). It might be nice to have a hook that receives the output of a cached pages so that php can replace tokens and such. Seems like hook_exit is well positioned for that. Anyway, not relevant to this issue.
Comment #10
FiReaNGeL CreditAttribution: FiReaNGeL commentedI thought that you could cache all elements that are non-dynamic in the pre render phase as you suggest, leaving only the other ones being calculated at page view without page cache.
Comment #11
Dries CreditAttribution: Dries commentedGood work, Moshe.
If this is to go in, we'll want to clean-up it up some more, make it easier to use, and make sure we can accommodate different scenarios such as those outlined by Fireang3l. It has to become a nice simple pattern that is optional.
That said, I think it OK to have this code in 'prototype stage' for a while. Let's not get obsessed with corner cases or developer experience just yet. First, we have to proof that it is worth it, and that it is better than the block caching we have. Let's be numbers driven so we can explore the upside -- and then worry about other things.
Comment #12
webchickQuestion. If we go this route, do we any longer need #186636: Block caching when node access modules are enabled.? Should I bump that one down to 6.x?
Comment #13
catchIf this gets in, then we just need to remove that check entirely I think. Although it'll have to be clear that modules specifying cache per page/role without using this system are going to break node_access.
Comment #14
kbahey CreditAttribution: kbahey commentedAwesome! Subscribing.
Comment #16
moshe weitzman CreditAttribution: moshe weitzman commentedThis patch persists any css or js which was attached to the element ... We really really need to encourage folks to use #attached_css and #attached_js instead of drupal_add_css() and drupal_add_js(). Those functions are side effects that are not easily cached.
The two patches in #5 are blockers for this issue. Help wanted.
Comment #17
David StraussThis looks incomplete:
Comment #18
David StraussI'm concerned about this current approach because it doesn't give us flexibility to expand this to support edge-side inclusion (ESI), which is becoming a necessary feature for items like blocks on sites with lots of pages and traffic.
The basic idea of ESI is that, instead of including content directly on a page, we provide a tag with the URL, which an ESI-compliant proxy server uses to download the actual content. ESI-supporting proxy servers send headers letting us know when this is possible, so we can intelligently choose between sending ESI tags or the actual content.
The key advantage to ESI is decoupling the caching of block content from general page content, which ties directly into the goals of this issue. Currently, if a site wants a block on every page to show the latest forum topics, the cache lifetime for all pages is limited to the maximum lifetime of that block. As more blocks get added, the page cache lifetime becomes a race to the bottom. With ESI, the page and block cache lifetimes have no bearing on each other. Thousands of cached pages can have long lifetimes even if the forum topics block gets updated every five minutes. ESI also synchronizes the block content across all pages, avoiding the awkwardness of seeing a block in various stages of staleness from page to page (e.g. 30 sec old on one page and 10 min old on another).
Now, we could make developers manually implement ESI as they need to, but we have a great opportunity here to integrate (or provide the foundations to integrate) ESI support into our general content/block rendering system.
What we need to support this is an architecture that allows generation of cached content purely on the basis of the cache key. This cache key must contain (in readable, non-hashed form) the basic arguments required to identify and start generating the content (e.g. a block with module X and delta Y) and relevant key/value pairs for the session (e.g. role M and user ID N).
By carefully defining the cache keys, edge servers (e.g. reverse proxy caches) can request content that Drupal can generate with no other context than the key.
The other advantage is that it opens the door for preemptive refreshing of the cache, like my Preempt module did. For blocks taking 5+ seconds to render, it's not very nice to make random visitors wait around to regenerate cache items. Preemptive caching has identical architecture requirements to ESI support.
I realize I'm suggesting a radical departure from what's being done here, but I want to lay foundations if we're going to bother making a change this big in the API.
Finally, I think the change suggested here is mostly an attempt to layer caching on the current API rather than create an inherently cache-friendly API. There's nothing inherently wrong with layering, but I think this API is worse for that design trait. This feels like the kind of cache support a contrib module would awkwardly provide.
Comment #19
moshe weitzman CreditAttribution: moshe weitzman commented@David - thanks for the info about ESI. It does sound quite powerful. It also seems really hard to implement in a dynamic system like Drupal. I'll need specific help from you about how to implement better cache key for the forum blocks. We need to work on a specific use case IMO. Ping me on IRC.
Until then, this patch has plenty of merit. We bring block caching even when node access modules are enabled. More generically, site builders have another tool where they can cache using a recognized pattern. And they still maintain full control of the cache key and expiry which is more than block cache system provides.
Anything we do here is easily changed in hook_page_alter(). Modules can remove our caching entirely, or change our key our expiry info. Further, the cache subsystem is swappable in case thats a helpful tool.
I do intend to polish this API a bit. I'm just waiting for the dependant patches to land.
Comment #20
Anonymous (not verified) CreditAttribution: Anonymous commentedsubscribe
Comment #21
David StraussDiscussed with Moshe on IRC. I'll be spending tonight banging my head against the problem of an API that supports ESI without too much absurdity. It's a rather interesting architecture problem.
Comment #22
webchickWhere is the documentation on #attached_js and #attached_css? I've had people asking me for it and am at a loss to help them. It also seems like this paradigm shift was introduced without the knowledge of the majority of Drupal front-end developers, because they've been working on making drupal_add_js() and drupal_add_css() better (alterable, weightable, able to pull in external JS, etc.).
Getting everyone on the same page here seems like a pretty critical issue.
Comment #23
catchIt went in with vertical tabs, which the same people working on drupal_add_js() and drupal_add_css() changes worked on. Also documented at http://drupal.org/node/224333#attached_js with a comment covering the caching issue. I think we're going to need a section in the module updates to make it clear this should be used for any renderable thing though.
Comment #24
Owen Barton CreditAttribution: Owen Barton commentedOooh yes, I like, I like!
Comment #25
David StraussI'm dedicating tonight to working on an ESI-compatible version of this that works for at least anonymous users. (That should not require anything special to work with Squid and Varnish.) Really, my final goal of ESI support for even authenticated users is too far down the road to worry about now.
Comment #26
scroogie CreditAttribution: scroogie commentedsubscribing
Comment #27
David StraussI'm starting to lose faith that it's possible to build a widely compatible ESI-based cache. If anyone has any great ideas for this issue (including Moshe's original one), don't let my investigations hold you up.
Comment #28
moshe weitzman CreditAttribution: moshe weitzman commentedFrom my perspective, folks who want to see progress here should focus on clearing the blockers I listed on #5.
Comment #29
catchChanging status.
Comment #30
moshe weitzman CreditAttribution: moshe weitzman commentedDoes it always follow that issues with dependencies are marked as postponed? Thats pretty draconian IMO. Furthermore, reviews on the architecture here are welcome.
Comment #31
catchI normally mark them postponed to make it obvious that they're blocked by the other issues, which is then a good reason to mark the blocking issues as critical, but maybe that's just me.
Comment #32
FiReaNGeL CreditAttribution: FiReaNGeL commented@davidstrauss: if you ever find the time, I think it would be a good thing to document (in another issue) the problems you encountered trying to implement an ESI-compatible version.
Comment #33
Frando CreditAttribution: Frando commentedBump. The patches in #5 both landed recently. Do you have a current patch, moshe?
Comment #34
sun.core CreditAttribution: sun.core commentedComment #36
moshe weitzman CreditAttribution: moshe weitzman commentedWorking on it.
Comment #37
moshe weitzman CreditAttribution: moshe weitzman commentedToo tired to keep going. Anyone is welcome to take this and run. I'm in real danger of simply not delivering before 9/1.
Comment #38
Frando CreditAttribution: Frando commentedOK. Attached is a new, improved patch, building on #16 but containing some new stuff.
The caching is not anymore based on a special #pre_render callback, but is native to drupal_render. All cache related properties are in a #cache property of an element that is an associative array.
Quoting from the documentation I added to drupal_render():
So: Elements do normally not specify a cache id directly but set cache 'keys' (for blocks that usually is array('module', 'delta')) and optionally 'granularity'. The latter is exactly the same as in the current block cache. For that to work out I moved the block cache constants to common.inc but did not yet rename them. How should they be named? DRUPAL_CACHE_PER_ROLE etc? This is an easy change, though.
Block cache is still unmodified and in place (but uses some code from common.inc now to create the IDs). I propose to just leave it in place.
Not that with this system, it is REALLY easy for contrib to change cache behaviour in hook_page_alter()!
All elements that load their actual data in a #pre_render hook can be made cacheable by just setting the #cache property on them in hook_page_alter(). It's also easily possible to modify caching behaviour for individual sites by modifying the 'granularity' and 'keys' properties of individual elements in hook_page_alter(). Would of course be nice to move more of core towards this system, but in this patch it's still only forum module blocks. I'd like to keep the rest to follow up patches.
Now, I will leave for vacation tomorrow morning so I won't be able to keep pushing this. Anybody interested? Would be really great to still get this in. We can still move more stuff towards this system after the freeze. But getting the base system in would be awesome.
Comment #39
Frando CreditAttribution: Frando commentedThis patch is the same as above but renames the block cache constants (so BLOCK_CACHE_PER_ROLE is now DRUPAL_CACHE_PER_ROLE etc) as they're in common.inc by now and used by drupal_render() caching as well. No other changes to #38.
Note that the patch appears much larger than it is, the actual changes are just about 100 LOCs in common.inc and a few lines in forum.module, the rest is just moving things around and some renames.
Comment #41
Frando CreditAttribution: Frando commentedForgot --no-prefix in the last patch. This one is better. The one in #38 is fine as well.
Comment #42
moshe weitzman CreditAttribution: moshe weitzman commentedHeroic work by Frando again. Many thanks. Your patch is such a great improvement.
This one is based on #39. I added a call to Query->preExecute() in the forum_block_view() so that we use a new cache key for every unique *altered* query. Now we properly respect node access.
Comment #43
moshe weitzman CreditAttribution: moshe weitzman commentedThis one only fetches from cache for GET and HEAD requests. Thats what block cache does and it is safer, without adding significant slowness.
Comment #44
moshe weitzman CreditAttribution: moshe weitzman commentedOops. Set the wrong status on last follow-up. Fortunately, bot already green-lit the latest patch.
I'm in IRC with Frando now, and we agree with RTBC so I am setting accordingly. I wrote half of this patch and he wrote the other. With this RTBC, I am approving his code and he is approving mine.
Comment #45
moshe weitzman CreditAttribution: moshe weitzman commentedmandatory text when i just want to add tags. i feel oppressed.
Comment #46
moshe weitzman CreditAttribution: moshe weitzman commentedMissed cache_get() in last patch when I restricted caching to just GTE and HEAD requests.
Comment #47
Dries CreditAttribution: Dries commentedI like this patch, but before this gets committed, I'd really like to see some benchmark results just to make sure it works and that we don't have any regression. Looking at the code, I don't expect any regression.
Comment #48
catchI don't believe just the active forum blocks being cached this way makes anything 10% faster, but there's certainly no regression:
Comment #49
moshe weitzman CreditAttribution: moshe weitzman commentedAs far as proving that it works,
* enable those forum blocks and post a couple forum nodes.
* enable devel module and enable the query log so it shows at bottom of page
* truncate cache table.
* refresh page once. search for the word 'forum' untl you see it in the query log. this happens because we had cache misses for the forum blocks
* refresh again. note that there are now no queries to the forum table. we had cache hits.
I tested all this and if I got it wrong, we could certainly do a bug fix patch after freeze. Just suggesting a way to save time for the committers and reviewers.
Comment #50
catchFound out why it's faster, PDO exception when forum block + block cache is enabled which bails everything out before rendering. Shoddy forum tests fail us again.
Comment #51
moshe weitzman CreditAttribution: moshe weitzman commentedFixing that exception
Comment #52
catchWas just a missing global $user so easy fix. I verified the caching is working with devel query log, and re-ran benchmarks to confirm no other issues. Back to RTBC.
Comment #53
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks!
Comment #54
Pasqualledocumentation
Comment #55
moshe weitzman CreditAttribution: moshe weitzman commentedWe didn't include the language and theme in cache keys. Attached patch fixes it. Also fixes an indentation snafu.
Comment #56
moshe weitzman CreditAttribution: moshe weitzman commentedComment #57
yched CreditAttribution: yched commentedMakes sense.
Comment #58
webchickShouldn't we add tests for this?
Comment #59
moshe weitzman CreditAttribution: moshe weitzman commentedFor me, thats a bit much. There are already tests for cache_set() and cache_set(). We would basically be testing that when we put in a series of keys into cache_set(), that the cached data is returned for same key but not for other key.
Comment #60
moshe weitzman CreditAttribution: moshe weitzman commentedWhat I meant to say, is that a test of this patch would be redundant with the cache_set/get tests.
Comment #61
catch#55 is still RTBC, but in profiling I noticed we're calling drupal_render_cache_get() and drupal_render_cid_create() for every element passed to drupal_render() whether it's cacheable or not - on a node with 50 comments this is ~ 1,000 function calls.
So this patch just adds an isset($elements['#cache']) to the cache check.
Comment #62
moshe weitzman CreditAttribution: moshe weitzman commentedcatch's patch loooks good. so, there are now 2 patches in this issue which are RTBC. the 2 are not especially related so i think it is OK to keep separate. both are very small.
Comment #63
Dries CreditAttribution: Dries commentedCommitted both patches. I still think it would be useful to have tests because this bug fixed in #55 is testable... marking to 'code needs work' for tests.
Comment #64
webchickTagging.
Comment #65
moshe weitzman CreditAttribution: moshe weitzman commentedI can't think of significant holes in test coverage. The block is tested in forum.test and the cache get/set system has own tests. Similarly, drupal_render tests #pre_render feature. We could add a test for granularity function but thats just a simple array builder.
Comment #66
catchDowngrading from critical, see #653074: Review test coverage / outstanding tests before release - I also don't think there's a particular hole in test coverage here - or at least, we have much more pressing gaps in forum.module specifically if we were to add it to there.
Comment #67
sun.
Comment #68
effulgentsia CreditAttribution: effulgentsia commentedAm I right in understanding that if drupal_add_*() is called from within a theme function or preprocess function, then this is incompatible with the render cache? And that therefore, if theme_foo() outputs markup that requires a CSS file, and if the way theme_foo() is called is via a structure like:
Then every caller of the above needs to add a #attached? That really breaks encapsulation, doesn't it? I think contrib will really struggle with this.
And related, if you have:
And if the parent is render cached, that too fails to load the CSS file, right?
Comment #69
moshe weitzman CreditAttribution: moshe weitzman commentedYou are correct on both counts. I think a good guideline is that direct calls to drupal_add_js() and drupal_add_css() are now discouraged. We prefer that folks use #attached instead. Can't think of a better resolution.
Your second point has an open issue that needs love - #859602: #cache does not record the #attached declared in child elements
Comment #70
moshe weitzman CreditAttribution: moshe weitzman commentedIMO render cache is not for contrib to worry about. It’s a tool for developers to speed up their own sites. Only they know what is personalized and what isn’t and thus what is cacheable and which key to use. So, the developer might have to use hook_page_alter() to add some #attached in your example. Ultimately, we need to get this right but I think we are OK ground for D7.
Comment #71
effulgentsia CreditAttribution: effulgentsia commentedHmmm. I'm troubled by the lack of encapsulation in #68.1. If foo.tpl.php outputs markup requiring certain CSS, how is every caller of it supposed to know that? I wonder if we can do something like add 'attached' to the theme registry, and then have drupal_render() transfer that info to $element['#attached']. The only trouble I see with that is that sometimes #theme is an array (or a string with "__" that's an implicit array), and drupal_render() shouldn't have to duplicate everything theme() does to handle that. But, the 2nd param to theme() can't be by reference, because sometimes, code calls theme('foo', array(...)), and PHP would choke on that if theme() tries to take $variables by reference. Does it make sense to add a 3rd param to theme that could be used for theme() to inform drupal_render() on what needs to be added to #attached? I know we'll fix this somehow in D8, but do you think anything along these lines is possible for D7? I've been starting to look at several contrib modules, and this problem is surfacing a lot. I don't think contrib module maintainers will move calls to drupal_add_*() out of the theme layer without some reasonable place to move them to.
Comment #72
moshe weitzman CreditAttribution: moshe weitzman commentedThat third param to theme() sounds really clever to me. We'd have to pass it along to the preprocess/process and theme hook/template that ultimately gets called.
Can you give a few examples of theme hooks that add css/js?
Comment #73
jhodgdonIs the "needs documentation" tag on this issue still relevant?
Comment #74
podaroksubscribe
Comment #75
catchDocumentation here: https://www.drupal.org/developing/api/8/cache