Cache node_load, clear on node_save. I am not too interested in this patch, I just wanted to get the ball rolling.

The status would be adequate as code needs work but then people would assume I will work on it.

CommentFileSizeAuthor
#196 cache_node_load.patch28.67 KBcatch
#195 benchmarks.txt6.23 KBcatch
#191 cache_node_load.patch28.96 KBcatch
#187 cache_node_load.patch29.48 KBcatch
#183 cache_node_load_28.patch29.48 KBBerdir
#181 cache_node_load.patch28.79 KBcatch
#179 cache_node_load.patch28.86 KBcatch
#176 cache_node_load.patch28.91 KBcatch
#174 cache_node_load.patch28.85 KBcatch
#173 cache_node_load.patch28.82 KBAnonymous (not verified)
#170 cache_node_load.patch28.85 KBAnonymous (not verified)
#168 cache_node_load.patch28.86 KBcatch
#163 cache_node_load.patch29.32 KBcatch
#158 cache_node_load.patch28.48 KBcatch
#156 cache_node_load.patch28.48 KBcatch
#153 cache_node_load.patch28.48 KBcatch
#151 cache_node_load.patch29.13 KBcatch
#148 cache_node_load.patch29.13 KBcatch
#141 cache_node_load.patch28.52 KBcatch
#137 cache_node_load.patch22.29 KBcatch
#134 cache_node_load.patch21.42 KBcatch
#133 cache_node_load.patch20.21 KBchx
#131 cache_node_load.patch21.43 KBcatch
#129 cache_node_load.patch20.84 KBcatch
#124 cache_node_load.patch23.79 KBcatch
#122 cache_node_load.patch23.8 KBriccardoR
#121 cache_node_load.patch23.8 KBcatch
#119 cache_node_load.patch23.59 KBriccardoR
#116 cache_node_load.patch23.79 KBcatch
#116 cache_node_load.patch23.79 KBcatch
#110 cache_node_load.patch22.75 KBcatch
#107 cache_node_load.patch21.12 KBcatch
#105 cache_node_load.patch17.83 KBcatch
#98 cache_node_load.patch17.71 KBcatch
#92 mw.patch18.21 KBmoshe weitzman
#88 mw.patch14.52 KBmoshe weitzman
#84 mw.patch14.38 KBmoshe weitzman
#82 mw.patch11.23 KBmoshe weitzman
#75 node_cache_3.patch10.58 KBAnonymous (not verified)
#74 node_cache_2.patch10.68 KBAnonymous (not verified)
#73 node_cache-111127-73.patch11.21 KBfloretan
#72 mw.patch11.75 KBmoshe weitzman
#71 node_cache.patch10.05 KBAnonymous (not verified)
#67 node_cache.patch9.54 KBAnonymous (not verified)
#62 node_load_rollback_1.patch12.17 KBGábor Hojtsy
#59 no_load_cache.patch585 byteschx
#42 node_load_rollback.patch9.97 KBchx
#40 node_load_18.patch1.21 KBGábor Hojtsy
#39 node_load_17.patch3.01 KBGábor Hojtsy
#38 node_load_16.patch1.73 KBchx
#30 node_load_15.patch9.43 KBchx
#28 node_load_14.patch11.94 KBpwolanin
#24 node_load_13.patch8.72 KBchx
#19 node_load_test.tgz.txt440 byteschx
#18 node_load_12.patch8.35 KBchx
#15 node_load_11.patch9.13 KByched
#14 node_load_10.patch8.43 KBchx
#10 node_load_9.patch10.11 KBchx
#9 node_load_8.patch3.06 KBchx
#3 node_62_0.patch1.42 KBkilles@www.drop.org
node_62.patch1.39 KBchx
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

killes@www.drop.org’s picture

some discussion revealed that:

poll.module needs to be patched to do user specific stuff in hook_view not hook_load. I already discussed this with Steven.

comment.module will need to invalidate the node cache if a comment is added.

This patch is likely to give us gains for complicated node types a la cck. I am not sure how much it duplicates cck's inbuilt cache.

The update path is missing too.

killes@www.drop.org’s picture

For an earlier attempt to get this into core and related discussion see here:

http://drupal.org/node/8506

killes@www.drop.org’s picture

FileSize
1.42 KB

patch needed to be fixed wrt cache usage.

killes@www.drop.org’s picture

I have done some testing with simple node types (story, forum, page) and there was no performance improvement. This was expected since there are hardly any extra queries performed when loading these node types.

moshe weitzman’s picture

i would recommend looking into this in the context of moving field modules into core (i.e. CCK). there isn't much point without those, and CCK has thought about this a lot already.

m3avrck’s picture

Would it make sense to move this caching into a *node specific* cache table?

It seems to be a site like d.o that has 100,000+ nodes could fill up this cache really fast. Having it in it's own table might be more efficient.

killes@www.drop.org’s picture

Good morning, Ted:

+ if (!isset($nodes[$param]) && ($data = cache_get($param, 'cache_node'))) {

:)

m3avrck’s picture

Err, yes yes! I was looking in the patch for the install file patch to create that table doh :-)

Splendid patch though.

chx’s picture

FileSize
3.06 KB

Hmm, this is a bigger boost than I imagined. I benchmarked this on NowPublic where we saw 50% even 75% less page load times but note that NP uses memcached.

chx’s picture

FileSize
10.11 KB

I upped the wrong file --again!-- and also note that as NP is a D5 , of course not this patch is tested but our variant of it. But the concept is the same.

chx’s picture

The user module changes are feature creep. First, most of them are whitespace trims and a bugfix actually, user_edit cleans the cache but user_delete does not, that's fixed now.

Gábor Hojtsy’s picture

1. The node schema has one too many empty lines now with this patch.
2. You repeat the is_object()... line twice. I would do:

+      if (!isset($nodes[$param])) {
+        if ($cache = cache_get($param, 'cache_node')) {
+          $nodes[$param] = $cache->data;
+        }
+      }
+      // Either the node was already cached or we loaded it in:
+      if (isset($nodes[$param])) {
+        return is_object($nodes[$param]) ? drupal_clone($nodes[$param]) : $nodes[$param];
+      }

3. The system update does not look like being up to schema API standards to say the least.

That's it. Otherwise it would be nice to see a simple benchmark, after which I could say: well done :)

moshe weitzman’s picture

sounds reasonable to me. we need to add code that flushes the cache when a module is enabled/disabled, or does the install system do that already?

chx’s picture

FileSize
8.43 KB

Reroll.

yched’s picture

FileSize
9.13 KB

Rerolled with Gabor's comments in #12 fixed.
So all this needs now is a benchmark ?

chx’s picture

yes please!

yched’s picture

PS : and moshe's comment #13 was adressed by latest chx's patch

I'm not really equipped to benchmark this myself. Anyone ?

chx’s picture

FileSize
8.35 KB

Same as yched's patch, a typo in a comment is fixed (thanks pwolanin) and -p added.

chx’s picture

FileSize
440 bytes

A little test module for benching.

pwolanin’s picture

quick bench using ab -n100 -c5 http://127.0.0.1/drupal6/
Server Software: Apache/1.3.33, PHP 4.4.7, MySQL 5.0.45 , Mac 10.4.9

hitting the the front page (/node) with 30 nodes of a mix of types (book, forum, page, etc).

without patch:
Requests per second: 5.66 [#/sec] (mean)
Requests per second: 5.62 [#/sec] (mean)
Requests per second: 5.62 [#/sec] (mean)

with patch:
Requests per second: 6.44 [#/sec] (mean)
Requests per second: 6.40 [#/sec] (mean)
Requests per second: 6.25 [#/sec] (mean)

node load/edit/save works fine.

pwolanin’s picture

note - these benchmarks are for the 2nd and later times running ab. There is, of course, a pump-priming effect so the first time running it with the patch is about the same or even slower than without the patch.

pwolanin’s picture

enabled modules:
core-req + actions, book, color, comment, dblog, forum, help, menu, taxonomy, update, devel, devel_generate, simpletest

Note, with the patch simpletest find a warning when running either the "Page node creation" or "Unauthorized page view" tests:

Unexpected PHP error [mysql_real_escape_string() expects parameter 1 to be string, array given] severity [E_WARNING] in [drupal6/includes/database.mysql.inc line 350]
pwolanin’s picture

this error seems to be generated by calling node load like:

 $node = node_load(array('title' => $edit['title']));
chx’s picture

FileSize
8.72 KB

Fixed. I used $cachable to decide on caching but that's not enough. I now cache only node_load($nid) which is 99% of the cases anyways. It's not really feasible to database cache node_load(array()) because it can be dependent on user, time, whatever. Thanks for the report and the benchmark.

pwolanin’s picture

one more fix: the relevant node cache needs to be cleared in this function:

http://api.drupal.org/api/function/book_outline_form_submit/6

in fact, it should probably call cache_clear_all() as well.

yched’s picture

There is still a TODO in translation.module ?

pwolanin’s picture

In addition to the book module fix-ups, I think some of the function calls are missing a parameter:
cache_clear_all('*', 'cache_node');

should probably be: cache_clear_all('*', 'cache_node', TRUE);

pwolanin’s picture

FileSize
11.94 KB

revised patch - adds last param to cache_clear_all() and more cache clearning in book.module and menu.inc. Removes spurious change to index.php.

Please review.

Gábor Hojtsy’s picture

- how are the menu module changes related?
- how are the poll module changes related?
- as yched said, there is still a TODO in translation module

chx’s picture

FileSize
9.43 KB
  • They are not, I removed 'em. Another patch they will be.
  • it was doing logic in load which belongs to view. Outdated little module, poll is.
  • Added better TODO.
Gábor Hojtsy’s picture

Status: Needs review » Fixed

Great, committed! (I think this will be loved by people with lots of "xmas tree hangings" on their nodeapi load hooks).

eaton’s picture

I'm very concerned that this change will have a HEAVY impact on large, high-traffic sites. If I understand it correctly, it will be saving to the cache table *every time a node is loaded*, and clearing the node cache *every time a node is saved*, correct?

Unless I'm missing something, this will have a detrimental effect on high-traffic sites unless the 'only clear the cache of nodes that actually changed' code goes in.

moshe weitzman’s picture

"every time a node is loaded" will be infrequent because of this very patch. we are replacing the node load queries with a cache_get().

we already wipe most caches when any node is saved. not sure why that would be worrisome. please elaborate.

eaton’s picture

I think this was a bit of a false alarm; I just noticed that the coe in node_save() only clears the individual node. This needs to be documented VERY clearly, though, or we'll experience a world of pain as modules convert over.

Repeat: this is HUGE in terms of unpredictable, unexpected behavior. node_load() was previously fresh per page load, and now it's not. Red flashing lights!

Gábor Hojtsy’s picture

Status: Fixed » Needs work

Yes, that definitely need to be documented in the update docs at least. Excuse me for not setting this to "needs work" (as usual after a core commit when the update docs need updating).

eaton’s picture

we already wipe most caches when any node is saved. not sure why that would be worrisome. please elaborate.

My concern was more serious before I realized that only multi-node operations cleared the ENTIRE node cache. Basically, node_load() has always been a 'read-only' operation. While there may be lots of queries, at least we knew that the flow of data was only going in one direction.

This patch ensures that any fresh node_load() will go in TWO directions. It will pull in data, and write to the cache. That makes node loads even MORE expensive, but the payoff is that the next load for that node will be faster -- unless it gets cleared. This means that the frequency with which the node cache is cleared becomes CRITICAL. If it's cleared too frequently, we never get the benefits of the cached loads, and everything just gets more expensive. In addition, if I understand properly, anyone who's trying to to a master/slace setup has to deal with *loads* writing to the master DB.

None of these are obvious showstoppers, but I worry that we don't understand the impact of the change on large sites with high traffic. One contrib module that flushes this cache relatively frequently could bring a site to a crawl.

pwolanin’s picture

I agree with eaton about the frequency of clearing the node cache. So, a related question - is it worth running extra queries to determine exactly which nodes need to be cleared (for example in book module or the translation module) rather than do a blanket clear of the node cache? It seems running an extra query (for example) and then potentially deleting a bunch of single items from the node cache it going to be pretty expensive too.

Also, issue for the menu.inc part of the patch I rolled (which chx sut out as unrelated)- indeed it relates to clearing the block and page cache, not the node cache: http://drupal.org/node/170514

chx’s picture

Status: Needs work » Needs review
FileSize
1.73 KB

Followup patch, reported by Gabor. Translations fixing.

Gábor Hojtsy’s picture

Status: Needs review » Needs work
FileSize
3.01 KB

Great. Modified the patch to get it documented. Now we have the translation set caches flushed only if we update a node in the set, not all cached node_loads. (I added some documentation and fixed one indentation problem I found on the way).

Setting it to needs work, so it gets documented.

Gábor Hojtsy’s picture

FileSize
1.21 KB

We discussed with chx and come up with some documentation for the node_load cache. I also ended up documenting the internal cache, so people know what the effect of both are, how are those invalidated and what to put into the load operations. Committed the attached patch.

It still needs to be documented in the update docs, but I leave this to others :) Also, anyone is welcome, if having more correct documentation.

pwolanin’s picture

I'd still like some feedback in terms of book module and whether it makes sense to make the cache clearing there more targeted too.

chx’s picture

Title: Cache node_load » Roll back cache node_load
Category: task » bug
Priority: Normal » Critical
Status: Needs work » Reviewed & tested by the community
FileSize
9.97 KB

OK this needs more consideration. Hope next time folks will review the patch BEFORE it's in.

chx’s picture

Status: Reviewed & tested by the community » Needs review

Well, let's set this to review and see what people have on it.

chx’s picture

Motivation for rollback: obviously the huge performance benefit is not enough to get people move their dynamic code from load to view.

seanr’s picture

I'm the type of person who really needs the performance benefit. Shouldn't we just put a big bold item in the update docs saying "Put dynamic content in hook_view"? Don't at all agree this should be rolled back. Which modules are currently suffering as a result of this patch? I'd suggest we help roll patches for them instead of rolling this back.

chx’s picture

Status: Needs review » Needs work

Back to original -- ie. needs documentation. I was never serious about rolling back.

Morbus Iff’s picture

Hrm. Say I had a poll.module of my own (not core, but this example exists merely because everyone knows what a poll is). Said poll has 45 votes on it, and is growing rapidly. Are those 45 (46, 47, 48, etc.) votes "data" about the node that should be loaded in accurately, or merely visual garbage to be added at view time? I argue that they're data that should always be accurate at node_load() time - I would hate to have to sit on a 'view' to find out how many votes have been cast when I may not actually care little for displaying the node. How does this cache handle this consideration?

Gábor Hojtsy’s picture

Title: Roll back cache node_load » Cache node_load

Morbus, first sentence in the issue: Cache node_load, clear on node_save.

Morbus Iff’s picture

Title: Cache node_load » Roll back cache node_load

I've also some general concerns about parent/child relationships between different node types. I've built various internal node types that have specific relations to each other - a Card is a child (subclass, etc.) of an Expansion. As such, any time a Card is loaded, I also add to the $card_node the results of an $expansion_node node_load() - in almost every use case, a Card required the Expansion data as well, so I wanted to save duplicate code by loading the Expansion in always. With the above cache, Expansion data could get stale within the cached Cards - I'd have to include a cache clear on Cards for every Expansion save (or else, a node_load then node_save). Not as worrisome as "I consider vote counts to be node_load data, not node_view" (above) certainly, but another scenario that may be valid for the forthcoming documentation.

Morbus Iff’s picture

Gábor: irrelevant. A poll vote is user-contributed supplementary data to the poll - it has a similar data "feeling" as a user comment. It doesn't change the bulk of the node itself - it doesn't change the body, or the question, or the options, or the title of the node. It is merely a statistical increment about the node itself. I would hate to spend a whole node_save to increment a poll vote by one (any more than we should see a full node_save to increment, say, the node "Views" statistic by one, or to issue a node_save when one comments).

Morbus Iff’s picture

Title: Roll back cache node_load » Cache node_load

Our comments crossed. Resetting title back.

Morbus Iff’s picture

Gábor: incidentally, voting in core's own poll.module doesn't use node_save either (see poll_vote for the direct updates) - it instead issues a clear against the entire cache. Is that the suggestion for designs like the above (both the poll incrementing and the parent/child relationship)? Fine, if it is, but it should be part of the forthcoming documentation.

Morbus Iff’s picture

We probably want to change the cache_clear_all() inside poll_vote() to something like cache_clear_all($node->nid, 'cache_node'), maybe? I'm not set up in any way for Drupal 6 development, so can't test or create a patch. It would /seem/ to make sense, however, based on the existing changes this patch added.

drewish’s picture

i think this patch is what broke upload: http://drupal.org/node/168813
nodeapi op 'prepare' is being called after 'view'. i'm not up to speed on this enough to know what the right way to fix that would be.

pwolanin’s picture

@Morbus Iff : cache_clear_all() is called by node_save(), comment_save() and lots of other places - it clears the page and block cache NOT the node, menu, filter, etc caches. It's called anytime anything changes (pretty much)

Crell’s picture

As just mentioned on the dev list before I saw this thread...

I'm finishing up a site now where most of my node_load() calls are NOT for displaying the node directly. They're loaded for their data. Data should not be loaded in op view. op view is for viewing. op load is for loading. There is no reason why you will always be viewing after loading. The site I'm on now loads all the time without doing a view op. Saying "well just do your load in the view" is a bad answer, since that's confusing two decidedly different pieces of the system (data vs. rendering).

moshe weitzman@drupal.org’s picture

The motivation here is to create a cache layer for often static stuff. We have chosen the 'load' step for that. I think thats reaonable.

Then we said that if you have dynamic info in $node you need to set that later. But the only later opportunity that we offer is 'view' which is not good, as you point out.

So, perhaps we need a new nodeapi op that happens after load. The results of that are not cached. In a way, thats a lot like the D5 menu system which caches the $_menu but gives a chance to change it when it does the !$may-cache operation.

An alternative solution would be to check for the presence of $node->node_nocache property and avoid caching if it is TRUE. Modules with dynamic load info could choose to set that.

chx’s picture

Then, I think we really need to roll this back. A new nodeapi op or a new column in node definitely does not fit into the freeze. Of course, we might want to say "hey, the performance boost worths it" and then I will be happy to roll something -- but I smell D7 here. Rollback patch above.

chx’s picture

Status: Needs work » Needs review
FileSize
585 bytes

Then I am wrong, says Moshe. One can set a $node->no_load_cache property in hook_load or nodeapi op load and then dynamic stuff won't be saved.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

yes, short and sweet.

Dries’s picture

Personally, I'd roll back this patch and postpone it to Drupal 7. IMO, it should not have been committed in the first place.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Fixed
FileSize
12.17 KB

Agreed. This needs more discussion and breathing room to get accepted or massaged into a state where it is applicable.

Rolled back the patch with chx's patch, slightly modified to also roll back the node_load doc block I added about this and reworded the system update removal notice, so it does not suggest this *will* be part of Drupal 7, but only that it is up for discussion. (Actual committed patch attached).

Let me know if there are any remainings which are not rolled back, but should have been.

Anonymous’s picture

Status: Fixed » Closed (fixed)
catch’s picture

Version: 6.x-dev » 7.x-dev
Status: Closed (fixed) » Needs work

I don't think this should've been closed - was the rollback that was fixed rather than the original issue, reset if I'm wrong.

moshe weitzman’s picture

Would be terrific is someone resubmits this patch. It has decayed a little but not too bad. I think my suggestion to add a $node->no_load_cache property (see #59) addresses the mentioned concerns.

Anonymous’s picture

subscribing.

Anonymous’s picture

Assigned: Unassigned »
Status: Needs work » Needs review
FileSize
9.54 KB

here's a first cut at reviving this patch for drupal7 (including moshe's suggestion at #59).

tested this lightly on a fresh install (create/update/edit nodes and comments), and it seems to work as advertised. i haven't tested the translation code.

not 100% sure i've updated this faithfully, but i'm happy to keep working on this and see it through until its committed/rejected.

moshe weitzman’s picture

Thanks much for updating this I had a read through the patch and found minor doc issues

- "The cache API based cache stores all nodes throughout requests,". Howe about ch"The cache API based cache persists loaded nodes across page requests"
- I prefer positive names so I would $node->no_load_cache to $node->load_cache and make abort caching if it $node->load_cache === FALSE
- $node->load_cache needs to be added to the doxygen for node_load()

I will try to do a functional review soon.

pwolanin’s picture

Do we want to postpone this until we have a sense of how or whether node fileds will be in core? that would seem like it might have substantial impact on whatever caching approach we want to take.

moshe weitzman’s picture

This is a small patch that can be changed by the fields patch if that materializes and gets committed. I don't believe in postponing useful patches in favor of non existent, non ready patches.

Anonymous’s picture

FileSize
10.05 KB

updated patch as per moshe's comments in #68 attached.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
11.75 KB

I created and edited nodes and used books and added comments and could not get this to misbehave. Update path works too.

I only added two cache clears in when submitting or cancelling a poll vote. Code looks good. RTBC.

To summarize, any module can disable the can disable caching of any given node as needed. That addresses the one significant objection so far. benchmarks have shown that this is a small performance boost for a plain core site. It will be much more impressive for a fully loaded site with oodles of Contrib modules who perform various nodeapi load operations.

floretan’s picture

FileSize
11.21 KB

The doxygen comment for translation_node_get_translations() at the very end of this patch seems to be some unwanted leftover from the many patches/rollbacks, the changed sentence is false. Same patch as above, but with that last change removed.

Anonymous’s picture

FileSize
10.68 KB

updated patch to keep up with HEAD, no functional changes.

Anonymous’s picture

FileSize
10.58 KB

updated patch to keep up with HEAD, no functional changes.

anyone?

moshe weitzman’s picture

Category: bug » feature
Priority: Critical » Normal

Rerolled for 2 failed hunks and restested. Works fine.

moshe weitzman’s picture

Status: Reviewed & tested by the community » Needs work

The failed the revisions simpletest and the problem is in the patch. I will investigate.

moshe weitzman’s picture

Status: Needs work » Needs review

rerolled with revisions fix and a book outline fix. unit tests are super useful (if you can stand to wait for them to finish)

moshe weitzman’s picture

trying to get patch to attach

moshe weitzman’s picture

I uploaded patch to http://tejasa.com/mw.patch

R.Muilwijk’s picture

patch is not working on your site moshe

moshe weitzman’s picture

FileSize
11.23 KB

Here is as a local upload.

R.Muilwijk’s picture

Status: Needs review » Needs work

Unfortunalty this breaks some tests. I ran all the node related modules. Please fix untill then 'code needs work'.

It breaks:
Poll vote
Taxonomy nodeapi
Translation nodeapi

moshe weitzman’s picture

Status: Needs work » Needs review
FileSize
14.38 KB

Fixed the test failures. The remaining failures happen before this patch as well.

R.Muilwijk’s picture

Status: Needs review » Needs work

The tests are fine now... I do however have some comments about the book part

book/book.module:489 and book/book.pages.inc:216 We just have one node here? Why clear the whole cache?

R.Muilwijk’s picture

Status: Needs work » Needs review

Hmmz I did some more testing and see why you do the cache_clear_all now. You need to clear all the parents and the book too.
Just hope we don't add/delete alot of book pages then because we'd have to build the cache allover again.

The tests are all working and I can't find any strange stuff in the patch. Looks RTBC to me. If noone reviews in a few days I'll set to RTBC.

EDIT: It might be good though to comment the cache_clear_all's in book so everybody understands them and makes your code readable.

catch’s picture

I also ran through all tests and had a quick look through the code - everything looks good. Probably worth putting an extra comment in book to make it clear what's happening, so RTBC with those - just needs a quick re-roll to put them in.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
14.52 KB

Added comments to book module's cache flush.

catch’s picture

quick note to confirm this applies cleanly, causes no test failures etc.

R.Muilwijk’s picture

Applied fine, tests are good, RTBC

Morbus Iff’s picture

For what it's worth, the load_cache added here, and the poll submission changes, address all my concerns (#49, #50, #52), so I have no further qualms about the implementation.

moshe weitzman’s picture

Assigned: » moshe weitzman
FileSize
18.21 KB

Fixed a new failure with tracker.test and reworked poll a bit to fix a menu bug that I had introduced in the last version of this patch.

sun’s picture

Status: Reviewed & tested by the community » Needs work

Once the static flag $node->load_cache is set, a node load will not be cached. Even if it is just one property that cannot be cached. Now consider a contrib module that acts on all nodes and cannot cache its properties. This flag reminds me of the 'no_cache' op for filters, which is equally inflexible (but reasonable there). However, I don't see why we do not want to allow f.e. 90% of a node to be cached, and only the rest (if at all) uncached - by caching op 'load' and loading any non-cacheable data in a subsequent op 'after_load' (or 'load_uncached' or similar).

catch’s picture

Status: Needs work » Needs review

@sun, while that sounds like a good idea - is it not something which could be done in a followup patch?

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

@Sun - thats out of scope for this patch. We don't do partial caching of objects anywhere I can think of ... The prior status was RTBC.

chx’s picture

+  // Subtract from the votes.
+  poll_allow_votes($node);

this together does not mesh as the function does not substract anything. Looks copy paste error.

poll_allow_votes should return (besides setting) ->allowvotes so the menu access callback can be return user_access($perm) && ($node->type == 'poll') && !$inspect_allowvotes || poll_allow_votes($node));.

-  $header[] = array('data' => t('Vote'), 'field' => 'pc.weight');

Why is this here?

chx’s picture

Status: Reviewed & tested by the community » Needs work

forgot to update status.

catch’s picture

Category: feature » task
Status: Needs work » Needs review
FileSize
17.71 KB

Now that #324313: Load multiple nodes and terms at once is in I revisited this. Overall it takes much less special casing to add the extra hook (provisionally hook_nodeapi_post_load()) so I went with sun's suggestion from #93 and implemented the hook in poll.module to load user-specific information. Seems like you might want that kind of information when you're not actually viewing a node, so worth loading it earlier on in the process, closer to how it's done at present.

This incorporates #333171: Optimize the field cache as well - meaning the front page will load everything with one cache_get_multiple().

Probably still some tidying up to do, but enough there for review/benchmarking to see if it's in the right direction.

catch’s picture

Benchmarked, looks like it's about 12% improvement on a bloated front page, and the same for an individual node:

ab -c1 -n500 with 5000 nodes, 5000 taxonomy terms, 30 nodes on the front page.

/node
HEAD:

Requests per second: 5.27 [#/sec] (mean)
Time per request: 189.764 [ms] (mean)
Requests per second: 5.35 [#/sec] (mean)
Time per request: 187.017 [ms] (mean)

Patch:
Requests per second: 5.93 [#/sec] (mean)
Time per request: 168.534 [ms] (mean)
Requests per second: 5.89 [#/sec] (mean)
Time per request: 169.785 [ms] (mean)

/node/10
HEAD:
Requests per second: 13.45 [#/sec] (mean)
Time per request: 74.339 [ms] (mean)
Requests per second: 13.66 [#/sec] (mean)
Time per request: 73.210 [ms] (mean)

Patch:
Requests per second: 15.21 [#/sec] (mean)
Time per request: 65.766 [ms] (mean)
Requests per second: 15.32 [#/sec] (mean)
Time per request: 65.274 [ms] (mean)

David Strauss’s picture

I like this patch. As a note to people about the cache warming question, in most cases, when a user saves a node, they go to the viewing page for that node. This viewing takes care of warming the cache.

It might not be a bad idea to only cache nodes that are published. Unpublished nodes should not be kicking published ones out of a cache that has finite space.

moshe weitzman’s picture

Code looks good. Note that creating the new cache_node table can be briefer. Here is how block.install does it:


$schema['cache_block'] = drupal_get_schema_unprocessed('system', 'cache');
$schema['cache_block']['description'] = 'Cache table for the Block module to store already built blocks, identified by module, delta, and various contexts which may change the block, such as theme, locale, and caching mode defined for the block.';

David Strauss’s picture

Status: Needs review » Needs work

Per Moshe's post, please replace the cache table creation code with the brief version.

catch’s picture

Status: Needs work » Needs review

Moshe, the full schema definition is just in the update function - looks like cache_block also does that in update_fix_d6_requirements(). If we can load the schema in the update function itself that'd be great though - is this what you meant?

I noticed that cache_get() returns the full row from the cache_foo table, whereas my conversion to multiple only really wants $item->data. We'll have to get the whole lot then pull ->data out I guess, so marking to needs work until I can re-roll for this. If someone can clarify cache/schema updates that'd be great - I guess I could just try it and see if it works though.

catch’s picture

Status: Needs review » Needs work

crossposted.

catch’s picture

Status: Needs work » Needs review
FileSize
17.83 KB

Re-rolled with brief update function and the cache_get changes mentioned earlier. Also re-ordered some stuff in node_load_multiple so that we can do node_load_multiple(array($nids), array('type' => 'article')) and still grab stuff from the persistent cache (also saves one foreach loop).

I'm getting very, very odd failures with path.test - says that node isn't being created, but manual testing works fine, as does the identical call to createNode() earlier in the test - marking as needs review to run past the test bot.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

FileSize
21.12 KB

Replaced path.test's home grown createNode function with drupalCreateNode() and it works fine now. Also manually replicated the test and everything's working as you'd expect.

The extra hook is now called hook_nodeapi_load_alter() - 'Act on nodes after they've been loaded from the database' (and the database is now either the cache_node table or direct query) - and poll.module implements it. I think this is fairly clear in what it does and what it should be used for, but needs review obviously.

Also incorporated David Strauss' suggestion to only cache published nodes.

Could do with some independent benchmarks and another close look through.

catch’s picture

Status: Needs work » Needs review
swentel’s picture

At this point, benchmarks make my setup slower, which isn't our goal I guess ..

2 other things:
- shouldn't we clear the cache_node table when we install a new module (especially if it has a nodeapi_load implementation)
- drupal_flush_all_caches doesn't flush cache_node, I'd put that there too

catch’s picture

FileSize
22.75 KB

- added cache_node to drupal_flush_all caches
- added node_modules_enabled() and node_modules_disabled() - these only clear cache if either hook_load() or hook_nodeapi_load() is implemented.
- fixed a silly bug where I wasn't actually unsetting the $nids for nodes loaded from the persistent cache - so it was going off and grabbing them from the database too.

swentel’s picture

Ok, benchmarks now show an increase of performance (even on a very old laptop)!
Done with 10 nodes on front page, comments & url aliases, no caching enabled.

Without
Requests per second: 15.09 [#/sec] (mean)
Time per request: 66.276 [ms] (mean)
Requests per second: 15.37 [#/sec] (mean)
Time per request: 65.063 [ms] (mean)

With
Requests per second: 16.48 [#/sec] (mean)
Time per request: 60.679 [ms] (mean)
Requests per second: 16.38 [#/sec] (mean)
Time per request: 61.046 [ms] (mean)

Very interesting patch for logged-in users also and when having lot's of nodeapi_load hooks surely will make your database much happier.

catch’s picture

For those who read through the whole issue, here's a quick summary of stuff that might need clarifying:

The big performance gains shown in initial benchmarks won't be as much now since multiple load saved us quite a lot on the front page - however - it's still 10+% gain on the front page (and likely extra with memcache), and we get around 10% on node/1 too - whereas multiple load didn't buy anything there. Still very good, but important to note that node_load got faster rather than caching getting slower here :)

Eaton's concerns about the extra expense of an uncached node_load() are partially dealt with by multiple load - an uncached node_load() is less expensive than it used to be. Perhaps in a followup patch we could look at multi insert cache_set() as well... not sure how much support there is for that in dbtng yet.

The extra hook added was suggested several times from around #47 onwards but was too late for D6 code freeze, I think Morbus' points about things we might not want to cache persistently in the node object still being node data - votingapi, flag etc. still stand. And hook_nodeapi_load_alter() seems a reasonably named hook for modules to do that sort of thing in.

The current patch still does a cache_clear_all when saving a book outline - pwolanin, do you have ideas on how we could make that less disruptive? Seems like a select query would be a lot less work than re-building the cache.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review

Resetting due to bot issues.

catch’s picture

Status: Needs review » Needs work

pwolanin informs me that SELECT nid from book WHERE bid=%d will work fine for getting all nids in a book - so will re-roll for that change, hopefully tomorrow. CNW for now.

catch’s picture

Status: Needs work » Needs review
FileSize
23.79 KB
23.79 KB

Now with more intelligent book cache clearing. No other changes - that's the main small thing that needed fixing up from earlier patches, so it's ready for proper review now I think. There's one kitten in there with the test which I'll try to split off later if I get time.

catch’s picture

Path test cleanup is over at #348742: Path test should use drupalCreateNode() now.

riccardoR’s picture

FileSize
23.59 KB

I think there are two typos at lines 231 and 242 of the patch within the two new functions node_modules_enabled() and node_modules_disabled()

+      cache_clear_all('*', 'node', TRUE);

should be:

+      cache_clear_all('*', 'cache_node', TRUE);

The patch applies with offsets.
Rerolled it against HEAD with the two corrections above.

riccardoR’s picture

cache_get_multiple() returns an array of items indexed by cid, whose value can either be cache data or FALSE, as returned by _cache_prepare_item()

node_load_multiple(), however, skips database loading for all items returned by cache_get_multiple() without checking if they contain cache data or not.

  // Load any available nodes from the persistent cache.
  $cached_nodes = array();
  if ($nids && !$vid && !$conditions) {
    $cache = cache_get_multiple($nids, 'cache_node');
    foreach ($cache as $item) {
      $cached_nodes[$item->cid] = $item->data;
    }
    // Remove any nodes loaded the $nids to load.
    $nids = array_keys(array_diff_key(array_flip($nids), $cached_nodes));
  }

BTW, The Doxygen comment for cache_get_multiple() says that it returns an array in the form array('hits' => $cache, 'misses' =>$cids)

HTH, riccardo

catch’s picture

FileSize
23.8 KB

riccardo, thanks for the review. I've fixed the docs on cache_get_multiple and it now only returns items which aren't FALSE.

riccardoR’s picture

FileSize
23.8 KB

you are welcome, catch. system_update_7016() just got committed. Rerolled.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

FileSize
23.79 KB

Re-roll.

This has had lots of reviews from various people, the benchmarks are good, anyone want to give it a final look over?

catch’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch failed testing.

Damien Tournoud’s picture

Index: includes/cache.inc
===================================================================
RCS file: /cvs/drupal/drupal/includes/cache.inc,v
retrieving revision 1.27
diff -u -p -r1.27 cache.inc
--- includes/cache.inc	12 Oct 2008 04:30:05 -0000	1.27
+++ includes/cache.inc	8 Jan 2009 11:51:44 -0000
@@ -15,9 +15,50 @@
  * @return The cache or FALSE on failure.
  */
 function cache_get($cid, $table = 'cache') {
-  global $user;
-
   // Garbage collection necessary when enforcing a minimum cache lifetime
+  _cache_garbage_collection($table);
+  $cache = db_query("SELECT data, created, headers, expire, serialized FROM {" . $table . "} WHERE cid = :cid", array(':cid' => $cid))->fetchObject();
+  return _cache_prepare_item($cache);
+}

- I know this is not touched by that patch, but there is a dot missing at the end of the garbage collection comment.
- The query should be dynamic (again, not touched by this patch, but you are writing a dynamic one in cache_get_multiple()...

+
+/**
+ * Return data and cache misses from the persistent cache when given an array
+ * of cache IDs.

This code comment should be in one line.

+ *
+ * @param $cids
+ *   An array of cache IDs for the data to retrieve.
+ * @param $table
+ *   The table $table to store the data in. Valid core values are
+ *   'cache_filter', 'cache_menu', 'cache_page', or 'cache' for
+ *   the default cache.
+ * @return
+ *   An array of the items successfully returned from cache indexed by cid.
+ */
+function cache_get_multiple($cids, $table = 'cache') {

You can type-hint $cids as an array.

[...]

+/**
+ * Utility function for preparing a cached item.
+ *
+ * @param $cache
+ *   An item loaded from cache_get or cache_get_multiple
+ *
+ * @return
+ *   The item with data unserialized as appropriate.
+ */
+function _cache_prepare_item($cache) {

You can also type-hint $cache as a StdClass.

--- modules/book/book.module	29 Dec 2008 16:03:56 -0000	1.483
+++ modules/book/book.module	8 Jan 2009 11:51:47 -0000
@@ -530,12 +530,29 @@ function _book_update_outline(&$node) {
         ->execute();
     }
     else {
-      if ($node->book['bid'] != db_query("SELECT bid FROM {book} WHERE nid = :nid", array(
-          ':nid' => $node->nid,
-        ))->fetchField()) {
+      $old_bid = db_query('SELECT bid from {book} WHERE nid = :nid', array(':nid' => $node->nid))->fetchField();
+      if ($node->book['bid'] != $old_bid) {
         // Update the bid for this page and all children.
         book_update_bid($node->book);
       }
+      else {
+        // If the node hasn't moved between books, ensure we don't clear the
+        // cache twice for no reason.
+        unset($old_bid);
+      }
+    }
+    // Clear the node_load() cache for any nodes in the original book if the
+    // node has moved to a new book.
+    if (!empty($old_bid)) {
+      $result = db_query('SELECT nid FROM {book} WHERE bid = :bid', array(':bid' => $old_bid));
+      foreach ($result as $record) {
+        cache_clear_all($record->nid, 'cache_node');
+      }
+    }
+    // Clear the node cache for all nodes in the current book.
+    $result = db_query('SELECT nid FROM {book} WHERE bid = :bid', array(':bid' => $node->book['bid']));
+    foreach ($result as $record) {
+      cache_clear_all($record->nid, 'cache_node');
     }
 
     return TRUE;

I'm not sure this micro-cache expiration approach really worth it. Just truncating the table should probably be a faster operation, especially on big books hierarchy. Has this been thought about?

--- modules/book/book.pages.inc	20 Dec 2008 18:24:35 -0000	1.13
+++ modules/book/book.pages.inc	8 Jan 2009 11:51:47 -0000
@@ -214,6 +214,8 @@ function book_remove_form_submit($form, 
       ->condition('nid', $node->nid)
       ->execute();
     drupal_set_message(t('The post has been removed from the book.'));
+    // Completely clear the node_load() cache since this change can affect many nodes.
+    cache_clear_all('*', 'cache_node', TRUE);
   }
   $form_state['redirect'] = 'node/' . $node->nid;
 }

Oh, here we simply clear the whole cache_node table. Why two different approaches?

[...]

--- modules/node/node.module	4 Jan 2009 19:56:51 -0000	1.1009
+++ modules/node/node.module	8 Jan 2009 11:51:51 -0000
@@ -152,6 +152,28 @@ function node_cron() {
 }
 
 /**
+ * Implementation of hook_modules_enabled().
+ */
+function node_modules_enabled($modules) {
+  foreach ($modules as $module) {
+    if (function_exists($module . '_load') || function_exists($module . '_nodeapi_load')) {
+      cache_clear_all('*', 'cache_node', TRUE);
+    }
+  }
+}

- Those should be drupal_function_exists() calls.
- I get the "_nodeapi_load" part, but why is the "_load" part required?

[...]

 @@ -782,20 +805,21 @@ function node_load_multiple($nids = arra

- Why only caching published nodes ($node->status > 0) ?

--- modules/path/path.test	5 Dec 2008 22:18:45 -0000	1.5
+++ modules/path/path.test	8 Jan 2009 11:51:51 -0000

Nice clean-ups!

catch’s picture

Dynamic query - yes it should be one - but don't you have a patch somewhere for that already?

Book cache expiration - this was requested earlier in the issue, so I added it in. I think on sites with only a few book nodes and lots of other content, it's nice to not clear the entire cache unless necessary - but then it's a lot simpler/more elegant if we just do a cache_clear_all. The two different approaches - most likely because I'm an idiot and forgot the other one ;). Would be good to get some more feedback on whether this is worth doing or not, I don't think either option is ideal to be honest.

nodeapi_load() and _load() - afaik it's quite possible for a module to be disabled, but the nodes to persist in the database - say forum topics or something - forum.module doesn't do an automatic node delete for all forum topics, even on hook_uninstall(), and nor should it. afaik poll is the same - but the cache definitely needs to be invalidated in those cases.

Only caching published nodes was David's suggestion - the idea is to conserve space in memcache (or wherever) by not taking up space with nodes which will be loaded rarely. The only people who should ever be viewing un-published nodes are administrators, and we don't really need the extra 5-10% performance (or for that matter extra cache_set()s on stuff which might not get viewed again) just for them.

Path test cleanup has it's own issue I think, would be nice to disentangle that from here if possible.

catch’s picture

Status: Needs work » Needs review
Issue tags: +Performance
FileSize
20.84 KB

Re-roll.

Path test changes got committed, so those are out.

Type hinted cache_get_multiple() to array.

Did not type hint _cache_prepare_item() because a record returned from the IN() query can be FALSE as well as an stdClass.

Reverted back to a straight cache_clear_all() in book.module - it's risky foreaching like that compared to a straight clear, unless we add an array option to cache_clear_all() and DELETE FROM all at the same time - but then that's adding complexity for not necessarily much benefit.

Fixed some comments.

That query in cache_get() is impossible to convert at the moment - per #344088: cache.inc cannot be fully converted to dbtng - we shouldn't try to fix that here.

I'm getting some test failures, but also seem to be getting those with HEAD (and in suspicious places like aggregator and profile tests with no node interaction, as well as book and translation which are touched by the patch), so uploading and setting back to CNR to see what the bot thinks.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
21.43 KB

system_update_N() conflict.

Here's another re-roll - I'm getting four exceptions from the new node submitted test - which manually posts a node and then runs it through theme('node_submitted'). There's nothing in there which should break, but it appears drupalGetNodeByTitle() returns FALSE for the second test function (it's fine for the first one, and elsewhere in node.test). This doesn't break without node_load() caching, or when running the same steps manually, so I'm not sure what's happening there. Marking to CNR only to get the test bot to run it.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
20.21 KB

Union fixes in the patch. instead of += now we have $nodes = $nodes_to_process + $nodes; and $node_cache = $nodes_to_process + $node_cache;.

catch’s picture

FileSize
21.42 KB

Removed some unrelated whitespace which had crept in, along with some variations in phrasing in the book cache_clear_all() inline comments.

chx’s picture

Status: Needs review » Reviewed & tested by the community

The testing bot likes it. The code is solid. Let's rock.

sun’s picture

Status: Reviewed & tested by the community » Needs work

Is there a reason why we use a static query in cache_get() and not a dynamic one like in cache_get_multiple() ?

+function _cache_garbage_collection($table) {
+  global $user;
   $cache_flush = variable_get('cache_flush', 0);

Missing blank line.

+ * @param $cache
+ *   An item loaded from cache_get or cache_get_multiple

References to functions should use a trailing "()" to make them obvious.

+ * Act on nodes after they've been loaded from the database.

Please try to avoid abbreviations in PHPDoc at all cost.

+ * @see poll_nodeapi_load_alter()
+ */
+function hook_nodeapi_load_alter($nodes) {
+}

Instead of the @see, we want to provide actual example code here. Easiest way: Copy directly from poll.module.

+  // Pass all nodes loaded from the persistent cache or the database through
+  // hook_nodeapi_load_alter() to allow modules to add information which can
+  // not be persistently cached.

Albeit correct, it would be good if we could use "cannot" here, since this comment reads like the opposite when quickly scanning it.

@@ -865,25 +889,55 @@ function node_load_multiple($nids = arra
...
-  return $nodes;
+  return $nids;

Are we returning nids or nodes?

+ * Clear the node_load() cache for all nodes in the translation set, so we
+ * have the proper translation set information in every node.

First PHPDoc summary line must be on one line.

Aside from those minor points, good job, catch!

Definitely RTBC afterwards.

catch’s picture

Status: Needs work » Needs review
FileSize
22.29 KB

- Is there a reason why we use a static query in cache_get() and not a dynamic one like in cache_get_multiple() ?

@sun Yes, a dynamic query brings the universe crashing in on itself - see #344088: cache.inc cannot be fully converted to dbtng (Damien asked the same question above).

edit: also I left the @see for poll in, in case someone changes the documentation example later. But copied it in for now too.

Otherwise all great changes. Here's a re-roll, passes all tests locally.

webchick’s picture

Could someone please summarize what this patch does? (And remove this tag after.) I actually did read the entire issue, but it's not clear to me exactly how this patch has progressed since its first iteration that was rolled back and had people freaking out about it.

catch’s picture

Here we go.

The rollback happened for two reasons:
1. eaton's concerns about cache clearing, although some were a false alarm it's worth going over.

When nodes are saved, we only clear the cache for that individual node. When translations are saved we save them for the translation. When any multi-node operation happens (moving book pages, deleting content types), we clear the whole cache.

In the case of book module, there's the potential for some micro-clearing (see #124), but I removed that for simplicity, and because without cache_clear(array(cid1, cid2)) we'd have to do a DELETE FROM for each record, which on Drupal.org handbook pages could get nasty.

In terms of priming the cache, since node_load() itself is less expensive in many cases, in the worst case we're adding 30 inserts to an operation we've just removed 100+ SELECTs from. Since those 30 will be served from one query (or memcache) the next time they're requested, I think any concerns about re-priming the cache are less compelling than they were - although the performance increase is also a bit less impressive (still around 10% with core modules though and potentially a lot more on real sites).

Two potential followup patches to enable some more fine grained clearing would be:
cache_clear_all(array $cids)

And if it's possible to do multi-update/insert with the new database layer, potentially cache_set_multiple() as well - so a single insert when caching every node on a listings page.

2. Potential for stale data

The original patch which was rolled back left any uncached operations entirely to node_view - Morbus had good reasons in #45 why this kind of information is still 'data' and doesn't belong in hook_nodeapi_view().

Later iterations of the patch added a 'don't cache me' flag to nodes, but this means that one naughty module could effectively disable node caching for the entire site. Adding a new nodeapi $op wasn't an option late in the D6 freeze, hence postponing this rather than a quick re-roll.

So, I added hook_nodeapi_load_alter()

The order of execution now looks like this:

Fetch from the static cache if possible.

Fetch anything remaining from{cache_node}

Fetch anything /still/ remaining from {node}, hook_load() and hook_nodeapi_load().

Store the fully built node objects with cache_set()

Invoke hook_nodeapi_load_alter() to allow for current user/language/vote-type data to be added - which can't be cached persistently, doesn't belong in hook_nodeapi_view(). poll_nodeapi_alter() is the implementation of this hook in core.

Add the fully loaded and altered node to memory so that we don't run poll_nodeapi_load_alter() 13 times on a page request.

hook_nodeapi_alter() will have exactly the same performance as the current hook_node_load() in HEAD - you can load multiple stuff in one query still etc., so if a module needs to use it, they don't lose any performance compared to what we have now, and can't drag down everything else with them like the NO_CACHE flag would have allowed them to.

I think that covers all the new stuff.

webchick’s picture

Status: Needs review » Needs work

Yay! Thanks for the useful summary! :)

+ *   The table $table to store the data in. Valid core values are
+ *   'cache_filter', 'cache_menu', 'cache_page', or 'cache' for
+ *   the default cache.

And yet, from drupal_flush_all_caches():

+  $core = array('cache', 'cache_block', 'cache_filter', 'cache_node', 'cache_registry', 'cache_page');

Why are cache_block, cache_node, and cache_registry not in the PHPDoc? Are they special in some way or is this just an omission?

The various cache_clear_all('*', 'cache_node', TRUE); lines in book.*.* make me very nervous, given that Drupal.org has a very active documentation team whose long-term plans are set to involve storing new handbook pages in a "test" area of sorts and selectively moving these into the "real" handbook hierarchy by volunteers... so this entire cache is probably going to get cleared multiple times per day. I voiced these concerns on IRC, but catch said (paraphrasing) that clearing the cache in a more fine-grained way was a hard problem to solve and best pushed to a follow-up patch.

I'm not sure of the name hook_nodeapi_load_alter(). I'm wondering if hook_nodeapi_postload() or hook_nodeapi_load_uncached() or something makes more sense, given that we also have hook_nodeapi_presave() and hook_nodeapi_alter(). Let's bicker about it on IRC. ;D

+ * This hook should be used for altering or adding information to the node
+ * object which can not be persistently cached. For example depending on the
+ * current user or language. The results of this hook are cached during the
+ * course of a page request, so you should not alter fields returned from the
+ * {node} table here.

"For example" needs a comma. I'd maybe rephrase this to something like "For example, node properties which are dependent on the current user or language." as currently it doesn't really read as a complete sentence.

I don't understand what "The results of this hook are cached during the course of a page request" has to do with "you should not alter fields returned from the {node} table here"... feels like I'm missing a part of the discussion that went on. Could we expand this out so it's a bit more clear?

+function hook_nodeapi_load_alter($nodes) {
+    global $user;

global $user is indented two too many spaces.

+      $node->allowvotes = FALSE;

We don't smoosh words together. $node->allow_votes. (Bah. Nevermind. The code that you're copy/pasting from already has it that way. Curses. Follow-up patch. :P)

-    // Add nodes to the cache if we're not loading a revision.
+
+    // Cache published nodes.
     if (!$vid) {
-      $node_cache += $queried_nodes;
+      foreach($queried_nodes as $node) {

I feel like we lost some clarity in the comments here. Unless I'm mis-reading, we still don't cache nodes if we're loading a revision.

Also, I don't see any new tests added with the addition of cache_get_multiple() and similar. Seems like it would be a good idea to add some, no?

That's about all I can see. I'll also ping Dries to take a look at this, since he's committed most of the other node performance-related patches, so I'd feel better w/ his okay before committing.

catch’s picture

FileSize
28.52 KB

We discussed the naming of the new hook in #drupal a lot a few weeks ago and I was originally in favour of hook_nodeapi_post_load() so I've changed it back to that, and chx +1ed on irc.

Book caching, I've made a change to cache_clear_all() to allow it to take an array as the first parameter - this was already discussed in irc last night and I wanted to defer it, but it actually simplifies things a fair bit. This is useful for taxonomy and translation as well (at least if you had translation in 50 languages or something).

Also added unit tests added for both cache_get_multiple() and the new cache_clear_all() behaviour. cache_clear_all() might do with splitting out into some different functions, since it does a lot of argument checking internally already, but that's definitely not for this issue, and shouldn't postpone it either.

However I then realised that _book_update_outline() is called on every node_save/update, and doesn't call node_save() for child nodes either - which means I was wrong on the frequency of cache clears earlier. This means in the case of book itself, I've moved book_nodeapi_load() to book_nodeapi_post_load() - which isn't cached persistently at all and means no chance of meltdowns on Drupal.org. {menu_links} pretty much acts as a cache anyway since it's all denormalized. Since the actual menu tree shuffling is dealt with outside of hook_nodeapi_save() this is the proper use for that new hook.

Since it's feasible that the array sent to cache_clear_all() could contain a few thousand cids, I've added a safety valve to cache_clear_all() to avoid max_allowed_packet errors. We always send the same data into cache_clear_all(), but if it's over a certain threshold, clear the entire cache instead. I've made this a variable and set it to 1000 to be conservative. Deleting a term is a relatively rare occurrence on most sites, particularly ones which are associated with thousands of nodes, so I don't think the book issues apply to the same extent, and anything up to 1000 nodes by default will be cleared in a single DELETE FROM as opposed to either looping through the array or clearing the whole lot.

Also went through and cleared up the comment issues identified by webchick.

catch’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review

Broken HEAD.

moshe weitzman’s picture

Looks really strong. Minor nits only:

  1. We have an abbreviated table list in doxygen for cache_get_multiple()
  2. Doxygen for _cache_prepare_item() is not so descriptive
  3. Sorry to pester about this, but hook_nodeapi_after_load() is slight better than hook_nodeapi_post_load() the word 'post' has ambigous meaning here.
  4. Perhaps add a comment to hook_nodeapi_post_load about why we are not caching node_load for books
  5. typo: hook_modules_disabed
  6. Olympic gymnastics medal for catch: - array_keys(array_diff_key(array_flip($nids), $cached_nodes));
catch’s picture

3. I can get behind hook_nodeapi_after_load() - but webchick also mentioned hook_nodeapi_uncached(). I'm a bit worried about using uncached, because it is statically cached for a page request - but that's also its purpose. We should probably hash this out one more time in irc.

6. helper function maybe? I want to do this for more bits of the multiple load functions - but there's enough variation in the objects we're dealing with to make it hard - again I'll try to grab you in irc to see if it can be rationalised a bit.

Will re-roll when these two are resolved.

webchick’s picture

after_load() is fine with me. It's a little awkward paired with "pre-", but it's true that "post" is *extremely* ambiguous.

LOL @ Olympic gymnastics medal.

catch’s picture

FileSize
29.13 KB

Alrighty, this should address all those issues apart from the gymnastics.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

Thanks catch. The gymastics is a compliment - no neded to change the code.

Lets move this along - RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
29.13 KB

system_update_7018() was committed. No other changes.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
28.48 KB

Re-rolled for whitespace changes.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Since there were no changes since this was RTBC, and the re-roll didn't introduce any errors, bumping this back up.

catch’s picture

FileSize
28.48 KB

re-rolled for offset/fuzz.

wedge’s picture

-    // Add nodes to the cache if we're not loading a revision.
+
+    // Only published nodes and active revisions are cached since these are
+    // not loaded sufficiently often to justify the additional space and
+    // complexity required to do so.
+    if (!$vid) {
+      foreach($queried_nodes as $node) {
+        if ($node->status) {
+          cache_set($node->nid, $node, 'cache_node');
+        }
+      }
+    }

Minor nit but is the "not" in the new comment really supposed to be there?

catch’s picture

FileSize
28.48 KB

Well spotted, re-worded that comment.

chx’s picture

Whether duplicating field caching is a problem or not -- I would say, no. Storage is, for all practical purposes, free. As long as there is hook_nodeapi, caching nodes are necessary and bothering with excluding is just that: bothering and unnecessary.

Page caching and filter caching are already duplicating and noone is screaming.

Dries’s picture

I'm trying to figure out where exactly the field API caching happens. Can you point me to the code?

Storage might be free, but checking the cache (and updating the cache) requires an SQL queries.

David Strauss’s picture

Regardless of Field API caching, caching entire loaded nodes is a good idea. We should strive to cache as late in processing as possible, even if there are earlier caches.

catch’s picture

FileSize
29.32 KB

The cache_set/get() is in _field_attach_load(). This also checks if the object_type is cachable or not.
http://api.drupal.org/api/function/_field_attach_load/7

So, in node_fieldable_info() there's this

function node_fieldable_info() {
  $return = array(
    'node' => array(
      'name' => t('Node'),
      'id key' => 'nid',
      'revision key' => 'vid',
      'bundle key' => 'type',
      // Node.module handles its own caching.
      // 'cacheable' => FALSE,
      // Bundles must provide human readable name so
      // we can create help and error messages about them.
      'bundles' => node_get_types('names'),
    ),
  );
  return $return;
}

If we uncomment cacheable in node_fieldable_info(), then fields loaded and saved from node objects won't be cached in {cache_field} due to the checks in _field_attach_load() - this means zero duplication between cache_node() and cache_field(). Attached patch does this ;).

Also, note that on a node listing, every cached node in the listing is retrieved in a single query (via cache_get_multiple()), and we avoid hitting the database at all if all nodes are cached and are using memcache as a back end.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Reviewed & tested by the community

All tests pass locally for me.

catch’s picture

Did some more benchmarks over at #369011: Performance: do check_markup() in text_field_load() - we get about 8% improvement with this, even with only one hook_nodeapi() module enabled and 10 nodes on the front page. That 8% is well worth having, regardless of whether every hook_nodeapi_load() gets converted to Field API.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
28.86 KB

Status: Needs review » Needs work

The last submitted patch failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
28.85 KB

getting back to this.

updated patch, conflict with update_N function only, no other changes.

Anonymous’s picture

test nitpick:

+    cache_clear_all(array('test_cid_clear1', 'test_cid_clear2', 'test_cid_clear3'), $this->default_table);
+    $this->assertFalse($this->checkCacheExists('test_cid_clear1', $this->default_value)
+                       || $this->checkCacheExists('test_cid_clear2', $this->default_value)
+                       || $this->checkCacheExists('test_cid_clear3', $this->default_value),
+                       t('All cache entries removed when the array exceeded the cache clear threshold.'));
+  }

don't we need another entry or ask for less than the 3 items to be deleted? you're creating three items, then asking that three be deleted, so not sure how that tests the threshold.

also, on the threshold functionality, i'm wondering if we shouldn't just chunk the queries? seems like this is less surprising? something like (not tested):

      $cache_clear_threshold = variable_get('cache_clear_threshold', 1000);
      if (count($cid) > $cache_clear_threshold) {
        do {
          $cids = array_splice($cid, 0, $cache_clear_threshold);
          db_delete($table)
            ->condition('cid', $cids, 'IN')
            ->execute();
        }
        while (count($cid));
      }
Anonymous’s picture

Status: Needs review » Needs work

discussed the chunking idea with catch in #drupal and he's ok with it, will reroll shortly.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
28.82 KB

updated patch attached.

catch’s picture

FileSize
28.85 KB

Discussed this with Peter and Damien, $cids now passed by reference to cache_get_multiple() so the gymnastics stays in cache.inc

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
28.91 KB

This time with less fail.

Anonymous’s picture

Status: Needs review » Reviewed & tested by the community

my only issues with this patch have been addressed, test bot is happy, i think this is RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
28.86 KB

re-rolled for hook_nodeapi. No other changes.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
28.79 KB

rickrolled.

someone needs to put this issue out of its misery, either way.

Status: Needs review » Needs work

The last submitted patch failed testing.

Berdir’s picture

FileSize
29.48 KB

The function translation_node_get_translations() has an internal static cache (not sure why, is that called so often?), this function is called multiple times, so when called from the new translation_clear_node_cache() it didn't contain the latest translation because there is already a cached array of translated nodes which was created earlier.

Added a $reset parameter to that function and call it from translation_clear_node_cache(). All tests should now pass again.

Berdir’s picture

Status: Needs work » Needs review

That form doesn't like me. I'm sure I updated the status...

catch’s picture

Status: Needs review » Reviewed & tested by the community

That change looks fine - we can look at whether translation_node_get_translations() actually needs a static cache in a separate patch.

Taking advantage of not being the last patch author and marking back to RTBC. Would really appreciate a review from Dries here since it's been 7 weeks since his last questions were answered and it seems webchick isn't going to decide whether this gets committed or not.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
29.48 KB

system update collision, no other changes

bjaspan’s picture

I hate it when people parachute into my issues late in the game, drop a bomb, and then run away very fast. But people do it to me all the time lately, Dries asked me to comment on this patch, and I can't handle reading 187 comments at the moment, so now I'm going to do it, too. I'm sorry, really.

1. Field API caches field loads if asked. Do we *really* need a separate node cache? As more things become fields, it seems like the complexity of the node cache will become less worthwhile, but still stay with us. Also, Field API leads Drupal in a direction in which nodes are not the only things that are content. Why not cache user_load? Or comment_load? Or every_other_fieldable_type_load? Yes, obviously node is the thing we have the most of it and is viewed the most often, but this seems to be adding more special-case logic to Drupal instead of removing it.

2. This really, really disturbs me:

@@ -391,6 +393,9 @@ function taxonomy_term_delete($tid) {
 
       $term = taxonomy_term_load($tid);
 
+      // Clear the node cache for all nodes associated with this term.
+      taxonomy_clear_node_cache($term->tid);

So, now taxonomy_term_delete() is required to know that node module maintains a load cache which taxonomy module is required to clear under certain circumstances. Special cases like this are going to pop up everywhere and make Drupal harder to work with. This is anti-modular programming. It feels a lot like having to know you need to clear node_load()'s static cache after calling node_save() which we *finally* just got away from.

Can you guarantee that there will never a field type that when saved will require the node cache to be cleared? If not, you are creating an impossible solution because Field API will not hard-code logic like this for the field module if I have anything to say about it.

Running away now...

catch’s picture

Grabbing your shirt before you get away...

#1: We'll always get more benefit caching at the highest level possible because the most work has been done. So if we can cache fieldable objects, we probably should, and the field API allows for this to happen currently.

With database caching, the field cache requires a round trip to the database for every object (+ the node_load and hook_node_load() round trips we already have) - because it retrieves from cache on a per-object basis. With this patch, it's a single query for the entire array of node objects (see cache_get_multiple()).

So as far as I understand it, the body and teaser as fields patch, on a front page with 30 nodes, will actually cause an additional 30 cache_gets() on top of the node_load_multiple() query - that's going to slow things down on a default install. I'll come into your issue and drop some benchmarks soon, don't worry :)

#2: this could be done in node_taxonomy_term_delete() :p I should note that taxonomy module is tied into node module in far worse ways than this. When and if taxonomy vocabularies are fully converted to a field, this special casing will go with all the rest of it. That's still a long way off though, as is the conversion of other core hook_node implementations to fields.

#3: Fields are saved as part of object saving - which clears the cache for the node anyway. I think we're going to end up with a use case for field_attach_load_uncached() (similar to hook_node_post_load()) anyway - and field API shouldn't prevent any fieldable object from being persistently cached should it? If that worked like hook_node_post_load() - we'd get still get one extra query per list of fieldable objects (per uncacheable field) instead of 10-30 with the field cache by default.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
28.96 KB

dbtng conversion conflicts.

Nearly three weeks since the last update here. There are benchmarks in #20, #99 and #111 and there's an issue summary in #139. David Strauss gave much the same answer to Dries regarding the field cache back in February in #162 as I did in my response to Barry three weeks ago, no-one's actually addressed what's been said on that point. If this is won't fix it'd be nice to know instead of having it sit in the RTBC queue for another three months.

Dries’s picture

Status: Reviewed & tested by the community » Needs work

Like Barry in #188, I agree that things get messy when everyone and his dog need to know when to clear the node cache, something which doesn't seem to be necessary with the field level caching. It seems abstracted better, and eventually, will do the same job -- especially with some of the fields related follow-up patches in the issue queue.

I'm tempted to mark this postponed until we dealt with some of the fields API patches. At this point, it seems better to revisit this patch after we drove those home. I'm worried that this patch adds complexity on top of an already new and complex (albeit properly designed) field API. Combined with the reasons outlined above, I'd prefer to keep things simple for the time being. Can we agree to revisit this patch later?

Also, the hack for "max_allowed_packet" in _cache_prepare_item() is a no go.

catch’s picture

I just posted benchmarks for the first core field conversion at http://drupal.org/node/372743#comment-1493570

That's a 30% slowdown vs. a 10% improvement here. We already have a valid comparison in terms of performance impact for object level caching vs. the field cache - and that's on vanilla HEAD with only comment and taxonomy modules enabled (and no comments or taxonomy terms attached to the nodes in those benchmarks either).

Deferring this until comment, taxonomy, poll and other hook_node_load() implementations are converted means we'll end up with the same situation we had 170 posts ago - it'll be too close to code freeze and it'll need to be postponed until Drupal 8. And meanwhile we made core xx% slower again.

Obviously the final decision is yours, but unless someone's got another way to claw back our very poor performance in HEAD I'm concerned that keeping things simple is going to mean everyone has to run memcache and materialized view API on their Drupal 7 sites just to keep up with stock Drupal 5.

yched’s picture

Do you think node-level caching would increase performance of 'body as field' (or 'X as field') ? From a first look at #438570: Field API conversion leads to lots of expensive function calls, it looks like the PHP overhead is added by the whole field workflow, and probably more the 'display' part (theme(), drupal_render(), check_plain()...). I don't see how node cache vs field cache on node load would make things better or worse here.

Other than that, the road is indeed very long until all current core node additions have been migrated to fields... There's no guarantee we'll get there in D7.

catch’s picture

FileSize
6.23 KB

Yeah I did the profiling because 40% of a request seemed like far too much for 20 cache_gets. I think the extra database access from the field cache as it stands now is responsible for 3/40 of that hit compared to the status quo - going by the results of #333171: Optimize the field cache at least.

However I just benchmarked 10 nodes again, this time with comment and taxonomy modules disabled
HEAD:
11.80 / 11.79 [#/sec]

Patch from #191:
12.82 / 12.78 reqs/sec

That's 8% improvement with neither field_attach_load() nor hook_node_load() being called at all. When nodeapi and/or field API are added on top, we'll see additional improvements from caching either of them - but the baseline of 8% is still going to be there.

Benchmarks attached.

catch’s picture

FileSize
28.67 KB

In response to this:

I agree that things get messy when everyone and his dog need to know when to clear the node cache, something which doesn't seem to be necessary with the field level caching.

I'm not sure how the field cache differs in this. For simple fields like number and field it's nice and easy to keep those self-contained since field API is the only entry point - this is equally easy for object-level caching.

However, any field which references something that has it's own independent existence isn't going to be as clean and self-contained though.

Here's a very simple example with nodereference:

For example:
1. Create node A
2. Create node B - reference node A
3. Delete node A

The cache_content entry for node B still contains the reference to node A. If you go back to edit node B then save it, you'll get a validation error because the deleted node is still a value displayed in the form (edit, this is actually because the field table doesn't get updated either, which looks like a straight bug).

The only reason that stale data doesn't get displayed when viewing node B is because the actual node titles are queried individually by the formatter on every request.

/**
 * Helper function for formatters.
 *
 * Store node titles collected in the curent request.
 */
function _nodereference_titles($nid, $known_title = NULL) {
  static $titles = array();
  if (!isset($titles[$nid])) {
    $title = $known_title ? $known_title : db_result(db_query("SELECT title FROM {node} WHERE nid=%d", $nid));
    $titles[$nid] = $title ? $title : '';
  }
  return $titles[$nid];
}

This takes the nids, which may or may not have been deleted, then issues a query per title to actually display them. Non-existing nids get an empty string. This code hasn't changed between D6 and D7.

If the same model was used for taxonomy terms in core, you're looking at objects * terms queries just to get the term names - a considerably worse performance situation than either D6 or HEAD for taxonomy module where it's been a query per node in D6 and two queries per page in D7. And you'd still have stale cache entries.

Additionally, whereas it's pretty easy to get the nids attached to a term and do cache_clear_all($nids 'cache_node'), for fields, everything is in one big cache table and it's not going to be possible to generate a list of cids from a single query, since the keys aren't just an object type and id. So you'd have to clear the entire field cache potentially every time a node or term is updated if using nodereference and term_field modules - something which this patch has taken care not to require.

So to me it looks like this is currently an unsolved problem in both D6 CCK and field API and I find it odd that claims are being made to the contrary. In fact there's both stale data being left around, and potentially expensive stuff doesn't get cached at all. So it's going to be messy one way or another, it's more a case of whether it's going to be messy and fast or messy and slow.

Attached patch removes the max_allowed_packet workaround from cache_clear_all().

catch’s picture

Status: Needs work » Needs review
yched’s picture

The argument could be : if you implement a field type that is a reference to some entity (taxo term, file, node...), then it's up to you to clear the field cache in hook_[entity]_delete(). A field type module requiring a field cache clear is less spaghetti than taxonomy.module requiring a node cache clear.

It also answers the fact the module defining entity X can live in core, and the module defining the 'X-reference' field type lives in contrib - or that both can be different contrib modules.

Refrential integrity is and has always been a complex issue for 'reference' fields. Right now, if taxonomy was a field ('term_reference field'), field API offers no easy way to do the equivalent of taxonomy.module's current 'on term delete, remove all node associations for that term'. Current reference fields (noderef / userref) do their validity checks at display time.

sun’s picture

I agree with catch. Drupal should cache as much as possible. That's one of main reasons why Drupal can't compete with several other systems.

However, it seems that (somehow? suddenly? after very much work has been done?) there seem to be worries about the invalidation of this additional cache. On an attempt to at least allow for node_load caching, would it be possible to add the bare minimum to core? So people like catch and me (who prefer fast Drupal sites) can use a node_cache contrib module without requiring core hacks?

Basically, what I mean is to turn our object storage process and arbitrarily invoked cache* functions into a real Cache API, (also) allowing modules to implement hooks to act upon creation/manipulation/deletion of objects _and_ cache entries. Definitely a separate issue.

I can feel your pain, catch.

catch’s picture

The argument could be : if you implement a field type that is a reference to some entity (taxo term, file, node...), then it's up to you to clear the field cache in hook_[entity]_delete(). A field type module requiring a field cache clear is less spaghetti than taxonomy.module requiring a node cache clear.

Less complexity, but means the entire field cache would be cleared every time any object which has a reference field gets updated or deleted. On some sites that's going to be every second if they have nodereference and lots of node updates - even if they have a million nodes and only five references.

If we don't clear the cache (let alone the referential integrity issues, although that's only for deletes), then we're introducing exactly the same issues which resulted in an early version of this patch being rolled back in 2007 - stale data. The complexity added in taxonomy module in this patch is precisely to avoid stale data and overly aggressive clearing of caches, this would be considerably harder to implement with a field level cache for the reasons given above - a single bin for all fieldable objects and complex to generate cache IDs.

Seems worth mentioning at this point that it was stale data which caused the D6 patch to be rolled back 150-odd followups ago and most of the work since then was to deal with those issues. That extra work is now being turned around as a reason to defer it the patch in favour of an architecture which has even more severe issues on this point. and given how nodereference avoids using the field cache for anything other than nids, considerably worse performance, even if we got all the conversion done in time to use it 'fully'.

@sun - that's somewhat where I've been planning to head with multiple load - although at this stage still a pattern rather than an API because an API takes us into ORM land, hence updating this patch rather than going for a completely new approach. For an abstracted way to do intelligent cache clearing, I've added some notes over at #439186: Cacheable object API.

yched’s picture

If taxonomy and file uploads can be attached to nodes only, then I'm not overly shocked that they need to be aware of a node cache and sometimes have to manually clean it when terms or files are deleted. Similarly, if we move to 'taxo as term_reference field' or 'uploads as file_reference field' (and I think we know we'd like that), I think it's alright that those field types are aware of the field cache and manually clean it when needed. Where the node cache is problematic, though, is that a X_reference field type then also needs to be aware of node cache, or any other entity-level cache, and act on it. This breaks a pattern. As a field type, you shoudn't even know about entity types.

I see the node cache as a (hopefully) temporary optimization while we move towards fields as the primary, cross-entity 'data attach' mechanism - which can take some time. In the end, we should aim at field API and field-level cache being flexible and performant enough not to require fieldable entities to maintain their own object level cache. The discussion here shows, for instance, that we need to define a strategy for cache and referential integrity issues. But it's possible that when D7 freezes, some but not enough of the current node additions have been migrated to fields, and that we're not able to state that the performance outcome that once was identified from node caching is satisfyingly provided by field cache. Then, we might want to ship with both, and the node cache becomes a special case every X_reference field needs to care about.

HEAD being sort of in-between right now, it's a tough call. I guess my feeling would be that we have the patch ready just in case, and that we can focus on moving ahead with 'X as field' tasks, and make sure they are as performant as they can be. The discussion here, the spinoff issues and the benchmarks catch already ran on 'body as field' are precious in this regard.

Status: Needs review » Needs work

The last submitted patch failed testing.

joshk’s picture

@sun & @catch:

attempt to at least allow for node_load caching, would it be possible to add the bare minimum to core? So people like catch and me (who prefer fast Drupal sites) can use a node_cache contrib module without requiring core hacks?

++ On this. I apologize if this seems like "parachuting in," but if we want Drupal to perform well, the ability to cache object-level data is critical. We should at the very least be able to do this without hacking patching core. Maybe, as Dries suggests, we return to this after the Field stuff is worked out, but I see the field work as being a long road, and am concerned that we won't get all the way there, and thus miss out on the much lower-hanging fruit.

@bjaspin:

Field API caches field loads if asked. Do we *really* need a separate node cache?

If you want drupal to be fast and support very large use-cases, then yes, we do. Object-level caching is a best-practice for scalability and high-performance architecture. Field-level caching presents just as much complexity, but with less performance gain because of increased PHP overhead. Why? It's not uncommon for node types have a lot of fields. Is it faster to get and assemble 15 caches or 1? What about all the things that are critical application-level node data, but not quite "content" (e.g. comment statistics, etc).

Also, Field API leads Drupal in a direction in which nodes are not the only things that are content. Why not cache user_load? Or comment_load?

You bet we should! Advcache module does just these things. Having a general cache api for the important objects in core (which contrib modules can then implement as needed) would be a huge win. Catch's 8% number only hints at the difference this will make with an optimized caching system.

So, now taxonomy_term_delete() is required to know that node module maintains a load cache which taxonomy module is required to clear under certain circumstances.

Seems like the thing a well-structured field-API would handle? The glue between a field and object should be easy to use (automagic) for clearing a necessary cache when a delete operation happens.

CorniI’s picture

subscribe

yched’s picture

re joshk #203:

Field-level caching presents just as much complexity, but with less performance gain because of increased PHP overhead. Why? It's not uncommon for node types have a lot of fields. Is it faster to get and assemble 15 caches or 1?

You got field cache wrong. Field cache caches all fields for a given object in one cache entry.

"So, now taxonomy_term_delete() is required to know that node module maintains a load cache which taxonomy module is required to clear under certain circumstances".
Seems like the thing a well-structured field-API would handle? The glue between a field and object should be easy to use (automagic) for clearing a necessary cache when a delete operation happens.

A taxonomy *field* would know that *field API* maintains a cache and clear it when needed. I precisely think Field API is fairly well structured on that regard : fields do not know about nodes or users specifically, or care about their specific internals.

catch’s picture

I think it's alright that those field types are aware of the field cache and manually clean it when needed. Where the node cache is problematic, though, is that a X_reference field type then also needs to be aware of node cache, or any other entity-level cache, and act on it. This breaks a pattern. As a field type, you shoudn't even know about entity types.

In this case though, as mentioned already, X_reference field types are going to require clearing the entire field cache every time the thing they reference is saved - or have to rely on formatters to load data with direct database queries and validate it on every request as happens currently - in either case you're spending a lot of time loading things manually or rebuilding the cache over and over - and each of these already has a solution here - dealing with those cases in field API is going to introduce exactly the same level of complexity.

For example, if the taxonomy_field formatter had to go and fetch term names for every term on a page with 10 nodes and 5 distinct terms each (not that many) - in the same way that nodereference currently does - then you're looking at 50 additional queries on that page. Add a nodereference field to a site and you're clearing the field cache every time a node is updated or deleted. Even if the field cache moves to per fieldable object - you've still got find all the object types with X_reference fields attached and clear their caches.

One way to avoid the knowledge of referenced objects required in either case would be to do exactly what joshk says - cache all objects consistently.

That way, the node IDs in a node reference field are cached persistently with the object - but no data about the nodes themselves. Then we load all the referenced nodes in a single query (or from memcache) via cache_get_multiple() for that request - and the formatters then just do $node = node_load($nid) $node->title instead of db_result(db_query('SELECT title')); - which by that point is more or less for free. Even with ten nodes, each with five term, user and node references each - you're looking at 4-5 queries per page, or 1 query + memcache, instead of 150 with the way field formatters currently work.

yched’s picture

Here's a propoal for the 'reference field' pattern (noderef, taxo as field, filefield...) which tries to addresss:
- performance issues
- stale cache issues
- object-level caching on some object types with a consistent pattern for 'who knows the internals of who'
Taking 'taxonomy as field' as an example. Replace [taxonomy field, term, tid] with [noderef field, node, nid], [file field, file, fid], etc...

a) We don't clean up existing field values when a term is deleted. Too much potential cost.
(this is how existsing D6 noderef and filefield work)

b) Instead, during field_attach_load(), hook_taxonomy_field_load() massages field values coming from the raw storage:
remove references to tids that don't exist anymore, add in base properties from the term object (term name, etc...).
(this is how D6 filefield works, but not noderef)
Once #362024: Field Attach: Make hook_field_load() be 'multiple' like field_attach_load() lands, this is one additional query per field of type 'taxo field' even on multiple object load.
The results of this get cached in the field cache or in the object cache if any.

c) When a term is updated or deleted, we need to clean field and object cache entries for each object that has $tid as a value in one of the fields of type 'taxo field'
This could be done by a function provided by Field API (function name obviously debatable). Taxonomy doesn't need to know that node objects are cached globally while users are not. Field API, OTOH, knows that already.

function field_clean_cache_references($field_type, $value) {
  foreach ($field of type $field_type) {
    // get all the objects who have $value as a value for $field :
    field_attach_query($field, $value); // 1 query per field of the given type
  }
  // merge results into a list of [objects type] => [object ids]

  foreach ($object_type) {
    if ($object_type is cached at object level) {
      // Build list of object-cache cids - requires we know the cid pattern ([id] ?) and table name of the object cache (cache_[obj_type] ?)
      cache_clear_multiple($cids, $table); // 1 (sliced) query per object type with object-level cache
    }
    else {
      // Build list of field-cache cids
    }
  }
  cache_clear_multiple($cids, 'cache_field') // 1 (sliced) query
}

As outlined above, the only requirement *if we want to allow object-level cache* is that, in addition to whether an object type maintains its own object-level cache, Field API needs to know the name of the cache table (or assume 'cache_[obj_type]' ?) and the structure of the cids (or assume [id] ?).

[edit: also, note that you don't need to run field_attach_extract_ids() on the results of field_attach_query(), which has a 'return raw ids' mode]

Thoughts ?

catch’s picture

Looks great to me, and similar to what I was thinking in #439186: Cacheable object API except put much better.

yched’s picture

Note that the proposal in #207 critically requires we make some progress on designing #415044: Indexes for field storage.

catch’s picture

Status: Needs work » Postponed

Multiple cache getting is now over at #333171: Optimize the field cache. And a spin off of that #369011: Performance: do check_markup() in text_field_load().

frando just posted #460320: Standarized, pluggable entity loading (nodes, users, taxonomy, files, comments) which would give us a clean way to do this for nodes (and users, and taxonomy terms, and etc.) from contrib. Marking this postponed on that issue.

catch’s picture

Version: 7.x-dev » 8.x-dev
catch’s picture

greenc’s picture

subscribe

Xano’s picture

Status: Postponed » Closed (won't fix)

We handle this at the entity level now.