With #324313: Load multiple nodes and terms at once on the horizon, I was thinking about revisiting #111127: Cache node_load (since the multiple load has no effect, or a very small cost, for single loads). However, doing N cache_gets when otherwise you could load the nodes in a single query per hook seems a bit much.

So, here's an initial patch which just copies cache_get to a new function which accepts and returns an array instead of a single cid and moves some code around. Moves what would otherwise be entirely duplicated code to a shared private function.

Completely untested. I'm going to bask while t.d.o does it for me ;)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

moshe weitzman’s picture

nice.

If you are going to loop over each result anyway, then don't do fetchAllAssoc('cid'). Just foreach the resultset.

catch’s picture

FileSize
2.49 KB

I wanted the results indexed by cid though, although a typo in the foreach didn't make it clear why that was useful. I'm not sure how expensive fetchAllAssoc is though - if it's doing a lot of work, then we should probably key the results in that same foreach loop.

New patch, changes the return to return array('cached' => $cache, 'misses' => $cids);

So in the node multiple load implementation it'd look something like this:

$result = cache_get_multiple($nids, 'cache_node');
$nodes += $result['cached'];
$nids = $result['misses'];
// grab what's left of $nids with db_select()

This would save doing array_diff_key(array_flip($cids), $cache); whenever you care about the stuff that wasn't fetched from cache, which is presumably all the time.

catch’s picture

FileSize
2.93 KB

Removed the fetchAllAssoc() and created a new array rather than unsetting stuff from the result set. Also moved garbage collection into a shared function to remove the remaining code duplication.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
2.93 KB

re-roll for the test bot.

yched’s picture

+  return array('cached' => $cached,  'misses' => $cids);

looks like you mean:

+  return array('cached' => $cache,  'misses' => $cids);

Rerolled for that. No credits plz :-)

lilou’s picture

Coding issue : 2 spaces

+  return array('cached' => $cache,  'misses' => $cids);
catch’s picture

Status: Needs review » Postponed

I've rolled this into #111127: Cache node_load - but don't want to close it out as duplicate yet since there's potentially other use cases (although this patch is wrong, fixes are in the patch for the other issue). So marking postponed pending what happens with node_load().

drewish’s picture

subscribing

David Strauss’s picture

subscribing

catch’s picture

FileSize
2.81 KB

Here's current work from the cache node_load() patch broken out, but this would all happen a lot quicker if more people could review that other patch instead of subscribing to this postponed one, I'm hoping we can mark as a duplicate when node load goes in.

catch’s picture

Title: Load multiple items from cache » Multiple cache getting and clearing (w/ field API conversion)
Status: Postponed » Needs review
FileSize
9.46 KB

Postponed for almost six months months is long enough.

Here's a refreshed version of the patch from node_load() which tidies up the way the return value from cache_get_multiple is handled and adds cache clearing with an array of cids. Both cache_get_multiple() and cache_clear_all($array_of_cids) have unit tests.

I also modified field_attach_load() to use cache_get_multiple(). Since we don't actually have any multiple loaded objects with fields attached to them in core yet, this'll take a bit of setup to benchmark.

catch’s picture

Status: Needs review » Needs work
FileSize
10.71 KB

Seems I hadn't actually saved field.attach.inc and that bit doesn't work yet. Have to go out, but here's a snapshot with that bit so far.

catch’s picture

Status: Needs work » Needs review
FileSize
10.73 KB

Patch works now.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
5.9 KB

Benchmarked HEAD with http://drupal.org/files/issues/body-as-field-372743-120.patch, 10 nodes on front page.
7.40 reqs/second

Same again with this patch applied:

7.58-7.62 reqs/second

Saves 9 queries and about 3%.

yched’s picture

Minor remarks about field_attach_load() :
- If an object is not $cacheable, we shouldn't need to add it to the list of queried $cids
- In the second foreach, "field:$obj_type:$id:$vid" could be factorized to a $cid variable ?

catch’s picture

FileSize
11.06 KB

Added a check before we add an item to the $cids array and consolidated the $cid variable in the foreach. Also made the tests pass.

yched’s picture

Dries doesn't seem to like the 'cache_clear_threshold' bit.
I'm not sure whether he doesn't like the slicing itself, or the new variable, though.

Other than that, this looks ready to me.

catch’s picture

I'm also not sure if it's the slicing or just the new variable which was the problem, I'm hoping just the new variable - since the slicing is awesome (I can say this because it was Justin's update, not mine, which introduced it).

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
11.35 KB

Re-rolled, removed the variable.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

I reviewed each line and found no issues. Time for this to finally land.

catch’s picture

For what it's worth, I did some additional benchmarks / devel logging with this over at http://drupal.org/node/369011#comment-1536282 to see how much we can squeeze out of it.

sun’s picture

Status: Reviewed & tested by the community » Needs work
  * @return The cache or FALSE on failure.
  */
 function cache_get($cid, $table = 'cache') {

While being there, it would be nice to fix the coding style of this @return.

  * @param $table
  *   The table $table to store the data in. Valid core values are
- *   'cache_filter', 'cache_menu', 'cache_page', or 'cache' for
- *   the default cache.
+ *   'cache_block', 'cache_filter', 'cache_menu', 'cache_page',
+ *   'cache_registry', or 'cache' for the default cache.
...
+ * @param $table
+ *   The table $table to store the data in. Valid core values are
+ *   'cache_block', 'cache_filter', 'cache_menu', 'cache_node', 'cache_page',
+ *   'cache_registry', or 'cache' for the default cache.

Hm. Obviously, those were never updated. Of course, because new cache tables are defined/added elsewhere. Instead of fixing them, I'd propose to remove the entire Valid core values are ... part here.

+ * @param $table
+ *  The table being requested.

Wrong indentation here.

+ * Utility function for preparing a cached item.

Why not simply "Prepare a cached item." ?

+ * Checks that items are permanent or have not expired and unserializes
+ * data as appropriate.

Hm. I needed to read that 3 times - maybe replacing "have not expired" with "did not expire," helps?

+  // If the data is permanent or we're not enforcing a minimum cache lifetime

Please avoid abbrevations in comments.

+      // Chunk deletes when passed a large array.

How about "Delete in chunks when large array is passed." ?
Does this value of 1,000 already account for well-known max_allowed_packet_size issues?

+  function setUp() {
+    $this->default_table = 'cache_page';
+    parent::setUp();
+  }
+  function testCacheMultiple() {
+    $item1 = $this->randomName(10);

Missing blank line between functions.

+    $this->assertTrue($this->checkCacheExists('test_cid_clear1', $this->default_value)
+                      && $this->checkCacheExists('test_cid_clear2', $this->default_value)
+                      && $this->checkCacheExists('test_cid_clear3', $this->default_value),
+                      t('Three cache entries were created.'));

Can we rewrite this to:

+    $exists = $this->checkCacheExists('test_cid_clear1', $this->default_value);
+    $exists = $exists && $this->checkCacheExists('test_cid_clear2', $this->default_value);
+    $exists = $exists && $this->checkCacheExists('test_cid_clear3', $this->default_value);
+    $this->assertTrue($exists, t('Three cache entries were created.'));
catch’s picture

Status: Needs work » Needs review
FileSize
11.22 KB

Thanks for the review, attached patch addresses the feedback, slightly improves the doc for $table (we're getting data, not storing it, in cache_get) and also removes an unnecessary line break before a return elsewhere. Didn't change the $exists check because we have an entire cache test using this style and I kept it consistent -not planning to make the patch twice as big changing the rest of the test.

max_allowed_packet defaults to 1MB on most systems I think - MySQL docs say it's 16MB now. 1,000 ought to be conservative even with long cache IDs.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review

test bot meltdown.

David Strauss’s picture

MySQL docs say 16MB limit for the client and 1MB limit for the server. This technically means max accepted packet size, not generated.

catch’s picture

gotcha, a straight delete from with 1000 cache ids isn't going to hit 1mb - much more likely to hit it with a straight cache_set on a large object with lots of text in it.

sun’s picture

Status: Needs review » Reviewed & tested by the community

cid == 255 chars max.
* 1,000
+ remaining query ~= 50 max.

:= 255,050 chars max. < 1 MB?

Sorry, I'm not good at such calcs.

catch’s picture

catch’s picture

FileSize
11.23 KB

Rerolled for getInfo() static method changes.

catch’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
11.38 KB

Changes in field_attach_load() made this no longer apply. Re-rolled but was getting a lot of exceptions / fails - however fields loaded via the CCK UI work fine here, so not sure why. Uploading to see how bad it really is.

Status: Needs review » Needs work

The last submitted patch failed testing.

yched’s picture

catch : you might want to wait until #456488: cache field data only for the current revision is in

catch’s picture

Status: Needs work » Needs review
FileSize
11.68 KB

And it is :)

Patch does exactly the same thing, but there's enough changes to make it work with the new field_attach_load() that it could do with another look over before it's back to rtbc.

moshe weitzman’s picture

Status: Needs review » Reviewed & tested by the community

Looks proper to me. Bot is happy. Adds nice tests. RTBC.

yched’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
11.28 KB

Reworked the field_attach_load() part to be more inline with the recent changes.

Back to CNR so that catch can review and the bot can retest.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Those changes look good. Test bot is happy.

catch’s picture

Assigned: Unassigned » catch

Bumping to add a summary - I wrote up loads of summaries on node load caching but there's no proper explanation of what the patch does here.

At the moment the field cache does a cache_get() for every object loaded on a page. With database caching this means a query per object - which makes it the exception in terms of how our fieldable objects are loaded.

The field and filter caches are currently the only places in HEAD where we have per-object queries in listings. When combined with #369011: Performance: do check_markup() in text_field_load() we should have exactly the same number of database queries on a page regardless of how many objects are loaded.

So the patch itself adds cache_get_multiple() - which takes an array of cids, and gives you an array of cache records back (and your $cids array reduced to the items you still need for convenience). With node or profile listings, this reduces the field cache lookups from 10 (or 30, or 300) to 1.

With the body as fields patch, this saves us 9 queries and gives us an extra 3% requests per second on the default front page with 1 node (see #16). If we ever make taxonomy terms fieldable, it'll save us hundreds of lookups to the field cache on node listings - because we load full term objects there which would call field_attach_load() - and issue a cache_get() for every term on the page.

Swappable cache systems should only have to do:

foreach ($cids as $cid) {
  $cache[$cid] = cache_get($cid);
}

And not a lot else if there's no benefit to re-implementing the multiple getting. There might be situations where a pluggable system supports something like this and could use it though - to reduce impact of network latency for example.

The patch also adds the facility to clear cache ids using an array of ids in addition to the current cid/wildcard/everything options - this should allow for more intelligent cache clearing when that kind of information is available. This was something necessary for the node_load() caching, but might be considered kitten killing as part of this patch. I think it'll be useful in core when combined with things like nodereference #392494: Fields API needs to allow searching of scalars though so leaving it in until I'm told otherwise. That bit also comes with tests.

yched’s picture

multiple cache clear is definitely needed for stuff like 'clear (node|file|...)-reference field cache entries when a referenced object is deleted', so I think it belongs in there.

catch’s picture

Now body as field is back to CNR, it's possible to benchmark this usefully.

Front page, 30 nodes. Devel output and ab -c1 -n500

HEAD:
Executed 101 queries in 72.98 milliseconds
3.42 [#/sec]

Body as field:
Executed 73 queries in 71.71ms
2.86 [#/sec]

Body as field + latest patch on this issue:
Executed 43 queries in 58.79 milliseconds.
3.03 [#req/sec]

catch’s picture

Issue tags: +Performance

tagging.

catch’s picture

With #413192: Make taxonomy terms fieldable the gains are even more obvious - this is with a completely empty field cache. Once we have thousands of records in there the individual queries will take a bit longer. Combined with body as field, half the front page queries will be from the field cache.

Devel, 30 nodes on front page, each with 5 terms attached, works out as 50-ish unique terms on the page:

HEAD + fieldable terms.
Executed 115 queries in 55.51 milliseconds.

Patch:
_get_multiple() + fieldable terms
Executed 72 queries in 50.18 milliseconds.

catch’s picture

Priority: Normal » Critical

fieldable terms and body as field are in HEAD, so new benchmarks. Can't upgrade a D6 site to get large amounts of content because that's broken at the moment, so one node with 20-odd freetagging terms created manually. Since we're now getting dozens of database hits from cache_get() on node listings with lots of unique taxonomy terms, bumping to critical.

One node, 20 odd terms.
HEAD:
Executed 62 queries in 44.97 milliseconds.
8.31 [#/sec]

Patch:
Executed 45 queries in 32.31 milliseconds.
8.60 [#/sec]

chx also pointed me to http://us3.php.net/manual/en/function.memcache-get.php which takes an array of cache keys - so it looks like this would help some alternative drivers rather than make things harder at least in some cases

catch’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
12 KB

Re-rolled after #482622: Make cache friendlier for alternative implementations.

Not sure about _garbageCollection() and _prepareItem() but it all works.

chx’s picture

Instead of function _garbageCollection you can just use protected function garbageCollection. Although a private function would work too but there is absolutely no reason to hide the function from another implementation that wants to build on the default.

catch’s picture

FileSize
12.01 KB

Thanks chx. Re-rolled with those changes.

catch’s picture

Title: Multiple cache getting and clearing (w/ field API conversion) » Optimize the field cache
FileSize
16.54 KB

Spoke to chx and bjaspan in irc (and had discussed briefly with yched before) - here's a slight extension of the patch.

We now create a cache table for each fieldable entity - i.e. cache_field_node, cache_field_user etc. This means table size (and hence overhead, key sizes etc.) will be a lot smaller, you'd be able to set up seperate cache bins in memcache per entity - good for when you have millions of records , and also backends like memcache which don't handle wildcard cache clearing very well. It also means we can use object id as the cache key, which should also make lookups a tiny bit faster than using composite strings like field:$obj_type:$id

Couple of very small kittens - hook_modules_installed() isn't called in install.inc, and hook_modules_uninstalled() is currently called after a module and all it's tables have been removed - not very useful if you need to check things like hook_fieldable_info() for the module being uninstalled.

yched’s picture

The patch doesn't take care of the cache clears with respect to the new 'bins'.

I'm OK with the idea, but do we really need to do this here ? The patch lingered as RTBC for weeks now, I'm not sure this will help it get committed...

catch’s picture

FileSize
16.54 KB

yched - if we do two separate patches, then they'll break each other, I'm fed up re-rolling this one every couple of weeks and don't want to compound that by maintaining two conflicting patches in tandem.

Updated field_cache_clear(), thought I'd already done that, but apparently not...

yched’s picture

Status: Needs review » Needs work

Same patch as in #52 ;-)
I was not only thinking of field_cache_clear(), but also the various cache_clear_all() calls - mainly in field_attach_*() functions, I think.

One patch / several patches: OK, your call :-)

catch’s picture

Status: Needs work » Needs review
FileSize
17.82 KB

Doh. Right patch this time.

Status: Needs review » Needs work

The last submitted patch failed testing.

chx’s picture

Could $cids = array_keys(array_diff_key(array_flip($cids), $cache)); this be changed to $cids = array_diff($cids, array_keys($cache))? I am just wondering.

catch’s picture

Status: Needs work » Needs review
FileSize
12 KB

Modules loaded in install is sufficiently broken that the table-per-entity part of this is going to end up postponed on four or five other patches.

So splitting that out to #496344: Create a cache table per entity.

This patch just makes the field_attach_load() changes and adds cache_get_multiple() and yes chx, that array_diff works fine. Note to self, listen to yched more.

yched’s picture

catch: heh, no pb, I'm not feeling frustrated ;-)

The Field bits are OK for me, but I'm not enough familiar with the new cache backends to RTBC myself.
chx ?

catch’s picture

FileSize
12 KB

nudging test bot.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
sun’s picture

Status: Needs review » Reviewed & tested by the community
webchick’s picture

Status: Reviewed & tested by the community » Fixed

I spent about a half hour reading through this issue and the associated code. Looks like a good change that's compatible with similar API improvements we've made in this release, and the performance benefits are demonstrated well. Committed to HEAD!

Status: Fixed » Closed (fixed)
Issue tags: -Performance

Automatically closed -- issue fixed for 2 weeks with no activity.