memcached supports data values up to 1M but not larger. Views, for example, regularly tries to cache values larger than 1M and this breaks. Size detection should be implemented so that we never try to write something to memcached is too large.

CommentFileSizeAuthor
#128 435694-127-d8-memcache-large-items.patch11.04 KBjaperry
#9 memcache_watchdog_errormsg-435694-9.patch644 bytesachton
#10 memcache_watchdog_errormsg-435694-10.patch632 bytesglennpratt
#13 memcache_watchdog_errormsg-435694-13.patch722 bytesglennpratt
#14 memcache_watchdog_errormsg-435694-14.patch814 bytespdrake
#18 memcache_watchdog_errormsg-435694-18.patch795 bytespdrake
#25 memcache-dmemcache-set-warning-435694-25.patch2.99 KBmarkpavlitski
#29 memcache-large-write-support-435694-29.patch6.63 KBmarkpavlitski
#34 memcache-large-write-support-435694-34.patch8.74 KBmarkpavlitski
#39 memcache-large-write-support-435694-39.patch8.54 KBwwhurley
#40 memcache-large-write-support-435694-40.patch8.21 KBmarkpavlitski
#42 memcache-large-write-support-435694-42.patch8.2 KBarantxag
#44 memcache-large-write-support-435694-42.patch8.2 KBElijah Lynn
#44 interdiff-435694-42-43.txt4.58 KBElijah Lynn
#44 memcache-large-write-support-435694-43.patch10.07 KBElijah Lynn
#68 435694.patch14.72 KBJeremy
#70 interdiff-435694-44-68.patch12.92 KBJeremy
#74 interdiff-435694-68-74.patch827 bytesJeremy
#74 435694-2.patch14.8 KBJeremy
#74 test-script.php_.txt567 bytesJeremy
#78 interdiff-435694-74-78.patch4.59 KBJeremy
#78 435694-3.patch15.24 KBJeremy
#81 interdiff-435694-79-81.patch5.24 KBJeremy
#81 435694-4.patch16.26 KBJeremy
#83 interdiff-81-83.patch5.3 KBJeremy
#83 435694-5.patch18.83 KBJeremy
#84 interdiff-81-84.patch7.24 KBJeremy
#86 interdiff-84-86.patch3.9 KBJeremy
#86 435694-6.patch19.19 KBJeremy
#90 diffstat-86-89.patch5.44 KBJeremy
#90 435694-7.patch19.28 KBJeremy
#93 interdiff-90-92.patch1 KBJeremy
#93 435694-8.patch19.53 KBJeremy
#104 435694-104-d8-memcache-large-items.patch5.32 KBrbayliss
#105 435694-105-d8-memcache-large-items.patch6.04 KBrbayliss
#105 interdiff-104-105.txt1.43 KBrbayliss
#106 interdiff-105-106.txt2.62 KBrbayliss
#106 435694-106-d8-memcache-large-items_0.patch6.92 KBrbayliss
#111 435694-111-d8-memcache-large-items.patch5.49 KBmarkus_petrux
#114 435694-114-d8-memcache-large-items.patch5.32 KBmarkus_petrux
#115 435694-115-d8-memcache-large-items.patch9.02 KBsolution33r
#118 435694-118-d8-memcache-large-items.patch14.5 KBrhristov
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Category: bug » task

I'm not sure it's possible to measure the size of a variable accurately - or at least I've not seen code that does that.

IMO this is a bug in Views (and CCK, theme registry, variables, Field API and all the other places that have massive individual cache items), but it'd be good to document it at a minimum.

catch’s picture

http://groups.drupal.org/node/116764

http://groups.google.com/group/google-appengine-python/browse_thread/thr...

I'd like to see us catch when objects are too large (memcache should throw some kind of error), and log this. If we throw enough errors around, people should then run memache with -I, or file bug reports against the modules with too large items.

achton’s picture

Subscribing.

mikeytown2’s picture

Any way for memcache catch the error and drop it in watchdog? Have a special call just for this occasion, kinda like this one I made #1011000: Catch & report errors thrown by $memcache->connect.

achton’s picture

I'd love to see this happen, as I think >1MB cache sets can potentially be a nasty performance hit, memcache or not (it was for us).

Although I cannot replicate the use case scenario for #1011000: Catch & report errors thrown by $memcache->connect. and don't quite have that patch running as intended (no error message is returned in the Exception object), my initial feeling is that this will be difficult to do.
As far as I can tell, dmemcache_set() calls the set() class function, which maps to a C equivalent. I skimmed the c code in php_memcached.c and AFAICT the related function (php_memc_handle_error) either returns 0 or -1. So no specific error message will ever bubble up to PHP and Drupal for us to use in Watchdog.

Besides this, my tests show that the memcached log only logs a message about the item size being to large, if you start the daemon with -vv. It then writes something like Writing an error: Too large..

catch’s picture

Right, memcache fails silently at least as far as PHP is concerned for this.

I have seen posts around the internet where people run strlen on the serialized value, and thrown an error if it's over a certain length. Could possibly live with this as a debug option but not as standard behaviour, not sure how that would play with the fact that we try to compress the items too though.

I would rather see people's effort going into fixing some of the root causes of this - there are pending patches at http://drupal.org/project/issues/search/drupal?issue_tags=memory - many of those have not had much review.

podarok’s picture

subscribe

memcache do not store large items even if memcached daemon started with -I 32M (store items with <32MB)
so it is a code behaviour

podarok’s picture

Title: >1M data writes incompatible with memcached » >1M data writes incompatible with memcached (memcached -I 32M -m32)
Version: 6.x-1.x-dev » 7.x-1.x-dev

the same on 7.x

achton’s picture

Version: 7.x-1.x-dev » 6.x-1.x-dev
Status: Active » Needs work
FileSize
644 bytes

In accordance with #6, I've started issuing watchdog warnings on my installation to help me identify when cache sets fail. See attached patch for 6.x.

There is still the mentioned issue about cache item size when using compression, which I have not adressed.

This procedure may be the best we can get, since memcached does not let us know specifically when a cache_set fails due to item size, just that it fails.

glennpratt’s picture

Thanks for the patch achton. Works for me:

http://www.localhost:8080|1312474790|memcache|127.0.0.1|http://www.localhost:8080/devel/php|http://www.localhost:8080/devel/php|1||Cache set attempt failed. Key: cache_size_test. Cache bin: cache. Item size (uncompressed): 3385 KB.

Re-roll without trailing whitespace attached. Also, bumped status to error, seems like an error when a cache set fails, though I guess that could be debated.

glennpratt’s picture

Just as an FYI, this patch has been very informative. We were sort of assuming cache_set was working most of the time, but our logs are pilling up with fails, some of which don't appear to be caused by the object size.

glennpratt’s picture

Note to self, update patch to report error code when possible:

http://www.php.net/manual/en/memcached.getresultcode.php

glennpratt’s picture

Re-rolled patch, changed watchdog call to look more like other calls in core, removed em tags, etc.

Primary change added the error message if we have Memcached.

pdrake’s picture

Depending on where we are in the bootstrap, "t()" may not yet be defined. I re-rolled the watchdog error message patch to check whether "t()" exists before using it.

catch’s picture

Let's drop the t() usage altogether here and just log in English. I don't see this ever getting translated. Alternatively could use get_t().

pdrake’s picture

Unfortunately, this apparently can occur at times when module.inc hasn't been included yet, so it errors when watchdog tries to run module_implements. Two options - not log if module_implements function does not exist, or require_once module.inc so that we have the required functions. I'm leaning toward the latter - do you have a preference catch?

mikeytown2’s picture

You can use a shutdown function for watchdog. This patch was committed to memcache #1011000-9: Catch & report errors thrown by $memcache->connect.

pdrake’s picture

mikeytown2, thanks for the suggestion. updated patch attached.

bibo’s picture

Subscribing.

If we could log when the 1M limit is restricting caching, it just might solve (or help pinpoint) huge performance problems that I'm seeing on a complex site (or several).

It seems in our case the problems are caused by a few key cache component that never make it to memcache (and as you know, the database cache is not used either), it's slowing the site more than I like to admit. And has been for a long while.

Several serialized cache blobs are over 1M, probably even when gzipped. For example:

cache
***********
theme_registry:main_theme_name [BLOB - 8.1 MiB] 0 1316556483 1
theme_registry:admin_theme_name [BLOB - 7.8 MiB] 0 1316553845 1

cache_menu
***********
links:admin_menu:tree-data:abbcf0407c9c6cc16599da6... [BLOB - 1.5 MiB] 0 1316534351 1

cache_content
***************
content_type_info:fi [BLOB - 3.4 MiB] 0 1316538815 1

fgm’s picture

The patch adds these warnings within dmemcache_set and lets it return the error status, but the problem is exacerbated by the fact that set() discards that status by invoking dmemcache_set() without assigning its result anywhere. The commit mentioned in #1011000: Catch & report errors thrown by $memcache->connect. appears not to touch this part of the code.

drewish’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev

Marked #1706172: Object to big to be cached, memcache does not inform properly as a duplicate.

This is actually still a problem in the 7.x branch. I found my site was regenerating view's field list on every page load because the cache value was greater than 1MB and it was hitting the floor with no notice. It seems like back in the day there was a watchdog message but it was firing too many of them: #338522: watchdog() in dmemcache_set is dangerous!. We should do something like log one watchdog message at the end of the request with all the failed keys.

mgifford’s picture

Is this what the error would like?

Notice: MemcachePool::set() [memcachepool.set]: Server memcache1 (tcp 1213, udp 0) failed with: SERVER_ERROR object too large for cache (3) in dmemcache_set() (line 44 of dmemcache.inc).

helmo’s picture

@mgifford: Yes: that's what it looks like for me.

Adding -I 5M solved it for me.

I already got this message via the watchdog so I'm not sure the patch is still needed.

Phizes’s picture

Sorry to pollute the issue queue, but using PECL Memcached with igbinary to serialize data results in the stored data being up to 18% smaller than using PHP's serialize. I'm not sure how this plays with compression, but it may help someone if they are close to the 1mb mark. I can't find my benchmarks on this and it was a while ago that I checked it, so there may be a problem with my recall about it. You would need to benchmark for your specific use case.

(In as I recall there was also a small improvement in performance with using igbinary, though this was on PHP 5.3, I have not compared them on PHP 5.4)

markpavlitski’s picture

Status: Needs work » Needs review
FileSize
2.99 KB

PECL::Memcache (3.0.6) produces a notice when dmemcache_set() fails.

The attached patch throws a warning when dmemcache_set() fails using PECL::Memcached.

Result codes were lifted from here:
http://stackoverflow.com/questions/9042883/php-memcached-extension-resul...
http://bazaar.launchpad.net/~tangent-trunk/libmemcached/1.2/view/head:/l...

johan.gant’s picture

Regardless of how much a module/developer tries to throw into a single cache item I do think that if memcache isn't able to successfully store that value it should complain. I hit this issue on the 6.x version and found it was silently failing.

Would be lovely if this could be addressed.

markpavlitski’s picture

@johan.gant - Hopefully the patch in comment #25 will address this (for D7 at least). Once a solution is approved we can look at backporting it to D6.

sd46’s picture

I've had a look at the patch in #25, the main issue I see is that logging messages was originally included in the module for the Drupal 5 version, but was subsequently removed for spamming the database with hundreds of messages see: watchdog() in dmemcache_set is dangerous!. Would it be sensible to be able to turn logging of these messages on or off, or some other method for restricting it from sending repeated messages?

markpavlitski’s picture

Category: task » bug
FileSize
6.63 KB

@FlakMonkey46 You're right, the patch in #25 introduces a regression and results in a large number of watchdog calls if the memcached instance is unavailable.

I'd consider this a bug since D7 core caching doesn't have this issue and memcache is supposed to be a drop-in replacement.

How about the attached patch which adds chunking support for large objects if the initial write fails?

arantxag’s picture

Subscribing.

Phizes’s picture

There is no need to post 'Subscribing." type comments any more, there is a follow button in the top right hand area of the issue summary. For more info, see: Stop subscribing, start following

arantxag’s picture

I'm stupid. I was looking for this button some minutes before write "subscribing"...
Thank you :)

marcingy’s picture

Status: Needs review » Needs work

Quick scan and initial thing that springs to mind is don't we also need to deal with cache clears for 'multi part' keys.

markpavlitski’s picture

Status: Needs work » Needs review
FileSize
8.74 KB

@marcingy Good catch. While #29 would have still functioned on a cache clear, there could be large cache entries left sitting around in memcache.

This patch cleans up these extra chunks of data when deleting a key or clearing the cache.

I've also added a test case to check the large values work with cache_get/cache_set.

SocialNicheGuru’s picture

patch no longer applies to the may 24, 2013 dev version

markpavlitski’s picture

@SocialNicheGuru The patch in #34 still appears to cleanly apply to the latest 7.x-1.x-dev branch. Please note it's a -p1 patch. Can you retry with the latest dev release?

SocialNicheGuru’s picture

Thanks for the tip. can someone share what the difference between 'git apply' and 'patch -p1'

anavarre’s picture

#37 https://drupal.org/patch/apply should answer your question.

wwhurley’s picture

Attached is a re-rolled patch to apply to the most current dev version.

markpavlitski’s picture

Status: Needs review » Needs work
FileSize
8.21 KB

The patch in #39 doesn't apply (looks like a -p7 patch?).

@wwhurley have a look at the patch creation guide https://drupal.org/node/707484 for how to create a -p1 patch.

markpavlitski’s picture

Status: Needs work » Needs review
arantxag’s picture

The patch #40 has errors. You don't change the variable name from $result to $rc. Here, you can find the patch reviewed.

Jeremy’s picture

Issue tags: +7.x-1.3 blocker
Elijah Lynn’s picture

Here is a patch on #42 that improves the docblocks to better meet our Comment Standards (https://www.drupal.org/coding-standards/docs).

Edit: Sorry for adding #42 again, cannot delete it.

Jeremy’s picture

Status: Needs review » Needs work

The main thing I'd like to see added here is logging, so we know when this is happening. Generally it's a bad sign that we're caching items >1M, and something that should be fixed in whatever module is doing it. So, ideally we'd log it and chunk it both (perhaps making both configurable, so you could do either/or/both).

Elijah Lynn’s picture

Does anyone have a View they could paste that generates >1MB objects? I would like to test this patch more fully and help add logging. Maybe I can figure out on my own but if you have one please let me know. I will post back if I come up with one.

mdupont’s picture

For information, I've seen objects > 1Mb on complex sites with a lot of fields and database tables. The field_info:instances cache and schema cache were therefore ignored by Memcache, impacting performance.

Elijah Lynn’s picture

Hiding an accident patch from #44.

bibo’s picture

I'm not sure if people are generally aware of it, but memcached developers introduced a new configuration option some time age for increasing maximum item size limit (MAXITEMSIZE). This was introduced in memcached 1.4.2.

More info: http://www.alphadevx.com/a/387-Changing-the-maximum-item-size-allowed-by...

Having memcached-module inform about problems with data values over 1Mb would be a nice feature. Maybe it could also live as a manually triggered check in memcache_admin(?). Raising the limit by altering memcached configuration is luckily not too difficult anymore.

SocialNicheGuru’s picture

When installing memcache I get the following:

"t/item_size_max.t .... 1/7 Item max size cannot be less than 1024 bytes.
t/item_size_max.t .... 2/7 Cannot set item size limit higher than 128 mb.
t/item_size_max.t .... 3/7 WARNING: Setting item max size above 1MB is not recommended!
Raising this limit increases the minimum memory requirements
and will decrease your memory efficiency.
WARNING: Setting item max size above 1MB is not recommended!
Raising this limit increases the minimum memory requirements
and will decrease your memory efficiency."

SocialNicheGuru’s picture

The solution in #49 did not work for me. the variable was not defined. I am on ubuntu 14.04 php 5.5.9

When I disable memcache, caching is done by drupal.
I take a look at my database and I see these as the major cache culprits:
cache 10 m
cache_advagg_info 1.6m
cache_bootstrap 2m
cache_field 14 m
cache_form 1.5 m
cache_rules 1.5m
cache_views 11.6 m

doing a memcache -help gives
'memcache -I (it's an uppercase 'i' not "L"): -I Override the size of each slab page. Adjusts max item size (default: 1mb, min: 1k, max: 128m)

I need to use the -i option for each in my config file

mikeytown2’s picture

@SocialNicheGuru
The thing to look for is a row in the database that is over 1MB in size not the total table size. So for example the cache_advagg_info table has a lot of rows but each row should only be less than 1kb in size.

achton’s picture

@Elijah #46 (and others): To generate >1MB size objects easily, install Panels and insert a view (or other) which generates a large bunch of HTML on a Panels page. Then enable page caching for that Panels page, making it cache the generated HTML directly in the default cache bin. Should make it easy to test.

Side note: This will also expose issues with your MySQL server, if it has the default max_allowed_packet size of 1MB (when caching to database).

torgosPizza’s picture

I have this issue in D7 on our homepage and catalog pages, which are Search API-Solr-Views and Panels/Panelizer based. (Using a view mode for our product entities that's been Panelized, and the homepage is a Page Manager page built in Panels). I would consistently get WSOD errors when attempting to load either of these pages after the panels cache had expired.

After using this patch, this does not seem to be the case - the pages load without any WSOD and relatively quickly (especially considering they would often not load at all). This is of course only observational data but the patch seems promising for those of us who don't want to sacrifice performance for an increase in memory allocation.

Thanks!

killes@www.drop.org’s picture

jeremy, I wonder why you want to have logging here. There is no way to really "fix" the issue.

As an alternative to splitting up the cached item, one could let the memcache extension handle this.

http://php.net/manual/en/memcache.setcompressthreshold.php

memcache_d_ does not seem to have a similar setting, however.

I've seen this issue not only for views but also for cache-schema and cache_bootstrap-hook_info, so a fix would need to be in core and I am not sure what it should look like and I don't think it should be dealt with in core since the size limit is a memcache issue.

Jeremy’s picture

Logging will be configurable, so it will be possible to disable it. The reason for the logging is when cache items grow this big it's almost certainly a bug in and of itself (throwing everything and the kitchen sync into the cache all in one big lump is generally not an efficient choice): that said, the reality is there are many instances of cache items getting this big so we need to properly support it. While compression is also an option that can help, I fully intend to add support for auto-splitting the cache item in the next release.

joelpittet’s picture

Here's a SQL command to find those big cache tables.


SELECT cid, LENGTH(`data`)/1024 as `size` FROM cache WHERE LENGTH(`data`)/1024 > '300' ORDER BY `size` DESC;

And some values:


cache
  features_module_info  962.7969
  schema  630.5576

cache_bootstrap 
  variables 398.6250

cache_field  
  field_info:instances 391.4385
cache_menu  
  links:management:tree-data:en-US:ec99d3452fef1ede622e66c68ba908b1dad455aa71f5e68648aeec6488b89c88 3303.6953

cache_rules
  data:en 367.9102
  data:en-US  367.7236

cache_update
  update_project_data 368.8066

cache_views
  views_data:en 1823.4355
  views_data:en-US  1823.4355

I'm going to try the patch in #44 @Jeremy is that the one you are looking at adding with the configuration?

joelpittet’s picture

I did get this nice little error after applying the patch in #44:

PHP Notice: unserialize(): Error at offset 1048047 of 1329813 bytes in memcache/dmemcache.inc on line 292.

adammalone’s picture

Following on from the comment in #57, here's a small drush script that has been used before to find those same large cache objects in the database: https://gist.github.com/typhonius/a3c52dab92bf07b7ce34

Fabianx’s picture

Priority: Normal » Critical

I learned at DrupalCon this took a BIG Drupal site down during their launch.

With Drupal 7 sites having grown to >300 modules and such, we cannot expect this to work anymore out-of-the-box and as increasing memcache is not an option, this is critical.

Also because the cache_get() silently fails for critical cache entries (cause the cache_set failed), that is a big concern, too.

mikeytown2’s picture

I do agree with the Critical here; this issue can be deadly. We've stopped using memcache because of this issue actually.

Jeremy’s picture

Assigned: Unassigned » Jeremy
Priority: Critical » Major
Issue tags: -7.x-1.3 blocker +7.x-1.4 blocker

Does not affect everyone, dropping to major but tagging as a 7.x-1.4 blocker (as it didn't actually get into 7.x-1.3 which was released tonight) which I fully plan to respect. Assigning to myself as this is one of my top priorities with this module.

Jeremy’s picture

(removing duplicate)

joelpittet’s picture

While it didn't all out solve the problem I had, it at least made it much less explosive and was helpful in tracking down those large cache items which I moved to redis. And a module_implements bug in core was uncovered as well because of this patch in it's current state.

Thanks fore adding it as a 1.4 blocker @Jeremy.

mikeytown2’s picture

@joelpittet

a module_implements bug in core was uncovered as well

Link to issue?

joelpittet’s picture

@mikeytown2 whoops sorry, #496170-34: module_implements() cache can be polluted by module_invoke_all() being called (in)directly prior to full bootstrap completion

So far so good on removing that patch from core on my system. I'm fairly sure the patch is not the cause of the problem but it does certainly contribute to exposing it.

mcdruid’s picture

https://www.drupal.org/project/memory_profiler now includes some code which can be used to identify objects which are too large for cache without having to move cache bins into the database:

http://cgit.drupalcode.org/memory_profiler/tree/README_memcache_profiler...

It's mostly harmless to run on a prod site for a short while to identify the problematic objects, but almost certainly better to use it with syslog rather than dblog.

Jeremy’s picture

Status: Needs work » Needs review
FileSize
14.72 KB

I've done some initial testing of the latest patch, with the intention of committing this functionality soon. I've confirmed it works as designed, but still need to do some additional profiling before I'll commit it. I've made a few modifications, including:

  • Renamed "chunk" to "piece" to avoid confusion with memcached's own use of the term "chunk"
  • Added and documented a memcache_data_max_length tunable to support server configurations with larger or smaller pages (defaults to 1 MiB)
  • By default, log when we split objects into pieces
  • Added and documented a memcache_log_data_pieces tunable to only log to watchdog when an object is split into this many or more pieces (defaults to 2)
  • Minimal cleanup for consistency

Feedback and additional testing welcome.

joelpittet’s picture

I was having some issues with the previous patch still, though trivial compared to without it. Sometimes the pieces/chunks wouldn't match sizes on these larger items. Though maybe someone else who is running this patch could keep an eye on their logs?

@Jeremy any chance you could toss up the interdiff as well, maybe the size issue was resolved in your cleanup?

Jeremy’s picture

FileSize
12.92 KB

@joelpittet interdiff attached, sorry for the oversight. My cleanup would not have fixed your problem, it didn't change the functionality in question.

Can you more fully describe the problem you've had? Do you get any error messages? Or what sorts of problems do you see?

I will tweak my test script to make smaller steps in object size to see if perhaps there's a small window where we are still generating pieces that are too large. Alternatively, we allow an estimated 512 bytes for metadata, perhaps we need to be more precise or add a larger margin.

joelpittet’s picture

@Jeremy it's a bit tricky to remember as I switched quickly to store those larger bins in Redis to mitigate any further issues.

All I remember is that the sizes or number of chunks didn't match up and it was throwing a notice or error message in the logs. Didn't notice any errors in the page.

Thanks for the interdiff, it would be nice to have those semi-automatic #2233521: Auto-generate interdiffs :)

This patch looks like quite the improvement. Hopefully someone can drop it in and give it a test out.

torgosPizza’s picture

I'll be testing this on production today and over the weekend. Thanks for the updated patch!

catch’s picture

This looks encouraging to me with the logging. Was concerned about us silently working before, but we silently fail now so working + logging sounds great.

Didn't review the patch in lots of depth yet but also didn't spot anything obvious.

Jeremy’s picture

@joelpittet with additional testing I did find some edge cases where the earlier patch would fail. I tracked this down to the memcached server reporting "error: MemcachePool::set(): Server n.n.n.n (tcp 11211, udp 0) failed with: SERVER_ERROR out of memory storing object". I've updated the attached patch to handle this gracefully and now all my tests pass consistently.

For what it's worth, I'm using the attached quick test script to test various sized cache items -- I've been tuning the $minimum and $maximum values and the block size (bs=), then running it with "drush scr test-script.php". It was with a block size of 512 and a min/max of 4095 that I first duplicated this bug. If you're not familiar with dd, I recommend you don't experiment with this test script as it has potential to delete your hard drive if you misuse it.

Jeremy’s picture

I've run some load tests comparing the before/after of applying this patch. Overall performance isn't changed much, however there is a huge spike in "delete_misses" and some penalty from this. The misses are expected from the new code in dmemcache_delete() required to best-effort cleanup multi-piece cache items, but I'd like to consider methods for avoiding this additional network traffic if possible.

This is latest -dev without this issue's patch applied (the average of 3 load tests):

"delete_hits"	21,987.3
"delete_misses"	63.3

And this is with the patch applied:

"delete_hits"	21,635.0
"delete_misses"	21,698.3

The load test processes 21.1 pages per second in either case, so this may be acceptable overhead.

Fabianx’s picture

Status: Needs review » Needs work
  1. +++ b/README.txt
    @@ -294,18 +297,44 @@ unless you know what you're doing.
    +
    +$conf['memcache_log_data_pieces'] = 2;
    

    This is great!

  2. +++ b/dmemcache.inc
    @@ -46,9 +48,32 @@ function dmemcache_set($key, $value, $exp = 0, $bin = 'cache', $mc = NULL) {
    +      global $_memcache_errormsg;
    +      $_memcache_errormsg = '';
    

    I shuddered on using a global here first, but given that a static or a drupal_static or a static class member all are kinda a global, too and its a very limited use case, this is fine.

  3. +++ b/dmemcache.inc
    @@ -60,6 +85,101 @@ function dmemcache_set($key, $value, $exp = 0, $bin = 'cache', $mc = NULL) {
    +  $cache = new stdClass;
    ...
    +  $cache->data = new stdClass;
    +  $cache->data->serialized = $serialized;
    +  $cache->data->piece_count = $piece_count;
    +  $cache->multi_part_data = TRUE;
    

    This has the potential that all multi objects that are LARGE are fighting in expiration with small objects, which might be unimportant.

    What I mean is:

    They end up in the same slab as the small ones.

    Perhaps adding some random / configurable padding would solve that?

  4. +++ b/dmemcache.inc
    @@ -60,6 +85,101 @@ function dmemcache_set($key, $value, $exp = 0, $bin = 'cache', $mc = NULL) {
    +
    +    $result &= dmemcache_set($cache->cid, $cache, $exp, $bin, $mc);
    +  }
    

    Yeah, it is too bad we don't have set_multiple in the PECL memcache extension ...

  5. +++ b/dmemcache.inc
    @@ -60,6 +85,101 @@ function dmemcache_set($key, $value, $exp = 0, $bin = 'cache', $mc = NULL) {
    +    global $timers;
    +    unset($timers['memcache_split_data']);
    

    Is there no timer_stop or timer_reset function that could be used to do the same here?

  6. +++ b/dmemcache.inc
    @@ -147,6 +270,60 @@ function dmemcache_get($key, $bin = 'cache', $mc = NULL) {
    +function _dmemcache_get_piece_key($key, $id) {
    +  return '_multipart_' . (string)$id . ':' . $key;
    +}
    

    This is not protected against max-length, should run through dmemcache_key again or have the same logic.

  7. +++ b/dmemcache.inc
    @@ -234,6 +415,24 @@ function dmemcache_delete($key, $bin = 'cache', $mc = NULL) {
    +      do {
    +        // Generate the cid of the next data piece.
    +        $piece_key = _dmemcache_get_piece_key($key, $next_id);
    +        $full_key = dmemcache_key($piece_key, $bin);
    +        $next_id++;
    

    As discussed having delete item misses for every non multi item is not a good trade.

    We discussed (and Jeremy brought up these ideas):

    a) making this at least configurable and maybe turning off by default

    b) Storing all multi keys in a global variable, similar to wildcard cache clearing.

    In any case we should ensure that supporting multi keys, while helpful to not bring a site down, is not affecting the normal operation of non-multi keys.

Reviewed, looks already great! Some pointers to discuss / change, but overall almost RTBC :).

catch’s picture

+++ b/dmemcache.inc
@@ -60,6 +85,101 @@ function dmemcache_set($key, $value, $exp = 0, $bin = 'cache', $mc = NULL) {
+      register_shutdown_function('watchdog', 'memcache', 'Spent !time ms splitting !bytes object into !pieces pieces, cid = !key', array('!time' => timer_read('memcache_split_data'), '!bytes' => format_size(strlen($data)), '!pieces' => $piece_count, '!key' => dmemcache_key($key, $bin)), WATCHDOG_WARNING);

Instead of watchdog could we trigger_error(). Assuming sites have their error reporting set up correctly developers would see that directly and production just gets watchdog anyway (but as a PHP notice rather).

Jeremy’s picture

@Fabianx, thanks for the thorough review! Very helpful.

#2: I've moved the global into a static.
#3: It will be interesting to see if this is ever a problem. For now, I'm opposed to adding additional network traffic and cache data just to try and load balance our slabs.
#5: This is the only way I've found to re-use the same timer name without getting an accumulative time.
#6: Thanks, fixed.
#7: I'll work on this in the next patch, but I felt the above changes were sufficient for re-review, and best to see alone.

@catch, I think this applies to ALL (or at least most) of the 'watchdog' calls in dmemcache.inc, as they are all errors. As such, this is best handled in a separate followup issue that addresses them all for consistency.

Fabianx’s picture

+++ b/dmemcache.inc
@@ -59,10 +59,9 @@ function dmemcache_set($key, $value, $exp = 0, $bin = 'cache', $mc = NULL) {
+      drupal_static_reset('memcache_errormsg');

@@ -87,10 +87,21 @@ function dmemcache_set($key, $value, $exp = 0, $bin = 'cache', $mc = NULL) {
+  if (!isset($drupal_static_fast)) {
+    $drupal_static_fast = &drupal_static(__FUNCTION__, array('memcache_errormsg' => NULL));
+  }
+  $memcache_errormsg = &$drupal_static_fast['memcache_errormsg'];

Unfortunately this implementation is not correct.

I think it should be just a normal static.

But to fix it would need to use:

function name to drupal_static_reset.

Use the pattern exactly as used in e.g module_implements().

No need to return data if its stored in the static anyway.

Jeremy’s picture

Status: Needs work » Needs review

Good point, the error handler is hardly a high-performance function. Okay, simplified it by making it a normal static.

I've also implemented logic to avoid the extraneous DELETE's. I've run a loadtest to confirm they're gone, and profiled to confirm it properly fires when it's supposed to. During implementation I realized there's potential for a slow memory leak here, so I added some garbage collection. Typically this would live in hook_cron(), but we don't generally require people to enable the memcache.module so instead I fire this off as much as once every 24 hours when deleting another multi-piece cache item -- this will prevent the garbage collection from firing at all if there are no large items in the cache.

Jeremy’s picture

It's a little easier to review if I attach the patches.

joelpittet’s picture

@Jeremy because it's somewhat realated to your last patch and cus it's fun: https://t.co/rC8gbo8Jis gc performance improvements and lots of animated gifs:)

Jeremy’s picture

FileSize
5.3 KB
18.83 KB

During an IRC discussion @catch pointed out that my previous patch would lead to an infinite loop if the variable array got larger than 1MB. So, instead we maintain the piece_cache in memcache (accepting that it's possible to lose it, in which cache memcache will eventually auto-expire the orphaned pieces).

Jeremy’s picture

FileSize
7.24 KB

Whoops, accidentally uploaded an earlier interdiff. Attaching the proper one.

catch’s picture

Only found a couple of issues:

  1. +++ b/dmemcache.inc
    @@ -234,6 +419,49 @@ function dmemcache_delete($key, $bin = 'cache', $mc = NULL) {
    +        // Perform garbage collection no more than once per day for keys
    

    I'm not sure this needs to be restricted to once per day. It's happening only when multi-piece items are deleted, and then the additional cost is only retrieving the item from drupal_static() the foreach and the timestamp comparison (if we only logged when something is actually garbage collected) - unless something is actually garbage collected which should be infrequent. That seems very cheap and would save the variable_set(). Also I don't think it's a problem due to the 24 hours timer, but this is another case where it would be possible to trigger the variable_set() from a variable_set() - since variable_set() does a cache clear and the variable cache could be multi-part.

  2. +++ b/dmemcache.inc
    @@ -234,6 +419,49 @@ function dmemcache_delete($key, $bin = 'cache', $mc = NULL) {
    +          foreach ($piece_cache as $k => $e) {
    

    Minor but I'd rather see $e as $expires. When I see $e I always think of Exception.

Jeremy’s picture

FileSize
3.9 KB
19.19 KB

Thanks @catch! I've fixed both, and also confirmed that this doesn't break when the __dmemcache_piece_cache itself grows to >1MiB.

mikeytown2’s picture

Like where this is headed not using variable_set.

killes@www.drop.org’s picture

Tested this a bit and saw watchdog entries like:

Spent 751.16 ms splitting 10.42 MB object into 11 pieces, cid = stage-cache_views_data-file_view%3Apanel_pane_5%3Aresults%3A9fb2367a069c76985dc346ff2c36a5ed

for pages where I'd get the mempool warning before.

Fabianx’s picture

+++ b/dmemcache.inc
@@ -441,24 +442,21 @@ function dmemcache_delete($key, $bin = 'cache', $mc = NULL) {
+        timer_start('memcache_gc_piece_cache');
...
+        if ($gc_counter) {
+          register_shutdown_function('watchdog', 'memcache', 'Spent !time ms in garbage collection cleaning !count stale keys from the dmemcache piece_cache.', array('!time' => timer_read('memcache_gc_piece_cache'), '!count' => $gc_counter), WATCHDOG_WARNING);

Not a blocker, but this timer_start / timer_read pair is not protected against that the timer is not reset when calling timer_start().

---

Besides this little nit, this is RTBC from my side and killes tested :).

Jeremy’s picture

FileSize
5.44 KB
19.28 KB

Two changes:

  • If __dmemcache_piece_cache itself grows larger than the maximum size memcache supports, gracefully report an error and avoid recursive loop
  • Fix use of timers (no need to _reset if you don't _stop)

@fabianx I realized that if I don't call timer_stop and just timer_start again there's no accumulation of time, so I cleaned up the timer logic accordingly. I would appreciate a review on the recursion handling, especially if you have better ideas on how to solve this.

@killes thanks for testing! please be sure to update to this latest patch which protects against the edge case of __dmemcache_piece_cache growing too large (which would require a massive number of very large cache items, and is therefor rather unlikely).

killes@www.drop.org’s picture

new patch works just as well as the previous one for me.

  • Jeremy committed 2245e77 on 7.x-1.x
    Issue #435694 by Jeremy, catch, Fabianx, markpavlitski, Elijah Lynn,...
Jeremy’s picture

Version: 7.x-1.x-dev » 6.x-1.x-dev
Status: Needs review » Patch (to be ported)
FileSize
1 KB
19.53 KB

I ran another load test today and found that while recent changes solved the problems with excessive DELETE misses, on websites with no over-sized cache pieces it was adding a significant number of GET misses. Fortunately, this was easily solved by initializing the __dmemcache_piece_cache variable the first time we try and load it and find it empty. Load tests confirm that there is now no measurable overhead with this patch on websites that don't have over-sized cache pieces.

@killes, thanks for testing this on a real-world website!

@markpavlitski, thanks for the original patch to split large items into pieces!

The patch has landed:
http://cgit.drupalcode.org/memcache/commit/?id=2245e77

Next, it needs to be ported to the 6.x branch, then we'll look at merging it into the 8.x branch.

Fabianx’s picture

Checked the recursion handling: That is fine.

Good catch on the cache get misses, testing helps :).

Congratulations everyone!

jonloh’s picture

Anytime soon if this will be backport to Drupal 6 soon? ;)

  • Jeremy committed d9ac0f3 on 6.x-1.x
    Port #435694 to D6, split large objects
    
Jeremy’s picture

Version: 6.x-1.x-dev » 8.x-2.x-dev
Issue tags: -7.x-1.4 blocker

Backported to Drupal 6:

Needs to be ported to D8, if not already done.

ressa’s picture

Was this patch ever ported to D8?

Andre-B’s picture

@ressa it doesn't look like, at least not readme parts. but this shouldn't stop you from using larger cache bins (at least not by looking at the d8 code, still have to test this though..)

fuzzy76’s picture

The D6 backport in #96 is broken. The contstant REQUEST_TIME is not defined in the D6 branch. 6.x-1.11 gives me these:

Notice: Use of undefined constant REQUEST_TIME - assumed 'REQUEST_TIME' in dmemcache_piece_cache_set() (line 797 of /var/www/html/sites/all/modules/contrib/memcache/dmemcache.inc)
Notice: Use of undefined constant REQUEST_TIME - assumed 'REQUEST_TIME' in _dmemcache_set_pieces() (line 158 of /var/www/html/sites/all/modules/contrib/memcache/dmemcache.inc)
Notice: Use of undefined constant REQUEST_TIME - assumed 'REQUEST_TIME' in _dmemcache_set_pieces() (line 170 of /var/www/html/sites/all/modules/contrib/memcache/dmemcache.inc)

Unsure if this should affect the status of this issue or not, so I'm not changing it.

  • Jeremy committed 42bad33 on 6.x-1.x
    Issue #435694 by Jeremy, fuzzy76: >1M data writes incompatible with...
Jeremy’s picture

@fuzzy76 -- thanks for reporting this! 6.x fix committed.

solution33r’s picture

Has this been ported to D8?

rbayliss’s picture

Status: Patch (to be ported) » Needs review
FileSize
5.32 KB

Here's a first pass at a D8 implementation. A lot's changed since the D7 version, so the approach is slightly different here. This approach creates a single parent item that references N child entries.

Currently missing from this patch:
* Ability to configure the chunk size (it's currently a constant on the class).
* Logging of set operations for items that are too large.
* Detection of "item too large" errors - right now we're assuming that any item that failed to set is failing because it's too large.

rbayliss’s picture

We've been using the patch from #104 in production for a while now on a large D8 site. It works very well, except that we've seen some strange behavior occurring around deployments. The problems all have the same pattern: Missing views field handlers and/or missing metatags, occurring during peak traffic deployments, which can be resolved by clearing caches. Our biggest cache entries are the views_data and token_info caches

I'm hypothesizing that we've got a race condition occurring where two SET operations are happening on a single large cache item at once, and causing an inconsistency between the child entries (eg: the first child entry and the second child entry belonging to different versions of the same parent entry). I'm adjusting the patch to add a unique identifier to the child CIDs so each multipart write is more atomic. This shouldn't affect GET performance or behavior at all. Worst case scenario is that we end up with extra data in the cache, but that should be flushed out eventually anyway.

rbayliss’s picture

Code style cleanups.

Josh Waihi’s picture

$dataParts = $this->memcache->getMulti($childCIDs);

Do we need to do any validation that the items come back in the same order as requested in $childCIDs?

The 7.x version also checks for a response of MEMCACHED_E2BIG (37) on a failed attempt where as this patch just assumes its too big if an item failed to be inserted. This approach potentially creates 1 additional request to memcache if the failure wasn't due to the item being too big. This could be avoided if the array count from splitItems() was tested to be > 1.

rbayliss’s picture

Thanks for the review! That's a great question about the order of the items. It actually looks like both the Memcache and Memcached backends guarantee order - Memcached through the use of the Memcached::GET_PRESERVE_ORDER, Memcache through some additional logic in DrupalMemcache::getMulti(). From observing this on our site, I'm not seeing unserialize errors, so this seems to be working well.

I'm going to think a little on the way we should deal with failures - the method you've mentioned should work, but the cleanest way might be checking for MEMCACHED_E2BIG.

Josh Waihi’s picture

Status: Needs review » Reviewed & tested by the community

Ensuing the order is preserved is the big one which sounds like its a non-issue. At this stage, its probably a bigger issue that this is not in the 8.x module than it is to modify the failure behaviour further. IMO, we could address that in a follow up issue. RTBC'ing.

markus_petrux’s picture

Status: Reviewed & tested by the community » Needs work

Does not apply. Will post an updated patch for alpha7 asap.

markus_petrux’s picture

Status: Needs work » Needs review
FileSize
5.49 KB

Patch updated

Josh Waihi’s picture

@markus_pertux, where were testLargeItem() and testLargeItemFailsGracefullyWhenChunkDeleted() removed?

markus_petrux’s picture

@Josh: Since the location of the tests has been changed, I forgot to adapt that part of the patch. Sorry for that. I do not have now an environment to test this now.

markus_petrux’s picture

FileSize
5.32 KB

Patch updated. No tests included, do not really know how to. Sorry

solution33r’s picture

We have addressed the problems presented in comment 114 - thanks @markus_petrux for your work!

So, here was the main issue - after much testing, we determined that the chunk size of split items was too large to fit inside slab 42 (which should accept items up to 1MB). This is due to some per-item overhead. As we wrote in the comments:

+   * Memcached is about balance. With this area of functionality, we need to
+   * minimize the number of split items while also considering wasted memory.
+   * In Memcached, all slab "pages" contain 1MB of data, by default.  Therefore,
+   * when we split items, we want to do to in a manner that comes close to
+   * filling a slab page with as little remaining memory as possible, while
+   * taking item overhead into consideration.
+   *
+   * Our tests concluded that Memached slab 39 is a perfect slab to target.
+   * Slab 39 contains items roughly between 385-512KB in size.  We are targeting
+   * a chunk size of 493568 bytes (482kb) - which will give us enough storage
+   * for two split items, leaving as little overhead as possible.
+   *
+   * Note that the overhead not only includes metadata about each item, but
+   * also allows compression "backfiring" (under some circumstances, compression
+   * actually enlarges some data objects instead of shrinking them).   */

This patch worked for us locally, but we would appreciate the maintainers' input. This is a very important feature to add to the memcache module, as many sites are using a combination of views, multilingual, and search facets - all which have the potential to create rather large cache objects.

ndobromirov’s picture

I do not like the commented out code from the patch.

Either add the service in the class and log all the time DEBUG level logs or have a config that can be overwritten from settings.php that will enable / disable this logging. This is just not feasible to live in the code in that way and makes it VERY unpractical if you need to debug in production and will have to hack a contributed module and a redeploy to start debugging.

ndobromirov’s picture

Status: Needs review » Needs work
rhristov’s picture

Status: Needs work » Needs review
FileSize
14.5 KB

Applying a new patch created from #115 with added logging functionality suggested in #116.

vurt’s picture

I did some tests. The patch applied fine - but it does not seem to do anything. Or maybe I do not understand what it should do.

At the end of MemcacheBackend::set() splitItem should be called and a new should be MultipartItem created. But I did not find any conditional testing in the code if data is bigger than MAX_CHUNK_SIZE.

In my tests with big data (that should be splitted) the split was never called because the memcache->set() before was sucessful and so it left the function before reaching the split part.

The code seems to rely on the memcache server to return an error - which mine did not do...

When I build in a size test of the data length and ommit the normal call of memcache-set() when the data is bigger than MAX_CHUNK_SIZE it seems to work. At least it writes this in the watchdog:
debug memcache Split item views_data:de into 8 pieces

vurt’s picture

Just wanted to add that I did tests with a large Drupal site and there are not only big views cache entries:

debug  memcache  Split item route_provider.route_load:... into 2 pieces
debug  memcache  Split item tree-data:... into 3 pieces
debug  memcache  Split item tree-data:main:... into 2 pieces
debug  memcache  Split item token_info_sorted:en into 2 pieces
debug  memcache  Split item theme.active_theme.... into 2 pieces

I guess the memcache compression saves many sites from a bigger impact.

janusman’s picture

Here is a small testing script you can run from the shell: https://github.com/janusman/memcachetesting

It does test the splitting of large objects.

  • Run drush-tests.sh from within a Drupal installation to run tests there. (e.g. cd to your Drupal install and then type bash /path/to/the/drush-test.sh to run the script).
  • memcache-analyzer.sh will dump the memcache contents into a file and try to analyze the results.
vurt’s picture

Sorry for the delay...

I used your nice test script and all tests run fine when the patch is applied.

I even added a content check to the tests to be shure. I extended the return line in storeAndFetchTest():

return !empty($fetched) && ($fetched->data == $value);

I dont know what was wrong with my last setup - the patch works as intended!

ndobromirov’s picture

Status: Needs review » Reviewed & tested by the community

Based on the above comments moving up to RTBC.

viappidu’s picture

Shouldn't this one be commited? (Trying to clean up my composer....)

japerry’s picture

Status: Reviewed & tested by the community » Needs work

Marking needs work. From what I see here, if you use memcache only as your database backend, a new release of memcache would break your site since you're missing the service.

so there are three options I potentially see here:
1.) Make a 3.x version of memcache so customers can manually update their settings.php to work with the memcache service correctly
2.) Inject the settings config option statically (IE bad practice containers not being done via dependency injection)
3.) Figure out how to include settings.php automatically through the module.
4.) Remove the debuging option and make it a separate issue

I'm looking at option 2 so we don't have to roll out a new major version... or take the debugging piece out and make a new issue for it.

MiroslavBanov’s picture

I am confused about comment 125 above, but I guess it is about the introduction of a new required argument to the MemcacheBackend constructor?

Though I still don't get what this means:

if you use memcache only as your database backend

Normally you use memcache as a cache backend, right?

And this isn't the first time an argument is added with no version bump - see for example #2996620: Fix clock skew tolerance in deleteAll; make it configurable [Follow-up: Add mechanism to support invalidateAll and deleteAll]

I also don't know exactly in what scenarios you would have issues upgrading. To my knowledge it's when you don't install the module but add the cache backend to the container through a code snippet in settings.php. There may be other scenarios I am not aware of. If so, I would like to know about them :)

japerry’s picture

Status: Needs work » Needs review

This patch puts the debug components in a static call. While I see that the constructor was changed in that patch, as more and more places are using memcache, the likelyhood of sites breaking due to the requirement of settings.php changing, especially due to debug isn't worth it.

I also added a TODO to change this to be injected properly when a 3.x version is created.

japerry’s picture

  • japerry committed b9e4ad6 on 8.x-2.x
    Issue #435694 by Jeremy, markpavlitski, rbayliss, Elijah Lynn,...
japerry’s picture

Status: Needs review » Fixed

Since no one had an issue with the debug helper function, declaring this issue committed!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.