Problem description
This module uses the Redis KEYS command to perform cache wildcard deletions. As the docs on KEYS command state, it should not be used in production environments. In a 1.5M Redis instance, we are experiencing website freezes on cache_menu bin flushes with key counts ranging from 500K to 800K. Happens also when flushing other cache bins, such as doing database updates. Redis freezes while doing KEYS to fetch matching keys for deletion and the entire site slows down to a crawl or even halts completely if the freeze lasts long enough (we've seen freezes lasting from 20 to 40sec).
It's also the first item under the @todo section in lib/Redis/Cache/Base.php :)
Solution
The alternative is to use the SCAN command which uses a cursor that you will keep calling repeatedly to iterate on the matching keys. Doing it this way, Redis can keep accepting commands and does not blocks. However, it has some differences:
Operation is not atomic, SCAN provides a cursor and the eval'ed script will loop until no more keys left. It can take slightly more time under heavy load, but no more freezing of Redis until it returns from enumerating keys.
To be replication-safe, it requires switching from script replication to script *effects* replication. This is available in Redis >=3.2. More info on Antirez's post.
Warning: This method might also cause trouble with script runs lasting more than lua-time-limit (which defaults to 5s), because of Redis answering 'BUSY' to clients past that limit. It's still being worked out.
Implementation
It has been implemented alongside current behaviour to preserve compatibility. Module will work by default as before unless you explicitly turn on this feature.
Also, if you accidentaly turn on the feature on a server not supporting effects replication (<3.2), it will simply ignore it, since the script always checks server version first and fallbacks to normal behaviour.
To add the feature, these are the main changes.
- Add the feature toggle option in the configuration page.
- Add a new parameter to the Lua script which uses the new SCAN code for deletion.
- Add the scan on/off parameter on the eval() call in the cache backend implementation, according the configuration.
The Lua script has grown to accommodate the following new capabilities:
- Checks second parameter (ARGV[1]) to enable SCAN.
- Checks for server support for switching to effects replication with the redis.replicate_commands() call.
- Loops performing SCAN/DEL until no more keys match.
- Script supports specifying SCAN's COUNT parameter as ARGV[2], which defaults to 100. You can change this value from the configuration page.
- It outputs information and events to Redis' logs. You can silence them raising server's minimum log level. Doing all logging with LOG_DEBUG level would render logging useless on high volume setups, since it will be lost among lots of items per sec. Information sent to log:
- Script invocation started, with requested keys argument (notice).
- If SCAN mode requested, say so, along server's returned version string and COUNT parameter (verbose).
- If SCAN mode requested and not supported, emit a warning message (warning).
- If using KEYS command, say so before and after calling KEYS to allow timing (notice).
- Finished message with mode (SCAN/KEYS) and deleted keys count (notice).
- Minor: Use the variable name
_
convention instead ofi
for unused variables.
Comment | File | Size | Author |
---|---|---|---|
#21 | redis-delete-by-prefix-using-scan-2851625-21.patch | 3.79 KB | showrx |
#20 | redis-delete-by-prefix-using-scan-2851625-20.patch | 5.4 KB | showrx |
#15 | interdiff--6-15.txt | 8.45 KB | amontero |
#15 | 2851625--add-scan-delete-by-prefix--15.patch | 11.86 KB | amontero |
#6 | interdiff--1-6.txt | 1 KB | amontero |
Comments
Comment #2
amonteroComment #3
amonteroComment #4
amonteroComment #5
memtkmcc CreditAttribution: memtkmcc at Omega8.cc commentedThis works absolutely amazing, thank you!
We have already implemented this patch in the BOA stack:
https://github.com/omega8cc/boa/commit/5fdc45f28c21360a63b0036e01426c5dd...
https://github.com/omega8cc/redis/commit/fceaef574a629fb716ef6c91db5eba6...
Comment #6
amontero@memtkmcc: Glad to hear that! I assume you mean production, already. Could you please give some figures on your setup? (instances, keys, etc)
Also, patch reroll with minor fixes:
Attached also interdiff.
Comment #7
memtkmcc CreditAttribution: memtkmcc at Omega8.cc commented@amontero We have added this to BOA head, and it will be included in the upcoming stable release this week, so we don't have data from production servers yet, we stress-tested this on typically slow mass-cache-clear tasks on sites managed in Aegir, and it worked beautifully. We are happy to share the results from production machines, once it is deployed.
Comment #8
cafuego CreditAttribution: cafuego as a volunteer commentedThank you @amontero!
We installed this patch on our production systems yesterday and cron stopped failing! Yay!
Previously, a cache clear in cron would error out after a minute and the drush cron task (and connection to redis) would then hang forever.
With the patch, cron takes a few seconds and no longer errors out.
Additionally, because of the errors previously, we ended up with a redis filled with 700k mostly stale entries, whereas now we have 250k of current-ish ones. I'm sure that will help performance too (no stats on that yet, though).
Comment #9
attiks CreditAttribution: attiks at Attiks commented#6 Nice job! Works like a charm
Comment #10
pounardWhy didn't I saw that in my issue list ? Seems good, I will review and test this when I'll have some spare time, thank you very much!
Comment #11
memtkmcc CreditAttribution: memtkmcc at Omega8.cc commentedAfter some testing in production we have mixed results. On at least four different servers with SCAN instead of KEYS it managed to almost freeze Redis server at totally random moments (typically every few hours), so it delayed connections to the point where it basically stopped working, hence triggering our fallback to use SQL cache, which of course causes high load spikes, exhausting allowed DB connections limits, etc. Note that we use fairly moderate configuration, where all cache tables are delegated to Redis, and the allowed memory limit per Redis instance is only 2 GB. Really weird, but using SCAN on some systems actually made things much much worse. Still investigating why exactly this happens.
Comment #12
pounardOne very good reason for NOT applying the patch right now, please if you have any more details, share it!
Comment #13
kirkby CreditAttribution: kirkby commentedAfter applying patch in production, we have had the same experience as @memtkmcc
The slowlog would reveal lots of
with response time around 0.9 seconds pr. delete, yielding system completely unresponsive.
The bigger the database, the longer the response time. A
flushdb
would temporarily solve problem.After disabling, response times were reduced by factor 8 for these deletes (still pretty slow ....) Of course, having a lot of "Delete by prefix" is probably not a good idea under any circumstances, however, this patch made things worse for us.
Comment #14
amonteroIn order to address pounard's legitimate concerns, I'll clearly label the feature as experimental to warn people and put a link here in the description field to get as much feedback as possible. UX suggestions welcome.
Now into @memtkmcc and @kirkby issues:
Although longer deletion times are expected since the Lua script yields CPU to Redis' main event loop to allow it to continue serving requests, freezing Redis at all is not expected. The problem I try to address here is cache wildcard deletes with huge key counts (800K). A freezing Redis means that all webserver threads stall at bootstrap phase and an avalanche of web requests starts to pile up. For those scenarios, it's better for the deleter request to take longer, but allow others to proceed, even at a slower speed to avoid web requests to being killed due to PHP's max_execution_time.
It is normal with the SCAN method to trigger SLOWLOG entries, since Redis has a low (configurable) threshold for it as opposed to KEYS, which is an atomic, stop-the-world operation which does not trigger slowlogs, IIRW. Can you please provide machine specs, version, configurations (off from defaults at least) and logs of delete operations? With Redis' logs we'll be able to diagnose how much time per key count it's taking, both with KEYS and SCAN commands. Increase Redis' server logging level to LOG_DEBUG, if possible. I also recommend monitoring Redis' performance metrics to check if there are hidden metrics that can give us any clue to find the cause (I recommend Redsmin).
Possible causes of varying results which come to mind (suggestions welcome):
I'll be rolling out a new patch shortly. Please, give me as much info as you can and I'll do my best to find where the problem is for those of you for whom is not working, and if new information appears which I can use to improve the code, I'll happily include it.
@pounard: Perhaps not all scenarios can take advantage of it (especially with short deletion times to complete and shorter Redis freezes, which would not make as much difference, other than taking longer to delete the keys). How about the approach of releasing the feature incrementally in the 7.x-dev branch *clearly* labeling it as experimental? Some people can already benefit from it and we can put it on the hands of more people brave enough to "try devel" to get more feedback.
Comment #15
amonteroDue to lack of feedback, I've added (well, actually exposed) a new configuration setting which might help solving the found issues.
The SCAN command accepts a COUNT argument for specifying the number of elements returned at every SCAN call. Being able to fine-tune this setting may change behaviour, be it by increasing the value (thus needing less calls to SCAN and hopefully finishing sooner) or lowering it (perhaps yielding more time to Redis' main loop, at the cost of more calls and more time to complete). Although the Lua script already had this parameter, there was no way outside the code to change it. This patch exposes the configuration setting and passes it when calling the Lua script.
Since the Redis service might be configured differently on each setup, your mileage may vary and there is no rule of thumb AFAIK which might apply to everyone, thus making the setting tunable is the best solution I can come with, by now. I have no other info to figure out what might be the culprit for the problems experienced here. Feedback is much appreciated.
Patch reroll. Changes:
Comment #16
kirkby CreditAttribution: kirkby commentedHi again, didn't notice your request for feedback until now.
It will take me some time to collect some systematic feedback but I will try to get around to it.
However, one thing that I should mention right away is that our redis instance was running
on an absurdly overpowered physical host, with the recommended THP setting and no
persistence which suggest that this is not related to insufficient specs on the host.
Another quick observation: we actually re-patched the patch to remove the version checking
in the lua script. This part:
Already running the required redis version this code seems rather inefficient.
No reason to check redis version for every single call - it will probably not change
between calls. Unfortunately, I have no data on the effect of this change.
I will try to collect some more data for you, maybe publish our own patch for inspiration.
Right now, we are not using patch in production. We eliminated most of the delete by prefix
calls from Drupal so things are running pretty stable right now.
Thanks for your effort.
Comment #17
pounardThat's fine for me, I'll have to do some reviews of the patch, and I'll try it too. You should also add the corresponding unit tests (no need to write new tests, but declare a new class with the correct parameters to have tests running with it).
Comment #18
amonteroNews on the possible causes of varying results:
As the inline comments on the included Redis 3.2 configuration file state, if the deletion script runs for longer than 'lua-time-limit' config setting, Redis will start answering 'BUSY' error to other calling processes. Although it might accept connections and commands, this would cause them to fail and most surely throw an exception. Try issuing 'CONFIG SET lua-time-limit 10000' to raise it to 10s or whatever suits you.
As I see now, the possible solutions are:
Any other? Which one sounds better to you?
@kirkby: Checking server version on each call is necessary because the script would break when doing a SCAN/DEL in versions prior to 3.2. Since Redis can be upgraded at anytime, it's the most cautious way to proceed, IMO. Having to configure your Redis version or relying on site admin to not toggling the feature for servers not supporting it adds to complexity and opens the door to troubles. I haven't measured it, but the code should run very fast, anyway. I don't think that the execution time improvement from removing the check justifies the added hassle of patching the module.
@pounard: Happy to hear that. Note taken for the tests, but postponing until @kirkby (or anyone else) sends a bit more feedback.
Comment #19
Fabianx CreditAttribution: Fabianx at Tag1 Consulting commented---
While I 100% agree that this needs to be fixed and that KEYS is problematic, the SCAN implementation in a script has many drawbacks as well.
I am just outlining an idea that I had in hopes that others may find it useful:
If the list of to be wildcard expired cache entries is known (cache_rules:*, cache:entity_info:*, etc.) then it is possible to create a LOOKUP table within redis.
The trick is to use redis ZSETs.
Once the cache gets cleared for a known prefix (one KEYS command usually is not locking things down, multiple do) the following is happening in pseudo-code - here on an example of cache_rules:
- clear("cache_rules:*")
- ZSET cache_cids_cache_rules => [ 'REDIS_INTERNAL_INIT' => 1 ]
Now a cache_set is happening:
cache_set("cache_rules:foo", "bar");
internally after the cache_set (or before) we do:
ZSET cache_cids_cache_rules => [ 'foo' => 1 ]
Our set now contains:
cache_cids_cache_rules => [ 'REDIS_INTERNAL_INIT' => 1, 'foo' => 1 ]
For a longer key, we would store the whole cid-suffix:
cache_set("cache_rules:bar:baz:foo", "bar");
ZSET cache_cids_cache_rules => [ 'bar:baz:foo' => 1 ]
Now when the cache_clear_all wildcard clear comes again:
- clear("cache_rules:*")
Within the script we check for an entry in cache_cids_cache_rules and if the 'REDIS_INTERNAL_INIT' is set to 1, then we get the remaining keys out of the entry and clear only those and reset the entry back to just the INIT.
If the 'REDIS_INTERNAL_INIT' is not set to 1 (e.g. it expired or was never called), then KEYS is used as fallback like now.
The advantage of that approach is that it is "learning" by observing frequent cache_clear_all's and it could also be threshold based (e.g. just start recording after 5 cache_clear_all's) - though a white list would also work.
This would not solve the occasional cache_clear_all for e.g. files after a file deletion, but would solve the very frequent ones and for the others like file_deletion a static configuration map could work - as we know core's cache_clear_all in general.
This has more effort on SET for the high frequency cache_clear_all cache IDs, but way less effort on expiration while still having the same atomicity level as reached now.
The reason why this works is, because the ZSET is either fully expired or not at all and as such we are able to know (compared to memcache) if the system is in a coherent state.
--
One of the best properties of that system is that e.g. if cache_clear_all() is called in really quick succession, e.g. on entity_defaults_rebuild(), which rebuilds all rules, which clears the cache e.g. 200 times in a row, the effort remains minimal.
And such the cases that lock down redis are efficiently solved, while the effort for non-locking occasional clears is the same as before.
--
We can certainly open another issue for that, but I wanted to present this solution as an alternative to the above here for all to see.
Comment #20
showrx CreditAttribution: showrx as a volunteer commentedWe're running this patch against 7.x-3.17. SCAN is used when redis_delete_prefix_scan_enabled is true and "script effects" replication is available (Redis 3.2.0+). There's also an optional redis_delete_prefix_scan_count to adjust scan COUNT.
Comment #21
showrx CreditAttribution: showrx as a volunteer commented#20 doesn't do it since EVAL itself is blocking, so I rewrote using Redis client scan() function...
- Added deleteByPrefixUsingScan(). To enable, set redis_delete_scan_enabled to TRUE.
- Two optional settings: redis_delete_scan_count (default is 100) and redis_delete_batch_size (default is 20). redis_delete_batch_size is pipeline delete batch size.
- Also eliminate duplicate deletions in a single execution using static variable caching.
Comment #22
djdevinThanks for the patch, this helped out a lot.
With 500k+ keys we had to increase the scan count to 1000 to prevent cache check/clears from taking much longer than they should. Increasing it to 10000 caused slowlog SCAN entries to appear. But at least now it wasn't blocking anything except the current request.
So not a perfect solution but much better. As said in #19 without some sort of lookup cache, Drupal has to scan over all those keys to delete them. In our case we have a lot of sites using a big Redis instance. And of course from what I've read, this isn't best practice. But there's no way I'm spinning up that many Redis instances on AWS because it would be insanely cost prohibitive. So memcache is unfortunately looking like a better option for purely cache purposes, even though I much prefer Redis clustering.
Comment #23
pounardWe use the Redis module configured for proxy assisted sharding, even when there's one single instance, and we configure Redis server itself to evicts keys using its own LRU mechanism. Basically, if you don't know, the Redis module set all his own keys as being volatile with a few month of lifetime for permanent keys, allowing Redis to evict them when it needs to reclaim memory. This way we don't have to clear caches at all. If you're working with many sites, which share the same Redis instances, use the module this way, explicit cache clear has never been recommended with high data volume.
Comment #24
cafuego CreditAttribution: cafuego as a volunteer commentedPutting the version back to 7.x-3.x, as you always patch against HEAD and not a release. Patch still applies to 7.x-3.18, but I think it may need a logic check so it keeps working as advertised with the changes from https://www.drupal.org/node/3074189
Comment #25
t14 CreditAttribution: t14 as a volunteer commentedThe patch with comment #21 worked very well in production on a high traffic site with a larger than normal amount of keys
The key (no pun intended) to getting this right is adjusting the configurable options that come with the patch 'redis_delete_batch_size' and 'redis_delete_scan_count'
'redis_delete_scan_count' is from what I have experienced the most important . The default value of 10 will probably fix your problem adjusting it to a bigger count will help improve the speed of cache clears and also improve performance especially if your php max_execution time is low, if you set the value to high you will end up with the same issues when using the key command.
I used the LUA script below to fill my redis db with wildcard keys (matching the value I have in prod) and then ran the patch #21 tweaking the config values "redis_delete_scan_count" and "redis_delete_batch_size" till I found a good value
Assuming you save the LUA script above with file name redis_import_keys.lua you can run the following command to fill your redis db
redis-cli --eval redis_import_keys.lua
I think this should be merged. Surprised that this module relies on the KEY command when it is clearly stated in the Redis docs that you shouldn't use KEY in production. (Other than that thanks for all the hard work, I know that most maintainers are volunteering their own time :) )
Comment #26
ron_s CreditAttribution: ron_s commentedPatch #21 still applies cleanly to the latest 7.x-3.19 version.
Comment #27
powrsurg CreditAttribution: powrsurg commentedSo I want to put out there that patch in #21 dramatically helped us for the most part.
I've done review of how many keys we've had open and I've seen us in the 2.5 mil key range. Our system was always locking up with Redis (which as an FYI was an AWS ElastiCache managed service, not something running on localhost) warnings until it completed when we flushed Drupal cache. This solution helped us for the most part.
We found that even by upping the number of deletions and scanned it was taking a long time and consuming a ton of memory -- more than our typical server needs. We had to add in some lines where deleteByPrefixUsingScan could extend the memory allocated to run, and to set drupal_set_time_limit to 0 so that it had unlimited time to delete everything.
With this the site never became unusable even when we were flushing cache. Overall this was an improvement for us.
Comment #28
powrsurg CreditAttribution: powrsurg commentedUPDATE: Even after applying this update, in Views updating something and saving the View seemed to cause a total Redis meltdown. I'm not sure if Views does not trigger this or what.