Related to #1555634: Improve performance of event cache by moving to a whitelist
The rules_event_whitelist variables appears to be a value that the rules module can determine itself over a number of requests rather than a value that should be set by the user (I don't see a UI option for it). The risk here begin that this can cause a variable_set to be called on a simple page request for an application, rather than explicitly in admin pages only which messes with the intended use of the variables APi in Drupal. As a result, we've observed this taking down a production website.
Instead, this value would be better suited in cache as the rules module is well capable of rebuilding the whitelist if it needs too.
I've attached a patch to push this over into the cache system instead. This prevents variable cache rebuilds from simple page requests.
Comment | File | Size | Author |
---|---|---|---|
#36 | 2190775-36-rules-event-whitelist-cache-fix.patch | 4.86 KB | davidwhthomas |
#33 | interdiff.txt | 1.57 KB | dawehner |
#33 | 2190775-33.patch | 6.37 KB | dawehner |
#28 | 2190775-28-rules-event-whitelist-cache-fix.patch | 6.33 KB | davidwhthomas |
#26 | 2190775-26.patch | 6.34 KB | dawehner |
Comments
Comment #1
davidwhthomas CreditAttribution: davidwhthomas commentedAgreed. This cache update is an important patch, particularly for high traffic sites using multiple servers.
We've had this issue causing lock wait timeouts on variable_set - taking down a clustered production site for short intervals.
Errors such as:
PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction: INSERT INTO {variable} (name, value) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1);
I've fixed up the above patch, namely
1. Fixed fatal "Cannot use object as array" error (using $cache vs $cache->data)
2. Use
rules_get_cache
to correctly rebuildrules_event_whitelist
cache3. Fix update hook to clear relevant caches on update.
I've tested it locally and Rules are correctly triggered and evaluated.
Marking major as it causes Rules to take down production sites.
Patch attached.
Comment #2
davidwhthomas CreditAttribution: davidwhthomas commentedI noticed Rules originally allowed the rule to execute if the whitelist was empty. This was to optimistically allow the rule to execute and the whitelist to be rebuilt.
I've adjusted the patch to match. If the whitelist is completely empty, the reaction rule will still execute - as per original approach there.
Once the whitelist is populated (normally the case in all my tests), the reaction rule event will be matched accordingly.
Patch attached. Supersedes previous.
Comment #4
davidwhthomas CreditAttribution: davidwhthomas commentedLatest patch passed testing.
Comment #5
Jim Kirkpatrick CreditAttribution: Jim Kirkpatrick commentedWell done David, this is killing us too...
I wonder if you can tell me if you've had your patch in a production/high load environment and it is therefore proven to work?
We're evaluating solutions to locking/load caused by variables and would like to understand the maturity of this patch, thanks.
Comment #6
davidwhthomas CreditAttribution: davidwhthomas commented@Jim Kirkpatrick
Hi Jim!
We're using it in production on a major site with ~3 million visitors per month ( approx ~100,000 avg uniques per day) without issue.
This patch really needs committing to the Rules project. Without it, the constant variable cache rebuilds / locks were killing the site.
Recommended!
Cheers,
David
Comment #7
joelpittetThanks @davidwhthomas I believe this to be taking down a production site of mine as well, it's shown up in the logs on a few occasions where the site has had a burst of heavy load.
Thanks for the patch, I'm deploying this immediately to production.
Comment #8
joelpittet@davidwhthomas this RTBC patch #2324587: Rules might be triggered too early in the bootstrap conflicts ever so slightly with your patch. Curious if that one gets in first will you need to change your approach any?
Comment #9
davidwhthomas CreditAttribution: davidwhthomas commented@joelpittet, there shouldn't be any conflict with the issue you mentioned, the main change is to use the cache bin instead of the variable table which should be available at that point in the bootstrap.
Cheers,
David
P.S 8 months later, would be good to get some movement on this patch, makes a big different on high-traffic sites.
Comment #10
joelpittet@davidwhthomas the hunks that failed were in rules_invoke_event and rules_invoke_event_by_args:
defined('MAINTENANCE_MODE'), which doesn't exist.
Here's a re-roll. Please let me know if I got the hunks as you intended re-rolled correctly.
PS. I guess nobody runs high-traffic sites:P
Comment #11
joelpittetI'm still getting these error after the patch:(
Tagging some related issues to link them all up, all trying to solve the issue in similar but different ways.
Comment #12
joelpittetI'm now getting this:
with the patch applied.
Comment #13
joelpittetSo similar but different error. Lock and rules_event_whitelist related.
Comment #14
Fabianx CreditAttribution: Fabianx commentedThis is critical, variable_set should never ever be used for something that could be triggered on every page request.
It is triggered for me on every page request on a big prod site ...
This can indeed bring BIG sites down ...
Comment #15
Silicon.Valet CreditAttribution: Silicon.Valet commentedAm seeing the same issue, but uncomfortable applying patch against dev version due to the nature of our packaging system. Will backport to 7.x-2.7 when I get a block of time for it but we are seeing this affect us as well at scale and has SIGNIFICANT impact.
Comment #16
Fabianx CreditAttribution: Fabianx commented@Silicon.Valet: Pro-Tip call system_cron only every 24 hours.
Suggested module: elysia_cron
Comment #17
davidwhthomas CreditAttribution: davidwhthomas commentedSo, is it just that my original patch doesn't apply against the current rules version?
If there's a new error, please include the relevant debug information, such as error message etc...
Comment #18
Fabianx CreditAttribution: Fabianx commentedThis should use:
rules_cache_set();
else the change within
rules_cache_get() will lead to the $cache[$cid] still being empty and such the rebuildEventCache() will be called again later.
Comment #19
dawehnerAdressed the feedback from fabian and improved a bit the documentation to explain WHY we change this.
Comment #24
dawehnerI really don't understand why, but let's see whether including all those files manually fixes the issues also on the testbot.
Comment #25
Fabianx CreditAttribution: Fabianx commentedIt passed, can we remove debug code?
Comment #26
dawehnerAlright, I still don't understand it completly, but well, here is the version without the debug statements.
It's to bad, that the
debug()
statements are not shown on the testbot.Comment #27
Fabianx CreditAttribution: Fabianx commentedCode needs work, #18 got missing from the patch again.
Comment #28
davidwhthomas CreditAttribution: davidwhthomas commentedHey guys, thanks for picking this one up,
I'm attaching an adjusted patch that includes rules_set_cache as per the request in #27
Cheers,
David
Comment #29
Fabianx CreditAttribution: Fabianx commentedRTBC :), thanks!
Comment #31
joelpittetMay not matter in tests but __DIR__ was introduced in PHP 5.3. Likely want to use dirname(__FILE__) for 5.2 compat?
Comment #32
Fabianx CreditAttribution: Fabianx commentedI personally don't think it matters as tests >= PHP 5.3, which is EOL anyway (and 5.2 kinda unsupported by now) should not be a problem.
Comment #33
dawehner@davidwhthomas
Oh thank you for fixing it!!
Afaik this is up to the maintainer to decide, but not really worth to fight for.
It could happen that d.o at some point decides to actually test in multiple (all supported) php versions.
Comment #34
Fabianx CreditAttribution: Fabianx commentedStill **RTBC**
Comment #35
fagoThanks, patch generally looks good except for those changes:
This file is there to *test* automatic file inclusion by Rules, thus should not be pre-included. If it's necessary now I suppose the patch breaks things :/
Comment #36
davidwhthomas CreditAttribution: davidwhthomas commented@fago, thanks for the review!
I'm attaching an adjusted patch that removes from rules.test the line:
include_once dirname(__FILE__) . '/rules_test.test.inc';
The new patch therefore makes no changes to rules.test
Cheers,
David
P.S Rules rocks :)
Comment #37
fagothanks, patch looks great now. I've committed it to 7.x-2.x-dev (after the 2.8 release).