Related to #1555634: Improve performance of event cache by moving to a whitelist

The rules_event_whitelist variables appears to be a value that the rules module can determine itself over a number of requests rather than a value that should be set by the user (I don't see a UI option for it). The risk here begin that this can cause a variable_set to be called on a simple page request for an application, rather than explicitly in admin pages only which messes with the intended use of the variables APi in Drupal. As a result, we've observed this taking down a production website.

Instead, this value would be better suited in cache as the rules module is well capable of rebuilding the whitelist if it needs too.

I've attached a patch to push this over into the cache system instead. This prevents variable cache rebuilds from simple page requests.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

davidwhthomas’s picture

Priority: Normal » Major
Status: Active » Needs review
FileSize
4.58 KB

Agreed. This cache update is an important patch, particularly for high traffic sites using multiple servers.

We've had this issue causing lock wait timeouts on variable_set - taking down a clustered production site for short intervals.

Errors such as:

PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction: INSERT INTO {variable} (name, value) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1);

I've fixed up the above patch, namely

1. Fixed fatal "Cannot use object as array" error (using $cache vs $cache->data)
2. Use rules_get_cache to correctly rebuild rules_event_whitelist cache
3. Fix update hook to clear relevant caches on update.

I've tested it locally and Rules are correctly triggered and evaluated.

Marking major as it causes Rules to take down production sites.

Patch attached.

davidwhthomas’s picture

I noticed Rules originally allowed the rule to execute if the whitelist was empty. This was to optimistically allow the rule to execute and the whitelist to be rebuilt.

I've adjusted the patch to match. If the whitelist is completely empty, the reaction rule will still execute - as per original approach there.

Once the whitelist is populated (normally the case in all my tests), the reaction rule event will be matched accordingly.

Patch attached. Supersedes previous.

The last submitted patch, 1: 2190775-2-rules-event-whitelist-cache-fix.patch, failed testing.

davidwhthomas’s picture

Latest patch passed testing.

Jim Kirkpatrick’s picture

Well done David, this is killing us too...

I wonder if you can tell me if you've had your patch in a production/high load environment and it is therefore proven to work?

We're evaluating solutions to locking/load caused by variables and would like to understand the maturity of this patch, thanks.

davidwhthomas’s picture

@Jim Kirkpatrick

Hi Jim!

We're using it in production on a major site with ~3 million visitors per month ( approx ~100,000 avg uniques per day) without issue.

This patch really needs committing to the Rules project. Without it, the constant variable cache rebuilds / locks were killing the site.

Recommended!

Cheers,
David

joelpittet’s picture

Thanks @davidwhthomas I believe this to be taking down a production site of mine as well, it's shown up in the logs on a few occasions where the site has had a burst of heavy load.

Thanks for the patch, I'm deploying this immediately to production.

joelpittet’s picture

@davidwhthomas this RTBC patch #2324587: Rules might be triggered too early in the bootstrap conflicts ever so slightly with your patch. Curious if that one gets in first will you need to change your approach any?

davidwhthomas’s picture

@joelpittet, there shouldn't be any conflict with the issue you mentioned, the main change is to use the cache bin instead of the variable table which should be available at that point in the bootstrap.

Cheers,
David

P.S 8 months later, would be good to get some movement on this patch, makes a big different on high-traffic sites.

joelpittet’s picture

@davidwhthomas the hunks that failed were in rules_invoke_event and rules_invoke_event_by_args:

defined('MAINTENANCE_MODE'), which doesn't exist.

patching file includes/rules.core.inc
patching file includes/rules.plugins.inc
patching file rules.install
patching file rules.module
Hunk #1 FAILED at 328.
Hunk #2 succeeded at 445 (offset 25 lines).
Hunk #3 FAILED at 981.
Hunk #4 FAILED at 1010.
3 out of 4 hunks FAILED -- saving rejects to file rules.module.rej

Here's a re-roll. Please let me know if I got the hunks as you intended re-rolled correctly.

PS. I guess nobody runs high-traffic sites:P

joelpittet’s picture

I'm still getting these error after the patch:(

Tagging some related issues to link them all up, all trying to solve the issue in similar but different ways.

joelpittet’s picture

I'm now getting this:

Cache rebuild lock hit: rules_event_whitelist

with the patch applied.

joelpittet’s picture

So similar but different error. Lock and rules_event_whitelist related.

Fabianx’s picture

Priority: Major » Critical

This is critical, variable_set should never ever be used for something that could be triggered on every page request.

It is triggered for me on every page request on a big prod site ...

This can indeed bring BIG sites down ...

Silicon.Valet’s picture

Am seeing the same issue, but uncomfortable applying patch against dev version due to the nature of our packaging system. Will backport to 7.x-2.7 when I get a block of time for it but we are seeing this affect us as well at scale and has SIGNIFICANT impact.

Fabianx’s picture

@Silicon.Valet: Pro-Tip call system_cron only every 24 hours.

Suggested module: elysia_cron

davidwhthomas’s picture

So, is it just that my original patch doesn't apply against the current rules version?
If there's a new error, please include the relevant debug information, such as error message etc...

Fabianx’s picture

Status: Needs review » Needs work
+++ b/includes/rules.plugins.inc
@@ -809,7 +809,7 @@ class RulesEventSet extends RulesRuleSet {
     // calls. See rules_invoke_event().
-    variable_set('rules_event_whitelist', array_flip(array_keys($sets)));
+    cache_set('rules_event_whitelist', array_flip(array_keys($sets)), 'cache_rules');
   }

This should use:

rules_cache_set();

else the change within

rules_cache_get() will lead to the $cache[$cid] still being empty and such the rebuildEventCache() will be called again later.

dawehner’s picture

Status: Needs work » Needs review
FileSize
4.81 KB
1.25 KB

Adressed the feedback from fabian and improved a bit the documentation to explain WHY we change this.

Status: Needs review » Needs work

The last submitted patch, 19: 2190775-19.patch, failed testing.

Fabianx queued 19: 2190775-19.patch for re-testing.

The last submitted patch, 19: 2190775-19.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
6.97 KB

I really don't understand why, but let's see whether including all those files manually fixes the issues also on the testbot.

Fabianx’s picture

It passed, can we remove debug code?

dawehner’s picture

FileSize
6.34 KB

Alright, I still don't understand it completly, but well, here is the version without the debug statements.
It's to bad, that the debug()statements are not shown on the testbot.

Fabianx’s picture

Status: Needs review » Needs work

Code needs work, #18 got missing from the patch again.

davidwhthomas’s picture

Status: Needs work » Needs review
FileSize
6.33 KB

Hey guys, thanks for picking this one up,

I'm attaching an adjusted patch that includes rules_set_cache as per the request in #27

Cheers,
David

Fabianx’s picture

Status: Needs review » Reviewed & tested by the community

RTBC :), thanks!

The last submitted patch, rules-7.x-2.x-dev-rules_event_whitelist.patch, failed testing.

joelpittet’s picture

+++ b/tests/rules.test
@@ -946,6 +948,8 @@ class RulesTestDataCase extends DrupalWebTestCase {
+    include_once __DIR__ . '/rules_test.test.inc';

May not matter in tests but __DIR__ was introduced in PHP 5.3. Likely want to use dirname(__FILE__) for 5.2 compat?

Fabianx’s picture

I personally don't think it matters as tests >= PHP 5.3, which is EOL anyway (and 5.2 kinda unsupported by now) should not be a problem.

dawehner’s picture

FileSize
6.37 KB
1.57 KB

@davidwhthomas
Oh thank you for fixing it!!

I personally don't think it matters as tests >= PHP 5.3, which is EOL anyway (and 5.2 kinda unsupported by now) should not be a problem.

Afaik this is up to the maintainer to decide, but not really worth to fight for.
It could happen that d.o at some point decides to actually test in multiple (all supported) php versions.

Fabianx’s picture

Still **RTBC**

fago’s picture

Status: Reviewed & tested by the community » Needs work

Thanks, patch generally looks good except for those changes:

   function setUp() {
+    include_once dirname(__FILE__) . '/rules_test.test.inc';
+
     parent::setUp('rules', '

This file is there to *test* automatic file inclusion by Rules, thus should not be pre-included. If it's necessary now I suppose the patch breaks things :/

davidwhthomas’s picture

Status: Needs work » Needs review
FileSize
4.86 KB

@fago, thanks for the review!

I'm attaching an adjusted patch that removes from rules.test the line:

include_once dirname(__FILE__) . '/rules_test.test.inc';

The new patch therefore makes no changes to rules.test

Cheers,
David

P.S Rules rocks :)

fago’s picture

Status: Needs review » Fixed

thanks, patch looks great now. I've committed it to 7.x-2.x-dev (after the 2.8 release).

  • davidwhthomas authored a891438 on 7.x-2.x
    Issue #2190775 by dawehner, davidwhthomas, joelpittet, Josh Waihi,...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.