Patch for D7 is available.

We need locking around the css and js aggregation. We added locking to other subsystems in Drupal 7, and this one was just missed. Without locking, multiple processes try to build and rebuild the css and js. On our site, this puts a high load on the file system, which causes apache to throw 503 errors.

Also, if you simply update the css or js, the files will not rebuild. Say you upgrade a module, and it adds 5 lines to it's css, or say you modify your theme and change things around, if the bundle is already built, and if no new files were added to the page, then the bundle will not rebuild. I've introduced a "version" for this. The intent of the "version" is that some contrib module can be written to allow incrementing the version either from an admin page or via drush, and that the site developers will know when this needs to be done, and will simply increment the version, forcing a rebuild.

An additional problem surfaced while trying to solve this, which is that if the creation of the css file fails, we currently return FALSE; however the function that calls drupal_build_css_cache() doesn't interprete the result, and simply adds the empty string as a css file. This is an obvious bug. We need to handle the FALSE return in css, just like we do in js.

So my solution is to use locks so that we generate each bundle in only one process (allowing multiple bundles to build simultaneously, but making sure only one process builds each bundle), to use a global lock when saving the map, returning stale bundles during the rebuild or on error.

Anyone new to this issue can problem skip ahead to #30 or so.

CommentFileSizeAuthor
#114 css-stampede-886488-114.patch14.91 KBDarren Oh
#100 css-stampede-886488-100.patch14.55 KBMiroslavBanov
#88 css-stampede-886488-86.patch14.26 KBmarcingy
#86 css-stampede-886488-86.patch14.26 KBmarcingy
#81 886488.stampede.patch14.41 KBAnonymous (not verified)
#80 886488-68.patch14.38 KBcarlos8f
#71 886488.patch14.66 KBdouggreen
#68 886488-68.patch14.38 KBcarlos8f
#65 stampede.patch14.05 KBAnonymous (not verified)
#56 stampede.patch14.05 KBAnonymous (not verified)
#55 css.stamped.patch13.69 KBAnonymous (not verified)
#53 css.stamped.patch13.65 KBAnonymous (not verified)
#50 css.stamped.patch13.65 KBAnonymous (not verified)
#48 886488-2-debug.patch.txt13.81 KBJeremy
#47 886488-2.patch12.86 KBJeremy
#46 886488.patch14.12 KBJeremy
#46 886488-debug.patch.txt15.49 KBJeremy
#46 886488-nopatch-debug.patch.txt1.76 KBJeremy
#42 886488.patch13.74 KBnnewton
#37 886488.patch13.25 KBdouggreen
#34 886488.patch13.23 KBdouggreen
#30 886488.patch13.23 KBdouggreen
#28 886488.patch12.51 KBdouggreen
#26 886488.patch12.41 KBdouggreen
#24 886488.patch10.12 KBdouggreen
#23 886488.patch10.09 KBdouggreen
#21 886488.patch9.74 KBdouggreen
#20 886488.patch10.11 KBcatch
#19 886488.patch9.77 KBdouggreen
#18 886488.patch9.75 KBdouggreen
#17 886488.patch9.16 KBdouggreen
#13 886488.patch7.71 KBdouggreen
#12 886488.patch7.59 KBdouggreen
#11 css_js_stampede.patch6.3 KBcatch
#10 css_js_stampede.patch6.23 KBcatch
#8 css.patch8.23 KBcatch
#5 css_js_stampede.patch6.16 KBcatch
#3 css_stampede.patch6.17 KBcatch
css_stampede.patch4.34 KBcatch

Issue fork drupal-886488

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Status: Needs review » Needs work

The last submitted patch, css_stampede.patch, failed testing.

catch’s picture

Status: Needs work » Needs review

css_stampede.patch queued for re-testing.

catch’s picture

FileSize
6.17 KB

Adding js.

Status: Needs review » Needs work

The last submitted patch, css_stampede.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
6.16 KB

minus syntax error.

douggreen’s picture

Status: Needs review » Needs work

Please initialize $lock_acuire_attempts = 0; in both cases. Also, the logic for $lock_acquired seems to be wrong, or rely on some mystical operator precedence. I'd change this to

if ($lock_acquired = lock_acquire($key)) {
  ...
}
else {
  lock_wait($key);
}
catch’s picture

Status: Needs work » Needs review
FileSize
8.23 KB

updated patch. Some of these fixes are direct from doug, so please credit him in any commits.

Status: Needs review » Needs work

The last submitted patch, css.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
6.23 KB

Should hopefully apply this time, also this time with a global lock instead of per file.

catch’s picture

FileSize
6.3 KB

One more time, use a variable for $lock_name.

douggreen’s picture

FileSize
7.59 KB

The previous patch by catch still does not solve our cache stampede problem. Forcing everything on the site to rebuild simultaneously causes too long of a wait. This new approach uses a single lock for css and another for js, and let's the system rebuild them one at a time, serving up the older files when it can't get a lock.

douggreen’s picture

FileSize
7.71 KB

And a fix to that release, that unlocks the right key (catches last patch did this right, but mine did not), and also unlocks the key on write failure.

catch’s picture

Write through seems the right way to go for this, however it looks like the only window for that to work is between the variable_set() and the file deletion, if that's right is there a way for us to do the delete itself a bit later?

catch’s picture

Doug pointed out that drupal_delete_if_stale defaults to 30 days, so there's a reasonable chance of them existing, and you can set things higher if you need it.

douggreen’s picture

The only problem I see with this approach is that the maps might contain entries that are obsolete. When you push new code that requires a js or css rebuild, other things probably changed and the hash's on some pages might change. We probably want to occasionally remove stale map entries, maybe on cron, that exist in the map, but have an older time than the clear. I'll add that later today.

douggreen’s picture

FileSize
9.16 KB

New patch
* adds map cleanup on system cron
* only sets $map['clear'] if there's something in it

douggreen’s picture

FileSize
9.75 KB

One small tweak, to unset 'clear' if it's no longer needed.

Also, I see an additional problem, which is that the map is retrieved before we get a lock. It's probably that if we don't get the lock on the first attempt, that another process will update the map, and that once this process gets the lock, it will revert anything another process did to the map. It's even possible that this was the root cause to needing a global lock (and a global lock just minimized the time that this was an issue), and that after this change, we might be able to go back to a lock on the individual keys.

douggreen’s picture

FileSize
9.77 KB

... and now, only re-read the map if file_unmanaged_save_data() succeeds, ... we probably never hit this condition, but it saves us from re-reading the variables on failure.

catch’s picture

FileSize
10.11 KB

lock_wait() needs the name of the lock as an argument.

douggreen’s picture

FileSize
9.74 KB

Updated patch always rebuilds file if it is missing, or being cleared, extra check for file existence was preventing rebuilds on clear, and legacy check from old code, new code should only get into this if the file doesn't exist or it needs to be cleared.

Status: Needs review » Needs work

The last submitted patch, 886488.patch, failed testing.

douggreen’s picture

FileSize
10.09 KB

oops, I did same thing last time

douggreen’s picture

FileSize
10.12 KB

We're still having problems. Updated patch file makes sure to atomically change the map file. We're also trying a version that uses cache_set/get, but will post that next if it appears to work.

douggreen’s picture

Status: Needs work » Needs review
FileSize
12.41 KB

Fixing patch file again, and updating to head

Status: Needs review » Needs work

The last submitted patch, 886488.patch, failed testing.

douggreen’s picture

Status: Needs work » Needs review
FileSize
12.51 KB

Add missing contents line, that was lost in the merge.

Status: Needs review » Needs work

The last submitted patch, 886488.patch, failed testing.

douggreen’s picture

Status: Needs work » Needs review
FileSize
13.23 KB

Updated version checks if the map exists before writing it, which is the root cause of the last testbot failure.

Updated version returns the $uri on file_unmanaged_save_data write failures. It returns the $uri because we have one, and it's more likely that returning a stale version is better than returning no version. Updated version also duplicates the js code css for handling a FALSE returned by drupal_build_css_cache, by simply not putting it into the css group. Surprisingly, if you do put it into the group, with a value of FALSE, what gets generated is a css link to the root of your site.

douggreen’s picture

Priority: Normal » Critical

This issue hasn't gotten much attention yet. I just rewrote the description, and would like to make a case for this being a Drupal 7 critical issue. It's amazing we've done so well on Drupal 7, which speaks to how mature it really was, even pre-launch. But this issue, the one outlined here, has been our biggest cause of downtime. Without this patch, we can't change our theme, and we can

I'm not convinced yet that this is the final solution. I'm advocating that others look at the problem, and help solve it.

I'm overstepping a bit by making this critical, but I'd like to make sure that some key people look.

Status: Needs review » Needs work

The last submitted patch, 886488.patch, failed testing.

Anonymous’s picture

subscribe.

douggreen’s picture

Status: Needs work » Needs review
FileSize
13.23 KB

ugh, I can't believe I did that, fixing variable name...

carlos8f’s picture

subscribing

JeremyFrench’s picture

Priority: Critical » Major

While I think this is a great change, and much needed for D7 I don't think it is a critical (which doesn't mean it won't make D7, just that it won't block the release).

Put simply
It doesn't break the API (although it does add some clean up functions so could count as an API change)
It doesn't contain any strings
It doesn't stop Drupal working in a fundamental way.

See #974066: Friendly reminder: on critical issues and RCs for information on criticals.

If someone else disagrees withe me please bump again.

douggreen’s picture

Priority: Major » Critical
FileSize
13.25 KB

This is a new feature to D7 that prevents a high usage site from updating their code without downtime, that to me stops D7 from working in a critical way, consider security updates. No disrespect Jeremy, I'm sure you've done great work in Drupal. I hope you agree it would be nice to let a few more people see this before lowering the priority. I work with chx and catch, and both those guys know about this issue. I know that either of them (or any of a dozen or more others) will lower the priority when they want to get Drupal out the door or if they agree with you, and I'll accept it then.

Here's one more update. Adding version to the filename, so that upping the version, which already forces a new bundle to build, also updates the filename, forcing CDN's to clear. I thought that was already in here, but wasn't. It's kinda a key component to the version idea.

Status: Needs review » Needs work

The last submitted patch, 886488.patch, failed testing.

catch’s picture

The return FALSE bug should probably be it's own issue, that's quite serious just by itself if you run into it and it's not directly related to the stampede.

I think this is strictly speaking 'major' in the sense that it will only occur with a certain degree of traffic (or possibly on a site that's i/o constrained with medium traffic), however I don't think we should release with both this issue and #973436: Overzealous locking in variable_initialize() unresolved - since to me the combination of the two is critical (the css/js stampede can itself lead to a variable_set() stampede since that's where the map is stored).

These are extremely hard issues to track down, hard to fix, and there are currently limited sites which can be used as test-beds - which means it makes sense to fix it before people's sites start going down all over the place rather than after when we're flooded with obscure support requests.

webchick’s picture

Priority: Critical » Major

Let's stop with the status ping-pong. But I'm happy to take a look at this if it gets RTBC in time.

nnewton’s picture

RE: The patch in #37

I've been testing this patch today on an EC2 VM. It is fairly easy to generate a css/js stampede simply using ab. You don't even need to simulate NFS-level IO conditions (although one could argue that EC2 is fairly close to that at times), which makes me wonder how long this has been a problem. I forward ported the patch in 37 to work for the latest head and tested it against a vanilla head. The results were not encouraging, it basically had no impact but for slowing down page load due to the locks. I added some logging and I think I know what is going on. While this patch introduces a lock and makes only a single process build a given css group at a time, the function never checks to see if another process has built the CSS group its looking for after waiting for the lock (unless I'm missing something). Thus, all you really end up doing is forcing multiple processes to repeatedly build the same css bundles in sequence.

I've run out of time today, but will continue with this tomorrow and post something that checks for new entries in the $map after waiting for a lock.

nnewton’s picture

FileSize
13.74 KB

Attached is a slight edit of the patch in #37. As a note, I don't think this is an acceptable way to do this as it makes the logical flow of this function just weird and introduces another break in the loop. However, I wanted to post it to allow for testing of this method of checking the map.

With this patch, after clearing the js/css and hitting a site with 50 concurrent requests we see the following:

[23-Nov-2010 13:18:29] Got the css lock, building
[23-Nov-2010 13:18:29] Saving file public://css/css_nTykqH7_P-Sl6RvZSU5FglBKJax_EHxtkLwXS_CelI8.css
[23-Nov-2010 13:18:29] Did not get the lock and mtime is not set, waiting
[23-Nov-2010 13:18:29] Got the css lock, building
[23-Nov-2010 13:18:29] Saving file public://css/css_UZZSfikU7d17FOgy4Ft2z8A59kA-jeQ0IbTgYg1SDQ0.css
[23-Nov-2010 13:18:29] Got the css lock, building
[23-Nov-2010 13:18:29] Did not get the lock and mtime is not set, waiting
[23-Nov-2010 13:18:29] Did not get the lock and mtime is not set, waiting
[23-Nov-2010 13:18:29] Saving file public://css/css_h0Hq534EpnZFH3ps1FOXyyFyDR5sn8iiIadImLqr0FE.css
[23-Nov-2010 13:18:29] Did not get the lock and mtime is not set, waiting
[23-Nov-2010 13:18:29] Got the css lock, building
[23-Nov-2010 13:18:29] Saving file public://css/css_1Uc_CBofbQGqeIKgDSH-akZMSYiWg_pKxfxa4ffiBCk.css
[23-Nov-2010 13:18:29] Saving file public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js
[23-Nov-2010 13:18:29] Saving file public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js.gz

As compared to this with just #37:


[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_f8Gzuz3XG20eax5GvJQHwYj--jYXFIfSk4pvMCGjzjc.css
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_f8Gzuz3XG20eax5GvJQHwYj--jYXFIfSk4pvMCGjzjc.css
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_f8Gzuz3XG20eax5GvJQHwYj--jYXFIfSk4pvMCGjzjc.css.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_451-qZiWBOIVu7qDk0wv_5WnfXM-OdrdCzYy8zvxbhU.css
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_451-qZiWBOIVu7qDk0wv_5WnfXM-OdrdCzYy8zvxbhU.css.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_451-qZiWBOIVu7qDk0wv_5WnfXM-OdrdCzYy8zvxbhU.css.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_451-qZiWBOIVu7qDk0wv_5WnfXM-OdrdCzYy8zvxbhU.css.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css__ZKx9t7pbm1l_4rHrOG_PayYpFTwjzZSwPKhiaWl-0w.css
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css__ZKx9t7pbm1l_4rHrOG_PayYpFTwjzZSwPKhiaWl-0w.css.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css__ZKx9t7pbm1l_4rHrOG_PayYpFTwjzZSwPKhiaWl-0w.css.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_2THG1eGiBIizsWFeexsNe1iDifJ00QRS9uSd03rY9co.css
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_2THG1eGiBIizsWFeexsNe1iDifJ00QRS9uSd03rY9co.css.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://css/css_2THG1eGiBIizsWFeexsNe1iDifJ00QRS9uSd03rY9co.css.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js.gz
[22-Nov-2010 16:55:07] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js.gz
[22-Nov-2010 16:55:48] Running file_unmanaged_save_data with dest public://css/css_nTykqH7_P-Sl6RvZSU5FglBKJax_EHxtkLwXS_CelI8.css
[22-Nov-2010 16:55:48] Running file_unmanaged_save_data with dest public://css/css_UZZSfikU7d17FOgy4Ft2z8A59kA-jeQ0IbTgYg1SDQ0.css
[22-Nov-2010 16:55:48] Running file_unmanaged_save_data with dest public://css/css_h0Hq534EpnZFH3ps1FOXyyFyDR5sn8iiIadImLqr0FE.css
[22-Nov-2010 16:55:48] Running file_unmanaged_save_data with dest public://css/css_nTykqH7_P-Sl6RvZSU5FglBKJax_EHxtkLwXS_CelI8.css
[22-Nov-2010 16:55:48] Running file_unmanaged_save_data with dest public://css/css_1Uc_CBofbQGqeIKgDSH-akZMSYiWg_pKxfxa4ffiBCk.css
[22-Nov-2010 16:55:48] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js
[22-Nov-2010 16:55:48] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js.gz
[22-Nov-2010 16:55:49] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js
[22-Nov-2010 16:55:50] Running file_unmanaged_save_data with dest public://css/css_nTykqH7_P-Sl6RvZSU5FglBKJax_EHxtkLwXS_CelI8.css
[22-Nov-2010 16:55:50] Running file_unmanaged_save_data with dest public://js/js_VF53tELsw3kPql1FJD633rc2wNa0fCZV5Foz0aUfhvo.js
douggreen’s picture

Great progress Narayan :) Yes, I see where the extra variable_initialize and map re-check is required only while were rebuilding the bundle. I'm somewhat concerned about the extra variable initialize after the lock_acquire because of issues raised by catch, and would like to hear what he has to say, but comforted that we only do this during the time that we try to rebuild the bundle.

Anonymous’s picture

this looks to be badly needed.

+        //We may have just waited, if so we need to recheck the map
+        $vars = variable_initialize(array()); 
+        $map = isset($vars['drupal_css_cache_files']) ? $vars['drupal_css_cache_files'] : array();

i think the intent is to make sure we get the updated map if another thread already did the work? so, can we add a comment about why we need to use variable_initialize() instead of variable_get() here? also, the comment needs a space at the start and a full-stop at the end, and there's no need to pass the 'array()' to variable_initialize, as that's the default.

we could really use #973436: Overzealous locking in variable_initialize() here...

catch’s picture

Yes this issue is a big part of why I opened the variable_set() one. Narayan's fix looks right to me pending Justin's comments.

Jeremy’s picture

Updated patch to address Justin's comments.

Does this patch intentionally remove support for gzip'ing the cache files? That's a critical bug with this patch imo, unless I'm missing where this logic was moved to.

Also, when we find that a CSS file has been built while we were waiting for a lock we simply exit out of the loop, leaving the lock around. I suppose it doesn't much matter, anyone already waiting on a lock will detect the new map and break, and anyone that comes along afterwards won't bother trying to grab the lock.

I'm also attaching two debug patches. The "nopatch-debug" patch simply adds a sleep (to increase the race time) and debug output to Drupal 7 without adding locking -- with this patch applied enable/disable a theme or otherwise trigger a CSS rebuild in one browser, and then load a page in another browser. This easily illustrates the race condition this patch is trying to fix as we end up building multiple copies of the CSS cache (the same is true for the JS cache though this patch doesn't show that).

The "-debug" patch adds sleep and debut output to Drupal 7 along with the new locking logic -- with this patch applied enable/disable a theme or otherwise trigger a CSS rebuild in one browser, and then load a page in another browser. You can then confirm that the lock is preventing multiple copies of the CSS cache from being built. (Navigate a second page to see the drupal_set_message output...) This also shows how necessary #973436: Overzealous locking in variable_initialize() is -- loading the page with a number of browsers while the CSS cache is being rebuilt we reload variables from the database an excessive number of times (an additional sleep could make that more obvious).

Jeremy’s picture

Status: Needs work » Needs review
FileSize
12.86 KB

I took another shot at this, trying to simplify the patch a bit. I removed the $version stuff as it's a new feature and not directly relevant to this bug. I also removed the patch to the system module, as I couldn't find anywhere that $map['clear'] is actually being set in the latest version of the patch. I also removed the special handling for the first time we fail to grab a lock as it was uncommented and I wasn't certain I understood why we do that, though perhaps that should be left in with comments added?

Otherwise, I restored the gzip functionality, perform the atomic map update for css and js caching inline, and added more comments to make things more readable. Sorry if this patch is a functional regression, but I was aiming to be sure there's nothing unnecessary in there that would prevent this from getting merged into core. All it's attempting to do is properly protect against the js/css stampede.

Jeremy’s picture

FileSize
13.81 KB

And the debug version in case it's helpful to anyone trying to test this.

Jeremy’s picture

#46: 886488.patch queued for re-testing.

Anonymous’s picture

FileSize
13.65 KB

i agree with jeremy's approach here - i think we need to change as little as possible at this stage.

attached patch is an attempt to improve on #47.

* don't need the $mtime stuff at all, we're only interested in whether an old file exists for $key
* don't lock around the gzip file, because the rewrite rules will handle a missing gzip file, so there's no need to make other process wait for that to hit the disc
* try harder to return the old $uri if we have issues saving the newly generated css data to disc
* lock around variable_del('drupal_css_cache_files') to avoid the race where we read the old map, delete, then write the old values again

i've only done the CSS part, but i think js should have similar treatment, pending feedback on the implementation.

Anonymous’s picture

oh yeah, this is another issue that would actually be testable if we had #850782: allow testing lock code via async http calls.

Status: Needs review » Needs work

The last submitted patch, css.stamped.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
13.65 KB

this time without the syntax errors....

Status: Needs review » Needs work

The last submitted patch, css.stamped.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
13.69 KB

initialise lock_attempts.

Anonymous’s picture

FileSize
14.05 KB

ok, js now gets the same treatment.

we really need tests of this functionality (locking aside). the patch at #55 was completely broken for the gzip stuff, but all tests passed :-(

Status: Needs review » Needs work

The last submitted patch, stampede.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review

#56: stampede.patch queued for re-testing.

Anonymous’s picture

can't reproduce that fail locally...

Status: Needs review » Needs work

The last submitted patch, stampede.patch, failed testing.

Anonymous’s picture

Status: Needs work » Needs review

blurgh! not having much luck with the bot.

Anonymous’s picture

Issue tags: +Performance, +scalability

tagging.

carlos8f’s picture

I need to review this tonight. The bot should be in a better mood today.

carlos8f’s picture

#56: stampede.patch queued for re-testing.

Anonymous’s picture

FileSize
14.05 KB

updates to keep up with HEAD only.

carlos8f’s picture

Status: Needs review » Needs work

The $mtime stuff already has its own issue: #678292: Add a development mode to allow changes to be picked up even when CSS/JS aggregation is enabled so I'm OK with removing that from the patch.

Starting with #50 though it looks like we are introducing a silly regression: we are re-aggregating on every request! This should be put at the top:


if (!empty($map[$key]) && file_exists($map[$key])) {
  return $map[$key];
}

I'm still digesting the rest of the patch, but we at least need to address the obvious :)

carlos8f’s picture

Assigned: Unassigned » carlos8f

Working on some cleanups.

carlos8f’s picture

Status: Needs work » Needs review
FileSize
14.38 KB
  • Fixed bug -- drupal_build_*_cache() should aggregate only if $map[$key] file doesn't exist.
  • Fixed bug -- lock not released in if ($lock_acquired) { block of the CSS function before returning.
  • Fixed bug -- js_cache lock not respected in drupal_clear_js_cache().
  • Style -- simplified while () conditions a bit. Now does 5 tries instead of 4.
  • Style -- define $data closer to where it's used.
  • Style -- rewrote some parts of the comments.

Additionally, drupal_build_*_cache() are virtually identical and tough to update since changes need to be made to both js and css functions in separate parts of the behemoth common.inc. I think we should merge these into one and have a $type = 'css' or 'js' for sanity. That doesn't have to happen now though.

Anonymous’s picture

thanks for the review! you've caught some nice bugs there.

i'll work on some tests today.

carlos8f’s picture

Assigned: carlos8f » Unassigned

.

douggreen’s picture

FileSize
14.66 KB

I've restored a somewhat simplified version system for the bundles. I think we need the version system for high volume sites because even with the race fixes here, some processes are going to have to wait, and our current wait is 1 second.

With a version system, someone can write a contrib module that (1) pre-generate new bundles before upping the version, and thus completely by-pass the contention, and (2) create new bundles after an update to your code, without breaking cached pages to the old bundles.

douggreen’s picture

Why'd you change this to 5 tries instead of 4? If we had #802856: Make lock_wait() wait less I wouldn't mind bumping the number even more; but without it, on our site, every second is 25-50 processes, and an extra second adds too many waiting processes.

carlos8f’s picture

@douggreen, I changed to 5 tries to make the code more readable, so the "< [number]" matches the number of tries. Otherwise you might think there are 5 tries when there are actually 4 due to the ++ being before the variable. And because I couldn't figure out why you would want 4 tries instead of 5 :) That it certainly negotiable.

Anonymous’s picture

maybe i'm missing something, but i don't see why versioning is necessary.

in the case where the file exists, but the cache hasn't been cleared, you could write a drush command to generate new cache files when you know your code has changed. the filename is based off of contents, so there are no caching issues.

a call to drupal_build_*_cache() writes the new file and updates the map, and new requests will just serve the existing file.

so, i think we should proceed with #68.

catch’s picture

I'm in a rush but last time I looked at this, I also thought the filename was based on contents, but it's not, it's based on the filenames only. This is a bug in itself, but afaik we can't fix it without loading all the files into php on every request which seems silly.

catch’s picture

Will try to review this more in depth later but the recent patches look a lot more straightforward.

Anonymous’s picture

filename is based on contents:

+  $filename = 'css_' . drupal_hash_base64($data) . '.css';
carlos8f’s picture

Re: #75,

- $key is based on the list of files to aggregate.
- The value of $map[$key] ($uri) is a filename hashed with the file's contents.
- If the $uri file exists, no action is taken (true in HEAD and in #68).
- If it's desired that $uri automatically changes with the file contents, mtime() would work for that, which already has an initiative in #678292: Add a development mode to allow changes to be picked up even when CSS/JS aggregation is enabled.

Patch #71 changes $key to take a $version into account. As it stands, it looks like that idea needs more hashing out (pun intended). Setting a variable to denote a "version" is messy, not documented, and would be better suited as an API function that auto-increments (but we're beyond the point of new features like that).

Re: @justinrandell #74, for that idea to work you would need a cache of all the css/js groups or "bundles" to re-aggregate. I don't think that exists currently. So with those things in mind, I think #68 can stand on its own without a versioning scheme, and versioning can be attacked in a different issue.

catch’s picture

Justin/Carlos thanks - makes sense with $key vs. $filename, I agree with moving discussion of that to a separate issue (or the existing mtime() one).

carlos8f’s picture

FileSize
14.38 KB

Re-upload of #68, because versioning is a feature and should have its own issue.

Anonymous’s picture

FileSize
14.41 KB

updated #80 to fix patch noise, no code changes.

catch’s picture

Been thinking about this recently and realised two things:

1. Drupal currently saves a gzipped copy of all css/js aggregates in addition to the .css/.js version - this will double i/o load. Any site using a cdn and/or varnish can afford to use mod_deflate on those files instead of having Drupal do it. Can save around 4-500ms per request.

2. We could also prevent the write stampede, or most of it at least, by not trying to have every request build the file themselves, instead have imagecache style callbacks that do the generation - see #1014086: Stampedes and cold cache performance issues with css/js aggregation. This would be an even bigger change than the patch here, and I don't have code for it yet though.

catch’s picture

Issue tags: +i/o

Tagging.

Wim Leers’s picture

Subscribing.

sun’s picture

Version: 7.x-dev » 8.x-dev
Category: bug » task
Issue tags: +Needs backport to D7

Nice improvement for high-performance sites, but only borderline a bug.

marcingy’s picture

FileSize
14.26 KB

Just a reroll to convert to git patch.

Status: Needs review » Needs work

The last submitted patch, css-stampede-886488-86.patch, failed testing.

marcingy’s picture

Status: Needs work » Needs review
FileSize
14.26 KB

Hopefully this is better

xjm’s picture

Tagging issues not yet using summary template.

bryancasler’s picture

Bump

cyberswat’s picture

fwiw we are running this code and have enabled php-fpm during a site crash to help with debugging. Turns out the usleep being used here was stacking up with multiple requests causing the site to crash:

[13-Feb-2012 17:47:22] [pool www] pid 23743
script_filename = /var/www/html/index.php
[0x0000000015a27a90] usleep() /var/www/html/sites/all/modules/contrib/memcache/memcache-lock-code.inc:106
[0x0000000015a25c00] lock_wait() /var/www/html/includes/common.inc:3464
[0x0000000015a25580] drupal_build_css_cache() /var/www/html/includes/common.inc:3136
[0x0000000015a23ac8] drupal_aggregate_css() /var/www/html/includes/common.inc:3222
[0x0000000015a22ba8] drupal_pre_render_styles() /var/www/html/includes/common.inc:5829
[0x0000000015a223f0] drupal_render() /var/www/html/includes/common.inc:2968

Seems that using any type of application code level locking to achieve this is a recipe for disaster on high performance sites. This type of processing should probably be queued in some way so that the end result is simply copying the rendered files into place vs locking.

catch’s picture

While it's not queuing, there is #1014086: Stampedes and cold cache performance issues with css/js aggregation and http://drupal.org/project/agrcache for D7 that takes this out of the page rendering pipeline. No locking, but the potential for stampedes ought to be reduced since it waits until the very last minute to generate files, and a single very popular aggregate that takes a long time to generate won't hold up any other processing at all.

Mark Theunissen’s picture

I agree that the lock mechanisms/logic in this patch are problematic. Was going to try it but after reviewing it and reading other comments decided against it. catch's suggestion for an alternative approach seems more sensible to me.

Anonymous’s picture

just pointing out #1014086: Stampedes and cold cache performance issues with css/js aggregation is active, and will make this easier.

YesCT’s picture

Status: Needs review » Needs work
Issue tags: +Performance, +Needs issue summary update, +scalability, +i/o, +Needs backport to D7

The last submitted patch, css-stampede-886488-86.patch, failed testing.

YesCT’s picture

Issue tags: -i/o +i-o

(the slash in the i/o tag breaks the autocomplete from adding new tags)

YesCT’s picture

should this maybe be postponed on one of those issues?
or maybe just start with
reroll (http://drupal.org/patch/reroll)

could use an issue summary update. tips to do an issue summary: http://drupal.org/node/1427826

catch’s picture

Priority: Major » Normal
Status: Needs work » Postponed

Yes postponing this on #1014086: Stampedes and cold cache performance issues with css/js aggregation - that massively reduces the window and also avoids the page requests themselves being blocked on asset generation.

MiroslavBanov’s picture

I got a deadlock on a high profile (Drupal 7) site. The patches do not apply any more, so I re-rolled the latest. This applies to 7.34.

mikeytown2’s picture

Version: 8.0.x-dev » 7.x-dev
Status: Postponed » Needs review

Testing #100

mikeytown2’s picture

Version: 7.x-dev » 8.0.x-dev
Status: Needs review » Postponed

Back to D8 postponed. #100 passed the tests.

Just a heads up that AdvAgg offers this for D7 without hacking core.

Status: Postponed » Needs review

Status: Needs review » Needs work

The last submitted patch, 100: css-stampede-886488-100.patch, failed testing.

anavarre’s picture

Status: Needs work » Postponed
Issue tags: +Needs reroll

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Darren Oh’s picture

Darren Oh’s picture

Version: 8.9.x-dev » 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.2.x-dev » 9.3.x-dev
nod_’s picture

Version: 9.3.x-dev » 10.0.x-dev
Status: Postponed » Needs work
catch’s picture

Status: Needs work » Closed (duplicate)

The approach is different, but it's trying to resolve the same issue, and (hopefully) #1014086: Stampedes and cold cache performance issues with css/js aggregation can land before too much longer.

Darren Oh’s picture

Version: 10.0.x-dev » 7.x-dev
Status: Closed (duplicate) » Needs review
bburg’s picture

Patch in #114 appears to work so far at least on this one high traffic site I was managing. CSS was randomly generating as empty files, which would happen intermittently. I had been pulling my hair out over this for months until I came across this issue and deployed this patch last week. I'll follow up if it regresses again, but fingers crossed! This really should be applied to core.

Darren Oh’s picture

Darren Oh’s picture

Status: Needs review » Reviewed & tested by the community
joseph.olstad’s picture

Issue tags: -Needs reroll
joseph.olstad’s picture

Title: Add stampede protection for css and js aggregation » [D7] Add stampede protection for css and js aggregation
Issue summary: View changes
Status: Reviewed & tested by the community » Needs work
Issue tags: -Needs issue summary update, -Needs backport to D7 +Needs tests
joseph.olstad’s picture

D10 test looks something like:

new file core/tests/Drupal/FunctionalTests/Asset/AssetOptimizationTest.php

@@ -0,0 +1,203 @@
<?php

namespace Drupal\FunctionalTests\Asset;

use Drupal\Component\Utility\UrlHelper;
use Drupal\Tests\BrowserTestBase;

// cspell:ignore abcdefghijklmnop

/**
 * Tests asset aggregation.
 *
 * @group asset
 */
class AssetOptimizationTest extends BrowserTestBase {

  /**
   * {@inheritdoc}
   */
  protected $defaultTheme = 'stark';

  /**
   * {@inheritdoc}
   */
  protected static $modules = ['system'];

  /**
   * Tests that asset aggregates are rendered and created on disk.
   */
  public function testAssetAggregation(): void {
    $this->config('system.performance')->set('css', [
      'preprocess' => TRUE,
      'gzip' => TRUE,
    ])->save();
    $this->config('system.performance')->set('js', [
      'preprocess' => TRUE,
      'gzip' => TRUE,
    ])->save();
    $user = $this->createUser();
    $this->drupalLogin($user);
    $this->drupalGet('');
    $session = $this->getSession();
    $page = $session->getPage();

    $elements = $page->findAll('xpath', '//link[@rel="stylesheet"]');
    $urls = [];
    foreach ($elements as $element) {
      if ($element->hasAttribute('href')) {
        $urls[] = $element->getAttribute('href');
      }
    }
    foreach ($urls as $url) {
      $this->assertAggregate($url);
    }
    foreach ($urls as $url) {
      $this->assertAggregate($url, FALSE);
    }

    foreach ($urls as $url) {
      $this->assertInvalidAggregates($url);
    }

    $elements = $page->findAll('xpath', '//script');
    $urls = [];
    foreach ($elements as $element) {
      if ($element->hasAttribute('src')) {
        $urls[] = $element->getAttribute('src');
      }
    }
    foreach ($urls as $url) {
      $this->assertAggregate($url);
    }
    foreach ($urls as $url) {
      $this->assertAggregate($url, FALSE);
    }
    foreach ($urls as $url) {
      $this->assertInvalidAggregates($url);
    }
  }

  /**
   * Asserts the aggregate header.
   *
   * @param string $url
   *   The source URL.
   * @param bool $from_php
   *   (optional) Is the result from PHP or disk? Defaults to TRUE (PHP).
   */
  protected function assertAggregate(string $url, bool $from_php = TRUE): void {
    $url = $this->getAbsoluteUrl($url);
    $session = $this->getSession();
    $session->visit($url);
    $this->assertSession()->statusCodeEquals(200);
    $headers = $session->getResponseHeaders();
    if ($from_php) {
      $this->assertEquals(['no-store, private'], $headers['Cache-Control']);
    }
    else {
      $this->assertArrayNotHasKey('Cache-Control', $headers);
    }
  }

  /**
   * Asserts the aggregate when it is invalid.
   *
   * @param string $url
   *   The source URL.
   *
   * @throws \Behat\Mink\Exception\ExpectationException
   */
  protected function assertInvalidAggregates(string $url): void {
    $session = $this->getSession();
    $session->visit($this->replaceGroupDelta($url));
    $this->assertSession()->statusCodeEquals(200);

    $session->visit($this->omitTheme($url));
    $this->assertSession()->statusCodeEquals(400);

    $session->visit($this->setInvalidLibrary($url));
    $this->assertSession()->statusCodeEquals(200);

    $session->visit($this->replaceGroupHash($url));
    $this->assertSession()->statusCodeEquals(200);
    $headers = $session->getResponseHeaders();
    $this->assertEquals(['no-store, private'], $headers['Cache-Control']);

    // And again to confirm it's not cached on disk.
    $session->visit($this->replaceGroupHash($url));
    $this->assertSession()->statusCodeEquals(200);
    $headers = $session->getResponseHeaders();
    $this->assertEquals(['no-store, private'], $headers['Cache-Control']);
  }

  /**
   * Replaces the delta in the given URL.
   *
   * @param string $url
   *   The source URL.
   *
   * @return string
   *   The URL with the delta replaced.
   */
  protected function replaceGroupDelta(string $url): string {
    $parts = UrlHelper::parse($url);
    $parts['query']['delta'] = 100;
    $query = UrlHelper::buildQuery($parts['query']);
    return $this->getAbsoluteUrl($parts['path'] . '?' . $query . '#' . $parts['fragment']);
  }

  /**
   * Replaces the group hash in the given URL.
   *
   * @param string $url
   *   The source URL.
   *
   * @return string
   *   The URL with the group hash replaced.
   */
  protected function replaceGroupHash(string $url): string {
    $parts = explode('_', $url, 2);
    $hash = strtok($parts[1], '.');
    $parts[1] = str_replace($hash, 'abcdefghijklmnop', $parts[1]);
    return $this->getAbsoluteUrl(implode('_', $parts));
  }

  /**
   * Replaces the 'libraries' entry in the given URL with an invalid value.
   *
   * @param string $url
   *   The source URL.
   *
   * @return string
   *   The URL with the 'library' query set to an invalid value.
   */
  protected function setInvalidLibrary(string $url): string {
    // First replace the hash, so we don't get served the actual file on disk.
    $url = $this->replaceGroupHash($url);
    $parts = UrlHelper::parse($url);
    $parts['query']['libraries'] = ['system/llama'];

    $query = UrlHelper::buildQuery($parts['query']);
    return $this->getAbsoluteUrl($parts['path'] . '?' . $query . '#' . $parts['fragment']);
  }

  /**
   * Removes the 'theme' query parameter from the given URL.
   *
   * @param string $url
   *   The source URL.
   *
   * @return string
   *   The URL with the 'theme' omitted.
   */
  protected function omitTheme(string $url): string {
    // First replace the hash, so we don't get served the actual file on disk.
    $url = $this->replaceGroupHash($url);
    $parts = UrlHelper::parse($url);
    unset($parts['query']['theme']);
    $query = UrlHelper::buildQuery($parts['query']);
    return $this->getAbsoluteUrl($parts['path'] . '?' . $query . '#' . $parts['fragment']);
  }

}
catch’s picture

This isn't going to be testable, or not easily, because it's specifically dealing with a race condition where multiple requests hit the same URL at the same time.

The Drupal 10 issue landed, but took an entirely new approach [#3301716].

joseph.olstad’s picture

Status: Needs work » Reviewed & tested by the community
Issue tags: -Needs tests

ok sounds good, removing needs tests.

joseph.olstad’s picture

Issue summary: View changes