Since we keep missing each other on irc I thought an issue might work instead :)
These projects are quite different in terms of details, but at a broad level we're both keen to get rid of file stats from rendering css/js aggregates and in general improving aggregation from contrib.
My plan with agrcache is to get as much of the work back into core for Drupal 8 and hopefully Drupal 7, if that happens it might even get retired, if not it'll need to be maintained until that happens.
Overall I'd like css/js aggregation to look like this:
1. Use the agrcache method of having page callbacks to generate css/js aggregates. This saves up to one second on requests that actually have to build the aggregates, and even though the css/js is served from PHP when requested, there's still a measurable ~25% improvement in chrome developer toolbar net tab in terms of the full page render. And then removing the file stats on every request.
2. Make it as easy as possible for modules like core library (and there's some Drupal 5/6 modules that had different approaches like configurable bundles) to change the aggregation mechanism. this might have to be case by case via contrib, but it would be interesting to make this more pluggable in Drupal 8. I don't have any plans to introduce this kind of thing to agrcache at the moment.
3. Have a central system for things like js minification, but also other tricks like encoding images directly in css. Somewhere on an issue (which I can't find now), it was discussed having a system similar to the text format / filter approach - so:
- have a central module that provides method for others to processing the js/aggregates when being built.
- have an API/UI to define how the different processors get chained
This would allow us to have a separate modules providing jsmin, or base64 encoding of images, and then people can install what they want, but the actual altering and processing workflow could be handled by one module centrally.
When starting agrcache I didn't have any intentions regarding #3, but that's something I'd consider working on.
Just ideas for now but wanted to get the discussion started.
Comments
Comment #1
pounardIn my opinion, aggregation should remain the more static and simple possible. When I look the actual core, the full mecanism is crap, way too dynamic.
My actual effort if purely optimization and does not provide any kind of API, but you are going this way and it seems good.
Totally agree about CSS/JS aggregation/minification pluggable backend and more API oriented. It should also be easily configurable via settings.php for future sysadmins.
The good thing in my actual implementation is the I/O reduction and the aggressive group merge for CSS and JS aggregation merge. I also provide an IU that can let the site integrator manually choose exactly which file should and should not be aggregated (and I think this is a thing that core should allow).
One last thing, in a future good API, core should force modules to define their CSS and JS files within a hook (such as the library handling) which would allow site admin to do advanced CSS/JS profiles on a per page/per theme/per site basis.
EDIT: If the core could control the full CSS and JS lib list, it wouldn't ever ever again have to do any I/O at runtime.
Re-EDIT: If modules are able to statically declare all their files, an inclusion policy could be configured by integrators, selecting which file(s) should be *always* aggregated wether or not they are being used. Then all others (dynamic includes) should only be included as-is without any aggregation/minification (this is a runtime job but really costs a lot in term of performances).
File existence should never be checked at runtime, modules developers should be aware that a non existing file or a wrong given path IS A BUG and the core graceful downgrade IS NOT A FEATURE but is an ugly failsafe mecanism due to non stricly developped modules.
Comment #2
catchSo at the moment there's three things agrcache does:
1. Registers menu callbacks to serve css/js if the file doesn't exist (same as imagecache), this means we never have to do a runtime file_exists() - that only happens in the files/css/% and files/js/% callbacks, and they only get hit on cache misses.
2. Slightly simplifies the css aggregation to avoid having to process them before building the hash, saved quite a bit of time in testing.
3. Instead of calling variable_set() for every aggregate on the page like core, calls it once for all css aggregates and once for js aggregates. I'm pretty sure some of the problems that sites have with css/js aggregation stampedes are actually variable_set()/variable_initialize() issues.
If I read the description of core_library correctly, you're able to skip the i/o, but only once the admin changes a setting. I think core_library could use the agrcache approach for generating the bundles while it's learning, so the i/o would always be saved - then you've still got the different aggregation algorithm in there.
settings.php definitely, or possibly exportables support if we need settings and weight for each processor and settings.php started getting unweildy.
Also while I remember, I'm collecting core i/o issues at http://drupal.org/project/issues/search/drupal?text=&assigned=&submitted...
Short list at the moment but only started that tag a short while ago.
Comment #3
catchAlso:
Fully agree with this. Core is not forgiving about putting crap into function arguments, but it's very forgiving about stuff like this. Not enough people profiling or doing straces :(
Comment #4
pounardWe totally agree then :)
Comment #5
catchMoving this to core, since we're more or less discussing core patches here, even if they're done in agrcache first or backported there eventually.
Comment #6
pounardAs I told quickly on IRC this issue open another issue which is the CDN integration. As you use the same mecanism (menu callback) to get the static CSS and JS files, this is the issue with the image handling mecanism and this mecanism could even be generalized (it would gracefully go in the file stream handling way of doing things).
EDIT: CSS and JS could declare schemes such as css:// and js:// and then benefit from the stream wrapper interfaces and still be using their custom implemention, therefore reducing greatly the full codebase.
The menu callback for accessing file should be generalized to all files, files would then be statically served if exists, or would call a stream wrapper interface method (may be such as generateFile() or such), therefore call the scheme specific implementation behind.
Comment #7
mfer CreditAttribution: mfer commentedWhen it comes to #1, how are you taking the hash and deriving the files that should be included in the response? The idea of doing the JS/CSS in this form is something a number of people have been talking about for some time. It's nice to see a place to start discussing that idea.
Pluggable JS was started over on #352951: Make JS & CSS Preprocessing Pluggable. It stopped waiting for a common plugin mechanism that never ended up materializing in D7. The same concept could apply to CSS.
Comment #8
catch@mfer:
I'll answer at #1014086: Stampedes and cold cache performance issues with css/js aggregation which is the core issue for #1.
Thanks for cross-linking the pluggable issue, it was that which got me thinking about #3 more.
Comment #9
mfer CreditAttribution: mfer commentedI was thinking a little further on #1. If we take a look at what espn.go.com does we see something like:
We could do something like:
The hash would be built like it is now and the combiner could know the files to use. This could be easily cached by the browser and we could test to make sure the files generate the right hash for security. I even like that it documents what files are included in the hash as it can be a pain to discover when you do need to know.
Comment #10
pounardI'm against showing the real file names in the URL. The combiner shouldn't get its parameters from a GET request it's unsafe. It means anyone could DDoS your site by doing this request and putting any parameter. It could even lead to security issues where people could ask for some other arbitrary files to be aggregated in the final files.
It should remain a generated hash that exists in a variable somewhere server side.
Comment #11
mfer CreditAttribution: mfer commented@pounard You can do it in a safe and secure manner.
Comment #12
catchWe should discuss this on #1014086: Stampedes and cold cache performance issues with css/js aggregation.
Comment #13
Wim LeersI agree that avoiding file stats at all costs is important.
Without being involved in the latest initiatives regarding front-end performance, I apologize in advance for not knowing about current patterns being applied. But, I do have the following question:
I know this does not directly affect core (yet), but it's an important best-practice pattern to be agreed upon and then documented.
CDN module
I'm currently working on a patch for the CDN module that adds this: #974350: Far Future setting for Origin Pull mode. However, it focuses only on adding Far Future Expires functionality, it does not yet avoid file stats. Essentially, it does two things:
1) Detect the last modification time of a file and use this to generate a unique file URL:
2) Then there is a menu callback defined at
cdn/farfuture
(meaning that any additional arguments will be passed to the menu callback function). This function then exhibits all the most optimal behavior: optimized caching headers for file URLs, gzip support, range request support (seecdn_basic_farfuture_download()
in http://drupal.org/files/issues/974350-12.patch) and correct replies for If-Modified-Since requests.I know, it's silly to implement all of this in PHP instead of at the web server level. But that is the only realistic solution to guarantee this to be working: cheaper hosting accounts don't allow for all required functionality to be defined through .htaccess files anyway. In the future, I can still provide a .htaccess file that expert users can enable (yes, the goal is to make the CDN module usable for everybody; and yes, CDN's are affordable already).
The
file_exists()
calls in this function don't matter much, since they'll only be called very rarely, and only by the CDN.Summary
So, in summary:
1) I have a working solution for unique file URLs, to allow CDNs to be used with Far Future Expires headers
2) this solution does not work without CDNs for large sites: the load would grow too fast due to the file_exists() calls
3) for very large sites, it still doesn't work quite well because of the
filemtime()
callIt is point 3 in the summary that is problematic and for which I'd like to see a best-practice pattern emerge.
Thoughts?
Comment #14
mikeytown2 CreditAttribution: mikeytown2 commentedif your not opposed to 2 database tables & one cache table you could use what I got in here.
http://drupal.org/project/advagg
does imagecache generation.
uses zero stat calls when serving a page; even with imagecache style generation off.
hooks for implementing things like jquery cdn, csstidy, jsminplus.
gzip (but this is in D7 now).
uses locking to prevent multiple threads from doing the same thing.
Looking advagg's issue queue, there are still a lot of issues to workout, but the core part of this is working correctly.
Comment #15
pounard@#14 Thanks for sharing.
Comment #16
catchAs well as the pluggable js issue, there is #494498: Add CSS parsing capability to Drupal and #352951: Make JS & CSS Preprocessing Pluggable around CSS aggregation.
/here's how I'd break down the issues:
- removing file_exists() checks during page rendering and speeding up aggregate cache misses - this makes sense on any site and there aren't many trade-offs to worry about. #1014086: Stampedes and cold cache performance issues with css/js aggregation. Doing it as a stream wrapper sounds good as well for D8.
- minification: our js aggregation is literally .= with nothing else. Our CSS preprocessing has lots of issues. JS preprocessing afaik only comes down to the various different kinds of minification (which since there's no true winner at the moment we should make pluggable). CSS preprocessing is a bit more tricky - there's de-duplication, inline encoding of images, a few different things that are done that aren't mutually exclusive or necessarily desirable on all sites - for that something more like text formats and filters might be better (at least - allow more than one operation to run on a file, and allow these to be run sequentially rather than just swapped out - more like check_markup()/text formats/filters although it should be more simple than that).
- grouping logic. D7 introduced radically different logic to D6, you can swap it out (bundlecache and core_library both change the logic) but there's a lot of boilerplate code required to do this, these decisions can be quite site-specific so again it'd be good to have a system where those implementations can be swapped in or out.
- expiration of aggregates. D7 core is better than D6 for this, I need to look at the code here. IMO this is like the actual aggregate generation mechanism - we ought to be able to come up with something that works for every site without major side-effects and site admins shouldn't have to worry about it.
- script loaders. LABjs is the most popular, but I've seen at least two more mentioned as well. One of these may win out, but until it does this is another thing it'd be good to be able to support changing around as transparently as possible.
- gzipping. Apart from what are relatively minor issues at #1040534: Rewrite rules for gzipped CSS and JavaScript aggregates cause lots of lstats for files that will never exist this is fine I think.
So file generation, expiration, and gzipping, I think we mainly need settle on one good way to do this in core, then where possible backport it to previous versions.
Grouping logic and minification I don't think core will ever be able to find something that works for every site, so these should be pluggable - but IMO these are two different tasks. I should be able to swap minification logic out separately from the grouping logic, and contrib modules should be able to tackle different approaches to these without having to re-implement the other one. Would be good to support this properly in D8. For D7 it'd be good to have a module that just allows other modules to take over these things more easily without necessarily providing its own implementation, or possibly backport new hooks if we could get away with that.
Comment #17
RobLoachHitting the good ol' non-existent subscribe button.
Comment #18
pounard@#16 Agree about a lot of things, especially with having a more flexible grouping logic.
Comment #19
Wim LeersI agree with everything outlined in #16. The only problem is: time. I won't have time until this summer at least (i.e. after this semester, after my master thesis). If we get this right, Drupal will lead in the WPO realm.
Also see http://drupal.org/project/wpo.
Comment #20
cweagansSubscribe.
Comment #21
Jeremy CreditAttribution: Jeremy commentedSubscribing.
Comment #22
nod_From the summary 2 and 3 is what #1751602: Use Assetic to package JS and CSS files is aimed at solving.
Comment #23
catchAssetic doesn't help #2, that's really about changing the bundling logic.
Comment #24
nod_Assetic can help with bundling, http://symfony.com/doc/2.0/cookbook/assetic/asset_management.html#combin... and since Assetic is pluggable I'm pretty sure we can work something out. For example I just ran into this: https://github.com/smoya/AssetManagementBundle, I wouldn't rule it out just yet.
Comment #25
catchThe Assetic code isn't really bundling, it's just concatenation once you already have a bundle isn't it?
The second link does look more like what we're dealing with.
Comment #26
pounardIf you preprocess stuff, you need to dump the files somewhere, which partially fixes #2 if you make the preprocess step mandatory. Concatenation is enough preprocessing for that.
EDIT: Yes the second link adds logic we would need on our side in order to achieve it fully, I tested the equivalent for ZF2 a few monthes ago.
Comment #27
Wim Leers#352951: Make JS & CSS Preprocessing Pluggable landed today.
Having reread this issue, #16 captures the gist of what needs to happen. Thanks to #352951, we can now:
asset.js.optimizer
asset.(css|js).collection_grouper
asset.(css|js).collection_optimizer
asset.js.collection_renderer
So, it is now a matter of deciding what we still want to do for Drupal 8 core. I think removing
file_exists()
calls is definitely a good idea. It can be achieved by fixing our currentCssCollectionOptimizer
andJsCollectionOptimizer
: it won't break D8 APIs, so we can still do it. Shall we get this done over at #1014086: Stampedes and cold cache performance issues with css/js aggregation?What else should we do for Drupal 8?
P.S.: inlining of background images in CSS files is now possible using
hook_file_url_alter()
thanks to #1961340: CSS aggregation breaks file URL rewriting because it abuses it.Comment #28
Wim Leers.
Comment #40
Schoenef CreditAttribution: Schoenef for flowconcept solutions commentedThis one can be closed, no?
Comment #41
catchYes there are still issues to work on, but this one is very outdated.