With services like AWS Cloudfront having a minimum cache lifetime of 1 hour (and ignoring the query string), and also to provide a failsafe way of invalidating the cache objects, I think it would make sense for the CDN module to allow for versioned assets.

I.e. Every time the application is released/tagged/deployed, the CDN module changes the links (for example using the SVN version number or release date), with mod_rewrite converting the links back to point at their original locations. E.g. foobar.com/1111/sites/all/assets/file.jpg > foobar.com/sites/all/assets/file.jpg


Members fund testing for the Drupal project. Drupal Association Learn more


Wim Leers’s picture

Assigned: Unassigned » Wim Leers
Status: Active » Closed (works as designed)

No, this won't be supported in the manner you're describing it.

The CDN module will not provide file versioning and whatnot. The CDN module will focus solely on integrating with a CDN, nothing else.

Instead, the functionality that you're looking for is/will be included in the BundleCache module. It is capable of generating bundles of JS/CSS files that have a unique version string in them. Give that module a try :)

bendodd’s picture

The feature I'm talking about seems to sit nicely right between these 2 modules. Although I would suggest that cache invalidation is a key part of using a CDN, especially those which don't accurately adhere to the expires headers.

What do you think? Create another module or extend BundleCache to apply all to all CDN/cached files? The later would also seem a corruption of the modules real purpose.

Wim Leers’s picture

Status: Closed (works as designed) » Postponed

The BundleCache module creates altered versions of existing files. The CDN module does not. I may reconsider, but in any case that won't be supported any time soon — unless somebody contributes it maybe.

I see that I entered the wrong status. My apologies.

bendodd’s picture

I am looking at contributing this functionality, as I think for us to work with CloudFront, we'll need it.

Do you have any thoughts, given your knowledge of your CDN module, what the best approach would be? I'm thinking mod_rewrite will convert the file URIs back as the request comes in, but how to change the URIs in the first place...

Wim Leers’s picture

Why would you need it any more for CloudFront than for any other CDN?

bendodd’s picture

It was just an example of a CDN with a minimum expiry (of 1 hour?). This would mean we'd not be able to affect the cached objects for that hour, so post deploy we'd be unsure what people were seeing.

bendodd’s picture

2.11 KB

Is this a potential solution? Although would need finessing...

For local files you would add a prefix following the domain which would be based on the timestamp of the file e.g. /cdn/1288569600/a/file.css. Then a mod_rewrite to remove it, and allow the server to produce the correct file:

#Allow for revisioned assets
RewriteRule ^cdn/[0-9]*/(.*) /$1 [L]

This would allow you to set the cache expiry times to a couple of years and properly leverage the upstream caching, both of the CDN and the ISPs? I'm primarily concerned with origin pull CDNs for this project, so appreciate this would not be applicable for files not on the local file system (which your module also caters for).

Wim Leers’s picture

Title: Versioned cache objects to aid object invalidation » Support versioned files (to aid object invalidation), served by Drupal, with configurable cache lifetimes
Version: 6.x-2.x-dev » 6.x-2.0
Status: Postponed » Needs work

Upon re-reading this, I'm wondering: what the hell was I thinking? *Of course* this is absolutely necessary! I must've been overloaded with work or something. My sincere apologies!

I've looked at your patch. Looks good. I too would have used filemtime() to make the filenames unique (I've done some basic research on filemtime()'s performance impact and that seems to be negligible — for max performance, they'll have to use BundleCache in the future).

However, I'd prefer to make the set up as easy as possible. So I'd like to avoid requiring users to alter their .htaccess file. I'd just use Drupal's menu system. After all, the CDN will only occasionally come and fetch the file. So the performance impact of that should be negligible.
A nice consequence is that we can then define cache lifetime from within Drupal, instead of having to update the .htaccess file. We can then even create a UI that allows this to be altered.

Working on a revised patch.

Wim Leers’s picture

Title: Support versioned files (to aid object invalidation), served by Drupal, with configurable cache lifetimes » Far Future setting for Origin Pull mode
Status: Needs work » Needs review
42.42 KB

I've decided to not create a UI. For one, it's a lot more work and more code to maintain (and port!). But also because far future expiration headers are the industry best practice. So I've created just one new setting for Origin Pull mode: "Far Future expiration". As soon as you enable this setting, all files with the following extensions will be served with far future expires headers: .css, .js, .svg, .ico, .gif, .jpg, .jpeg, .png, .otf, .ttf, .eot, .woff, .flv, .swf. And files with one of the following extensions will also automatically be gzipped when the client requests it: .css, .js, .svg, .otf, .ttf, .eot. This will only be done for requests made by the CDN — if the file won't end up on the CDN, then it won't get the far future expiration headers (for scalability purposes: the file serving is done through PHP, and if that needs to happen for thousands of page loads, your server will get overloaded, but for the occasional request from the CDN, this is perfectly acceptable).

Patch attached. Please review. This patch is applied to http://driverpacks.net, where this new functionality is thus now running live.

I've used http://REDbot.org to verify that it's been configured optimally and working properly.

Vacilando’s picture

Great, guys! Subscribing, will test if I manage to find the time.

Wim Leers’s picture

And of course, mere days after I write this, http://calendar.perfplanet.com/2010/easy-cache-headers/ is published. It indicates that what I've written so far, is perfect header-wise, as far as I can tell. The author of that article also has an optimized "drop-in .htaccess" file: https://github.com/sergeychernyshev/.htaccess/. Maybe we can lend some from that.

I just realized something thanks to that article: it's more accurate *and* faster to use MD5 hashes of the files in the file URLs instead, and store (file path, md5 hash, last time hashed) tuples in the database (or maybe in a separate SQLite database?). We can then allow for automatic re-hashing every X minutes, so that new files will be picked up within minutes. And we can also offer the ability to disable that and provide a "Rehash all files" button.

What do you think?

I'll leave this issue at "needs review" to get feedback on what I've posted in #9.

Wim Leers’s picture

42.43 KB

Patch from #9 updated: .ico's should also be gzipped. Teeny, tiny change.

Vacilando’s picture

Status: Needs review » Reviewed & tested by the community

This seems to work all right. My CDN-enabled URLs look like this:


and http://redbot.org/ reports:

Cache-Control: max-age=290304000, no-transform, public

which corresponds to over 9 years.

Vacilando’s picture

Status: Reviewed & tested by the community » Needs work

Correction; after some hours playing with it today I am finding a flaw.

The system works fine if the files called exist. But that's not enough in some cases. Sometimes the file is generated on the fly by a module. I have been testing with galleries generated by Brilliant Gallery in Picasa mode.
There, an img tag on the page contains path like this:
But at that moment there is nothing in folder /sites/vacilando.net/files/fm/files/picasacache/
At the moment of the request it is created (from a large original taken from Picasa) and the resized image is put in that path.

Now, when using vanilla CDN 6.x-2.0, the path is http://d26engbtkhfc1t.cloudfront.net/sites/vacilando.net/files/fm/files/picasacache/bg_cached_resized_93ac3ac2a0d90c3b4da5195d13b40616.jpg and it works fine.

But after using that version patched using code from #12, and if the cache folder is empty (something I failed to test yesterday!), then the path is http://d26engbtkhfc1t.cloudfront.net/cdn/farfuture//sites/vacilando.net/files/fm/files/picasacache/bg_cached_resized_93ac3ac2a0d90c3b4da5195d13b40616.jpg.
So the number generated by filemtime() is missing. Which is natural, because the file is not there yet.

As a solution, I recommend you to consider using md5() of the path (in the example of above "/sites/vacilando.net/files/fm/files/picasacache/bg_cached_resized_93ac3ac2a0d90c3b4da5195d13b40616.jpg" would generate "cdcc0eac85201e75eeeb948f3d81e1dd").

I do not attach a patch since the suggested change is so simple, besides maybe you will have a better solution (md5 strings are unsightly long and the full URL length might still possibly pose problems for some browsers?)

Wim Leers’s picture

43.23 KB

Excellent find!

I've re-rolled the patch with a work-around for this problem.

Wim Leers’s picture

Status: Needs work » Needs review
Vacilando’s picture

Status: Needs review » Reviewed & tested by the community

A clever fix! Applied the patch and everything works nicely -- the filemtime() number is there in all cases even if the files do not exist physically prior to the request. The patch seems ready to be included in the CDN module!

One question related to setting expiration to far future -- when do the unused files actually get erased from the CDN (or CloudFront specifically)? Does the CDN take care of it, or is it done by this module?

Wim Leers’s picture

The CDN takes care of it. A CloudFront edge server keeps files on the CDN for 24 hours and then comes back to the origin server to get a new version.

reisler’s picture


Wim Leers’s picture

Status: Reviewed & tested by the community » Needs work

Erh, the patch I posted seems to be buggy. I apparently uploaded an older patch, with debug leftovers and without the actual changes. So now I've lost the working patch. Strange that it works for you, vacilando.

I'll fix this soon.

Vacilando’s picture

Hmm; not sure what happened. I had briefly scanned your patch and found you used drupal_http_request to generate the file if it does not exist yet, I applied the patch without errors against fresh 6.x-2.0, cleared all caches, and checked that both real and to-be-created files were served fine from CDN. Let me know here or via PM if I should look at anything else.

Wim Leers’s picture

Exactly! There was a drupal_http_request() call in there! So it was drupal.org that somehow messed up, not me. Very strange. Do you still have that patch? If you do, please upload it here. Thanks!

Wim Leers’s picture

Status: Needs work » Reviewed & tested by the community

Seems to be just fine again now. Must've been a browser cache issue (despite pressing cmd+r repeatedly).

Currently awaiting feedback from/in #1048316: [meta] CSS and Javascript aggregation improvement.

Wim Leers’s picture

Version: 6.x-2.0 » 6.x-2.1
Status: Reviewed & tested by the community » Needs work

But there's still some ugly leftovers in there that need to be cleaned up.

bendodd’s picture

Removed. Duplicate submission :(

bendodd’s picture

Based on my experience over the past few months (using the CDN module on http://www.rednoseday.com) and the comments above I would add the following:

1. Versioned object identifiers

The approach I have devised is to provide a hook interface that will allow the CDN module, along with other modules, to provide versioned object "identifiers" (ways of detecting a change to an asset/object that requires a change in the URL). My customised CDN module provides the defaults discussed above: filemtime & md5 with the addition of Drupal version. I also have a custom module, implementing the hook, which uses our deployment metadata to get an app version (based on the SVN tag and revision)

For example, the "Drupal Identifier" allows core assets to be versioned without the need for filesystem scans and results in a URL like:

This example uses our deployment interface to get the application version (again not needing to interact with the actual file):

2. Referenced versioned objects

There is an issue when using versioned objects specifically with CSS files that reference other CDN'ed files. If a CSS file has relative reference the identifier that is used to call the image, for example, is that of the CSS file, not the image...this means that when the image changes, it will not be reflected until the CSS file is updated:

For example, this file (with an identifier based on filemtime):
Contains relative references to images:
overflow:visible; background:url(../../../../../sites/all/assets/images/home/bg_john_alex.jpg) right bottom no-repeat}
Which give the image a versioned URL of:
Importantly, if the image is updated a new version will not be requested by the CDN as the identifier is that of the referencing CSS file, not the image.

3. Using Drupal to remove the identifier prefix

Bootstrapping Drupal for every request to the server is not viable for us, especially for a large site like RND.com. I imagine the load would be significant. I appreciate there is a desire for this module to be plug-and-play, but I need to squeeze every last drop of performance. How about an approach to allow for .htaccess mod_rewrites and use the module as a fallback?

4. GZIP Compression

Using the CDN module to provide compression is not something we would use and is often implemented elsewhere for large sites. This is something that we do at the Apache layer and would not want Drupal doing it for us. This is also a lot of code, and given the comments at the beginning of this thread I was surprised that it was included.

Wim Leers’s picture

I've started to contract for a fairly big website, to help them with WPO. That includes installing this module with this functionality. But, indeed, as I've said in #1048316-13: [meta] CSS and Javascript aggregation improvement, it's important to first work around the file_exists() call.

This is what I proposed to them:

A possible solution to avoid file system calls (and the implied bottleneck) is the following:
1) We assume that only CSS and JS files change. We therefor assume that images, Flash files, fonts and videos *never* change.
2) We can then *always* assign Far Future expires headers to flash, fonts and videos.
3) We can allow for a variable that assigns a unique ID per deployment. This way, we only change file URLs when there is an actual change, and new files will be picked up *immediately* by browsers. These files will then again be cached infinitely by browsers.

1. Sounds great!
2. This is a valid issue. The only way around it, is by updating the file URLs in the CSS file. This is a low-priority issue though IMO, and a non-issue in some set-ups: since an updated image implies a new deployment, the CSS file's deployment ID will change, and thus result in a new file URL, and thus result in the new image file to be downloaded.
3 + 4. If you're using a CDN, this is a non-issue. Yes, performance will be poor when compared with a raw HTTP server, but at the cost of plug-and-play. Of course, I'll happily welcome patches that add the same functionality for Apache and other webservers directly. (And FWIW, gzip compression support actually requires very little code. It's mostly the setting of headers that is "a lot" of code.)

Looking forward to your patch! :)

mikeytown2’s picture

@Wim Leers
advagg doesn't use file_exists unless it needs to. Takes care of cloudfront issues and sets expiration to 1 year in the future.

bendodd’s picture

15.94 KB

Here is a 'preview' of a patch, I'm sure it needs to be finessed to meet coding standards, but I'm a bit busy at the moment. The patch is made to be applied to cdn-6.x-2.0 (which is the version we're using and not to the module with the other patches in this thread applied).

Let me know what you think?

Wim Leers’s picture

I will let you know! Probably tomorrow :) It's okay that it's not up to coding standards yet — but is all code in there? From a quick glance at it, it seemed to be the case :)

Thanks for sharing!

bendodd’s picture

Yes, I believe so...although I must admit to not testing it with a clean drupal install. Let me know if you have any problems, I'm very keen to provide a contribution to your already excellent module.

Wim Leers’s picture

Status: Needs work » Needs review
76.13 KB
16.51 KB
15.99 KB

The work related to this comment was sponsored by Belgian news site DeWereldMorgen.be, to make their site faster and to start using Amazon CloudFront as their CDN.

@mikeytown2: how does it take care of CloudFront issues (what issues?). 1 year into the future is not sufficient, since browsers don't obey far future expires headers perfectly, they rely on heuristics for which they use the Expires, Cache-Control and Last-Modified headers (they'll also issue If-Modified-Since requests).
I'm very eager to hear how you're instantaneously serving changed files if you're caching MD5 hashes/mtimes.

@bendodd: I've reviewed your code. It was in the right direction, but it was indeed not yet quite compliant with Drupal's coding standards. I took your work and merged the most significant part of it (the support for unique file identifiers, with custom unique file identifiers addable through a hook) with the rest of this patch.

• Unique file identifiers (UFIs), defined through hook_cdn_unique_file_identifier_info (following UFIs included: md5_hash, mtime, perpetual, drupal_version, deployment_id). The latter is defined through a CDN_DEPLOYMENT_ID define, and thus can be defined in code, in a VCS-agnostic manner (this is what DeWereldMorgen.be will use for their code and theme deployments. 'perpetual' is useful for e.g. videos and Flash.
• A simple UI similar to the CDN mapping UI to define which UFI should be used when. Comes with a sensible default, too:

sites/*|.avi .flv .m4v .mov .mp4 .wmv|perpetual

And when used for high-traffic websites, the following addition is recommended:
• A default UFI in case none is defined through the above rules: mtime (configurable through the CDN_BASIC_FARFUTURE_UNIQUE_IDENTIFIER_DEFAULT define).

• UFI for svn: your code references it, but does not contain the actual code. If it's small, I'd be happy to ship it with the CDN module. Otherwise, you could create a small contributed module that adds this.
• .htaccess support, since DeWereldMorgen.be uses nginx and not Apache for their web servers, and thus they wouldn't benefit from this. I'd still love to see this added though. Could you maybe contribute it in a follow-up patch? :) I'd like to see it similar to how the Boost module does it (see boost.admin.inc boost_admin_htaccess_page() and boost_admin_generate_htaccess())
• Your additional blacklist/whitelist stuff. It wasn't very clear to me what the added value was.

Known issues:
• Unique file identifiers (UFIs) that require filesystem access (md5 hash, mtime, in the future possibly svn revision or git commit) are not yet cached, meaning that they will result in filesystem hits. Depending on the set-up, this may not be a problem at all though. I'm looking forward to hear from mikeytown2 how he added caching while still letting changed files be served instantaneously and automatically.
• In cdn_file_url_alter():

      // If the file does not yet exist, perform a normal HTTP request to this
      // file, to generate it. (E.g. when ImageCache is used, this will
      // generate the derivative file.)
      if (!file_exists($path)) {
        drupal_http_request(url($path, array('absolute' => TRUE)));

This will cause problems when there are many ImageCache files that are yet to be generated: either the page will time out or it will be slow.

Please review! Attached:
• patch
• screenshot when Far Future expiration is disabled
• screenshot when Far Future expiration is enabled

mikeytown2’s picture

@Wim Leers
AdvAgg uses a database table to record all files that are included inside an aggregated CSS/JS file (advagg_bundles). It uses another DB table to keep track of the individual filename and mtime/md5 of those files (advagg_files). It uses a 3rd table (cache_advagg_files_data) that is a general storage for any additional info about a file; CSS Embedded Images uses this to keep mtime/md5 of the background images included in that CSS file (not aggregate). AdvAgg will also store if the aggregate has been created in a 4th table (cache_advagg table). Long story short, when you issue a flush cache command it will check the checksum (mtime/md5) of all files that have been used in any aggregate and if a file has changed it will increment the counter by one on all aggregates containing that file; thus the next (uncached) page load requesting that aggregate will get the new version of it. The md5 never changes in the filename, just the counter; thus if an aggregate is missing advagg can lookup what files are contained in it and generate that file; AdvAgg can use this to create the aggregates in a background process by requesting that file during the page load, thus speeding up page generation times.

In terms of cloudfront, if you try to push the same filename out, it can take 24 hours for that change to appear. Thus each version of a file needs to be unique, I can't use the md5 only, I need an md5 and a counter.

Also look at how I use drupal_http_request in AdvAgg; calling the IP and setting the host in the header gives much more flexibility when dealing with mutiple webheads, you can send all requests to one box or to the exact same box. Also I think Imageinfo Cache will take care of the cdn_file_url_alter issue.

PS check out the bundler sub module in advagg... pretty sweet stuff for not having a GUI.

naeluh’s picture

@Wim Leers

I applied this patch - 974350-farfuture-32_0.patch - and it resulted in a broken site and this fatal error:

Fatal error: require() [function.require]: Failed opening required './modules/cdn/cdn.basic.farfuture.inc' (include_path='.:/usr/share/php:/usr/share/pear') in /var/www/modules/cdn/cdn.module on line 516
Wim Leers’s picture

54.39 KB

The file cdn.basic.farfuture.inc simply was not yet included. Updated patch attached.

bendodd’s picture

  1. I think there is too much documentation in the control panel. I would have thought a basic syntax referance, and full documentation in a README or on Drupal.org?
  2. There should be a list of available Unique file identifiers(UFIs) provided by the hook
  3. The expanding element based on a checkbox was unusual and unexpected; This could be personal preference/experience.
  4. I would need a way of turning compression OFF, I do this in Apache.
  5. I would like way of controlling the expiry times or turn it OFF.
  6. I would also like the large arrays of mime-types etc to be in configuration not in the code (I'm not sure the pattern Drupal allows for separating out configuration and code (without it being in the database)?
  7. The new UI is simpler (and an improvement), but would need a way of overriding the coded default via the UI. Or are you suggesting: "*|perpetual" on the final line?
  8. It was also important to allow a cascade when selecting the method of UFI. I.e. When looping through the UFI config list and a UFI matched, the loop was exited and the following UFI config entries ignored; I'm not sure this is the case with this code. E.g. Given the file /test/ben/file.jpg and a list of

    only ufI_method1 should be matched.

  9. I we can agree Memcache/d is popular, can we introduce a way of caching the file stats? We could stat whole directories at a time and store the results in memcache backed caches? I'm not sure there would be any benefit of storing the results in DB backed caches.
  10. There is the issue of none of this working with CSS aggregation turned on
  11. There is the issue of none of this working when referencing assets via CSS files (discussed in #974350-26: Far Future setting for Origin Pull mode). I don't have a solution for either of these.
  12. I would not want Drupal to have to be bootstrapped to allow for UFI decoding; I will do ANYTHING to stop Drupal having to bootstrap. Of course, we need to allow this module to work in a plug and play mode, but I also need to be able to override this functionality with more suited technologies (assuming we can agree Apache Web Server is more suited to path rewriting and compression). I had the following:

    #Allow for revisioned assets stored on a CDN
    RewriteRule ^cdn/([0-9a-z])*/(.*) /$2 [L]
    #Prevent people browsing the site via the CDN reference domains *origin.examplesite.com
    #Allows files with acceptable files extensions or the files in the specified dirs (sites etc)
    RewriteCond %{HTTP_HOST} origin.examplesite.com$ [NC]
    RewriteCond %{REQUEST_URI} !^/(cdn/[0-9a-z]*/)?(sites|modules|misc|themes|external)(.*)
    RewriteCond %{REQUEST_URI} !\.(css|js|gif|jpeg|jpg|png|xml|mp4|mov|ico|ttf|swf)$
    RewriteRule ^(.*)$ / [NC,L,F]
    # Redirect non existing files back to the root domain allowing Image_cache, for example, to work
    RewriteCond %{HTTP_HOST} origin.examplesite.com$ [NC]
    RewriteCond %{SERVER_NAME} !origin.examplesite.com$ [NC]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ http://%{SERVER_NAME}/$1 [P]
Peter Bowey’s picture

Refer to #35

This patch (http://drupal.org/files/issues/974350-farfuture-35.patch) is not compatible with advagg (http://drupal.org/project/advagg). It alters the needed 'advagg' menu hook for 'fast 404 missing.css status'

bendodd’s picture

How do we move this forward? I'd love to get this into the module...

Wim Leers’s picture

In the next week or so, I should be able to start working on the CDN module intensively again, much like I did for Hierarchical Select in the past month. I'd have preferred to work on the CDN module, but several clients sponsored some HS work, hence the delay.

This project isn't dead, although it has been in a coma for quite some time now. My apologies. Crazy times.

Wim Leers’s picture

81.17 KB


1. I agree: it's very overwhelming. But this is to prevent excessive amounts of support requests here on d.o. Maybe I should use the advanced_help module?
2. This is already provided in the documentation at admin/settings/cdn/details:

sets the unique identifier method that should be applied to the aforementioned directories, and only to (optionally) the listed file types. Available methods are:

  • MD5 hash (md5_hash): MD5 hash of the file.
  • Last modification time (mtime): Last modification time of the file.
  • Perpetual (perpetual): Perpetual files never change (or are never cached
    by the browser, e.g. video files).
  • Drupal version (drupal_version): Drupal core version — this should only be applied
    to files that ship with Drupal core.
  • Deployment ID (deployment_id): A developer-defined deployment ID. Can be an
    arbitrary string or number, as long as it uniquely
    identifies deployments and therefore the affected

    Define this deployment ID in any enabled module or
    in settings.php as the
    constant, and it will be picked up instantaneously.

3. This makes the UI contain slightly less documentation.
4. I agree. I wrote this in #2:

.htaccess support, since DeWereldMorgen.be uses nginx and not Apache for their web servers, and thus they wouldn't benefit from this. I'd still love to see this added though. Could you maybe contribute it in a follow-up patch? :) I'd like to see it similar to how the Boost module does it (see boost.admin.inc boost_admin_htaccess_page() and boost_admin_generate_htaccess())

5. Why? Far future is far future, and this is simply the best practice. If we have unique file URLs, we might as well take advantage of them this way.
6. I'm not sure what you'd like to see different then? I've moved it to a separate .inc file, like Drupal 7's: cdn.mimetypes.inc.
7. Yes, it's possible to override the default that way.
8. You're absolutely right. I added this in the attached patch. A tricky thing about specificity is the priority about directory tree depth vs. file extensions. I gave the highest priority the directory specificity. Otherwise, it would be potentially be possible for sites/*|.jpg|perpetual to override sites/default/*|.jpg|mtime.
9. Let's talk about that later. Memcache may be popular, but not on small (<1M pageviews/month) VPS-hosted sites such as the ones I own. E.g. http://driverpacks.net greatly benefits from the CDN module and this patch, but does not need nor use memcached.
10. Why wouldn't this work in that case? It works just fine. At least with the defaults. sites/*|mtime ensures the last modification time is checked, so it works just fine!
11. This needs further attention indeed.
12. This is basically point 4.
As explained before: "This will only be done for requests made by the CDN — if the file won't end up on the CDN, then it won't get the far future expiration headers (for scalability purposes: the file serving is done through PHP, and if that needs to happen for thousands of page loads, your server will get overloaded, but for the occasional request from the CDN, this is perfectly acceptable).". This is perfectly acceptable for many sites and it has the added benefit of requiring no further tweaking.
However, I obviously understand that you have different requirements and want to support them. I stand by what I said in point 4 though, IMO it should be done that way.

#37: It's not clear to me why this is incompatible. We don't alter any menu. Do you understand, @bendodd?

Wim Leers’s picture

55.87 KB

Woops, that patch is relative to the previous patch. This one is relative to 6.x-2.x-dev. Much better.

bendodd’s picture

The clarify the aggregation issue and it's lack of compatibility with far future CDN; Given this file:

There are entries which are like this (having been rewritten to reference the root): background:url(/profiles/basic/modules/ctools/images/status-active.gif) which means the actual file is loaded from:

This will work, but the prefix has been removed and will not be invalidated when the reference is updated in the html.

bendodd’s picture

I very much like the idea of using Advanced Help, also an alert as per Views to alert people to it's existence:
"If you enable the advanced help module, Views will provide more and better help. Hide this message."

Wim Leers’s picture

Well, to make that work, we'd either need to collaborate with advagg, or just use incorporate most if not all of advagg… That implies: far broader scope and most importantly: boatloads of extra code. Which we need to maintain. Sigh.

So, in conclusion: all your points are addressed, with the following todo's remaining/resulting:
- use advanced_help
- incorporate extra code to deal with links in CSS files
- incorporate code to generate .htaccess files that can serve the files correctly *without* Drupal being involved at all

But: how to address the performance impact of some UFI's, such as mtime or md5? I think something like @mikeytown2 described in #33 is unavoidable.

mikeytown2’s picture

@Wim Leers
Let me know what you need from me. I will say that in the 2.x series of advagg, the counter should be an md5 of the files md5's and never use mtime or a bundle change counter. Would make advagg's code simpler and would give the same filename across different multisites, thus allowing for one advagg folder for 100s of multisites.

Wim Leers’s picture

@mikeytown2: I haven't had the time yet to look into how advagg works exactly or *what* it does (its scope). (Lots of preparations for my Facebook internship still ongoing.) Do you think it makes sense to merge advagg + cdn, in part or in full? If not, how should we make sure they collaborate well?

mikeytown2’s picture

The 2 projects scopes are far enough apart that merging doesn't make sense. Merging AdvAgg & BundleCache makes a lot more sense. AdvAgg is a replacement for cores css/js Aggregation that adds in a lot of hooks.

I'm using it with CDN perfectly fine.

naeluh’s picture


I am trying to use the Far Future setting for Origin Pull mode patch.
I have successfully patched the module but I am having trouble with the rewrite of the urls it writes a url that looks like -


which is a 404.

Do I have to wait for cloundfront to create the new version before it appears. - #17 and #18

I used the patch from #41 on CDN 6.x-2.1 - should I be using dev version for this to work.

from #41 - Woops, that patch is relative to the previous patch. This one is relative to 6.x-2.x-dev. Much better.

Should I switch to dev using #41 or repatch CDN 6.x-2.1 using #40

Also was this patch suppose to create a folder to store the files?

Sorry if this seems rudimentary.



Wim Leers’s picture

#47: You can take over BundleCache if you want. I thought I was going to have a lot of time, but with my Facebook internship, that's now far from reality.

The thing is that I want to make the CDN module work as expected *without* other dependencies. What's your view on this?

mikeytown2’s picture

Things like HeadJS require AdvAgg; I really don't think CDN and AdvAgg belong in the same module. People are using it in their themes as well (#1282154: using advagg_css_extra_alter() in fusion_core template.php for IE's conditionals). The end goal of AdvAgg is to get the base module into core for D8. I'm still developing AdvAgg so I haven't even considered D7, let alone a D8 patch. Getting things working correctly and handling almost all the edge cases takes time. Long story short, the goal is to have advagg functionality built into D8 core so it can be utilized elsewhere. Unless you can come up with a compelling argument, I'm going to have to pass on the merge here.

In terms of "One module that does it all" see W3 Total Cache. Thing is, I'm actually taking the opposite approach of making each component stand on its own if possible. The HTTP Parallel Request Library's code is in a bunch of my modules in one form or another & I plan on simplifying my modules by making them require it. The 2.x branch of Imageinfo Cache was the first one of my modules that uses it. The 2.x branch of AdvAgg will use it as well.

I believe providing a single file download that includes all required modules is the solution to this problem; this would be solved at the d.o packaging level.

bendodd’s picture

70.09 KB
73.63 KB

I have 3 branches of CDN (branched from 6.x.2.x): https://github.com/bendodd/drupal-6-cdn

  1. Moved help copy to advanced_help to clean up UI: https://github.com/bendodd/drupal-6-cdn/tree/help
  2. Added HTTPS mapping option (as well as help): https://github.com/bendodd/drupal-6-cdn/tree/help_https
  3. Added small fix for ctools dependency - #dependency_cout (Far Future mapping not hidden correctly when switching mode): https://github.com/bendodd/drupal-6-cdn/tree/far_future_ctools

They need some additional work, but I think they are all valid improvements. The only issue I'm having with moving help is the loss of dynamic content. e.g. lists of supported extensions or UFIs derived from hooks etc. For this I'm moved all static help to advanced help, and dynamic stays, but in a collapsable fieldset.


bendodd’s picture

I was interested in the time it takes for PHP to serve the assets as opposed to Apache. It seems from my basic tests that it takes significantly longer: 37.483 seconds vs 2.769 seconds

Server Software:        Apache
Server Hostname:        drupal6-origin.local
Server Port:            80

Document Path:          /cdn/farfuture/drupal-6.22/misc/jquery.js
Document Length:        31028 bytes

Concurrency Level:      10
Time taken for tests:   37.483 seconds
Complete requests:      15000
Server Software:        Apache
Server Hostname:        drupal6-origin.local
Server Port:            80

Document Path:          /misc/jquery.js
Document Length:        31028 bytes

Concurrency Level:      10
Time taken for tests:   2.769 seconds
Complete requests:      15000

I'm not sure what I'm suggesting, but I'm struggling to find a simple alternative way to apply the same expiry headers via Apache (we currently apply them in Varnish). http://stackoverflow.com/questions/7947906/add-expiry-headers-using-apac...

My concern is under extreme load, if an update or a deployment is made is could adversely affect the platform.

mikeytown2’s picture

I've been thinking about this issue after seeing the benchmarks and I think I've come up with a better way of doing this. We use .htaccess rules. Lets say we have a picture sites/default/files/pic.jpg. We embed it in the html as sites/default/files/pic_{timestamp}.jpg and then use regular expressions to remove "_{timestamp}" at the apache level so it then points to the file; thus we do not need PHP in order to get the file like the current patch requires. The timestamp is stored in the files table so it should be available to us fairly cheaply (already loaded); and keeping old versions of the same file is not needed at all, like it might be needed for css/js.

htaccess rules to test this theory out. This assumes the timestamp is 10 digits in length and any file extension will be between 2 and 5 characters in length.

  RewriteCond %{REQUEST_URI}  ^(.*/sites/default/files/.+)_[0-9]{10}(\.[a-z0-9]{2,5})$
  RewriteRule .* %1%2 [L]

This takes care of the imagecache issue as well since anything matching "_1234567890.jpg" at the end of the requested file will be transformed to ".jpg" before it is passed to PHP.
/sites/default/files/pic_1234567890.jpg => /sites/default/files/pic.jpg
/sites/default/files/imagecache/small/pic_1234567890.jpg => /sites/default/files/imagecache/small/pic.jpg
Or if we wish to use MD5 instead of a timestamp

  RewriteCond %{REQUEST_URI}  ^(.*/sites/default/files/.+)_[0-9a-f]{32}(\.[a-z0-9]{2,5})$
  RewriteRule .* %1%2 [L]

/sites/default/files/pic_08d15a4aef553492d8971cdd5198f314.jpg => /sites/default/files/pic.jpg
/sites/default/files/imagecache/small/pic_08d15a4aef553492d8971cdd5198f314.jpg => /sites/default/files/imagecache/small/pic.jpg

bendodd’s picture

I have been playing with this too, and I'm not sure how concerned I am; These assets would be requested very infrequently and would be cached by the CDN. That is on top of the fact I had to make 15,000 requests to expose the issue.

We also use .htaccess and mod_rewrite (http://drupal.org/node/974350#comment-4474326) although this requires some configuration and a higher level of Apache knowledge. I think I agree with Wim while he wants to provide a plug-n-play version, with the option of a .htaccess version which can be layered on top (similar to Boost's .htaccess generate tool).

Currently I'm having issues with apply expiry headers to mod_rewrite rules (http://stackoverflow.com/questions/7947906/add-expiry-headers-using-apac...) and also some issues with Image Cache.

Wim Leers’s picture

27.96 KB

#41 is finally committed.

#50: ok. So I guess we should include a recommendation to install AdvAgg to prevent this problem. Not all sites are affected by this, but many obviously are, and it's necessary to make sure things "just work". Would you mind drafting a recommendation line to include that briefly yet completely describes why this is necessary?

1. Not yet committed. Rerolled version attached. There were several formatting issues in the help, you had renamed "File Conveyor mode" to "Origin Pull mode" in the UI, but had not reflected this in the help. This is also incorrect, since File Conveyor mode can also be used for Origin Pull mode. The section you were referring to in cdn_help_index_page() no longer existed. I've fixed all these problems, except for one: the lack of variable expansion in advanced_help. Hence you see "!extensions" at the Far Future Expiration help page. This is not acceptable. Surely, there must be supported somehow. Please create a new issue for this.
2. Depends on 2, hence also not yet committed.
3. Committed — searched for the changes you made manually since you had just merged the patch in #41 with your changes in a single commit :P :)

#52/#53/#54: I agree with bendodd's assessment in #54: I want to make things plug 'n play, but happily welcome patches to improve performance in demanding set-ups. That includes .htaccess optimizations. Note that the suggestions in #53 will also not result in stellar performance, although better performance of course. Also note that .htaccess set-ups may be problematic (as demonstrated in #54), may not always work (if .htaccess support is disabled to improve Apache performance) and it only works with Apache, not with nginx/lighttpd. That's why I've committed #41 as is. But, again, I'd happily welcome patches that add .htaccess support.

Wim Leers’s picture

Status: Needs review » Needs work

I won't close this issue until I've incorporated mikeytown2's AdvAgg reference/recommendation.

mikeytown2’s picture

Go below

  # If your site is running in a VirtualDocumentRoot at http://example.com/,
  # uncomment the following line:
  # RewriteBase /

But above

  # Rewrite URLs of the form 'x' to the form 'index.php?q=x'.
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

The htaccess rules:

  <IfModule mod_headers.c>
    # Transform /cdn/farfuture/***/sites/default/files to /sites/default/files
    RewriteCond %{REQUEST_URI} ^/cdn/farfuture/([0-9a-zA-Z])*/(.+)$
    RewriteRule .* /%2 [L,E=FARFUTURE_CDN:1]

    # Set a far future Cache-Control header (480 weeks), which prevents
    # intermediate caches from transforming the data and allows any
    # intermediate cache to cache it, since it's marked as a public resource.
    Header set Cache-Control "max-age=290304000, no-transform, public" env=FARFUTURE_CDN
    Header set Cache-Control "max-age=290304000, no-transform, public" env=REDIRECT_FARFUTURE_CDN
    # Set a far future Expires header. The maximum UNIX timestamp is somewhere
    # in 2038. Set it to a date in 2037, just to be safe.
    Header set Expires "Tue, 20 Jan 2037 04:20:42 GMT" env=FARFUTURE_CDN
    Header set Expires "Tue, 20 Jan 2037 04:20:42 GMT" env=REDIRECT_FARFUTURE_CDN
    # Pretend the file was last modified a long time ago in the past, this will
    # prevent browsers that don't support Cache-Control nor Expires headers to
    # still request a new version too soon (these browsers calculate a
    # heuristic to determine when to request a new version, based on the last
    # time the resource has been modified).
    # Also see http://code.google.com/speed/page-speed/docs/caching.html.
    Header set Last-Modified "Wed, 20 Jan 1988 04:20:42 GMT" env=FARFUTURE_CDN
    Header set Last-Modified "Wed, 20 Jan 1988 04:20:42 GMT" env=REDIRECT_FARFUTURE_CDN
    # Do not use etags for cache validation.
    Header unset ETag env=FARFUTURE_CDN
    Header unset ETag env=REDIRECT_FARFUTURE_CDN
mikeytown2’s picture

This should be included as well: #1070938-17: Support hook_cdn_blacklist() and hook_cdn_blacklist_alter() to deal with same-origin policy more flexibly

AdvAgg Recommendation:
If you've ever had any issues with CSS or JS files not behaving as desired, check out AdvAgg. The "Advanced CSS/JS Aggregation" module solves all issues that arise from having CSS/JS served from a CDN. Keeping track of changes to CSS/JS files, smart aggregate names, 404 protection, on demand generation, works with private file system, Google CDN integration, CSS/JS compression, Gzip compression, caching, and smart bundling are some of the things AdvAgg does. It's also faster then core's file aggregation. Also if using AdvAgg there is the "Parallel CSS - AdvAgg Plugin" module. It can alter the url()'s in css files so they reference CDN domains.

How does this sound?

Wim Leers’s picture

#57: WOW!

#58: Sounds good! I'm going to bulletize it probably and am going to include a line along the lines of "The CDN module aims to do only one thing and do it well: altering URLs to point to files on CDNs. But in some cases, simply altering the URL is not enough, that's where the AdvAgg module comes in."

I can't wait to get working on the CDN module more actively again after my internship :)

bendodd’s picture

@Wim Are you waiting for me to do anything around the advanced_help work? Like learn how to use Git properly? Or learn how to create a patch properly?

@mikeytown2 I love it! Want the Stack Overflow points? http://stackoverflow.com/questions/7947906/add-expiry-headers-using-apac...

bendodd’s picture

There is a thought (seemingly shared with Google) here that expiry should not be longer that 1 year in future?

"To mark a response as "never expires," an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future."

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html - 14.21 Expires

"Set Expires to a minimum of one month, and preferably up to one year, in the future. (We prefer Expires over Cache-Control: max-age because it is is more widely supported.) Do not set it to more than one year in the future, as that violates the RFC guidelines."


Do we agree?

rjbrown99’s picture

I'm not sure I fully understand the tie-in between this issue and AdvAgg. I do understand the version/filename issues for things like images, but CSS+JS isn't making sense. Specifically, here's my understanding of how Advagg works.

1) Enable AdvAgg for CSS+JS, let's say we use MD5 and 4 CSS and 4 JS per bundle
2) We enable CDN module, using CloudFront in origin pull mode
3) CSS+JS URLs are automatically rewritten and served from the CDN
4) Profit!

If a CSS or JS script changes, its MD5 would change and the bundles would either update or increment by a number, so we're not serving old assets as long as any page caches are expired. The only real issue here is to make sure to clean out your page cache so static content isn't served with old bundles. Since the bundles changed, the new ones get served up and the old ones just time out in the CDN cache.

I did have an issue with imagefield/filefield/imagecache in this regard, and my answer was to use the filefield_paths module. I created a custom token based upon the md5 of the file and the current time, and I rename the file if it changes (image is rotated, cropped, or replaced). It winds up looking like originalfilename_07374f06cd28e1cfc37bc1fdd4df17a9.jpg, and this takes care of any upstream issues with the CDN.

Sorry to ramble, but can anyone point out the issue with the current advagg+cdn module that this fixes? Thanks!

rjbrown99’s picture

For image handling relating to the CDN, my token is simply this, in case anyone is interested:

function mymodule_token_values($type, $object = NULL) {
  $values = array();
  switch ($type) {
    case 'node':
      $values['mycustom-timestamp'] = md5(time());
  return $values;

function mymodule_token_list($type = 'all') {
  $tokens = array();
  if ($type == 'node' || $type == 'all') {
    $tokens['mymodule']['mycustom-timestamp'] = t('MD5 hash of Current timestamp.');
  return $tokens;

...and my filefield_paths filename is set to:

mikeytown2’s picture

Reason why AdvAgg is needed: In core, after flushing caches 20 times you run out of unique file names for CSS/JS files and farfuture files out there will not get replaced. AdvAgg gets around this issue by using a counter instead of repeating the same thing after 20 times. In core, clearing the CSS/JS cache removes all CSS/JS files thus 404's for things like CSS and JS files can happen. AdvAgg gets around this by not removing older files right away and allowing CSS/JS files to be generated on demand preventing a 404 from occurring and ensuring that the website displays correctly.

drupal_get_css() and drupal_get_js() both have the same logic when it comes to file naming.

  // A dummy query-string is added to filenames, to gain control over
  // browser-caching. The string changes on every update or full cache
  // flush, forcing browsers to load a new copy of the files, as the
  // URL changed. Files that should not be cached (see drupal_add_js())
  // get time() as query-string instead, to enforce reload on every
  // page request.
  $query_string = '?' . substr(variable_get('css_js_query_string', '0'), 0, 1);
    // Prefix filename to prevent blocking by firewalls which reject files
    // starting with "ad*".
    $filename = 'js_' . md5(serialize($files) . $query_string) . '.js';

css_js_query_string is built like so:

 * Helper function to change query-strings on css/js files.
 * Changes the character added to all css/js files as dummy query-string,
 * so that all browsers are forced to reload fresh files. We keep
 * 20 characters history (FIFO) to avoid repeats, but only the first
 * (newest) character is actually used on urls, to keep them short.
 * This is also called from update.php.
function _drupal_flush_css_js() {
  $string_history = variable_get('css_js_query_string', '00000000000000000000');
  $new_character = $string_history[0];
  // Not including 'q' to allow certain JavaScripts to re-use query string.
  $characters = 'abcdefghijklmnoprstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
  while (strpos($string_history, $new_character) !== FALSE) {
    $new_character = $characters[mt_rand(0, strlen($characters) - 1)];
  variable_set('css_js_query_string', $new_character . substr($string_history, 0, 19));

I've answered that question, thanks for pointing it out!

rjbrown99’s picture

Thanks, I do get the idea of why advagg is needed - wouldn't live without it :) My broader question was why would you need any of the far future or file handling at all if you are using advagg?

IE, assuming someone is using advagg today why would #57 be needed?

mikeytown2’s picture

For files that are not named uniquely. Not everyone has setup something similar to a "[mycustom-timestamp]" token; thus a file like pic.jpg that gets deleted and then something with the same name gets uploaded will cause issues if using far future cache. The #41 patch (what has been committed) addresses this by prefixing the file with "/cdn/farfuture/*MD5*". This make files served via the /cdn/ path require a full bootstrap now if they are not on the CDN. #57 addresses this issue by removing the far future prefix and adding in the correct headers for browser caching at the apache level; thus making the /cdn/ path be fairly quick again. If your not using apache rules the performance will be similar to the private file system, as PHP transfers the file.

In your case, you don't need anything from this patch; unless you change your theme's image files as these will be outside of your filename token trick.

rjbrown99’s picture

That makes a lot of sense, thank you. When I originally read the thread, it seemed like the new htaccess rules had more of a relation to advagg - IE, if you were using advagg those rules would be beneficial from a caching standpoint.

You have helped quite a bit by clarifying that advagg is one of the potential solutions to the caching problem. So if I could sum up a few of the recommendations:

CSS+JS Caching and the CDN
1) If you are using stock Drupal CSS+JS aggregation, #57 and the .htaccess rules are going to be really important to you.
2) If you are using AdvAgg for CSS+JS, #57 really has no impact either way for CSS+JS.

Other static content and the CDN
1) For pretty much anything to do with images or other static content, #57 is going to help you (unless you solve the problem in some other way, similar to my filefield approach.)

As always thanks for the feedback, and hopefully this summary will help others assuming I have it right.

Wim Leers’s picture

#60: I'm waiting for fixes for the problems I mentioned in my comment.

#62: AdvAgg is necessary if you want to make sure that images referenced in CSS files are also versioned. I.e. right now, if a CSS file remains the same but the image changes, the URL of the image won't change and thus the old image will still be used.

mikeytown2’s picture

@Wim Leers
AdvAgg does not do exactly what you're hoping it does; at least not yet. I can create a sub module that will do it. There is support for it built in, just it hasn't been taken advantage of yet (in AdvAgg core).
Desired work flow md5/mtime:
css aggregate file css_abcd1234_0.css references /image.png
image.png is far-futured so url is now /cdn/farfuture/12345/image.png
image.png is changed thus url is now /cdn/farfuture/123456/image.png
css aggregate file is now css_abcd1234_1.css due to image.png changing.

Desired work flow version number:
css aggregate file css_abcd1234_0.css references /image.png
image.png is far-futured so url is now /cdn/farfuture/1/image.png
version number changed thus url is now /cdn/farfuture/2/image.png
css aggregate file is now css_abcd1234_1.css due to the version number changing.

What I need:
A way to check that the css url() reference will be a far-future CDN url & what type it will be (version number, core, md5, mtime).
API to see if that file has changed according to the rules; I will store the previous state, so doing a string comparison of the url is all I need most likely.
A way to have CDN output relative URLs instead of absolute URLs (nice to have, also help with string comparison). I just might strip off the hostname and call it good for string comparison.
API call to make sure the url for image.png is changed when it gets processed in the CDN module (if using caching) (md5 or mtime mode).

bendodd’s picture

I've having trouble applying the patch in #55 to the current 6.x-2.x branch. Could you re-roll?

Also, should we move the help and https requests to their own ticket?

philsward’s picture

I don't know if I should lump this issue in here, or create a new ticket...

When enabling "Far Future", I get the following error as an anonymous user, viewing a page with an image captcha:

warning: filemtime() [function.filemtime]: stat failed for image_captcha/24659/1326877486 in /home/user/public_html/sites/all/modules/cdn/cdn.basic.farfuture.inc on line 232.

This appeared with the original rollout for far future in the dev released in Dec (I think...?) and I was hoping the dev update (around the 16th of jan) would solve it, but alas, it's still there.

I'm also wondering if it's related to the image_captcha issue?

I'm using captcha v2.4

Wim Leers’s picture

Status: Needs work » Patch (to be ported)

#36 & #57 follow-up: #1413156: .htaccess rules for Far Future expiration: make it possible to use the Far Future feature directly in Apache, avoiding PHP is a follow-up issue to this one for a .htaccess generator.

#56: AdvAgg reference/recommendation included in the README (http://drupalcode.org/project/cdn.git/commit/c951155) and on the project page. Related to this: request to merge the parallel_css module into the CDN module: #1410318: Merge into CDN module?.

#69: Doesn't the parallel_css module handle this? Anyway, let's continue in #1413176: Provide API so other modules can hook in/depend on the CDN module to discuss what you need exactly. (I want to give you what you need!)

#70: rerolling at #1413162: Advanced Help support

#71: #970632: "image_captcha" path in blacklist does not work has been fixed a few days ago (the day before you made this comment, actually). If that doesn't help, please create a new issue.

I don't want to see anything new in this issue. This issue is now only open to do the Drupal 7 port of this patch. Remaining stuff has been split off into other issues. Bugs about Far Future expiration should also go into new issues.

Wim Leers’s picture

I just committed a follow-up patch to the D6 branch: http://drupalcode.org/project/cdn.git/commit/7b38b56. This ensures the ETag header is also stripped if it's set by default. It replaces the use of split() with explode(), because the former is deprecated in PHP 5.3. And it does a minor variable name cleanup.

Wim Leers’s picture

Version: 6.x-2.1 » 7.x-2.x-dev
Status: Patch (to be ported) » Fixed

Better yet, I just committed the D7 port of this patch! :) That means there is now full feature parity (in no small part thanks to the boatload of time I invested in this module during the past week).

I'm also happy to report that the port of this functionality was sponsored by a company for the first time in the history of this module: it was sponsored by ONE Agency, http://www.one-agency.be!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.