We need to take some steps before we can integrate a CDN with Drupal.org:

  • Choose a CDN provider: EdgeCast
  • Pick a site or assets to push to CDN: site status being tracked @ http://goo.gl/vQ1ESZ
  • Test, make any necessary configuration updates to Nginx/Apache/PHP/Drupal/etc.Testing complete for subsites
  • Plan statistics gathering / stats counter for updates.drupal.org: updates.d.o will be configured to always pass to origin

Comments

tvn’s picture

Issue tags: -prague2013 +security2013

err

killes@www.drop.org’s picture

We plan to use a CDN for drupal.org and may want to use it for updates.drupal.org.

We need to investigate if we can still preserve the usage stats counter.

basic’s picture

Title: Integrate CDN with updates.drupal.org » [Meta] Integrate CDN with *.drupal.org
Issue summary: View changes

Update title and summary

basic’s picture

Issue summary: View changes
mgifford’s picture

So what options are being considered? I'm assuming we'd be using the CDN module.

I've experimented lately with AmazonS3 integration. Would be useful to know the direction that d.o goes for this.

Are you looking at putting theme & user images, as well as cached CSS/JS there? Lots of options.

basic’s picture

basic’s picture

We are primarily concerned with the network level security that a CDN will provide Drupal.org. The CDN will allow us to restrict access to our origin servers and disallow directly connecting to origin web nodes (which is currently possible). The two big advantages are:

  1. Accelerate cache-able content (static assets, static pages, etc)
  2. Allow us to easily manage network access and having a very large network in front of ours to absorb some levels of attacks

Some examples of how the CDN will help Drupal.org:

  • We were having issue about 4 months ago with a js file on drupal.org. It was having routing issues to Europe and people were complaining about drupal.org stalling in the page load. There was basically nothing we could do but wait for the route to get better. That should never be a problem again.
  • We constantly have reports of updates.drupal.org getting blacklisted because it is a ton of traffic coming from/going to a few IPs. That should also not happen because that traffic will be distributed through the CDNs network.
  • 8 months ago we were under a consistent attack from Chinese IPs that was sub http and was taking up bandwidth. We can now block that at the CDN which will barely notice that level of traffic. We will have a huge network in front of us that can 'take the beating'.
basic’s picture

Issue summary: View changes

CDN Deployment 4/18/2014

  • localize.drupal.org

CDN Deployments 4/16/2014

  • groups.drupal.org
  • portland2013.drupal.org
  • prague2013.drupal.org

CDN Deployments 4/15/2014

  • www.drupalcon.org
  • boston2008.drupalcon.org
  • chicago2011.drupal.org
  • cph2010.drupal.org
  • dc2009.drupalcon.org
  • denver2012.drupal.org
  • london2011.drupal.org
  • munich2012.drupal.org
  • paris2009.drupalcon.org
  • saopaulo2012.drupal.org
  • sf2010.drupal.org
  • sydney2013.drupal.org
  • szeged2008.drupalcon.org
  • barcelona2007.drupalcon.org
  • paris2009.drupalcon.org

CDN Deployments 4/14/2014

  • api.drupal.org
  • infrastructure.drupal.org
basic’s picture

CDN Deployments 4/24/2014

  • association.drupal.org
  • security.drupal.org

CDN Deployments 4/21/2014

  • staging.devdrupal.org
basic’s picture

CDN Deployments 4/28/2014

  • austin2014.drupal.org
  • amsterdam2014.drupal.org

amsterdam2014.drupal.org had a 24h ttl on the dns record, so it may not be "live" until 4/29/14 for some people.

basic’s picture

CDN Deployments 4/30/2014

  • qa.drupal.org

Added an /etc/hosts entry to the testbot puppet to route testbot traffic directly to the load balancers (bypassing cdn) in case of issues.

The remaining sites are drupal.org and updates.drupal.org

basic’s picture

CDN Deployments 5/7/2014

  • updates.drupal.org
  • Live @ May 7 15:50:00 UTC 2014

basic’s picture

www.drupal.org rename completed, scheduling www.drupal.org CDN switch for 6/25 @ 1PM PDT, 20:00 UTC

YesCT’s picture

the switch from drupal.org to www.drupal.org broke dreditor... https://github.com/dreditor/dreditor/pull/207

nnewton’s picture

While unfortunate, we cannot control third party tools and this should be a very easy fix.

basic’s picture

updates.drupal.org traffic is now being cached by the CDN. http://ow.ly/i/60idL shows a drastic decrease in origin traffic which should result in decreased network saturation during git clones and traffic spikes.

webchick’s picture

Wow, impressive. :)

basic’s picture

We have decided to push the CDN deployment for Drupal.org to Wednesday 7/2/14 @ 1PM PDT, 20:00 UTC to give more notice on Twitter prior to the switch.

dstol’s picture

Just curious about why the switch from drupal.org to www.drupal.org?

star-szr’s picture

YesCT’s picture

the switch from drupal.org to www.drupal.org also effected simplytest.me #2289145: drupal.org is now www.drupal.org - patch file URLs do not work

nnewton’s picture

While I appreciate you keeping track of this. There is not much we can do about third party services. We have no integration with these teams. They don't talk to us before setting up these services, we sometimes don't even know they are scraping us until we find them in the logs. Historically, many of these services have actually caused us significant problems when their scraping goes awry. It would be nice to have a better way to communicate with them and some policies for 'third party integrators', but we currently don't and don't really have the admins to deal with it if we did.

We are obviously aware that third party services may break when we change something, but we can't just not change things. Thus, there isn't much actionable about knowing a change broke a third party service.

markhalliwell’s picture

Agreed. 3rd party tools are just that, 3rd party. They have no [real] bearing or influence on these types of changes and have to roll with the punches. FWIW, I am personally very happy to see so much progress, regardless of the extraneous consequences.

It would be nice to have a better way to communicate with them and some policies for 'third party integrators', but we currently don't and don't really have the admins to deal with it if we did.

I know this isn't the "policy", but having #1710850: Deploy RestWS for D7 project issue JSON would certainly help remove/standardize a lot of the "scraping" that we have to now do manually.

nnewton’s picture

That really does seem like it could be a good improvement.

basic’s picture

The CDN deployment on www.drupal.org is now live. DNS respecting TTL for www.drupal.org should now be pointing at the CDN.

www.drupal.org is an alias for cs73.wac.edgecastcdn.net.
basic’s picture

Status: Active » Fixed
webchick’s picture

Any purdy graphs to show the impact this had? :)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

basic’s picture

@webchick https://assoc.drupal.org/blog/basic/why-we-moved-drupal.org-cdn has the origin server bandwidth graph

There was also some interesting data from webpagetest.org and general "it's faster over here now responses":

from France / webpagetest.org (drupal.org/home):
Pre-CDN results: first page load=4.387s. repeat view=2.155s
Post-CDN results: first page load=3.779s, repeat view=1.285s

leanmomlacy’s picture

It was having routing issues to Europe and people were complaining about durple.org. lacyarnoldleanmomsreview.com/privacy/

Perignon’s picture

Is there any place where the actual details of how D.O was put onto a CDN? I'm a co-maintainer of the EdgeCast CDN module and also use the standard CDN module that just parts and pieces of the website. I am wondering how the entire drupal installation got put behind an origin pull CDN and how stuff like form posts work.

Vacilando’s picture

I've got the same question as Perignon: is there a document or page with a technical explanation about putting the whole d.o. on a CDN. The CDN module does bits and pieces but not pages as such.

nnewton’s picture

At some point we may do a write-up, but there isn't anything unique about it. Many large Drupal sites are behind CDN's in a similar fashion. Basically, this method requires that the CDN have a rules engine and then works very similar to how Varnish works with cookie/session analysis.

If there is significant interest in this, I can write up something at some point. I've worked on several sites with different CDN's doing similar.

-N

nnewton’s picture

Actually, what might be useful is something explaining the differences between the modules you discuss and fronting with a pull-based CDN....hmm. I will probably write something up :). Hopefully it is useful to someone at some point.

@Perignon, have you had issues with purges going through on edgecast with the module you co-maintain? In a previous deployment when I used purging extensively, it was a pretty shaky system. That was years ago though, so perhaps it has gotten better. I have not extensively used purging recently.

-N

Vacilando’s picture

Great, looking forward to that, @nnewton!

Vacilando’s picture

@nnewton, re #34 — still looking forward to at least cursory write-up of the approach taken to implement origin pull for the whole of www.drupal.org (forms, expiration, ...) What's been done for our community website might also be very interesting for the community members. Thanks!