DJANGO DEVOPS

Deploying your Django static files to AWS (Part 2)

Request/responses for fetching static files through a CDN

02/01/2019


This is part 2 of a 3-part series of posts on how to properly deploy your static files to production in a Django app. In the first part, I explained the principles of the Django staticfiles app and how the various settings affect the way static files are served. We ended up with a simple one server setup, where static files are served by nginx and stored locally on the EC2 instance's root EBS volume. In this part, we'll add a Content Delivery Network (CDN) to speed up delivery of static files.

 10 min read

Part 2: What's a CDN and why do we need it?

After part 1, our static files are served by our nginx web server directly from our local file system. This isn't optimal for two reasons:

  • Every time someone visits our website for the first time, all images, css files and javascript files need to be downloaded from our site. Even though nginx is pretty good at this, we're wasting valuable bandwidth and processing that could be better used to just run our Django app.
  • Also all these files need to take the route from our EC2 instance to the user, which might be unnecessary long and lagging. Caching our static files somewhere further down the route would be much more efficient.

That's what a Content Delivery Network (CDN) is for. Instead of telling our users to download the static files directly from our EC2 instance, we point them to the CDN. If the CDN already saw this file before, it will be cached in all its edge locations spread around the world. The user's request might only need a couple of IP hops to reach the resource which should be much faster than having to get it from our EC2 instance.

Creating a CloudFront Distribution

  1. Go to CloudFront and click on Create Distribution.
  2. Select a Web distribution, click on Get Started.
  3. For Origin Domain Name, type the domain name of your web server (www.django_website.com).
  4. Leave Origin Path empty as we're going to append the /static/ path directly to the domain.
  5. Choose anOrigin ID, e.g. static-production.django_website.com, that will uniquely identify this origin.
  6. Leave Custom Headers blank.
  7. For the default behavior, you want to restrict access, we'll setup a separate behavior for just /static. The reason is that we don't want to make it possible for users to access the actual pages of your website via the CDN, which would be the case if you didn't restrict default access. So, leave all the values as they are and just change Restrict Viewer Access to No.
  8. Finally, you can leave the remaining settings as they are, I would advise to set Logging to On though (you'll need to specify an S3 bucket for this).
  9. Click on Create Distribution.
  10. Now, select your new distribution and go to the tab Behaviors.
  11. Click on Create Behavior.
  12. As Path Pattern, enter "static/*".
  13. Select the origin you created for the distribution, there should be just one.
  14. Select HTTP and HTTPS or HTTPS only depending on whether your nginx server is setup for HTTPS. I would advise to use HTTPS only.
  15. For Cache Based on Selected Request Headers, choose Whitelist. This opens the option to whitelist the following headers: OriginAccess-Control-Allow-Headers, Access-Control-Allow-MethodsAccess-Control-Allow-Origin and Access-Control-Max-Age. You need to type these headers yourself and add them using Add Custom >>.
    CloudFront origin settings
    CloudFront origin settings for HTTP headers
  16. For Object Caching, select Use Origin Cache Headers. This tells CloudFront to just obey the headers set by our server.
  17. This time, make sure Restrict Viewer Access is No (default) and click Save.

Your behaviors tab should look similar to the screenshot below. Make sure the static/* pattern comes before the Default (*) one.

Two behaviors for your CloudFront distribution
Two behaviors for your CloudFront distribution

Now you can test your distribution: Open a page of your site and look up a static resource loaded on that page. Let's say your page is using the file /static/js/app.49ec9402.js. If you go back to the General tab of your CloudFront distribution, you'll see the Domain Name: e.g. d2sj0abvvbje4e.cloudfront.net. Append this to the path to your static file to form the full URL https://d2sj0abvvbje4e.cloudfront.net/static/js/app.49ec9402.js and insert this in your browser's address bar. It should display the plain javascript file if everything went well.

What happened when you requested the static file?

This was the first time CloudFront was asked to fetch the resource /static/js/app.49ec9402.js. So it looked in its behaviors and saw that you created a behavior for static/*.

That behavior is attached to the origin with ID static-production.django_website.com, which is pointing to the domain name www.django_website.com with no path component. So CloudFront went ahead and made the request to resource /static/js/app.49ec9402.js to host www.django_website.com using https. The nginx server on your EC2 instance got the request, saw it starting with /static and therefore fetched the file /js/app.49ec9402.js directly from your /static directory in your project folder. It added an expiry header of 1 year.

Now CloudFront received the response from nginx, added the headers from the origin (your nginx server) and served it back to your browser. In the meantime, it made copies of the file in all its edge locations across the world.

If you now open another browser and fetch the same file, CloudFront will return the file directly from the closest location to you, since it knows the file is still valid for 1 year. Your nginx server never sees that second and any subsequent request for that file.

Tell Django about the CDN

Ok, we can fetch our files from CloudFront, but my webpage is still showing things like

<img src="/static/images/homepage/logo.839d9a933.png">

so how do users get to use my CDN setup? Remember the STATIC_URL setting? Well, that's the one that tells Django what to put in your templates when it sees a {% static ... %} tag. Go in to your settings/production.py settings file (or whatever it's called in your project) and modify:

STATIC_URL = https://d2sj0abmmbje4e.cloudfront.net/static/  # or http

# In my case I keep the host in an environment variable so I have
STATIC_HOST = get_env_variable('CDN_static')
STATIC_URL = STATIC_HOST + "/static/"

That's all to make Django embed the correct URLs into your webpages.

Adding Django Whitenoise for smarter caching

Now you have a pretty decent setup, with CloudFront fetching your static files only once, nginx serving them from your file system and the end-user browsers caching the files for a year.

But there is no cleverness here: All the files are treated the same, because nginx doesn't know which files could be cached longer and which files should be cached shorter. Imagine somewhere you forget to use {% static %}, or maybe you have a javascript referencing a file. Then the user doesn't see the MD5 hashed filename but the original one. If that changes, since everything is cached 1 year, no one will see the change.

Also it would be nice to compress our files with gzip so that we send smaller files down the wire to the users.

You could configure nginx to consider all these cases with some clever regex expressions. But it's complicated. That's where WhiteNoise comes in. It moves the serving of static files back to your WSGI process (instead of nginx), but it does that pretty efficiently, by-passing most of the django stack, and since we're behind a CDN, most requests for static files will never reach your server anyway.

Giving this task to WhiteNoise is very easy, make sure you pip install whitenoise on your production instance:

  1. Remove the special /static location from your nginx configuration file. We're passing all requests to /static through the default catch-all / location (which proxies all the requests to our gunicorn socket)
  2. Add the following lines to your settings file settings/production.py:
  3. MIDDLEWARE = [
        'django.middleware.security.SecurityMiddleware',
        'whitenoise.middleware.WhiteNoiseMiddleware',
        # ... the rest of your middleware
    ]
    STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'
    
  4. Run collectstatic to make WhiteNoise create compressed versions of the files and restart gunicorn.

If you now look into your collected files, you'll see two versions of each file, a normal one (with the MD5 hash appended) and a gzipped one, also with the MD5 hash. Depending on the user's browser, CloudFront will get the gzip version if supported, otherwise the non-gzip version. Both will be cached in the CDN.

To show you what WhiteNoise does, try the following:

  • Load one of your static files that has the MD5 hash in a browser window for which the developer tools are open. Go to the Network tool and look at the Response headers:
    Response headers for immutable static file
    You'll see something like the above. The Cache-Control header shows a max-age set to 10 years. WhiteNoise saw this is a versioned file that will never change, so it instructed CloudFront to cache it forever.
  • Now load the unversioned file (remove the MD5 hash). You should see a much shorter max-age value. If you didn't configure anything, it should be 60 (seconds). You can configure this value with the WHITENOISE_MAX_AGE setting, which I have set to 3600 (1 hour), leading to the following response from CloudFront:Response headers for static file that may change
    In the above example, you can see the value of x-cache is "Miss from CloudFront" meaning the resource was fetched from my server since it didn't exist (or was expired) on CloudFront. The previous one showed "Hit from CloudFront" which indicated a file found and cached on CloudFront and therefore my server never saw the request.

You'll also notice that the Content-Encoding is gzip, my css file was sent in a compressed form to the browser.

Summary of our current setup

We're now serving all our static files behind a CDN. The vast majority of all requests to our static files is now served directly by CloudFront, which improves the speed of our website a lot! The diagram below shows that only the 1st request to a static file will hit our EC2 instance, all subsequent ones are resolved at a location near the user.

Architecture of serving static files with CDN
Architecture of serving static files with CDN

With WhiteNoise, we're able to tell CloudFront (and our users' browsers) to cache the files for a period of time that is appropriate for the file. Versioned files are cached forever, non-versioned files are cached for 60 seconds. I changed this to 3600 seconds because almost all my files are versioned, except for a very few files referenced in one javascript to create an animation.

This setup is a very good setup to have when you launch and still don't have too much traffic, nor a requirement for 100% availability. You can still afford a little of downtime if needed to change things but you want your users to get a good experience.

Fetching a webpage with static files via a CDN
Fetching a webpage with static files via a CDN

How do I deploy in this setup?

Let's briefly discuss how a deployment to production would look like in this one instance setup. Let's assume you made some changes to some of your static files (e.g. the CSS of your site and a script) and added a couple of new images.

  1. Deploy your code to your production instance. At this point, users are still getting the old site (and everything should be running smoothly) because you didn't restart gunicorn.
  2. Migrate your database if you have migrations. Still, users hitting your site are getting the old one. Normally a migrated database shouldn't break a site, unless you removed/renamed columns or tables, in which case you site would be crashing now (giving 500 errors to your users since the old code is trying to pull data that doesn't exist).
  3. Run collectstatic. At this point, your changed files will get a new version and references to them will point to the new versions in the "staticfiles.json" mapping file. However, since you haven't restarted gunicorn, the old version of the mappings are still used, so the old versions of the static files still served. You site should still be running fine (on the previous version).
  4. Now is the time to restart gunicorn. From now on, the new code is running, the new staticfiles.json mappings are loaded and thus the templates are rendered with the new versions of your CSS and javascript files, and include your new images.
  5. You're done. When you now check your website, you are effectively going to request your new and changed static files for the first time, so CloudFront will immediately propagate them to all edge locations. Your users will not do the first request and get a fast website. It's a good habit to make the first requests yourself so all other users get the optimal experience.

You've just made an almost zero downtime deployment. If you didn't make breaking changes to your database, your site was only down for the time it took to restart gunicorn. That's a couple of seconds. In the worst case, a user would have seen an error page (nginx error page because the socket wasn't there) but a “refresh page” after that would probably have worked.

So how is this going to scale?

Even if you're still at the beginning of your website and your EC2 instance can easily handle all the traffic to your website (even better so now with CloudFront handling the static files traffic), you'll soon want to go to a more robust setup that allows multiple EC2 instances to run in parallel:

  • One reason is fast recovery in case of a failure. One EC2 instance could be hosted in an availability zone (AZ) that faces problems, like network connectivity. When that happens, AWS will not try to recover your instance in a different AZ. Having a standby instance in a different AZ that can be started immediately would help.
  • Also, you might want to have real zero-downtime when deploying. Right now, restarting gunicorn gives a second or two of downtime.
  • Another reason is that in order to quickly create a new instance in case of emergency, I always want to have a current AMI at hand with the latest deployment. However, in order to ensure 100% integrity of the AMI, you should create the AMI with the "reboot" option ticked, so that the instance is effectively stopped while the snapshot is created, leading to downtime of your website.
  • Finally, you want to be able to add an instance in case your website traffic suddenly surges.

So in part 3, we'll setup an auto-scaling group with our one single instance, that will automatically scale up when needed, but will also allow us to deploy and create an AMI of our latest deployment without any downtime.

Follow me on Twitter @dirkgroten to find out when parts 3 and 4 are published and ask me questions.


Dirk Groten is a respected tech personality in the Dutch tech and startup scene, running some of the earliest mobile internet services at KPN and Talpa and a well known pioneer in AR on smartphones. He was CTO at Layar and VP of Engineering at Blippar. He now runs and develops Dedico.

Things to remember

  • When adding a CDN, restrict access to your static files so the rest of your resources cannot be accessed via the CDN
  • Let the CDN use the headers set by your origin, in our case WhiteNoise
  • Use Django WhiteNoise to facilitate the task of setting the correct headers and for compressing our static files
  • Use your browser developer tools to check that everything is setup correctly

Related Holidays