CloudFront custom origin server with Rails, Jammit, and Apache

This is the first in a series of more in-depth technical posts about OpenGovernment. Our project is open source, so feel free to check us out over at github.

Here’s what you want to be able to do:

  • Serve your CSS, Javascript, and images from a CDN that’s near your users,
  • Serve gzipped versions of CSS and JS files if possible,
  • Make these files as small as possible by minifying them,
  • Serve them via SSL when desired,
  • Have them cached on the browser for as long as possible,
  • And serve as few of these files as possible (as Steve Conover at Pivotal Labs put it: “We send down exactly one .js and one .css file.┬áIf you are sending down more than one of each of these to the browser, you have a performance problem.”)

Our history with S3

On OpenCongress, we’d been using S3 for content distribution for a while, but we weren’t following many of these rules until pretty recently. We weren’t compressing our .css and .js, in some cases we weren’t minifying them either, and it was resulting in slow page loads and adding an unnecessary premium to our S3 bills.

My first step in speeding things up was to install Jammit, the excellent asset packager gem. Jammit reduces the number of CSS and JS files you need to serve, and it minifys and gzips them for you. Unfortunately, at the time I installed Jammit there wasn’t any support for S3, so I had to adjust our already bandaided S3 sync tool to handle Jammit’s packaged assets.

Why did it need patching? Well, Jammit creates minified assets with extensions .css, .css.gz, .js, and .js.gz. Because S3 does not do any content negotiation, if you put these files on S3 and reference the .css file in your stylesheet, compatible browsers will never receive the gzipped stylesheet.

Our initial fix for that was to change which assets we referred to based on whether the browser loading our site supported gzip or not. So if we saw an Accept-Encoding: gzip header, we’d serve up, say, default.css.gz as the stylesheet. Of course, we also had to make sure that S3 would deliver the .gz file with Content-Encoding: gzip, so browsers would know to unpack it before interpreting it.

Unfortunately, even with the proper Content-Encoding header, Safari barfs at this arrangement because of the .gz extension, and the result is a mangled page without any styles or javascript at all.

So the final fix was to rename the files before uploading them to S3. If they’re called .cssgz and .jsgz, they’re handled fine by browsers. The .css and .js versions still exist, of course.

Our site loaded much faster, so we had a working-yet-kludgy solution in place.

Fast forward a few months. Prior to the launch of OpenGovernment.org, I looked at the situation again and discovered that we probably should have been using CloudFront all along (it’s faster than S3), and that a new feature on CloudFront presented a much simpler solution.

The way better solution: CloudFront custom origin server

CloudFront now supports a Custom Origin Server option. With this option, instead of having CloudFront sync to an S3 bucket, you can just have it mirror your own server. So if we reference http://ourcloudfronthost.cloudfront.net/assets/common.css and someone hits our site, CloudFront will issue the same request to our server if needed, cache the result, and send it back down to the browser.

This actually simplifies a lot of things. It means that if you can get the headers right on your server, as they should be, then CloudFront will serve up the same things under the same circumstances. No extra gems are needed, and there’s no need to ever upload or sync things with S3.

To set this up on CloudFront is pretty easy, but you can’t do it via the AWS management console. You need to send an XML request to AWS. Download cfcurl.pl and create a file called ~/.aws-secrets with your AWS secrets, eg:

%awsSecretAccessKeys = (
 # PPF on AWS
 'ppf' => {
     id => 'your aws key',
     key => 'your aws secret',
 },
);

Now you’ll need a little XML file. Here’s the one I used to create OpenCongress’s CloudFront:

<?xml version="1.0" encoding="UTF-8"?>
<DistributionConfig xmlns="http://cloudfront.amazonaws.com/doc/2010-11-01/">
<CustomOrigin>
<DNSName>www.opencongress.org</DNSName>
<OriginProtocolPolicy>http-only</OriginProtocolPolicy>
</CustomOrigin>
<Comment>OpenCongress Remote Origin</Comment>
<Enabled>true</Enabled>
<CallerReference>20110210135532</CallerReference>
</DistributionConfig>

Use cfcurl.pl to send this to AWS using your keys:

perl cfcurl.pl --keyname ppf -- -X POST -H "Content-Type: text/xml;charset=utf-8" --upload-file opencongress_cf.xml https://cloudfront.amazonaws.com/2010-11-01/distribution

You should get a response back with your new CloudFront distribution information. You should also see the new distribution in the AWS management console.

Now you’re ready to add your new CloudFront hostname to your production.rb. This is ours:

# Enable serving of images, stylesheets, and javascripts from CloudFront
config.action_controller.asset_host = Proc.new {
   |source, request| "#{request.ssl? ? 'https' : 'http'}://d20tbjzc77cxpv.cloudfront.net"
}

Using CloudFront as your asset host does introduce a couple problems, and we’ll cover them next.

Cache Busting

CloudFront and S3 ignore Rails’ typical cache-busting strategy of appending a timestamp as a URL parameter on asset URLs. Normally with S3 sync you’d be able to simply upload new files and those would be reflected more or less immediately via CloudFront. But when you deploy under this new Custom Origin strategy, your changes will stay cached on CloudFront for up to 24 hours unless you give your assets new pathnames.

So I set up a new cache busting strategy that uses an Apache Rewrite rule and a subtle adjustment to Rails’ asset_path setting. In httpd.conf:

# Cache-busting rule for CloudFront.
RewriteEngine on
RewriteRule ^/r-.+/(images|javascripts|stylesheets|system|assets)/(.*)$ /$1/$2 [L]

And in production.rb:

# Use the git revision of this release
RELEASE_NUMBER = %x{cat REVISION | cut -c -7}.rstrip

config.action_controller.asset_path = proc { |asset_path|
   "/r-#{RELEASE_NUMBER}#{asset_path}"
}

This assumes you’re using Capistrano and you have a REVISION file in our application’s root folder containing the git revision of the release.

Deflate static assets

That covers the CloudFront side of things. The only issues left are on the Apache side: making sure that we’re gzipping content when possible (using content negotiation or mod_deflate), and setting far-future expiration dates.

Apache and Jammit don’t play too well together when it comes to content negotiation. With MultiViews turned on, I believe Apache will not serve any variants unless the requested file itself is not found. So if there is a common.css, for example, Apache will always serve it. I’ve found that only if I rename common.css to common.css.css, will Apache present the gzipped variant properly when common.css is requested.

Another option is to leave common.css and common.css.gz alone, but change the stylesheet link tag to refer to simply “/assets/common”. This is more fragile than the first option, though, because when /assets/common is requested, Apache will serve the smaller of common.css and common.js if both files exist.

So the solution was to set our deploy scripts up to rename the files after jammit runs. In deploy.rb:

desc 'Compile CSS & JS for public/assets/ (see assets.yml)'
task :jammit do
  run "cd #{current_release}; bundle exec jammit"
    
  # For Apache content negotiation with Multiviews, we need to rename .css files to .css.css and .js files to .js.js.
  # They will live alongside .css.gz and .js.gz files and the appropriate file will be served based on Accept-Encoding header.
  run "cd #{current_release}/public/assets; for f in *.css; do mv $f `basename $f .css`.css.css; done; for f in *.js; do mv $f `basename $f .js`.js.js; done"
end

And, of course, in httpd.conf:

<Directory "/web/opengovernment.org/current/public/assets"> 
  # MultiViews allows gzipped assets to be delivered
  Options MultiViews
</Directory>

Also in httpd.conf, by default I think there are AddType entries for .gz and other zipped files. You’ll want to comment these out, because they’ll override the content negotiation behavior we want.

Deflate dynamic content

Our static assets are now gzipped when possible, but the pages themselves are not. Assuming you’ve compiled mod_deflate into Apache, this is a trivial step. In httpd.conf:

# Deflate whatever hasn't already been deflated via MultiViews.
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/javascript text/css application/x-javascript
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

DeflateFilterNote Input instream
DeflateFilterNote Output outstream
DeflateFilterNote Ratio ratio

… and if you want logging:

LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)' deflate
CustomLog logs/opengovernment.org-deflate_log deflate

Far-future expiration

The last piece we want is to set the Expires: headers so that browsers will cache our assets without the hassle of issuing a request and getting back a 304 Not Modified. In httpd.conf:

ExpiresActive On
<Directory "/web/opengovernment.org/current/public">
   # We're already invalidating assets via the above cache-busting rule, so no ETags.
   FileETag None

   # Far future expiration, made possible via cache-busting as well.
   ExpiresDefault "access plus 1 year"
</Directory>

Finally, we have a setup that meets the basic goals, and I’m happy to report that our Google Page Speed score is over 90/100 for most pages on OpenGovernment. Yes, we still have some inefficient CSS rules to fix, but that’s a story for another day.

Caveats and room for improvement…

  • You need to use image_path or image_tag across your project, never plain old <img> tags. You also need to refer to images via their relative URLs from across your CSS, eg: background: url(../images/bg.gif) instead of /images/bg.gif. If you notice that some images aren’t being cache-busted properly, this is probably the reason.
  • It would be nice if the cache were only busted when the individual files changed. In the above configuration, every new deployment will require all of our regular users to reload assets once after each deployment, even if those assets haven’t changed. Having the asset_path reflect the individual asset’s github file revision rather than the github repo revision would fix this for images but not jammit assets. Using the modification time of the file doesn’t work because Capistrano seems to always update mtimes. So for now I don’t see a feasible way to do this. Do you?
  • File attachments and other items in /public/system are not accounted for here. You may need an exception in your asset_host and/or Expires settings for these items. Or use Paperclip’s S3 support, for example.

Thanks to Jeff Cleary and James Abley for their earlier posts on CloudFront and Rails.

This entry was posted in Uncategorized. Bookmark the permalink.
 

6 Responses to CloudFront custom origin server with Rails, Jammit, and Apache

  1. Pingback: Cache busting with Rails and Cloudfront | deadkarma

  2. Vitaly Gorodetsky says:

    Amazing post!!
    What is missing here is how do you upload assets to CloudFront.

  3. Carl Tashian says:

    VItaly, you don’t need to upload your assets. CloudFront acts as a read-through cache in this setup.

  4. Vitaly Gorodetsky says:

    Thank you

  5. BE007 says:

    Cloud front custom origin service is awesome i am using it using Bucket Explorer….Great article to describe Amazon cloud front services….

Leave a Reply to BE007 Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>