The CloudFront index.xml Challenge

2014-December-29
Jim Derry

balthisar.com is now served via CloudFront with an S3 backend. The transition has been fairly simply but not without its caveats, including handling the RSS .xml feeds (the topic of this article), dealing with a circular reference brain fart, and deciding how to handle “cool URLs”.

Although S3 doesn’t have support for .htaccess, it provides a few tools for redirection and index documents that one might normally use .htaccess for.

The Issue

With .htaccess you can specify multiple directory indices and httpd will try them one at a time, in order, until a suitable document is found. For example:

DirectoryIndex index.html index.xml index.php index.htm

If you request a directory, say, /rss/ then httpd will first try to find and return index.html. If it doesn’t exist, then httpd will try to find and return index.xml, and so on.

Index Document Support

Although S3 supports index documents, it supports only one index document per bucket. Because www.balthisar.com used “cool URLs” for RSS feeds — which are .xml documents — as well as all other site links, there is no way to specify both index.html and .index.xml from different directories in S3 (and remember, S3 directories aren’t really directories).

Out of the three redirection options above, then, Index Document Support wouldn’t be an immediate solution to the problem.

Redirection Rules

S3 supports bucket-level redirection rules. For example:

<RoutingRules>
  <RoutingRule>
    <Condition>
    <KeyPrefixEquals>blog/rss</KeyPrefixEquals>
    <HttpErrorCodeReturnedEquals>404</HttpErrorCodeReturnedEquals>
    </Condition>
    <Redirect>
    <HostName>www.balthisar.com</HostName>
    <ReplaceKeyWith>blog/rss/index.xml</ReplaceKeyWith>
    <HttpRedirectCode>302</HttpRedirectCode>
    </Redirect>
  </RoutingRule>
</RoutingRules>

However these don’t work with wildcards, and all of the matches are somewhat greedy meaning that I could build a long version of this .xml file — one for every blog RSS feed — but then it becomes a future management problem if I ever add blog keywords or tags. Currently www.balthisar.com has 24 separate blog feeds, and although only five are promoted on the website, they all do really exist.

Because of my laziness and potential to forget about managing these redirects, Redirection Rules aren’t a suitable solution for my problem. However the code above is used as a fallback for users. If someone tries to dig into /blog/rss, they will receive the whole site RSS feed, and so this Redirection Rules feature is still useful.

Web Page Redirect

S3 allows web publishers to upload a dummy document and then set a Website Redirect Location to another URL. However like specifying Redirection Rules, there's a lot of manual work to do and maintain this. As long as S3 wants me to upload a dummy file, then I might as well implement the solution I finally implemented.

Final Solution

Because I use Middleman to generate the website it was trivial to add an index.html file to be generated alongside each index.xml during the RSS building phase. If a user visits /blog/rss then Index Document Support will serve up /blog/rss/index.html, which contains a standard HTML meta redirect to /blog/rss/index.xml. Now users’ RSS readers won’t be broken.

Although I haven’t done so, now would be a good time to setup Web Page Redirects using metadata. Because the index.html files are already in place and not likely to be changed, some of the management concerns mentioned above are alleviated. The index.html will perform a meta redirect as a fallback (e.g., should I forget to add the metadata), and files that have metadata will simply be 301 redirected without having to load the file into the browser first. I think this solution is win-win.