balthisar.com is now served via CloudFront with an S3
backend. The transition has been fairly simply but not without its caveats, including handling
.xml feeds (the topic of this article), dealing with a
circular reference brain fart,
and deciding how to handle “cool URLs”.
Although S3 doesn’t have support for
.htaccess, it provides a few tools for redirection and
index documents that one might normally use
- Index Document Support
- Redirection Rules (inconsistently called Routing Rules, too)
- Web Page Redirect
.htaccess you can specify multiple directory indices and httpd will try them one
at a time, in order, until a suitable document is found. For example:
DirectoryIndex index.html index.xml index.php index.htm
If you request a directory, say,
/rss/ then httpd will first try to find and return
index.html. If it doesn’t exist, then httpd will try to find and return
Index Document Support
Although S3 supports index documents, it supports only one index document per bucket. Because
www.balthisar.com used “cool URLs” for RSS feeds — which are
.xml documents — as well as
all other site links, there is no way to specify both
different directories in S3 (and remember, S3 directories aren’t really directories).
Out of the three redirection options above, then, Index Document Support wouldn’t be an immediate solution to the problem.
S3 supports bucket-level redirection rules. For example:
<RoutingRules> <RoutingRule> <Condition> <KeyPrefixEquals>blog/rss</KeyPrefixEquals> <HttpErrorCodeReturnedEquals>404</HttpErrorCodeReturnedEquals> </Condition> <Redirect> <HostName>www.balthisar.com</HostName> <ReplaceKeyWith>blog/rss/index.xml</ReplaceKeyWith> <HttpRedirectCode>302</HttpRedirectCode> </Redirect> </RoutingRule> </RoutingRules>
However these don’t work with wildcards, and all of the matches are somewhat greedy meaning
that I could build a long version of this
.xml file — one for every blog RSS feed — but then
it becomes a future management problem if I ever add blog keywords or tags. Currently
www.balthisar.com has 24 separate blog feeds, and although only five are promoted on the
website, they all do really exist.
Because of my laziness and potential to forget about managing these redirects, Redirection Rules
aren’t a suitable solution for my problem. However the code above is used as a fallback for
users. If someone tries to dig into
/blog/rss, they will receive the whole site RSS feed, and
so this Redirection Rules feature is still useful.
Web Page Redirect
S3 allows web publishers to upload a dummy document and then set a Website Redirect Location to another URL. However like specifying Redirection Rules, there's a lot of manual work to do and maintain this. As long as S3 wants me to upload a dummy file, then I might as well implement the solution I finally implemented.
Because I use Middleman to generate the website it was trivial to
index.html file to be generated alongside each
index.xml during the RSS building
phase. If a user visits
/blog/rss then Index Document Support will serve up
/blog/rss/index.html, which contains a standard HTML meta redirect to
Now users’ RSS readers won’t be broken.
Although I haven’t done so, now would be a good time to setup Web Page Redirects using metadata.
index.html files are already in place and not likely to be changed, some of the
management concerns mentioned above are alleviated. The
index.html will perform a meta
redirect as a fallback (e.g., should I forget to add the metadata), and files that have metadata
will simply be 301 redirected without having to load the file into the browser first. I think
this solution is win-win.