What to expect when you're expecting (a Kubernete)

Hello! Hope you’re all having a great start to your new year. As I hinted in my recap post, there are a lot of exciting things happening over here. In this post, I want to talk about one of them: a fun project I worked on over the holidays1 and in the early part of the year to create ACRL’s very first permanently-running Kubernetes cluster2!
I’m expecting you all are having two distinct reactions to the above statement: a) “You’ve been in business for 2.5 years as Kubernetes experts and you don’t even have a Kubernetes cluster? What kind of hackjobs are you?”, or b) “Don’t you barely even do any real work? Why on earth would you need a Kubernetes cluster?”
My answer to those questions are “The best kind,” and “We were getting bored over here.” Anyways, in the remainder of this post, I’m going to pull back the curtain on some of our internal infrastructure and talk about how we got here.
No, but really–why do you need a Kubernetes cluster?
There’s an oft-repeated adage of sorts in the industry that “most companies” don’t actually need Kubernetes, and that trying to run a k8s cluster just adds a bunch of complexity that will distract from your core business offering when you’re small. I actually don’t totally disagree with this viewpoint, and have repeated it myself from time to time, but there are two specific reasons why I decided to ignore this advice for ACRL:
We actually are supposed to be Kubernetes experts, and it is a good use of our time to make sure we stay up-to-date with how to run and operate the technology we are supposedly expert at.
As we’ve grown, I’ve built up a lot of “bespoke” services, and there are a bunch more that I want to run: for example, my website exists as a janky mess of Docker Compose files, and it’s always a little nerve-wracking to make changes to it. I also want to start hosting things like “an internal message/chat app” or “VictoriaLogs, because we already generate more logs than is humanly-feasible to comprehend”. And what I’ve realized is that I can have N sources of complexity to manage each of these N services/applications in slightly different bespoke ways, or I can have 1 source of complexity that manages N services for me in a uniform way. It’s the classic “pets vs cattle” problem; and while ACRL is admittedly still quite small, it turns out that it doesn’t take a very large number of pets before you have too many pets. Kubernetes provides a well-supported platform for turning all of your dogs, cats, fish, and turtles into cows3.
So, given those two things, I’ve wanted to get this cluster up and running for a while, and finally decided it was time to take the plunge.
So what’s your architecture, bro?
Now, I knew going into this that running a Kubernetes is not for the faint of heart. It does have a huge amount of complexity, and it’s almost guaranteed to cost you an arm and a leg4. So my goal when setting the cluster up was to do it as cheaply and with as little human intervention needed as possible. In this section we’ll talk about my architectural choices so that everybody on the Orange Site can mock and ridicule me for doing things the wrongest possible way.
What Kubernetes distribution?
There are a LOT of different options for “how to run Kubernetes” ranging from “just do the steps in Kelsey’s book, don’t worry, you’ll have a working cluster in 99 years” all the way to “just pay out the nose to run on EKS”. I landed somewhere in the middle: I don’t want to pay out the nose to AWS, and I want/need access to the control plane (which you don’t get with EKS), but I also want something that does “most” of the work for you. For this purpose I elected to use k3s, which is a lightweight Kubernetes distribution that bundles all of the components (API server, controller-manager, scheduler, kubelet) into a single statically-linked binary that you can “just run”5.
Using k3s makes setting up the cluster itself extremely easy: ship a binary somewhere and run it, done. The default configuration is even pretty sane out of the box6! The bigger question is how to set it up in a persistent, reliable, automated, and inexpensive way. And it turns out that this is where all the complexity lies.
A descent into Moria
Those of you who know me know that I’m a huge Lord of the Rings geek. So when I initially set up all my internal infrastructure tooling at ACRL, I didn’t have to reach far to come up with names for my repositories. We have two internal repos for doing “infrastructure as code” and “configuration management”. The first repo is named “moria”, and handles all of our IaC needs using Pulumi7. Moria is where all of our AWS config is managed: S3 buckets, EC2 instances, etc. The second repo, named “isengard”, uses Ansible8 for configuration management9: in other words, what software, tools, and configuration files should be installed on my hosts.
So, when I started getting k3s set up, the path seemed10 pretty straightforward: launch an EC2 instance in moria, and install k3s on it via isengard. The first obstacle appeared when I started looking at costs: running a single EC2 instance with 2 CPUs and 8GiB of RAM would cost me ballpark $60/month (and that doesn’t include storage). And I want (eventually) a whole cluster of these things! So I made the design decision to run the entire cluster (including the control plane) on spot instances11. Doing this lets me run a single node for ~$10/month (depending on spot price fluctuations). That’s much better!
The only problem is that spot instances can be taken away literally at any time, and I want my cluster to be (somewhat) resilient to disruption. Obviously the “best” way to do this would be to run etcd, a distributed datastore that is designed for resilience, but that would add a whole bunch of complexity and expense that I’m not ready for yet, so I took a middle ground: the k3s data volume would be a persistent EBS volume that gets automatically re-attached any time the instance restarts. So add another line of code to Pulumi and off we go!
Oops! Turns out that storage is expensive. A 100GB EBS volume (probably the minimum of what would be “acceptable” for this use case, once I get things actually running on the cluster) costs $8/month, and a 30GB root volume adds another $2.40. Welp, my costs just doubled. ALSO, I just made life harder for myself, because there’s no built-in way to re-attach EBS volumes to hosts in the event of disruption.
No big deal though, we can just write some more isengard code to handle this. The way we handle this is, each k3s host is tagged with the ID of the EBS volume that “belongs” to it; a couple of systemd scripts run on instance startup to look up the tag, attach the volume, mount it to the right place, format it if necessary, and then start k3s. Easy peasy12!
The next problem we need to deal with is that the API servers need to have a stable network address if we’re planning to access them from anywhere “external”. My first thought was “DNS is for lusers, and also it’s always DNS”, so I tried to just assign a static private IP address to the control plane instance. However, because we’re running the control plane node as a spot instance we have to stick it inside an ASG if we want the instance to automatically re-create itself on disruption, which means that—even though the ASG has a max size of 1—we can’t assign a private IP address to it. So, back to DNS it is.
But now we have a problem: how do we update the A record for the instance? Eh, no big deal, we’ll just use throw more money at AWS and use an internal hosted zone in Route53, and create a ASG lifecycle hook and a lambda function to update the A record on instance launch. Net additional cost: a few cents/month.
AMI dreaming?
The last step in the process is answering the question, “how do we get k3s re-installed on the nodes when they’re disrupted?” The naïve solution is “just re-run the Ansible playbook on boot”, but this is obnoxious because it a) uses up a bunch of compute credits for my burstable EC2 instance, and b) it takes even longer for the control plane to become available after interruption. So the solution here is to use packer13 to bake an AMI14 with all the required software packages installed. This part actually was pretty easy, thanks to all the hard work Ian has been doing to create a SimKube AMI for GitHub Actions. It took a half-hour or so to modify the AMI baking pipeline to make it more generic, and now we’re baking an AMI for k3s as well! We do this weekly, so we can pick up security updates and other package updates to the underlying OS.
Of course, we don’t have anything to actually clean up stale/old AMIs, and turns out that these snapshots are also kindof expensive, so currently I log into our AWS console every couple of weeks and delete all the old ones, a process which is completely sustainable from now until the end of eternity. Net additional cost: another $5/month or so, probably.
We do want the k3s node(s) to reference the most recent AMI when new ones are launched or old ones are disrupted, and we also don’t want to have to manually update the ASG Launch Template for k3s to point to the new AMI. Fortunately, AWS provides a key-value store that you can reference inside a launch template so that it always points to the latest AMI. All we have to do is update the value stored there whenever a new AMI bake is complete, and problem solved. Fortunately because AWS is so benevolent, this service is free.
Cool Kubernetes cluster, bro; what’s it do?
So there you have it! ACRL is running a (one-node) Kubernetes cluster for around $35/month. Not too bad if I say so myself.
What? What’s that you say? Is it actually running any services or applications? Lmao, of course not. That stuff costs money! Also I got nerd-sniped this week into solving another totally different problem which will absolutely become its own blog post sometime in the future. So, yes, we are in fact spending $35/month for nothing.
Also, there’s the little, small, tiny—miniscule, really—problem that the cluster is living inside our private VPC, which means it’s not actually accessible to anybody from the outside. My current solution to this is to use SSH forwarding and a manual entry in /etc/resolv.conf to point to the internal VPC DNS resolver. This is… not actually sustainable, and isn’t even worthy of being called an “interesting choice”, it’s just dumb.
I think the solution here is to set up TailScale, but that’s another $6/month and a whole bunch more configuration that I don’t have any time for right now. So instead, we’re just gonna keep running a pointless Kubernetes cluster that does nothing for the foreseeable future. But maybe at least you can use this article to inspire some “interesting” architectural decisions of your own.
As always, thanks for reading!
~drmorr
In the previous two years, I spent a tremendous amount of time over the holidays preparing grant proposals, because that’s when they were due. I decided this year, based on the general state of *gestures despairingly at everything* that applying for grants was a waste of my time, which meant I got to do fun things like “making my monthly AWS bill go up”.
Well, “permanently” running is maybe a stretch. My SLA for the cluster is about one “5”
OK this analogy got weird.
Another interesting fact about k3s is that it doesn’t—by default—use etcd, it instead runs an embedded SQLite database. You can configure it to use etcd or Postgres if you want, but I do find it really fascinating that the single foundational building block of Kubernetes that supposedly enables all its fancy features (watches and updates and blah blah blah)… isn’t actually necessary.
The only change I made was to disable Helm, because, well, fuck Helm.
I can already hear some of my former coworkers saying “That’s an interesting choice.” My main reasons for using Pulumi were a) I’ve done a lot with Terraform and wanted to try something different, and b) I like Python better than HCL.
“That’s an interesting choice.”
Having now used all three of the big “configuration management” tools—Puppet, Chef, and Ansible—I can now say definitively that “they all suck equally, just in different ways”. I picked Ansible because I like Python and Jinja templates.
Foreshadowing, anybody?
Say it with me now: “That’s an interesting choice.”
It was not, in fact, “easy peasy” – the sticking point, naturally, was AWS IAM permissions; I asked both Claude and ChatGPT to help me write the IAM policies, because I’ve done enough of that by hand for one lifetime, and I expected that to be one of the tasks that the chatbots would actually be good at, and, well, it turns out that they are not, in fact, good at it.
Looks like I wasn’t able to escape HCL after all.
Pronounced Ayy Emm Eye, never “ahh-mee”.


