ACRL is a CA now
“CA” is one of the most overused acronyms in tech. It means, among other things, “Cluster Autoscaler” (e.g., the thing that scales your Kubernetes cluster up and down), “Corrective Action”, (that is, a post-incident task you do to ensure that the incident or outage doesn’t happen again), and “Certificate Authority” (the process or system that cryptographically signs all of your HTTP certificates so that all your REST traffic can be encrypted). That’s not to mention the more pedestrian meanings of “California” and “Canada”.
Put all this together, and things get really confusing really fast, especially when your CA engineer needs to work with his CA counterpart to write a CA because the CA couldn’t get new credentials from the CA, which is a thing I have definitely never had happen to me.
All this is to say that ACRL has been a CA for a while: we’re a California-based company that has definitely autoscaled some clusters, and given the circumstances of my departure from my previous gig, you could argue that ACRL is also a corrective action. But there’s one CA that we haven’t been before, which is a Certificate Authority; however, late last year that changed! In this post I’m going to talk about the why and how of it all.
Why on earth do you need to issue certificates?
OK, so let’s take a step back: what is a certificate12? I alluded to this briefly in the introduction, but a certificate is a cryptographic primitive for performing encryption and authentication. It has two parts, a “public key” and a “private key”; the public key, as you might expect, can be shared publicly, but the private key needs to be kept secret. There’s a lot of math involved which I won’t go into, but the basic idea is that you give your public key to someone else, and then you can prove to them that you are the owner of the private key by decrypting a bit of data that they encrypted with the public key. This is one of the underpinnings of the modern internet; most websites these days use public key cryptography to provide a secure connection to their site, and in fact many browsers will flash scary warnings at you if you visit a site that doesn’t use this security measure.
It turns out that this is just the beginning; for a long list of reasons, we’ve established a chain of trust with these certificates, so that very often you’re no longer verifying that you have the private key, you’re verifying that you have the private key and that private key is trusted by some third party. The third party has their own certificate, which is trusted by another third party, and so on and so forth. These third parties are called “Certificate Authorities”, and one of the reasons why they exist is to make certificate revocation easier.
Imagine, for example, that you accidentally shared your private key on GitHub. Whoops! Now anybody who happened to look at your GitHub repo while it was there has your private key, and they can pretend to be you! They can send encrypted messages as you or pretend to be you when visiting websites. You can take the private key down, but it would be really great if there was some way to signal to the entire world that if anybody ever uses that private key again, they are a bad person and should feel bad. That’s (one of) the functions of a Certificate Authority: they maintain revocation lists where you can look up and see whether a certificate is still “trusted”.
Again, all very cool technology, based on a lot of interesting math, but why is ACRL a Certificate Authority now? Well, the answer is simple: SimKube.
Oh come on. You’re telling me your open-source Kubernetes simulator needs to be able to issue certificates?
Well, not exactly. See, SimKube itself isn’t very useful on its own; if you wanted to, for example, compare Cluster Autoscaler and Karpenter, there are a lot of extra components that you might want to install to make that comparison easier, and some of those components might be hosted in a private container registry on AWS. So you need some way to give folks access to that container registry. Now, AWS has a way to manage permissions and authentication already: it’s called IAM (Identity and Access Management), and it’s some of the most arcane nonsense you will ever have to deal with34.
Fortunately for us, AWS provides a way to make it even more arcane: IAM Roles Anywhere. The sales pitch for IAM Roles Anywhere is, essentially, “What if we took an incredibly complex permissions management system and hot-glued an incredibly complex cryptographic identity scheme on top?”
The nice (????) thing about this, and the reason why ACRL is now a CA, is that it gives us a one-step process to grant access to parts of our private AWS account. If a client needs some internal tool or component to make simulation easier, all I have to do is send them a certificate. They don’t even need to have their own AWS account! They just install the cert and they’re off and running. And then, at some later point when they no longer need access to ACRL’s AWS account, I just revoke the certificate and AWS won’t let them in anymore. Cool beans!
I’m still not convinced that any of this is necessary, but OK, at least tell me how you did it.
The process of setting up a certificate authority that works with AWS IAM is non-trivial; there’s quite a lot you need to think about, especially if you want to do it securely. Fortunately for us, someone else already did all the hard work! A German company called Q-Solution has published an open-source Terraform module for creating a certificate authority and easily using that authority to generate public/private keys.
The steps for setting it up were relatively straightforward5:
First, I created a second AWS account for the CA; these certificates aren’t guarding anything particularly sensitive right now, but they definitely could in the future, and if an attacker somehow gains access to my primary AWS account, I don’t want them to be able to muck around with certificates. This is probably overkill, honestly, but as one of my trusted colleagues has repeatedly told me, this whole CA scheme is insane and ridiculous overkill, so why not go all in on it?
Once I had the second AWS account, I really wanted to make sure that I got notified whenever anybody used the account for anything. For this purpose, I created a CloudTrail log (you get one free!), and then set up EventBridge to send notifications from CloudTrail to the Simple Notification Service (SNS), which emails me. Note that CloudTrail is configured in the main account (aka the management account), but it actually aggregates events from all the different accounts.
This was incredibly annoying to get working: ACRL is using AWS Single Sign-On (SSO) to access both the main account and the CA account; when you sign in to an account with AWS SSO, it logs a Federate event to CloudTrail. But also, users can access the CA account from the CLI, which doesn’t go through SSO, but instead uses an GetRoleCredentials call; so I needed to monitor two separate types of events, confusingly, neither of which are an AWS API Call, but instead are an AWS Service Event via CloudTrail6. And lastly, it turns out that GetRoleCredentials is a read-only event, which isn’t tracked in EventBridge by default; instead, you have to turn on tracking of read-only events in EventBridge, and the process to do this is a semi-secret flag that you can only set from the command line, and not through the AWS console7.
Anyways, once I got all that set up, I decided I hadn’t gone deep enough down the rabbit hole, and also made this janky “login monitoring system” alert me if anyone logs in using the AWS root user. Because why not. But finally, we can move on to
Point the certificate authority Terraform module at my brand-spanking-new AWS account, run it, and immediately get a million emails saying “Someone just logged into your AWS account!!!111!1!11one”
The certificate authority module is pretty nice actually; when you first set it up, it runs a bunch of AWS Lambda functions to generate your root certificate and signing certificates, and then anytime I need to generate a “client” certificate, I can just use the provided Python script to do so. Managing the revocation list is also not too bad, any time I need to revoke a certificate I just add its SHA to the revocation list, and trigger the “revocation Lambda”, and voila, nobody can use that certificate to access my account anymore. Pretty neat!
Huh. I guess that is kinda cool.
Thank you. I’m glad you finally agree. Anyways, the best part of all this is that now any time anybody wants to know if ACRL is a CA, I can confidently answer “Yes”, regardless of what definition of CA they are referring to8.
As always, thanks for reading.
~drmorr
Feel free to skip this section if you’re a computer security guru, because it’s full of handwaving and statements that maybe aren’t outright lies, but they’re definitely not true either.
On the other hand, if you aren’t a security expert and want to know more about all this stuff, the Wikipedia article on PKI isn’t a bad place to start.
Pray that you never have to.
Even Claude and ChatGPT are bad at IAM policies, which is saying something. I’m not sure what it’s saying, but it’s something.
I’m using “relatively straightforward” to mean the same thing as “the proof of this theorem is trivial” in your college math textbook.
You must use this string exactly when you set it up, and if you use the wrong string or make a typo, nobody will tell you, but none of your events will get delivered.
Using the cleverly-named EventBridge parameter, ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS.
Except for Canada, I guess. I don’t think ACRL will ever be Canada.


