Doing Kubernetes Configuration Good

Aug 04, 2023

Disclaimer: in this post I’m going to be talking about something which I have not very much experience with. It is highly likely that I’m re-inventing one or more wheels here, but I had fun doing it, and that’s what matters, right? Anyways, if you know of better options for doing this type of stuff feel free to leave a comment :)

Simpsons meme. Bart says "I don't hate YAML" and Homer responds, "You don't hate YAML *yet*". — Meme from r/ProgrammerHumor

Kubernetes Manifest Generation

As I’ve been getting into my various projects at Applied Computing, I’ve just been making up hand-crafted artisanal YAML files as I went along. This has been working well-enough for me thus far, and honestly, I could probably continue doing that for quite some time, but I didn’t like it. At previous companies, we’ve had bespoke YAML generators that let you write something a little nicer than raw Kubernetes YAML (one of those projects was PaaSTA, which I still think hit the sweet spot for expressiveness and simplicity to use). But I didn’t have anything like that for myself. I was expecting to have to get into helm at some point, which I was kindof dreading for no real well-defined reason, just that YAML templating… sucks.

The other project I’m aware of in this space is kustomize, which is the Kubernetes SIG-CLI-sponsored way of building applications. It’s a “YAML-and-filesystem” language: essentially, you can write your Kubernetes YAML with shared bits in a base YAML file, and per-environment customizations or patches (called overlays) that get applied based on your filesystem structure. It’s a “pure-YAML” approach, so there’s no templating or code involved. It has the nice feature that it’s built into kubectl, so it provides a somewhat “native” developer experience1

However, a few weeks ago on Mastodon, I saw someone mention a project called cdk8s and that it was easy to use—so I thought this might be a good opportunity to a) simplify my life a bit, and b) explore some projects that other folks are doing in this space.

So, what is cdk8s? It’s a Cloud Native Compute Foundation sandbox project built by AWS that is designed for… You know what, here, I’ll just quote the project:

cdk8s is an open-source software development framework for defining Kubernetes applications and reusable abstractions using familiar programming languages and rich object-oriented APIs.

Translation: we generate Kubernetes YAML from code. In this post, I’ll go into a little bit of my experiences with the project, and also introduce a wrapper that I wrote around it to make it even easier (imo) to use.

cdk8s: digging in

First of all, I had to install the app, which is done through npm. Once installed, the docs say to run cdk8s init <app-type> in your chosen directory. The supported options (at time of writing) are Python, Go, Java, JavaScript, and TypeScript, though it seems fairly clear that TypeScript is the “recommended” option. However, I don’t know TypeScript and wanted to do some more Python development, so I picked that:

> cdk8s init python-app
Initializing a project from the python-app template
[snip--lots of output]
=====================================================================

 Your cdk8s Python project is ready!

   cat help      Prints this message
   cdk8s synth   Synthesize k8s manifests to dist/
   cdk8s import  Imports k8s API objects to "imports/k8s"

  Deploy:
   kubectl apply -f dist/

=====================================================================

Ok, let’s see what it created for us:

> ls
cdk8s.yaml  dist  help  imports  main.py  Pipfile  Pipfile.lock

As promised, the help file contains the help message that it printed at the end. This feels a little half-baked to me, but ok, moving on. What’s in that cdk8s.yaml file?

> cat cdk8s.yaml
language: python
app: pipenv run python main.py
imports:
  - k8s

This seems to be instructions to the cdk8s CLI on how to actually synthesize your code into Kubernetes YAML. Which means, if I’m reading this right, we don’t actually need to run cdk8s synth, we can just run pipenv run python main.py. And in fact, that turns out to be correct — but we’ll get to that more a bit later on in the post. Let’s move on.

The dist directory is just the output location for the manifests. Let’s see what’s in that imports directory…

> ls -T imports
imports
└── k8s
   ├── __init__.py
   ├── _jsii
   │  ├── __init__.py
   │  └── k8s@0.0.0.jsii.tgz
   └── py.typed

Ok, so this is presumably where the Kubernetes API definitions are… I wonder what JSII is? Oh, this is also an AWS project:

jsii allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase!

Wow. There must be some arcane stuff going on inside that codebase. I… don’t think I want to investigate that any further. But hey, it looks like we have type hints, which is awesome! I basically refuse to do any Python development without mypy2.

Lastly, we’ve got main.py, which holds the actual code, and a Pipfile and accompanying lockfile — pretty standard stuff. So let’s see what it’s like to actually use the library!

cdk8s: generating YAML

The first thing I tried to do was to create a simple nginx pod, which is pretty straightforward:

class Simkube(Chart):
    def __init__(self, scope: Construct, namespace: str, id: str):
        super().__init__(scope, id)

        label = {"app": "nginx"}
        k8s.KubeDeployment(
            self, "deployment",
            spec=k8s.DeploymentSpec(
                selector=k8s.LabelSelector(match_labels=label),
                template=k8s.PodTemplateSpec(
                    metadata=k8s.ObjectMeta(labels=label),
                    spec=k8s.PodSpec(
                        containers=[
                            k8s.Container(
                                name="nginx",
                                image="nginx:latest",
                                ports=[k8s.ContainerPort(
                                    container_port=8080)],
        )]))))

So, already we can see some cool things: defining the label as a variable ahead of time lets us re-use it later on, and you can imagine how you might start to build up some functions that, for example, create common sidecar containers that you want injected into every pod. Then, when you run cdk8s synth, it does a bunch of type-checking and schema matching to make sure you’ve filled things in correctly, which is awesome. I don’t know how much time I’ve wasted on subtle type errors in my Kubernetes manifests, but this means I don’t have to worry about that particular frustration anymore.

The generated output is what you’d expect, so I’m not gonna post it here. My one minor concern is that resource names are pretty verbose. Every resource name has the following format: <chart-name>-<resource-name>-<hash>, where the hash is a unique-but-stable identifier to prevent name collisions. So, for example, the nginx deployment I have above would look like nginx-deployment-abcd123. This means that, by the time you get to the actual pods in the deployment, the name has 3 different sets of random character strings attached: nginx-deployment-abcd123-wqer42-m4n31, which starts to get pretty cumbersome. At first I didn’t really like this, but I’ve more-or-less come to terms with it3.

While we’re on the subject of the generated output, another cool feature of cdk8s is the ability to add dependencies between objects. I didn’t show an example here, but it’s always frustrating to have your pod fail to start because you forgot to create the ConfigMap that it depends upon, but cdk8s allows you to specify dependencies so that the ConfigMap is always created before the pod.

But, to be honest, this isn’t actually that much better than raw YAML. It’s still pretty verbose, and it requires a lot of looking things up in the API documentation to remember how they’re structured and how to fill things in.

And speaking of the documentation, the cdk8s docs need some work. All of the examples in the docs are presented in TypeScript, and although there is API documentation for the other languages, there is not currently any API documentation for the Kubernetes objects themselves! This is a bit problematic, especially because in Python, the various field names are given in snake_case, whereas Kubernetes uses camelCase. In most cases I was able to intuit what the right Python name should be for a field, but there were a few cases I couldn’t, and I didn’t have anywhere to look it up4. It also leads to some weird idiosyncrasies where the named function arguments use one style, but (e.g.) dict keys that get passed through to Kubernetes use a different style.

So, my verdict is: some rough edges, but overall, cdk8s seems pretty cool! I’m pretty excited to use it more. But, surely we can do better? What I really want is a nice, high-level wrapper around “common” things that I want to do, that don’t require me to go look up/remember which values for which fields I need to set every single time.

Introducing: 🔥Config

It turns out that cdk8s actually has a higher-order wrapper called cdk8s+, which does something very close to what I want. Unfortunately, a few things turned me off from it. The documentation for cdk8s+ is even more sparse and even more TypeScript-centric than for cdk8s, and I found it fairly opaque to understand how to use beyond the few examples they give5. I’m sure I could have figured it out, but I actually wanted to try my hand at building my own wrapper which would support a few of the operations that I’ve always wanted in this type of tool.

Firstly, I really wanted something that uses the builder pattern. Kubernetes manifests seems tailor-made for that type of design, which seeks to “separate the construction of a complex object from its representation”. Secondly, I really wanted a tool that keeps track of all the “fiddly bits” of Kubernetes objects. And, lastly, I wanted something that was a bit more Python-centric than cdk8s+ was.

Two examples of “fiddly bits” I wanted to support are VolumeMounts and Services. I am constantly struggling with these two objects in particular, because they require you to sync up your config in two or more different places: for a Volume, you first have to define the volume in the PodSpec, and then you have to reference the same volume in the ContainerSpec (and if your volume comes from, say, a ConfigMap, then you additionally have to make sure the volume definition is synced with the ConfigMap definition). If you make a typo or change a name in one place and forget to change it everywhere, then you have a lot of frustrating and time-consuming kubectl apply cycles awaiting you. The same is true for Services, where you have to make sure that the Pod port and the Service port match up, and the Service selector matches the Pod label selector, etc. etc. etc.

So I built a thing: 🔥Config is a Python wrapper around cdk8s that does all this stuff for you. It is extremely alpha at this point, so there are no docs or tests, and the API may change at a moment’s notice, so please don’t depend on this library for anything critical right now. But, lemme show you what it looks like!

class Simkube(Chart):
    def __init__(self, scope: Construct, namespace: str, id: str):
        super().__init__(scope, id)

        cm = k8s.KubeConfigMap(
            self, "configmap",
            metadata={"namespace": namespace},
            data={"node.yml": NODE_YML}
        )

        volumes = (fire.VolumesBuilder()
            .with_config_map("node-skeleton", "/config", cm))
        env = (fire.EnvBuilder()
            .with_field_ref("POD_NAME", DownwardAPIField.NAME))
        container = (fire.ContainerBuilder(
                name="simkube",
                image="localhost:5000/simkube:latest",
                command="/simkube",
                args=[
                    "--node-skeleton", 
                    volumes.get_path_to("node-skeleton"),
                ],
            ).with_env(env)
            .with_volumes(volumes)
            .with_security_context(Capability.DEBUG))

        (fire.DeploymentBuilder(
                namespace=namespace, 
                selector={"app": "simkube"},
            ).with_service_account_and_role_binding(
                'cluster-admin', is_cluster_role=True,
            ).with_containers(container)
            .with_node_selector("type", "kind-worker")
            .with_dependencies(cm)
        ).build(self)

Don’t worry too much about what this pod is doing right now, that’ll be a subject for a different blog post. But you can see that, with only a few more lines of code than in our previous example, we’re accomplishing quite a bit more. We’re still creating a cdk8s Chart object, and we’re just using the vanilla cdk8s API to create a ConfigMap. The next block of code is where things get cool. First, we create a VolumesBuilder object, and we tell it to create a volume based off the ConfigMap that we just built, and give the volume the name of node-skeleton. If we had other volumes we wanted to add, we could do so with additional .with_X statements.

But take a look at where we’re using those volumes. First, we pass them into the ContainerBuilder with a .with_volumes() method call. In this example, we only have one volume to mount, but a common scenario is that you have multiple containers and multiple volumes, and each container needs a different subset of volumes mounted. The .with_volumes() method on the ContainerBuilder takes a list of volume names as second optional argument, and it’s smart enough to only attach the specified volumes to the containers.

Next, note the arguments list to the container: another really common operation is that some file in the mounted volume gets passed in as an argument to the program running in the container, so if you change the name of that file but forget to change the argument, your program won’t work. But here, we just call the .get_path_to() method and it will do the right thing.

There’s some other cool stuff happening in the ContainerBuilder which I think is mostly self-explanatory, so I’m going to skip ahead to the DeploymentBuilder object. The main requirement for a Kubernetes Deployment is a label selector so that it knows what pods to watch, so we specify that in the constructor. Note that we don’t have to specify that label anywhere else! The DeploymentBuilder will automatically inject the chosen label selector into the PodTemplateSpec that it manages, we don’t have to manually set the label on the created pods (we can set other labels on the underlying pods by calling .with_pod_label(), or we could set a label on the Deployment by calling .with_label()).

Next up, we associate a service account and a role binding with the Deployment, which is another one of my “fiddly bits”. The DeploymentBuilder is smart enough to create these objects and set up all the dependencies between them without me having to do any work. Hooray!

The next step is to inject the containers into the pod with the .with_containers() method; if we had sidecar containers, you could either call .with_containers() multiple times, or you could just list them all out in a single call. Note that we just pass in the ContainerBuilder object, we don’t actually call .build() on the containers. The DeploymentBuilder will call .build() on all its subresources once we call its .build() method.

Lastly we set up a node selector for the pods, and add the ConfigMap as an explicit dependency so that cdk8s creates it before the Deployment, and then we call .build(self), which constructs the appropriate cdk8s object(s) which knows how to synthesize itself into YAML. Pretty slick!

“But what about the volumes?” I hear you ask. We’ve created the volume mounts in the container, but we still to to specify the actual volume configs in the PodSpec. Have no fear! The DeploymentBuilder will automatically detect which volumes are used in the containers, and create the requisite configs in the PodSpec.

This post is getting long already, but one last thing I’ll point out is that 🔥Config embeds the cdk8s imports directory, so you actually don’t have to install the cdk8s CLI to use it. You can just specify 🔥Config as a dependency in your Python project, and it takes care of the rest.

Wrapping Up

Ok, so there you have it: cdk8s and 🔥Config. Both projects are definitely on the “young” side and have some rough edges, but I’m really excited about where cdk8s is going! I think they are doing a lot of good things and the rough edges and documentation issues I’m sure will improve with time. What about 🔥Config? Well… I’m planning to keep trying to use it personally, but it’s definitely not ready for prime time, and it may not ever be. It’s more of an “aspirational” project than anything else—i.e., this is the tool that I wish I’d had all along, and maybe someday it will be usable by someone besides me. But then again? Maybe not! Who knows.

Thanks for reading,

~drmorr

I just want to reiterate here that I’ve used neither helm nor kustomize up till now in my career, and what I wrote in this post is based on a very cursory and limited understanding of how the projects work, so… don’t flame me too hard in the comments section, plz.

Excuse me, sir? Your rust is showing.

Using tools like kubectx and kubeps1 help a lot here, and if you really don’t like the hash appended to the resource name by cdk8s, you can turn it off.

There’s actually one place you can look these up, which is what I ended up doing. Remember imports/k8s/__init__.py? All of the Kubernetes API objects are defined in this file. It is a little horrifying.

Actually one thing I still don’t understand about cdk8s and cdk8s+ is that cdk8s+ is versioned, e.g., if your cluster runs Kubernetes 1.27, you must use cdk8s+ 1.27. However, cdk8s itself is unversioned, which doesn’t make sense to me… Shouldn’t cdk8s also be versioned based on the Kubernetes API version it’s targeting? Is it just always targeting the latest version? I don’t know.