Naked pods are Weird, Man

I’ve been doing some work with SimKube for a client recently and have run into a bunch of issues with SimKube fixed regarding its handling of bare Kubernetes pods. Bare pods are kindof an unusual special case for Kubernetes so I thought it might be interesting and/or instructive to see some of the problems I’ve been having and what the fixes have looked like. But before I dive into the details, let me give a quick reminder of what we’re all doing here1:
A Quick Reminder Of What We’re All Doing Here ™️
SimKube is a “record-and-replay” simulation environment for Kubernetes that we’ve been building for the last couple of years; it works by recording a “trace” of events that occurred in a production Kubernetes environment, and then replaying those same events in a simulated environment. In the simulator, we use KWOK to mock out the application pods and the compute nodes they’re running on, while still running all of the real control plane components. This lets us simulate extremely large Kubernetes clusters on your local dev machine2.
Under “normal” operation, the trace file collects changes over time to “higher-level” Kubernetes resources, like Deployments, CronJobs, StatefulSets, or any custom resources that you’ve installed. It then uses real controllers to manage those objects and create the actual (simulated) pod objects that they own. This way you can be sure that the controllers are running in the same way that they might in a “real” environment3.
However, in some cases, you might not want to (or might not be able to) watch the higher-level resources, but you still want to understand some properties of your production cluster (like autoscaling or scheduling behaviour). In this case, you can configure SimKube to watch “bare pods”—that is, the raw pod objects themselves, independent of their owning resources. In principle this will work just as well in SimKube; however, because it’s a bit of an edge case, it hasn’t been as well tested or supported during development, so we’ve encountered some problems in the last few weeks. What kind of problems? I’m so glad you asked!
Bare pods problem #1: Pod Spec Template Paths
The “replay” component of SimKube is called the SimKube driver, which is a pod that sits in your simulation Kubernetes environments; it is responsible for a) replaying all of the events in the collected trace file, and b) intercepting the controller-created pod objects to re-target them to the fake KWOK nodes. In order to apply the correct node selectors, labels, and annotations to the simulated pods, the driver needs to know what these pods “look like”4. All of the core Kubernetes resources (Deployments, etc) and most third-party/custom resources do this via a “pod template”, where users actually embed the pod YAML into the higher-level resource and the owning controller will use that template to “stamp out” copies of the pods it should create. The SimKube driver modifies the contents of this pod template to make sure they run in the right place, which means it needs to know “where” in the higher-level manifest that template is defined. SimKube does this through a podSpectemplatePaths field defined in its configuration5, where the path is represented in JSON pointer notation. For example, the podSpecTemplatePath for a Deployment is
/spec/templatewhereas for a CronJob, it is
/spec/jobTemplate/spec/templateBut what should the podSpecTemplatePath for a bare pod be? It’s kindof a weird question because there’s no other controller that is “stamping out” copies of that pod, it’s just… the pod. But it turns out not to matter, because the SimKube driver can still do all the modifications it needs in order to run the sim. So, what is it?
If you answered "/", you would be incorrect, but you’d be wrong in good company because both Ian and I thought that’s what it should be at first. However, it turns out that SimKube prepends the podSpecTemplatePath value to a JSON pointer suffix in several places, and using "/" for bare pods results in a double-slash, which is incorrect and caused everything to crash:
//metadata/labels # This is not correctThe fix for this was extremely simple.
Bare pods problem #2: The pod’s already been scheduled
The second “bare pods” problem we ran into was around scheduling; the normal flow of operations in Kubernetes is
Something (usually a human operator) creates a resource (like a Deployment).
The Kubernetes controller that owns that resource (e.g., controller-manager does some stuff that eventually results in pods getting created.
The Kubernetes scheduler tries to assign that pod to a node, or report that no node is available.
If the pod was scheduled to a node, the kubelet starts the containers in the pod and kicks off the application code.
The part we’re interested in here is in step 3; see, in the pod specification, there is a nodeName field which indicates what node the pod has been assigned to. However, this field is immutable; once the pod object has been created, you can’t change it later. Instead, you have to use the bind API to bind the pod to the node (I wrote an entire blog post about this wayyyyyyyy long ago in the early days of ACRL).
Again, this is all fine and dandy normally, but if your trace file contains bare pods, it is likely that these pods have already been scheduled to some node in the cluster that the trace came from, which means that the nodeName field will be set to some node that doesn’t even exist in your simulation environment! This clearly will cause problems; the fix here is again extremely simple.
Bare pods problem #3: My pods are running forever!
My initial thinking when tracking bare pods is that they work just like any other Kubernetes resource: we track when they’re created and we track when they go away, and everything else will work as normal. Pods are weird, though, because they can finish without being deleted. This is normal in the case of Jobs and CronJobs; it turns out to also be normal in the case of tracking bare pods.
Fortunately, SimKube has already built in primitives to handle this for Jobs and CronJobs, using the lifecycle annotation capabilities of KWOK. Basically, in addition to watching changes to “higher-level”6 Kubernetes resources, the SimKube tracer also can be configured to watch for pod lifecycle events—e.g., when the pod containers start and finish. It records all this information inside the trace so these pod lifecycles can be replayed. In the sim, whenever we detect matching pod lifecycle data in the trace, we set the following annotations on the simulated pods:
simkube.kwok.io/stage-complete=true
simkube.kwok.io/stage-complete-time="2026-06-01 11:00:00"We then create a KWOK stage such that when the indicated completion time has passed, KWOK will move the pod from the “Running” phase to the “Succeeded” phase. Cool! We should just be able to use this for bare pods, right?
Turns out, yes you can! Say it with me now: “The fix is extremely simple.”
HAHAHAHAHAHAHA I tricked you good. The fix, while being relatively small, isn’t simple.
There are two problems that you run into: the first is that, in order to limit the amount of data in a trace file, we only record the pod lifecycle data for pods that are owned by a resource that the tracer is tracking. For bare pods, there is no such owner! It turns out that we can just set the pod to be its own owner inside the trace file; it’s a little weird, it’s arguable whether this is correct or not, but it does work without needing to do a bunch of special-casing.
The second problem is that you need to know whether the pod finished running before the simulation started or not. Remember: the pod object can still exist in etcd, long after all of the containers in the pod have completed running. If you don’t know when the containers stopped relative to the start of the simulation, it’s going to throw off all your analysis78910.
Unfortunately there’s no good way to track this without adding some kind of new field into the trace file, which I was reticent to do. It’s not impossible, and I made some changes with SimKube 2.0 so that I can make these changes in a backwards-compatible way, but I still wanted to avoid it if possible. Then I came up with a ~dirty hack~ genius solution: instead of modifying the trace file format, I can instead just modify the Kubernetes resources in the trace. Now the tracer inserts two annotations on every object it tracks indicating the original creation time and deletion time of the resource.
Most of the time, this information is just ignored; however, if the resource in the trace is a bare pod, the simulation driver will see that these values are set on the pod, and it can then use that to calculate the remaining lifecycle time for that pod object, relative to the starting timestamp on the simulation.
I have no idea if that solution is going to bite me in the future, but for now it seems to work OK.
Also, we’re writing some tests to check for regressions in the bare pod behaviour, cuz those things are *#%&ing weird, man.
~drmorr
Hi new readers from Reddit!
Or anywhere else you might want, like, say, in your CI pipeline.
Have I used enough “air” “quotes” for you in this post so far? Don’t worry, I’m sure there’ll be more!
Some people have complained about the number of semicolons and footnotes that I use in my blog posts, so I thought I’d branch out and start adding more quotation marks.
You might be wondering why this is plural; some third-party Kubernetes resources create multiple different types of pods, which mean they have multiple different pod templates defined in different locations in the YAML.
There’s those darned quotes again.
Indeed, I discovered this problem initially because I was comparing simulation results to the actual cluster, and discovered that there were more than twice as many running pods in the simulation as there “should have been”
Boom, quotes in footnotes; now I just need a semicolon—and maybe an em-dash.
This is how you know I’m not writing this with AI. If I ever write a post without footnotes, you’ll know that I’ve sold out and just asked ChatGPT to write something for me.
Sadly, substack doesn’t let you put footnotes in footnotes, which is incredibly annoying.


