Writing Tools that Don't Suck
Ok, I know I promised more KubeCon and/or NSDI content, but that'll have to wait until next week. I've got a topic on my mind that I can't get rid of until I write about it: namely, software tooling. This is inspired by a few different things: first, one of the topics I've had on my backlog of blog post ideas is "useful tips and tools"—when I first put this on there, I thought it might just end up being a laundry list dump of "hey, did you know that eza
is better than ls
?", which, frankly sounds about as boring to me to write as it sounds for you all to read, so it's just been hanging out on the backlog.
But, recently, I've been doing a bunch of work with ansible to try to automate some of my infrastructure needs, and also I've been writing a bunch of tooling to help me with my SimKube data analysis—more on this in a bit. Additionally, Hillel Wayne just wrote a very timely post on software friction, which has been resonating with me a lot, and lastly I've been doing some reflection recently on previous experiences where, not only was writing tooling not prioritized, it was actively discouraged. All of these musings have led me to this post. I'm pretty sure I'm not saying anything new, unusual, or controversial in here, and I've seen plenty of other blogs around the Internet saying the same things, but people (apparently) aren't listening to those posts so I’mma say it again.
What's tools, precious??
You have a job to do. That job is to write software, and (hopefully, if you are lucky) get that software in front of some users. And, (if you are exceptionally lucky) that software will actually do the thing it is supposed to do without making the user's life noticeably worse. This post is not about any of that. This post is about the things you need in order to make your primary job easier.
Every profession needs tools. Some tools are more-or-less required (you can't get the job done if you don't have them). If you're a carpenter, you probably need a saw. If you're a coder, you need something that's approximately in the shape of a compiler12. Other tools are optional: you don't have to have them, but they're sure gonna make your life easier if you do have them. A drill with a screwhead bit is a lot easier to use than a screwdriver. An automated variable rename/refactor utility in your IDE is a lot easier (and potentially safer) than doing a find-and-replace by hand.
There's also a time-honored tradition in carpentry of making your own tools, and there's a reasonably well-known book called The Pragmatic Programmer which makes the same comparison to carpentry that I'm making here3. The point is, in carpentry, you know how to make stuff with your hands, so if a tool doesn't exist that's gonna make your life easier, you can just make it. Same thing with software: as a programmer, you know how to make computers do things, so if a computer isn't doing a thing for you and having the computer do the thing for you will make your life easier, you can just tell the computer to do the thing4.
In this essay I will
Go slow to go fast—and safe
(I hope you all appreciated the joke at the end of the last section)5
One of my guiding principles at ACRL is that if I'm going to be effective, I need to be fast, but not in a "move fast and break things" kind of way. I would rather "move fast and do things right"—and a bit counter-intuitively, the only way to do that is by starting slowly. See, there's this weird tradeoff in writing tools: writing the tools themselves takes time, and that's time that you could be spending working on your product. You could, for example, spend half a day pushing buttons on the AWS console getting some EC2 instances deployed. If you just have to do this once, maybe it's not worth writing the tool, but if you are constantly spinning up new infrastructure, or making changes to your existing infrastructure, you're probably going to want some automation to help. Not only is it faster, it's also safer. You never have to worry about forgetting to push the "Don't charge me $10b/day" button that's hidden in the Special Operations dropdown of the Advanced Configuration settings for the EC2 Instance Launch Configuration Generator Generator6.
So when should you create a tool? When the absence of that tool will a) make you go meaningfully slower, or b) make you significantly less safe. In other words, when there's friction in your process7. The obvious thing about friction is that it slows you down, but the less obvious thing about friction is that it also makes things dangerous. Why is a dull knife worse than a sharp knife? It's not because it's better at cutting, it's because dull knives make accidents more likely to happen while you're cutting. Friction means you need to apply more force to get the thing done that you want to get done, and the more force you have to apply, the more likely you are to break something.
How to make a good tool
I don't think it's controversial that tools are important. So in the rest of this post, I instead want to talk about how I think about building tools (and I'm specifically talking about software tools, aka automation here, but I think some of these principles also apply to other types of tools as well). See, the thing about tools that is different from other types of software is that in many cases, the users of the tools are a relatively small group of people. Maybe you're the only user. Maybe it's just your team. Maybe it's you and a couple random folks on the Internet that you've never met. But in general, you don't have to spend time making your tool super robust as long as the failure modes are minor. Your CLI tool crashes when you supply the wrong sequence of input parameters that you know you will never supply? Don't waste time fixing or testing for that case! Just make your tool do the thing you need it to do, and no more.
This, too, is non-controversial in theory, and extremely hard to do in practice. Here are three guiding principles I use to help me build useful (but not too useful) tools:
Understand the problem the tool needs to solve. I've been working on a collection of tools to make data analysis on Kubernetes clusters easier. I'm doing a bunch of work with pandas and related data analysis/visualization libraries that have a lot of inherent friction to them. The immediate temptation is dive in and immediately write a giant pandas wrapper framework that abstracts away all the arcane syntax and solves all the problems you can think of with the library. Do not do this. If you don't intimately understand what the friction looks like (in a safe environment), the tool that you write will not make it better, and it probably will make it worse.
In my case, what this means is that I've spent a lot of hours in a Jupyter notebook, staring at "native" Pandas code, reading the Pandas documentation, and trying to get the results that I want without any extra tooling at all. It's been very painful and very frustrating at times, but this is a critical part of the process. You can't remove friction if you don't understand it.
Start small and iterate. I try hard to do this in general anyways, but I think it's extra important for building tools. The immediate temptation (again) is to try to build the most general interface you can think of that might solve a problem that you don't have now, but know you will have in the future. Again, do not do this. When I started looking into ansible for automating some of my infrastructure, the first task I wrote was essentially
- name: Create an EC2 instance amazon.aws.ec2_instance: name: "my-instance"
I wrote that and then immediately said, "That's stupid. Why do I have a one-line function to create an EC2 instance? I can't even do anything with that instance, it won't have the right SSH keys or networking config or anything." But I knew I was going to need to create EC2 instances of some kind, and I wanted to solve the basic problems with doing so before figuring out all the hard stuff. In this case, I realized that I didn't have the correct AWS roles configured to even be able to run an instance, so I had to spend a bunch of time setting that up. If I had been trying to solve a more general problem (like, say, creating an entire bespoke Kubernetes cluster across multiple regions) there would be so many problems to debug that I likely wouldn't know where to start.
Recognize when the tool has exceeded its scope. It's a well-known problem in the industry: if your tool is good, people will see you using it and want to know how to use it too. Maybe they'll discover (or contribute!) a way to use the tool that you didn't expect or intend8, but now the tool is better! Yay! Now more people are using it. Whoops! Someone just added your tool to a production system, and then your tool broke because you didn't bother making it robust, and now everyone is mad at your tool9.
Sound familiar? It happens all the time. At some point, one of your tools is going to get depended upon in a way that is a) mission-critical, b) unexpected, and c) brittle. In that case, you should celebrate! You made a thing that other people like and are using! This is incredibly validating. Then, once you've done that, you have a choice to make: you can either step up and turn your tool into a proper product (not necessarily one that you sell, but definitely one that adheres to best practices for production code, e.g., maybe write some tests for the damn thing), or you can throw it away.
Either option is a valid choice, depending on your circumstances. Some tools outlive their usefulness, and that's totally fine! Take pride in building a thing that was useful for a time. Other tools are so important that they become de facto required in order to actually do the job. It's really up to you to decide which route you want to take here, but make sure you're deliberate and up-front about it. If you're going to throw the tool away, make sure you tell your users, "I am no longer supporting this tool, if you continue to use it in these ways, please be aware that it may break at the worst possible time." Telling them this won't make them any less upset when they ignore you and then it breaks anyways, but if you put it in writing at least you can tap the sign and say "I told you so," which might give you a modicum of satisfaction for being right.
So anyways, those are my thoughts on writing tools, thrown out into the void in a hopefully timely manner. Tune back in next week for some more coverage of KubeCon and/or NSDI and/or whatever the heck else I happen to feel like writing about at the time.
Thanks for reading!
~drmorr
I refuse to get into a debate about interpreters, compilers, transpilers, etc. If you would like to get in this debate, do it somewhere else.
"But what if you just write in assembly?" I hear you asking. For one thing, stop. For another, you still need something to turn that assembly into machine code. "But what if—" No. Shut up.
I haven't read it yet, but I think I need to.
Note that this is not an argument in favor of LLMs.
See there's this thing on the internet where people will write a hot take about a thing that doesn't matter and then follow it up with "In this essay I will" but then they just stop there and don't actually write the essay, but this is a clever subversion of expectations because I actually wrote the essay. Also explaining the joke is an effective way to make it funnier.
Seriously, y'all, this is a really important button, I can't believe AWS doesn't make it more obvious.
In Hillel Wayne's blog post, he articulates a number of responses to friction in process, only one of which is automation, i.e., building software tools. And he's totally right! There can be a tendency in engineering towards automation when it isn't necessary, but I might argue that another way to look at this is that you still need tools to handle the friction, but only a subset of your tools are software. Social tools are tools too!
See also: Hyrum's Law.
Definitely not mad at you, though, we practice a blame-free culture around here, remember?