SimKube: Part 1 - Why do we need a simulator?

Aug 25, 2023

Alright, it’s time for my first multi-part series on this blog!

2 Comments

Aug 30, 2023

Well, Part I was a real flashback. I've worked on distributed systems in one way or another for 40 years. Granted, a very different scale, but so were the tools and capabilities and training, so I suspect the challenges were comparable. The notions of simulating failure modes, the challenges of formal proofs - all stuff we wrestled with in building banking systems, from the pre-Internet era (dial-up modems) to ATMs to Internet banking.

ATMs were particularly interesting in that they are a physical/electro-mechanical device, with their own collections of failure modes. Many of those failures were not readily reproducible with physical manipulation, and even those that were reproducible were time-intensive to work through a complete test scenario manually. Hence, the need for simulators to "play the role" of some piece of the system, whether a cash dispenser, card reader, or bank transaction system that has to correctly track whether I have the money for a withdrawal, and credit that money where it should be once we know the currency delivery results.

Not life-or-death, but stuff you can't afford to get wrong. Also had to deal with timing issues - minor and non-deterministic message flow and hardware activity could trigger unexpected results. In one case, one of our great testers could only reproduce a failure by performing a specific action, then counting to 10 in Vietnamese, his native language, and then doing some other action. Simulators could help with things like that, but there's a cost to building and maintaining them!

Very much appreciated the footnoted details - good humor, and tightens the focus around what you're really trying to address.

Expand full comment

Reply (1)

drmorr

Aug 31, 2023

Really interesting -- my dad worked in hardware design for many years, and of course simulation is a critical part of that workflow. Don't want your $20-bajillion ASIC to come back dead because of some unanticipated failure mode.

Another friend also pointed out that there's a lot of similarities here between simulation/testing and software fuzzing to find security vulnerabilities, specifically on the "how do you generate realistic-looking inputs to the simulator" side of things.

Expand full comment