2 Comments

Well, Part I was a real flashback. I've worked on distributed systems in one way or another for 40 years. Granted, a very different scale, but so were the tools and capabilities and training, so I suspect the challenges were comparable. The notions of simulating failure modes, the challenges of formal proofs - all stuff we wrestled with in building banking systems, from the pre-Internet era (dial-up modems) to ATMs to Internet banking.

ATMs were particularly interesting in that they are a physical/electro-mechanical device, with their own collections of failure modes. Many of those failures were not readily reproducible with physical manipulation, and even those that were reproducible were time-intensive to work through a complete test scenario manually. Hence, the need for simulators to "play the role" of some piece of the system, whether a cash dispenser, card reader, or bank transaction system that has to correctly track whether I have the money for a withdrawal, and credit that money where it should be once we know the currency delivery results.

Not life-or-death, but stuff you can't afford to get wrong. Also had to deal with timing issues - minor and non-deterministic message flow and hardware activity could trigger unexpected results. In one case, one of our great testers could only reproduce a failure by performing a specific action, then counting to 10 in Vietnamese, his native language, and then doing some other action. Simulators could help with things like that, but there's a cost to building and maintaining them!

Very much appreciated the footnoted details - good humor, and tightens the focus around what you're really trying to address.

Expand full comment