ICYMI: Live-blogging "Towards Modern Development of Cloud Applications"
If you don’t follow me on Mastodon, I occasionally read through papers and live-blog my thoughts. Last week I read Towards Modern Development of Cloud Applications, written by a bunch of folks at Google1, and which was published earlier this year in HOTOS ‘23. I got a bunch of really positive responses to my live-blogging attempts so thought I would repost here. I don’t know if this is the kind of content people are interested in on this blog, so let me know in the comments your thoughts!
From the abstract: "With our approach, developers write their applications as logical monoliths, offload the decisions of how to distribute and run applications to an automated runtime, and deploy applications atomically."
First thoughts: oh bloody hell, I had this idea years ago and haven't had the time or resources to investigate it.
I guess that's #research life.
They surveyed some teams and found that #microservices improve performance, fault tolerance, abstraction boundaries, and deployment/rollouts.
The also found that #microservices degrade performance, have worse fault tolerance, more rigid/less flexible abstractions, and have worse rollouts.
Why don't microservices always deliver on their promise?
1) requires developers to manually determine the network topology, which is then hard to change.
2) application binaries are individually and continually released into production, making it hard to change the cross-binary protocol
In this paper they propose a different design pattern:
1) write apps as a #monolith
2) use a runtime to split up the apps
3) deploy atomically
The monolithic application is broken up into "components", which, superficially, sound like microservices. They have an abstraction layer defined through an interface instead of an API. The runtime then splits components across physical workers.
Component API calls turn into network calls if different components are on different hardware, but they remain local if they are co-located on the same hardware.
The runtime can move components around and scale them independently.
"There are many ways to implement a runtime. The goal of this paper is not to prescribe any particular implementation. Still, it is important to recognize that the runtime is not magical."
Whew. I guess there's still some work to be done, lol.
How do applications learn about the environment they're running in? Via a small library called the proclet which is linked in to the binary.
The proclet knows how to talk to Kubernetes, or to a multiprocessing/threads environment, or to an SSH-based environment, etc.
Updates are done atomically: once traffic hits a particular version of the application, all subsequent communication is restricted to that version. Then it's the traffic that is gradually shifted over from old version -> new, instead of the mix of running instance types.
I think this is a very important point; many API boundary decisions are made early on in a product's development, and those decisions become extremely difficult to untangle later on when you learn why they were wrong.
This model adds a bunch of opportunities: the runtime has a "birds-eye view" of the application, and can use that information to make better decisions for scaling, routing, testing, etc.
A prototype implementation of their runtime is available at https://github.com/ServiceWeaver
Most of the performance improvements for their prototype come from a highly-optimized network serialization engine, which can make a bunch of assumptions about the serialization format because everything is running the same version.
This paper doesn't address or solve a lot of problems in #distributedsystems, it's just trying to start a discussion about a different way to do things.
"We argue that developers should write their application as a single binary using our proposal and decide later whether they really need to move to a microservices based architecture. By postponing the decision of how exactly to split into different microservices, it allows them to write fewer and better microservices."
In summary: a very cool paper, echoes a lot of what I've been thinking about over the past few years. There's definitely still a lot of problems to solve and work to be done, but I think this is moving in the right direction!
Would definitely encourage you all to read the paper if you’re interested in more details, and also would be great to know your thoughts on the Masto→Substack format! Do you want to see more of this type of content?
Sanjay Ghemawat, Robert Grandl, Srdjan Petrovic, Michael Whittaker Parveen Patel, Ivan Posva, and Amin Vahdat