Live-blogging "A Survey of Multitier Programming"
It’s been a while since I’ve done one of these live-blogging things, and I’ve gotten a whole bunch of new followers since my last one, so I figured I’d try it again. I also didn’t get a “real” post written for today, but I did write a whole bunch of documentation for SimKube!
Also I’m just copy-pasting the same stuff I wrote down on Mastodon, so if you follow me over there this will be a repeat. Sorry about that. Nobody’s ever told me whether they like seeing this content over here or not, so until somebody does I’m just gonna keep doing it. Anyways, here goes.
This is "A Survey of Multitier Programming" by Weisenburger, Wirth, and Salvaneschi.
https://decomposition.al/CSE290S-2023-01/readings/multitier-survey.pdf
First, what even is multitier programming? First sentence of the abstract: "Multitier programming deals with developing the components that pertain to different tiers in the system (e.g.,
client and server), mixing them in the same compilation unit."This is something I've been thinking about for Kubernetes for a while so I'm interested to learn more!
More details right out the gate:
"The code for different tiers is then either generated at run time or it results from the compiler splitting the codebase into components that belong to different tiers based on user annotations, static analysis, types, or a combination of these."
And, "the goal of the multitier approach is to improve program comprehension, simplify maintenance and enable formal reasoning about the properties of the whole distributed application."
No way! "Developing distributed systems is widely recognized as a complex and error-prone task."
Problems include concurrent execution, different programming environments between e.g., client and server, synchronization, data serialization, etc etc etc.
There are lots of "individual" solutions that try to target these problems in isolation. Multitier programming instead says "let's try to solve them all at the same time" which sounds harder but maybe is actually easier?
This paper is abbreviating multitier as MT and I don't like that.
But anyways, lots of different options out there for multitier programming, the success of multitier programming is highly domain-specific thus far, and this paper is a survey paper to try to help people understand what the various techniques are and how they might apply to different domains.
The standard "multitier" architecture has 3 tiers, or layers: presentation layer, processing layer, and data layer.
What I find interesting is that many companies still think their architecture uses these three tiers, but in a SOA world, we actually have N tiers, where N is the number of microservices that you are running.
The paper makes a distinction between "homogenous" multitier in which tiers have approximately the same computation model, and "heterogeneous", where tiers have different computation models. An example of heterogenous is when your data layer is a SQL database, because SQL is a significantly different computation model than C. This paper will only focus on homogeneous multitier.
Translation: running distributed datastores is a pain in the ass, we ain't touching that with a 10-foot pole.
Table 1 is a list of 29 "multitier" languages and a short description of each. None of these are "mainstream" programming languages, although there is an unfortunately-named language called "Swift". I spent a while trying to figure out if this is the same "Swift" as Apple's and I believe the answer is "no it's not" but I'm only like 98% sure of that.
Next up is a comparison of the "echo" program in a few representative languages. I'm kinda skimming this bit, but there's at least one language that uses embedded XML! Wild!
However, most of the languages they're highlight require the programmer to annotate "which" tier this particular block of code is supposed to run on. Which is fine if you have a 2 or 3 tier system, but quickly becomes untenable if the number of tiers is a very large N.
One system they highlight that doesn't require this is Distributed Orc, which tries to split apart a program onto a distributed system to minimize the communication overhead.
(reference: https://dl.acm.org/doi/10.1145/2957319.2957370)
The paper spends a bunch of time discussing "placement axes" which basically are just "how does the multitier system know how to split your code up and where does it run the code that does get split apart?"
After "placement" comes "communication", which is "how do the different tiers talk to each other?"
All your standard "network communication" players that you'd expect are here: RPC, message-passing, pub-sub, etc.
And after "communication" comes "formal methods". Definitely some really cool stuff here but also like, we can't even formalize non-distributed-systems!
(hyperbole for effect)
Unsurprisingly, (or maybe this is actually surprising? I don't know) of the 29 multitier languages described in Table 1, only 6 of them have any sort of formal reasoning built in.
Now we're at the "Discussion and Outlook" section. First sentence is my main takeaway from the article:
"A significant limitation of most existing MT research languages...is that they do not address generic distributed systems but consider only the client–server architecture with clients of the same kind, mostly in the limited setting of web applications."
Fault tolerance/resiliency is another big area for future research that they highlight. Scalability is another -- most multitier systems don't have any support for modularization or other helpful abstractions.
And that's a wrap! It's a great article with lots of really helpful references -- 7.5 pages of them, to be precise -- so seems like a good place to start learning more about the subject. I flagged a couple references that I want to read next.
In terms of my takeaways, I think there are two really interesting things that the article didn't discuss at all.
1) I really want to know if anyone has tried building a multitier language that "compiles" to Kubernetes. E.g., you write your code and it generates all the appropriate Kubernetes primitives to run on a normal cluster.
2) I also really want to know if anyone has tried to take a more traditional language like C++ or Golang and make it multitier. The paper hints at this approach, and the recent Google paper in HotOS also describes this type of approach, but it still seems like a problem which is far out of reach from the "mainstream"
And if we combine those two ideas, then we get an approach that automagically decouples your monolith! Just write one giant codebase and then have a system that deploys it in a distributed fashion and you don't have to think about it at all.
Wouldn't that be nice?
Anyways, that’s all for this week! Thanks for reading my phoned-in stream of consciousness. Next week I’m gonna write a post about design docs, so that ought to be fun.
~drmorr