The plain fact is that no matter how hard one entity is to deal with, it’s exponentially harder to deal with more than one. Everything is more complex when you go from one thing to more than one thing. The additional complexity may not be visible, or detectable, or appear to cause problems, but it’s still there and needs attention paid to it. At some point, when scaling begins to bite, problems inevitably appear.
There are times during consulting engagements when complexity is all around you, and how you manage that complexity is a never-ending challenge. It’s not just that you have two computers instead of one: you also have the communication between them to consider. In fact, in this engagement particularly, one and one equals three. The three are not identical things—two are computers and one is a communications channel—but nevertheless, adding the connection between machines has created an additional entity that needs to be managed. As an unwelcome side effect, you also now have the added complexity of two different kinds of entities in your system (PCs and comms), requiring different skill sets and knowledge to manage. In a way, it’s a bit like having kids: when you have two (or more) you not only have to deal with each of them individually, you also have to sort out any problems that occur when they interact—and the interactions can take more effort to manage than the individual kids. With as few as five kids, there are ten different ways fights are possible and fifty ways in which the group can fragment into warring factions.
It’s not just the extra complexity that generates problems. To really get to grips with distributed systems (systems whose constituent parts are physically separated but work together), you have to start to think in a completely new way. This has been the challenge with this particular engagement, to be completely honest. I have stated many times that you can’t make the same set of assumptions that you might for monolithic systems. You need a wholly new, completely different set, one that is diametrically opposed to the original set in both thinking and practice. In other words, you need a set of reversed assumptions. An understanding of the idea of reversing your assumptions is critical in changing the way you think about systems; it’s essentially about shifting your perspective so that you see the system as a whole and not just a collection of individual machines.
I love these reversed assumptions, and they have come in handy when explaining many things. Enjoy.
Traditional: One Reversed: Many
This is the first, and most basic, change. You can no longer assume from the start that there’s ever going to be only one of anything again; you must start with the assumption that there is more than one. How many? That depends on how many you need. How many will you need in the future? How long do you want your system to last, and what’s a safe growth rate to assume? These are all questions you need to find answers to for each and every kind of thing represented in your system. That’s a lot of questions, but they have to be asked; get the answers wrong, and you will have a Type 3 Failure (scaling) on your hands. Regardless of the scaling issues, you must first and foremost assume a plurality of everything you have in your system.
So, the first reversed assumption is really about scaling in all things, and the need to understand both the scaling trends and the breakpoints (where one method of doing things stops working efficiently once a certain size has been reached, and another method is required). The most important breakpoint, where things change most radically, is between having one item and two items, because the whole nature of the problem changes.
The point here is that with the traditional assumption (of a single entity), the quantity is known and, perhaps more importantly, also fixed (at one). This has the advantage of making the passing of time irrelevant, for if the number of entities doesn’t change over time, it can’t be a consideration, so your solution is automatically future-proof.
However, the reversed assumption implies that the quantity may vary, so even if you know its value now, you may not know it in the future. You have, essentially, just acquired a new variable in your calculations, a new degree of freedom to play with, and a scaling problem that you didn’t have before. If there were many singular things in your world, then reversing this assumption gives you many new degrees of freedom, all potentially independent of each other. It wouldn’t be unreasonable to describe this flexibility as an embarrassment of riches.
To further complicate matters, different parts of a system may scale at different rates. This is generally not too bad. If you know about a system’s scaling requirements, you can track them and throw hardware at the problem when it’s needed. However, it’s generally a good idea to avoid being caught out by a scaling issue that you can’t handle without rewriting chunks of your system’s software. Unfortunately, this is likely to take time and money, which leads us to the second lesson: modularity. Make sure your system can be pulled to bits so that replacement parts can be inserted with ease. Once again, my design principles come back to roost.
In order to deal effectively with scaling issues, your project will need some good capacity-planning skills. With good design and appropriate testing, you can have confidence that your system will overcome scaling problems simply by adding more or faster hardware, which is the easiest, quickest, and often cheapest way of upgrading a system.
Now, fold in the physical fact that two things can’t be in the same place at the same time. Whereas one thing can obviously only be in one place, many things must be in different places, geographically speaking, whether spread worldwide, adjacent on a desk, or as neighboring chips on a circuit board. But the distance is irrelevant: it’s the multiplicity of locations that defines a system as distributed.
I’ll cover the reversed assumption of monolithic/distributed systems in my next installment of Notes from the Field.