Reasoning and Choice

One of the most important properties about any software system is the ability to understand what it is going to do without having to run it. This concept is usually referred to as the ability to “reason about the system.” Basically, you want to make statements about the structures, actions, and results of the system without having to see them in action first.

To understand why this is important, imagine a system with a hundred different pieces. To keep this simple, let’s pretend it’s an actual physical system, and not a computer. Let’s say that we have an automated plant that produces cars, with 100 steps from raw materials to finished car. Each of these parts makes some change to the input materials to produce an output product. There are various ways we could configure this system and each of its pieces:

We could make each piece do multiple actions, and depending on which action was taken, the next machine we choose is actually different. For example, let’s say we are converting metal into circular rods. Each car has a different number of circular rods it needs, and our rods could be made out of 5 different kinds of metal. So the machine has a program that decides, each time it gets a bar of steel, which rod it will make. This is different depending on the time of day and the current demand for our cars. Then, depending on which rod was made, that rod goes to one of five different next machines.

Now imagine that every single machine in the entire system was like that–it took a complex set of inputs and produced a complex possible set of outputs which went to a complex possible set of next machines. Not only would it be impossible for a human being to make statements about (i.e., reason about) the exact behavior of the whole system at any given time, it would even be difficult to reason about the behavior of the individual pieces.

Now imagine a different setup, where each machine takes one input, provides one output, and each machine only “talks” to one other machine (that is, its input always comes from one, specific machine and its output always goes to another single machine). Although it might be hard to think about the whole system all at once, because it’s still 100 machines, it’s easy to look at each individual piece, and from there, reason about both the individual pieces and the logical behavior of the whole system.

This is a core part of simplicity–the ability to reason about systems like this. When you look at any individual piece of a software system, you should be able to make statements about its behavior, guarantees, structure, and potential results, without having to run that piece. It should be clear exactly how that piece can interface with the rest of the system–either we should know exactly what calls into it and what it calls, or we should understand the structure that creates the boundaries of how the piece can be used. For example, this is why the concepts of “private” and “public” functions in many programming languages ease the ability to reason about the system–they are boundaries that tell us what can and can’t possibly happen. And when you look at the actual implementation of a function or class, it should be easily possible to understand the actions it’s taking by reading the code and comments. This is, for example, why naming is so important for functions and variables–because good naming allows the reader to reason about the behavior and boundaries of the system.

Choice

There is another very important component to enabling systems to have this quality, though. To explain this part, imagine that each of the machines in our imaginary car factory was not automated, but was instead run by a person. This is more like a software engineer who is typing actual code, “running” the machine of their IDE, computer, compiler, programming language, etc.

In our first example, where we have complex machines making complex decisions, imagine that all of the choices the automated machine was making before, now a human being has to make. That is, every time a piece of metal comes into our machine, a human being has to look at it, decide what type of metal it is, decide what rod to make, and all based on looking up the current demand for cars and noting the time of day. Now, in a real factory, some of that might actually be acceptable. It does at least create an interesting job for a person to do. But even there, you can see that you would be opening the door to a lot of mistakes and bad results.

Compare that to our latter example, where we have simple machines that have simple inputs and outputs. They would be so easy for a person to operate that you could have one person operate multiple machines, probably, and you would eliminate almost all potential for mistakes or bad results.

Now take into account that in programming, the programmer is often operating tens or hundreds of these “machines” in terms of the classes and functions that they maintain. So a better analogy for the complex car factory is having one person run all one hundred machines. As you can see, if each part of the system offers too many decisions to operator has to make, creating our “car” quickly becomes impossible. Even if you could do it, you would be manufacturing cars tremendously slowly and burning out the people operating your machines. And lo and behold, that is exactly what happens to teams that have to maintain software systems that have that level of complexity.

What’s the key point here that we introduced, though, when we added human beings to our “factory?” We introduced the factors of decision (something a human being does with their mind) and choice (options that are presented to a human being).

There are some schools of thought that say that all developers should be empowered to make every possible decision about their software system, at all times. This sounds great, because it sounds like it’s providing intellectual freedom to intelligent people—something that we all want. However, if you take this principle too far, you actually end up creating the complex car factory for your developers—a system where there are so many choices to make that they either become paralyzed, are guaranteed to do it wrong, or develop wildly inconsistent systems that others can’t easily make heads or tails of.

So what’s the solution here, is it to remove all choice from everybody, and make them into mindless automatons carrying out the will of your Chief Architect? Well, I’m sure there are some Software Architects out there who would like that, but actually, that’s a bit extreme of a solution. The answer is to instead recognize which choices are important for a developer to be able to make, and which are unimportant.

This differs depending on who you are in a software team and what point you’re at in the lifecycle of your software. For example, if you’re just starting up a new company and you’re the first developer, it’s important that you be able to choose almost everything about the basic platform your company will run on–the language you’re using, the frameworks, the libraries, etc. But even then, you don’t want those frameworks and libraries to present you with decisions you don’t need to be making. Imagine if a compiler stopped and asked you exactly how it should optimize each piece of code. Would that help you or aid your productivity? Would that actually be a net benefit for your company or the goals you’re trying to achieve? I don’t think so.

Then, at a different point in the lifecycle of a project, once you have standardized on a language and a specific framework you’re using, you usually wouldn’t want to allow a random junior developer to choose a different language or framework for their part of your codebase. It’s a decision that they don’t need to spend time making–it’s more productive for them to just go with the flow. Even if there is a better language or framework they could be using, re-writing your entire system just to implement this junior developer’s one feature doesn’t seem like a good use of your resources.

In the aggregate, if you can remove enough choices that developers don’t need to have, you can actually save quite a bit of developer time across the scope of an entire company. Imagine if every team in your company had to spend two weeks going through a review of different frameworks before they could start developing their system. Now imagine that you standardized on a framework that was good (that is, it was capable of fulfilling all the business needs of everybody who was going to use it) even if not perfect, and nobody had to make that decision anymore. How much engineering time would you have saved the whole company? That’s huge–bigger than almost any other productivity improvement you could make, in the long term.

Now, it is important to keep in mind that there are decisions that developers need to make. They absolutely need to be able to decide how the business logic of their system functions—that’s the core requirement for them to be able to do their jobs. There have been frameworks and libraries in the past that simply don’t allow people to actually write the systems they need, and that’s a level of restriction that’s detrimental to productivity. For example, imagine that your company standardized on a framework that supported HTTP but somehow fundamentally could not support SSL (that is, no HTTPS). That would be disastrous when you needed to encrypt your connections for security purposes. So that would be a very bad restriction.

This is a very tricky line to walk, sometimes, but in general I have found that erring on the side of deleting choices actually makes developers happier in the long run, because it makes them more productive. This is very tough at first when you take away certain choices from people, because they feel like you are impacting their personal freedom. And in a way, in the short term, you are. But the truth of the matter is that you’re trying to provide much more freedom to create—the freedom that that developer actually wants, fundamentally. The purpose of restricting choice should always be to improve the ability to create systems. You’re not killing production, you’re deleting distractions, barriers, and confusions in the form of choices that somebody simply doesn’t need to be making.

-Max

3 Comments

  1. In practice, there is almost always some dependency on opaque legacy subsystems that makes both the reasoning and choice objectives impractical to fully achieve. Nice objectives in theory, but you need enough fault tolerance so that not completely meeting those objectives does not threaten to lead you into deceptive reasoning or too much rigidity in your choice policy.

    As far a reasoning goes, I would rather have a well organized set of automated tests to understand than the entire code base. Automated tests makes it trivial to actually validate the presumptions that your reasoning is based on instead of being bitten by too clever developers who may be long gone or hidden dependencies on opaque legacy subsystems.

    • I understand what you’re saying, but I do think that it’s often within the power of the individual developer to actually make the part of the system that they are working on simple enough to reason about. It’s true that the complexities of underlying platforms can make this very challenging, especially when they weren’t designed with these principles in mind. But in actual practical application, one can design systems where the pieces of those systems can be reasoned about with reasonable accuracy.

      Even the point of fault-tolerance ideally involves the ability to reason about the system–its behavior under error conditions.

      It’s fine to be able to reason about a system via its tests, that’s a very helpful tool, I would agree. And they also do a great job of validating the reasoning that you have, that’s true. But I would hope that the code itself, when you look at it, is something that the expected reader would be able to reason about, and that when integrated into the larger system, doesn’t present to the user choices that they don’t need to be making.

      -Max

  2. however everything is being automated nowadays. I wondered what ENGINEERS do then? the blog is very helpful for me that technology is growing by creativity of humans. Thanks for Sharing the knowledge. It really benefits software developers with mechanical knowledge.

Leave a Reply