There is a key rule that I personally operate by when I’m doing incremental development and design, which I call “two is too many.” It’s how I implement the “be only as generic as you need to be” rule from the Three Flaws of Software Design.
Essentially, I know exactly how generic my code needs to be by noticing that I’m tempted to cut and paste some code, and then instead of cutting and pasting it, designing a generic solution that meets just those two specific needs. I do this as soon as I’m tempted to have two implementations of something.
For example, let’s say I was designing an audio decoder, and at first I only supported WAV files. Then I wanted to add an MP3 parser to the code. There would definitely be common parts to the WAV and MP3 parsing code, and instead of copying and pasting any of it, I would immediately make a superclass or utility library that did only what I needed for those two implementations.
The key aspect of this is that I did it right away—I didn’t allow there to be two competing implementations; I immediately made one generic solution. The next important aspect of this is that I didn’t make it too generic—the solution only supports WAV and MP3 and doesn’t expect other formats in any way.
Another part of this rule is that a developer should ideally never have to modify one part of the code in a similar or identical way to how they just modified a different part of it. They should not have to “remember” to update Class A when they update Class B. They should not have to know that if Constant X changes, you have to update File Y. In other words, it’s not just two implementations that are bad, but also two locations. It isn’t always possible to implement systems this way, but it’s something to strive for.
If you find yourself in a situation where you have to have two locations for something, make sure that the system fails loudly and visibly when they are not “in sync.” Compilation should fail, a test that always gets run should fail, etc. It should be impossible to let them get out of sync.
And of course, the simplest part of this rule is the classic “Don’t Repeat Yourself” principle—don’t have two constants that represent the same exact thing, don’t have two functions that do the same exact thing, etc.
There are likely other ways that this rule applies. The general idea is that when you want to have two implementations of a single concept, you should somehow make that into a single implementation instead.
When refactoring, this rule helps find things that could be improved and gives some guidance on how to go about it. When you see duplicate logic in the system, you should attempt to combine those two locations into one. Then if there is another location, combine that one into the new generic system, and proceed in that manner. That is, if there are many different implementations that need to be combined into one, you can do incremental refactoring by combining two implementations at a time, as long as combining them does actually make the system simpler (easier to understand and maintain). Sometimes you have to figure out the best order in which to combine them to make this most efficient, but if you can’t figure that out, don’t worry about it—just combine two at a time and usually you’ll wind up with a single good solution to all the problems.
It’s also important not to combine things when they shouldn’t be combined. There are times when combining two implementations into one would cause more complexity for the system as a whole or violate the Single Responsibility Principle. For example, if your system’s representation of a Car and a Person have some slightly similar code, don’t solve this “problem” by combining them into a single CarPerson class. That’s not likely to decrease complexity, because a CarPerson is actually two different things and should be represented by two separate classes.
This isn’t a hard and fast law of the universe—it’s a more of a strong guideline that I use for making judgments about design as I develop incrementally. However, it’s quite useful in refactoring a legacy system, developing a new system, and just generally improving code simplicity.
-Max
Personally, I do the combining when there’s three or more distinct implementations. One is ideal, of course. Two leaves a bad taste in my mouth, but sometimes there’s good reason for it. Three is just unacceptable, a sign of sloppy code.
I’m with Alex. In my early days, I would have over-engineered something even for a single implementation. Now, I typically wait for code to need to be in three separate places before I centralize it. Many times, the burden of maintaining a central library overshadows the warm fuzzy from avoiding synchronization.
(avoiding synchronization issues)
[…] Codesimplicity – Two is too many […]
Well I’m with Max: Two really is too many.
While in theory, Alex’ approach of waiting for the third implementation feels okay, my experience shows that over the years, other developers will often blindly follow the lead established by the two distinct implementations. By the time somebody finally thinks of combining them, there might not be three, but e.g. seven competing implementations š In my opinion, it’s almost always better to do it right from day 1. Having only one unified implementation will also increase legibility and help new developers understand the system.
Yeah, in my experience the thing about the “two is too many” approach is that it reliably works. There could be lots of other good theoretical reasons to use another approach, but I simply haven’t seen them actually work to keep a codebase maintainable.
-Max
One problem with a “Three is too many” approach is that by the time you come to write your third implementation, your existing implementations may have diverged, and they would need work before they could be combined into a generic implementation.
There are lots of problems with it, yeah. Another one is that it’s much easier to notice when there are two implementations than to notice when there are three. That is, there might be three implementations, but the developer only finds one other implementation, and so thinks there are only two and waits for there to be three. And yeah, as you point out, combining three implementations can be a lot more work than combining two and then later enhancing the combined solution when you need it to do something slightly different. On very large codebases this becomes highly apparently, with the difficulty of merging even two existing implementations in some cases. š
For what it’s worth, I wasn’t saying it’s a hard & fast rule either. A large part, I’ve found, has to do with how much capacity and influence you have to make such changes in the first place… but that’s more about project / time management than a rank and file senior engineer wanting to do the Right Thing.
I said two implementations leaves a bad taste in my mouth. When practical, I’d try to combine them and not wait for the third.
Thanks for the reminder Max. Good stuff.
I agree that, in Max’s example, a CarPerson class would be a bad (actually, IMO, horrible) idea; however, sometimes it’s useful to extract code or logic that is exactly, or almost exactly, the same into a separate class or module and, via whatever appropriate mechanism is available in the language you’re using (e.g., some type of inheritance, or, in Ruby, the “include” mechanism) use this shared code/logic in any class (Car and Person in Max’s example) that is appropriate. Such a design might be appropriate in the Car vs. Person example (although, without specifics, I can’t say for sure). I believe that, in Meyer’s taxonomy of inheritance (
http://se.ethz.ch/~meyer/publications/computer/taxonomy.pdf) such cases would often be categorized as “implementation inheritance”.
Couldn’t agree more. We need to take care of this specially while maintaining old code written by someone else. There I repeatedly come across, not just 2, but at least 5-6 places where you need to make the same changes. I have been developing software since last 10 years (starting with C, then moving to C++ and currently Java), I am shifting my methodologies from OOPs to functional and see considerable improvement in productivity.
The author has a lot of valid points, and definitely every developer should strive towards the principles mentioned. But back in the real world, where budget and time may be issues, how do you justify changing a part of a system that is already functional? Will your boss be happy that you now have to go back and test something that wasn’t part of the project scope?
Another aspect of this is that taking time to make a generic implementation may be great – but what if you never re-use that code? Thus your solution has become over-engineered which means longer development cycles (and more effort in fixing bugs).
As a generic principle this approach makes sense, but like anything, it needs to be tempered with moderation and common sense.
That’s a very valid concern. But I think this is where unit tests play a very important role. If you have a good unit tests as a safety net then that should help you find if you have broken anything. And you should add more if needed. Having said that, there is no alternative for end to end tests. A good test automation franework should take care of that. And if time permits you can do manual testing as well.
I go over this in some detail in The Philosophy of Testing.
Abdullah, overall I would recommend that you read the book, which answers all of these questions and more.
-Max
I program for may years and have been playing the piano and study music since I was in high school, (I am 47). I see many similarities between programming and writing serious music. There is a saying in music, something like: “You can ignore any rules in music if it makes your music sound better”. Rules or models in programming or music (and in many other areas) are just some guides once you master them, you’ll see their pluses and minuses, and then can not use them whenever you want, even if somebody else says, “Hey this thing you are doing is not right!” , if you know what you are doing, let them say anything they want.
Thatās a very valid concern. But I think this is where unit tests play a very important role. If you have a good unit tests as a safety net then that should help you find if you have broken anything. And you should add more if needed. Having said that, there is no alternative for end to end tests. A good test automation franework should take care of that. And if time permits you can do manual testing as well.
Copy and paste is like pouring sawdust into a noisy transmission–it may look like you’ve made progress but you’re going to pay for it later.
[…] line of code is hard to understand, and this looks like code duplication. Can you refactor this so that it’s […]
[…] line of code is hard to understand, and this looks like code duplication. Can you refactor this so that itās […]
[…] we can remove the manual step (b) above and keep token types in one place only. This is the “two is too many” rule in action – going forward, the only change you need to make to add a new keyword […]