Practicing Decomposition: A Basic Example
Introduction
I concluded in an earlier post -- Make It Simple -- that decomposition and modular design will lead to simpler code. By simpler code, I mean code that is easier to reason about, easier to extend, easier to re-use, and easier to maintain -- all throughout the lifetime of the software. Managing complexity is the highest technical imperative of software engineering -- it will enable you to deliver better features, faster, with more happiness.
It would seem nearly any software engineer who cares about quality will agree that creating modular code is the right thing to do. But like most "right things to do" (like the so many "dos and don'ts" espoused by parents, governments, and religions) once you are out in the real world, a world of deadlines and tech debt and sloppy coworkers and demanding managers and mercurial customers and fierce competition -- you often find yourself led away by temptation, straying from the straight and narrow. Veering off the high road. Cutting corners. Getting sloppy. Hacking.
It seems that if we trim down the number of "right things to do" that we -- and our teams -- constantly track in our heads as we work, if we choose a small set of principles with high rates of return, then we will make ourselves more effective. Principles we never abandon, no matter how complex the problem, no matter how critical the deadline. Golden Rules.
To that end, I suggest adopting at least the following two:
- Decompose the Problem: Take time to think deeply about the problem; iteratively break it down into a hierarchy of discrete sub-problems; repeat until good enough (not perfect!).
- Encapsulate into Modules: Build simple abstractions around individual sub-problems or complicated solutions.
For this post, we'll use a trivial example: Project Euler, Problem 2, "Even Fibonacci numbers" (link).
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.
Any novice engineer should be able to figure this out without breaking a sweat. However, there's enough going on that by working through a few decompositions we can demonstrate the benefits of these two principles.
I'm currently building up my skill in Go (hence why I had been working on Project Euler problems), so that is the language my solutions will be written in. Fear not: they will be simple enough that anyone should be able to follow along without trouble.
Decomposition One: The Interwoven Solution
One might read the problem statement, think up the solution in their head instantly, and write this single function with little trouble. There's nothing intrinsically wrong with this code. It's straightforward. It gets the job done.
func EvenFibonacciSum() int {
max := 4000000
var sum int
a := 0
b := 1
for b < max {
if b % 2 == 0 {
sum = sum + b
}
temp := b
b = b + a
a = temp
}
return sum
}
But is this the best decomposition we can come up with? Does this break down the problem into discrete bits we can think about, develop, and test, one bit at a time? Does this provide us with the most modular, re-usable code?
One reason this code seems acceptable is because the problem itself is basic, and only requires a basic solution. This code is easy -- it is short, with some variables and loops and if statements -- but it is not simple (for more on simple vs easy, see Rich Hickey's tech talk). If you look closer, there are separate concerns mixed together into a single interwoven solution. While this style doesn't negatively impact this code, anything even slightly more complex will result in significantly more tech debt and maintainability issues, especially as new features and bug fixes get slapped on over time. I have often seen this play out in real world projects.
Decomposition Two: The Half Measure
Okay, let's refactor a bit. Let's try cleaning up this code with some private functions (those are always good, right?).
Our EvenFibonacciSum
function is looking better. It's shorter, easier to see what's going on, and more readable. And we have made steps towards splitting the sub problems out into independent components.
func EvenFibonacciSum() int {
fibs := fibonacciSequence(4000000)
return sumIfEven(fibs)
}
func fibonacciSequence(max int) []int {
fibs := []int{0, 1}
for nextFibonacci(fibs) < max {
fibs = append(fibs, nextFibonacci(fibs))
}
return fibs
}
func nextFibonacci(fibs []int) int {
return fibs[len(fibs) - 1] + fibs[len(fibs) - 2]
}
func sumIfEven(nums []int) int {
var sum int
for _, num := range nums {
if num % 2 == 0 {
sum += num
}
}
return sum
}
The title of this section is "Half Measure" for a reason. In my opinion, the overuse of private functions to manage complexity is a flaccid attempt to sweep logic and complexity under the rug. Don't get me wrong: private functions are a powerful tool for writing clean code. Unfortunately, I have too often seen them used to hack out a half-hearted attempt at clean code, rather than having the discipline to decompose and develop a fully modular solution.
But even if we moved these functions into their own independent modules, I would argue they are not optimal.
fibonacciSequence
assumes we'll always generate sequences of fibonacci's up to a maximum number. What if we want to get the first 100 numbers? Of the set of numbers between some range?fibonacciSequence
as a name is too general, too vague.sumIfEven
still contains two concerns: summing numbers and only including even numbers. Yes, it's reusable, but only in very specific situations.
Decomposition Three: The Modular Solution
When coming up with excellent decompositions, I highly recommend slowing down, and spending some time to think intently about the problem. Even if solutions pop in your head and attempt to steal your attention, jot those ideas down, and re-focus on the problem at hand.
If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.
-- Albert Einstein
How many times have you (or someone you have been interviewing) screwed up an interview question because you rushed into implementation, and didn't spend enough time thinking about the interviewer's question to fully understand the problem, and catch all the tricky little details and edge cases? How many times have you delayed delivery on sprint tasks and projects because you ran into a significant problem halfway through, and realized you could have avoided the mess with a bit more design?
Next time, remember Einstein's little quote above, and slow down. Think about the problem.
Let us begin by focusing on the Project Euler problem a little more deeply.
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.
We can peel off pieces of this problem until we have a set of separate concerns, some independent sub-problems:
- Creating a sequence of Fibonacci numbers up to some maximum
- Filtering a list of numbers to only include numbers that meet some criteria
- Checking if a number is even
- Adding all the numbers in a list to get a sum
My next step would be to open a text editor, and write out the top-level solution as a single function. I would get the function's interface right, and then write the implementation code with methods and objects even if they do not already exist. For non-existing constructs, I'll shape their interfaces into forms that are modular and re-usable, based on the needs of this problem, but also keeping in mind how these abstractions could possibly be used generally. I'll write and re-write for a few minutes, shaping the methods and objects until everything is readable.
func EvenFibonacciSum() int {
fibs := fibonacci.UpTo(4000000)
evenFibs := numbers.Filter(fibs, numbers.IsEven)
return numbers.Sum(evenFibs)
}
I'll look to see if some constructs already exist in the standard lib or in a lightweight, well-supported internal or 3rd party library with an open license (no reason to re-invent the wheel). For standard stable libraries, I may use them by this function directly. For more complex libraries or services, I'm a fan of encapsulating them behind the abstracted interface we sketched in the step above.
Let's finish up the implementation of our solution. Our fibonacci
package.
package fibonacci
// Returns a Fibonnaci sequence made up of values no greater than the given max
func UpTo(max int) []int {
if max < 1 {
return []int{}
}
if max == 1 {
return []int{0}
}
fibs := []int{0, 1}
for next(fibs) < max {
fibs = append(fibs, next(fibs))
}
return fibs
}
// (in Go, lowercase fn's are private)
func next(fibs []int) int {
return fibs[len(fibs) - 1] + fibs[len(fibs) - 2]
}
Our numbers
package:
package numbers
// A function that evaluates a number and returns true or false based on
// its condition expressions.
type Condition = func(num int) bool
// Filters down a given list of numbers to only include those numbers
// for which the given Condition returns true
func Filter(nums []int, filter Condition) []int {
s := make([]int, 0)
for _, num := range nums {
if filter(num) {
s = append(s, num)
}
}
return s
}
// Returns true if the number is even. This signature is of type Condition.
func IsEven(num int) bool {
return num % 2 == 0
}
// Sum up a slice of numbers
func Sum(nums []int) int {
sum := 0
for _, n := range nums {
sum += n
}
return sum
}
I'm not claiming this is the best possible solution to the problem, but it has a lot of qualities I like:
- This code is readable: each part self-describes how we solve the overall problem (this alone makes it worth it, IMO)
- This code lowers cognitive load: this process allows us to divide and conquer, and work on one sub-problem at a time. For instance, when working on generating a list of Fibonacci numbers, you only need to focus on that requirement; you can forget the others; and once done, you don't need to hold the whole Fibonacci implementation in your head.
- This code is open for extension: if requirements for the overall problem change, we could easily add create more sub-components or modify the use of the existing components. The only thing that changes is this top level
EvenFibonacciSum
(or create a new function reflecting the new requirements). As time goes on, as we implement new solutions, If we decide our abstractions for the sub components are not quite correct, we can simply extend our modular packages (e.g.numbers
) with better interfaces, and switch over existing solutions to use the new ones, and then deprecate the old ones. - The sub-components are orthogonal: one part does not depend on another unrelated part.
- The sub-components are re-usable: by putting them as public functions in their own general packages, we have created solutions that can be easily and clearly re-used in other problems that may need to use them.
- There is a promise of less code over time: as we encounter situations where we can reuse these modular components, we will avoid writing more code. We have less to maintain.
We reap all these benefits by taking the time to think hard about the problem, and then iterating through multiple decompositions -- a break down of the problem into a cascading acyclic hierarchy of sub-problems -- and once we find a decomposition that seems good, we encapsulate each one into a module using the various tools provided by our languages, libraries, patterns, frameworks, and coding paradigms.
This all might seem like a lot of work. It might seem like it will slow you down. However in my experience, it has always empowered me to write code with a higher degree of maintainability, and write it faster.
By thinking hard about the problem at hand, I more often catch hidden requirements, nuances, conflicts, and gotchas (instead of discovering them halfway through my implementation, causing me to throw some or all of my work). By working with independent sub-modules, I can focus on implementing one at a time, without having to worry about the others. By starting from the top-down, and only writing sketches of the interfaces needed before diving into implementations, I discover problems and blind alleys faster. At work, once I develop the stubbed structure of the solution, I can send out a "request for comments" code review, and get feedback about the general approach before I waste time implementing it (admittedly, I don't do this often enough). Depending on the amount of work, I can send small, bite-sized code reviews for each sub-component and associated tests.
However, do not iterate to perfection -- instead, aim for "good enough." Write the most maintainable code given the current context: the level of complexity, expected amount of change over time in the code, expected lifespan of the code, importance of the feature, and time available before a result is required.
Conclusion
My humble intent is for software engineers to think harder about problems, from multiple angles, and to iteratively break down the problem, multiple times, until arriving at a sketch of a solution that can then be filled out with code. Favor decompositions that result in modular pieces that are assembled into a final solution in one places. Iterate fast. Avoid analysis paralysis. Aim for "good enough."
This should not just be a "good thing to do," something discarded at the first sign of trouble. When things get tough, when complexity mounts, when deadlines loom, these principles are arguably the most important processes to follow, because these are precisely what will make the biggest gains in managing complexity, getting stuff done faster, and increasing code maintainability over time.