We’re going to do a little experiment. Take 20 buttons and a ball of string, and scatter the buttons over the floor. Then take two buttons at random, and connect them using a thread. Do this for a while, and keep a count of the number of threads. The whole heap should start looking like this (Figure 1).
Figure 1. Twenty buttons connected by an increasing number
of threads. Picture redrawn from At home in the Universe, 1995, Stuart Kauffman
Now, we do another experiment — pick up a random button, and count the number of buttons it is attached to. In the figure above, we see that in the top-left situation four clusters are formed, three with two buttons each and one with three. In the bottom-right situation, which has a much larger number of threads compared to the number of buttons, all buttons but one are connected to form a single, large cluster. Apparently, there is some kind of transition going on here.
Does this remind you of anything? It reminds me of software complexity. In software, there seems to be a strong tendency for systems to evolve towards the ‘ball-of-mud architecture’. In a paper published in 1997, Brian Foote and Joseph Yoder do a great job of explaining it:
“A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated.” (sourced from this Wikipedia page)
The complexity of a piece of software is strongly correlated with our ability to reason about it, and to reason about things, we need to form a mental model in our head. Making a mental model entails knowing the parts and their connections and behaviours, so we can predict what will happen in a given circumstance.
If the number of components (= buttons) that we need to reason about at any given time is small, we can succeed at this task. However, if too many components are connected to each other, which ever way we try to think about a problem, we will find out that when we pick up one component, we drag up a whole bunch of others. And then our reasoning capabilities come to a screeching halt. Let’s call that the ball-of-mud transition.
Wikipedia also knows why this happens: “such systems are common in practice due to business pressures and developer turnover”. Under pressure, code is often changed in ways that increases complexity. Clojure creator Rich Hickey uses the verb ‘to complect’ to describe this effect, which stems from the Latin roots com, meaning ‘with’, and plectere, meaning ‘to weave, entwine’. Two things that were once separate, become entwined, connecting them together. He has much more to say about this subject, please go check out his talks, especially Simple made Easy and The Value of Values.
Quantifying the ball-of-mud transition
Let’s go back to the button model, which was first published in the book “At Home in the Universe” by Stuart Kauffman, in 1995. Kauffman used it to investigate the behaviour of biological systems such as networks of interacting genes.
The button model shows that we go from a simple, sparsely connected heap of buttons to a situation where almost all the buttons are connected in a big cluster. Can we say anything more quantitative about this? To investigate, I created a small Clojure program that simulates the experiment for arbitrary numbers of buttons. The results are shown in the graph below.
Figure 2. Results of the button model for a single run with 20 buttons (red)
and an average of 100 runs with 500 buttons (blue).
Horizontally, we have the ratio of threads to buttons. We start out on the left with no threads, and then move up the curve when we add more threads, and more buttons become connected. Vertically, we plot the size of the largest cluster we pick up, as a percentage of the total number of buttons. Moving from left to right, we start with small cluster sizes, and move up until the cluster size equals the total number or buttons: when we pick up a single button, all the others are attached to it.
The red line represents a single experiment with 20 buttons, while the blue line is the average of 100 runs with 500 buttons. For the first steps of the red line, we show the size of the largest cluster of buttons. You might think the red line was hand-picked to fit the blue one, but not so – it is a random run.
I don’t know what your best guess about what to expect would be, but to me, the graph has a striking shape — we start out on a flat slope where we pick up small clusters, and then, at some point, the clusters dramatically increase in size, to include almost all of the buttons. This sudden transition to complexity starts when the number of threads equals half the number of buttons. When the number of buttons is increased, the transition shoots up more sharply, making the transition even more pronounced. In physics you’d call it a phase transition.
The lesson for software: the complexity of software system does not increase linearly with the number of connected components – there is a clear transition from a manageable, slightly connected system to a ball-of-mud system that is heavily connected and hard to reason about. In nature, complexity is where the good things, such as life, happen. But for programmers, this is not the case. Quite the opposite. In order to keep our heads sane, we need to build systems composed of small, independent units, and keep the interconnections to a minimum. So let’s get out our scissors, and simplify!
Mark Tiele Westra is product manager for Akvo FLOW