Tech debt - broken windows and normalization of deviance
I believe that our language has a strong influence on how we see the world. An example is when we lack certain words to accurately describe what we want to say - For example, if we don’t know the word crimson we might call the color red or dark red. Until we learn new words for different shades of red we won’t be able to clearly communicate a color we saw.
In technology and engineering I’ve seen this happen with the umbrella term “Technical debt” (or tech debt) - We tend to call a wide variety of issues “tech debt” just like a person would use “red” to describe crimson, if they did not know the word.
“Oh, that’s technical debt”
(Everyone, circa 2020)
Overusing the term - Normalizing deviance and broken windows
Before we go further, I would like to introduce two concepts:
The first one is Normalization of Deviance. Coined by the sociologist Diane Vaughn, it describes the situation when an incorrect or immoral action is so common that it becomes acceptable within an organization or group of people. The most recognizable manifestation of normalization of deviance is people saying “we’ve always done it this way” about things that should be done differently.
The second concept is the broken windows theory, that also touches the topic of social conformity. It proposes that small issues (broken windows) cause larger problems by signaling to others the social norms of that environment. So if a neighborhood has a lot of broken windows and graffiti on the walls, it signals impunity to all the residents, leading to higher rates of more serious crimes.
What do these concepts have to do with technical debt? Consider this:
- If we call all code issues “tech debt” people around we will start feeling that tech debt is normal and they will add more of it (Normalization of deviance).
- By allowing and calling small issues tech debt (broken windows) we are signaling “impunity”, and we open the door for larger problems (more serious crimes).
Tech debt - A clearer definition
The term “tech debt” is fantastic because it factors in the economics of engineering. Software development is always constrained by resources, and building a perfect solution from the start is not often (never?) possible. That is why we “borrow” resources, usually time, in order to accomplish the task at hand. Borrowing time is often done via sub-reasonable decisions that compromise some other factors for speed of delivery.
But a lot of the things we usually associate with the concept of debt have vanished from tech debt. For example, when someone contracts debt there’s usually the intention of paying it back. There is often a payment plan, an pre-evaluation of cost and interest, as well as an estimation of full repayment date.
Those things generally don’t happen to what we nowadays call tech debt. The term became an euphemism of sorts - a term that excuses us to do bad engineering:
- Piece of code that someone put together because they didn’t know any better? Tech debt.
- Hacking something together just to get a task done? Tech debt.
- Building something without a plan, ending up with a big ball of mud? Tech debt.
- Bug ridden code? Tech debt
This is one of the key points I’m trying to communicate - tech debt has grown too much apart from its origins. We contract debt without laying down a payment plan, we generally have no idea when the debt will be fully repaid, and we pay little attention to the costs and interest of that debt.
We established that there’s value in the concept of tech debt, but it’s definition right now is too broad. Let’s try and redefine what is tech debt in light of the economic concept it tries to mimic.
1 - Tech debt is not about taking sub-optimal decisions. Its about taking “sub-reasonable” decisions.
Most of the time it’s impossible to reach an optimal solution. Something can always be better, and something can always be optimized no matter how much we work towards a goal. That leaves us with realistic sub-optimal decisions that are enough - let’s call it a “Reasonable solution”
Let’s call the space between a solution that is barely acceptable and a reasonable solution the sub-reasonable solution space. This is where technical debt will live because it’s good enough to be accepted, but not good enough to be reasonable. We can then define technical debt as the “distance” between our current solution and the reasonable solution in teh current context.
Current solution + | Tech debt (Effort) +<-----------> +xxxxxxxxxxxx+-------+------------+----- ... -------> ∞ 0 + + + Barely acceptable Reasonable Optimal solution solution solution <-----------> <------------------> <---- ... -------> ∞ Unacceptable Sub-reasonable Over engineered solution space solution space solution space
2 - Tech debt is taken deliberately
For something to be considered debt we should be aware that we’re implementing a sub-reasonable solution. You don’t accidentally walk into a bank and open a credit line without your knowledge and consent, so that shouldn’t happen either when writing software.
This rules out the cases when someone does bad engineering not as a way to buy time or resources, but because they didn’t know any better.
Since now we are sure that we’re aware of contracting debt, here are the factors that should be taken into account when deciding to go forward:
2.1 - Opportunity cost - An estimation of the debt interest is taken into account
You wouldn’t walk in to a bank and accept a credit line without knowing the interest rate first. Same thing should apply to tech debt.
Before we commit to contracting a debt we should do a rough estimation about how much will it cost us:
- Cost of repayment - how much will it cost to fix the issue. Will we have to rebuild the whole feature? Will refactoring be enough?
- Debt interest - how much will it cost to maintain (or expand) a poorly designed feature. This should include
- Home much more extra time and money will it take.
- Lower team morale for maintaining poorly written and poorly documented systems.
- Possible system malfunctions from poor design, incorrect maintenance, or incorrect extensions / reuse.
This is what is called the opportunity cost of our decision.
2.2 - There is a clear benefit to taking debt, like allowing us to invest the current time and resources
We now know that the debt has a cost. But no one would take a debt if there wasn’t an upside to it. So now we need to consider the opportunity cost of the alternative - properly engineering the system - and weigh the costs and benefits
Here’s an example (and no, you don’t need to make a table every time):
|Taking the debt||Proper solution|
|Benefits||* Able to deliver the feature on time
* Able to deliver some other feature that is critical to customers
|* Easy extension of the feature
* Easy maintanability of the feature
* Quality and stability
|Opportunity cost||* Maintaining a sub-reasonable system
* Possible malfunctions
* Poor extendability
* Team morale
|* Delay in project deliver
* Decreased trust from customers
* Increased upfront cost of feature
The decision then becomes: * Is the opportunity cost of taking the debt acceptable, given the benefits? * Is the opportunity cost of properly engineering the solution acceptable, given the benefits?
In software engineering the benefits of taking the debt are generally related to time. That is, our time is more valuable right now than in the future. And this should be quite familiar to any programmer who has experienced deadlines - we have a finite amount of time to ship certain feature, which increases the value of our time right now.
In startups this is even more relevant, since we don’t know if there’s going to be a next year for the company. This means all the long benefits pale in comparison with the opportunity cost of taking the proper approach - the risk of the startup dying.
NOTE: There is a caveat about this kind of thinking. If we’re always on tight deadlines our time is always more valuable now, than in the future. This creates a bit of a paradox in economic terms - if our time this month is worth 2x of next month’s time, and next month’s time is worth 2X of the month after (and so on) it either means that our time now is infinitely valuable, or that our time in the future is worth virtually zero.
My point here is that we should rethink and take a critical look about our time value estimations. How much more valuable is our time today, compared to our time tomorrow? We all need to draw a line somewhere - when do we allow ourselves to become craftsman/craftswoman and do proper engineering, instead of hacking things forever as we go along?
3 - Tech debt is taken with expectation that it will be repaid
If there is no expectation of repayment, then by definition we shouldn’t be calling an issue tech debt. It might just be bad code that will have to be maintained until the feature is discontinued. If that’s the case we’re basically pay interest on that debt during the entire lifetime of the feature, so might as well call it “Tech lease”.
Another case on not planning to repay the debt is throwaway code for temporary features - Since the lifetime of the feature is so short, it might not make sense to properly engineer something (however, note that it’s quite common for temporary features to become permanent). The difference between throwaway code and tech debt that we have to maintain forever is communication: The team knows that throwaway code should not be reused or extended because it will be removed.
3.1 There’s a strategy for repayment (and not in a “Someone else’s problem” way)
The debt has been taken deliberately, after a careful evaluation, and we have the intention of replaying it. However, having the intention of paying it back is generally not enough, since without appropriate visibility and prioritization things end up shelved or forgotten.
Similarly to a monetary debt, we should have a payment plan for tech debt, and have an estimation of payment date. This should consider the interest on the debt, and the cost of repayment, and can always be renegotiated. But not having a plan seem much worse.
Another important factor is visibility. The repayment plan should be visible to the rest of the team and not only in our heads. If the plan is visible people will have more realistic expectations and will feel generally less frustrated.
Getting new vocabulary for non tech debt things
We have a quite strict definition of tech debt, and as you might have noticed, a lot of things we might have called tech debt do not fit that definition. In my personal experience I’ve heard people call tech debt to things that, in my head, are very different from tech debt (like calling crimson red). Here I propose a bit more granular vocabulary:
Bugs - code that does not work
If some code does not work as designed, throws unhandled errors in certain branches of execution, or outright crashes every now and then: that shouldn’t be called debt.
Calling bugs tech debt feels like an euphemism of sorts, to deny that the code is not working, or maybe avoiding a term that management or the client know the meaning of. We might say that we needed to prioritize other bugs or features, and that’s why it’s called “debt”, but I honestly think that this also falls into the self deception field.
Tech rot - this used to be a good idea
Rot is part of a software that was appropriately constructed, but wasn’t appropriately maintained, or changes in the business environment made its structure outdated. We all know some part of a software that is due for refactoring after years of small changes, or some decisions that sounded good a while ago, but not so much anymore
Another factor is our own experience and available tools. Probably we found new use cases for the system, discovered novel architectures, or a new framework came out that removed some of our initial constraints.
In this case it starts to make sense to talk about debt, as the code works (a big difference comparing to bugs), but it’s just hard to maintain or change due to its age. However because it wasn’t taken with deliberately I don’t like to call it debt.
Dead tech - where is this used?
This is a tricky one. At a glance dead tech doesn’t need to be changed/maintained, and it’s not in use. It’s just there, sometimes in the whitespace around a normal work day without bothering. Because dead tech is not in used it’s not causing malfunctions, and generally doesn’t need to be changed.
We might argue that it only marginally impacts daily work, like an almost a zero interest tech debt. But dead tech does have a significant long term cost. Tests might start failing when updating dependencies, it’s confusing to new people, and creates fear of change (if I change this will it break something?).
Dead tech fits the bill of the strict definition of tech debt if we deliberately postponed the cleanup of an old feature. However, most of the times, what happens is that changes leave behind unused code, unused features and unused systems. Again, this does not fit the “deliberate” part of contracting tech debt.
Litter / waste - Leaving bad code behind
If we have working, maintainable piece of code, and someone plows through it leaving behind some rushed decisions, just to get something done, that feels more like littering than consciously contracting tech debt.
Litter is just trash someone leaves behind on a previously clean place, and if we accept the broken windows theory to be true, that litter will signal other people that it’s ok to make sub-par decisions too.
Litter does not fit our tech debt definition because it does not have a significant benefit. What is the real advantage of not spending an extra 10 or 30 minutes to make something well? These cases generally stem from laziness or carelessness, not time constraints.
Plain old bad code and throwaway code
Bad code is only “bad” depending on the expectations.
For example, if we accept, communicate and agree that the bad code will not be used anywhere else (under no circumstance), and that the code either needs to be thrown away or re-written, then we can do whatever trickery and chaos in there.
The problem comes when we do bad code that needs to be reused, extended and maintained. If we have some functions in a library that no one understands, but somehow is used across the whole code base - That’s plain old bad code.
Bad code can be tech debt if taken deliberately, and there’s a plan for repayment or throw way. But what I’ve seen more often is that these pieces generally remain untouched without any concrete future plans. “If it works let it be”.
Making things visible - Code comments
This can feel a bit overwhelming right now, so how do we go from here to something practical that can impact our engineering quality? My proposal is that we take these new definitions and we start communicating “debt” in a clear and structured way.
This does not solve the problem of tech debt, but will certainly help in the identification, communication of code issues, as well as prompting people to think about the reasons for contracting debt in a systematic manner.
Based on the definitions above, I would suggest the following:
Actual Tech debt
# TECH-DEBT # CONTRACTING REASON # We needed to ship this feature on the 2020/02/02 otherwise # client ACME inc would cancel the contract # # REPAYMENT PLAN # Please isolate methods a, b, and c into a mix-in because # they are also used in class X. Documentation is missing. This should be fixed # for the second iteration of the feature, or if we decide to extend or reuse any # of the the mentioned methods # # DEBT INTEREST: LOW class HastilyWrittenCode(object): def __init__(self): ...
# KILLME # WHEN CAN IT BE REMOVED: Feb 2020 # # INFO: This feature has been declared as defunct due to low usage. # Deprecation communications have been sent and feature can be # decommissioned after Feb 2020 class ReallyOldFeature(object): def __init__(self): ...
Throw away code
# THROW-AWAY-CODE (DO NOT REUSE) # # INFO: We needed to make a quick prototype. We will probably re-write if # the feature is accepted by the customer. Please contact @AUTHOR for # inquiry on the status class OneTimeThing(object): def __init__(self): ...
# TECH-ROT # BEST BEFORE DATE: 2025 # STRATEGY: END OF LIFE (< KEEP / REBUILD / END OF LIFE >) # # INFO: This feature has degraded over time. Not enough users for re-write, # so we'll keep supporting it until 2023 and if nothing changes # we will remove it class OldClass(object): def __init__(self): ...