Monte Carlo Analysis of the Zero Defect Mentality of TDD

Challenges that led me here

Heated arguments at work regarding how much TDD is enough and how little is too little. How do we find common ground?
An acknowledgement of technical debt and a confusion about how to leverage it. How much debt is too much?
Being labeled as pedantic and a zealot. Is a Zero-Defect Mindset ever worthwhile? When?
Learning exercise in how we can gain concrete insights using our intuition in a methodical fashion. How can I communicate abstract ideas without concrete evidence in a rigorous manner?

This article represents my lessons learned from this exploration.

Making the Abstract Concrete

It was a normal day at work, myself and another co-worker were strongly and passionately arguing for the benefits of strict, pure, clean TDD against a couple of other equally passionate co-workers who were sold on the idea of everything in moderation. Having just completed a four month full time Agile immersion with an amazing albeit very idealistic consultant, his ideas about a zero-defect mindset and the idea that it was practically achievable were seductive. I had entertained my own idealistic fantasies for a while never really thinking they could or should be taken so seriously.

It was liberating.

Also, it was isolating. Having these thoughts, and that excitement placed me on one extreme side of a continuum with many of my other teammates on the other side or somewhere in the middle, nearer to the side of limiting TDD in the name of practicality. Conversation after conversation, debate after debate, we ended in the same place, perhaps even galvanized a bit by the disagreement and a bit further from finding common ground.

I finally came to understand that regardless of what I knew to be right, everyone on my team had their own perception and their own knowledge of what was right as well. That's not sarcasm. In social interactions there are multiple realities and all of them need to be appreciated and considered valid enough to be worth understanding.

How could I model my perception of reality in some sort of a concrete way that would enable me to make rigorous (albeit somewhat subjective) predictions? How could I ensure my mental model was at least self-consistent and work-able? Like self-respecting geek, I decided the best way to model uncertainty was to run thousands of simulations and projections of reality to see what lessons could be gleaned.

Finding Common Ground in a Common Purpose

The first decision I had to make was figuring out the underlying metric I would use to compare the two development methodologies. Having been just recently introduced to systems thinking and the Theory of Constraints, I thought a great start would be to use the value throughput of the companies.

But what is value? When we speak of delivering value to our business customers what is it we are actually delivering? In discussions with my team, we decided that business value is best seen as the present day value of your company were it to be valued by an external party. For the purposes of the simulation, I assume the value delivered by completed stories to be equivalent to some randomly assigned numbers provided by a value distribution and assigned without regard for feature size. That's right, it means a feature that takes next to nothing to develop may create an enormous amount of value for the company.

For further assumptions and specifics of my model, read on.

My Model for Thinking About This (AKA My Domain Model)

Concepts and their role in the simulation:

User Story- In this simulation, a User Story is the smallest unit of work that the Product Development Team can work on that provides the slightest bit of business value. They also have an associated size.
Business Customers- Generates a random set of randomly sized (to a discrete distribution) stories each iteration. Their value and size are also randomly assigned upon creation.
Product Backlog- Repository for all stories. New stories are all added as a top priority in the order delivered. Bugs are randomly dispersed into the Product Backlog when they are received.
Product Development Team- Anyone and everyone responsible for getting the release out the door. This includes programmers, testers, technical writers, etc., etc. The Product Development Team iterates over the Product Backlog and works to complete stories. They also are the ones deciding the cost of the various stories. Over time the speed of their work can go up if a range (minimum velocity and maximum velocity) > 1 is specified on construction. The function which controls the team's performance improvement is an "Experience Curve" as documented here: http://en.wikipedia.org/wiki/Experience_curve_effects Without getting too into it, this experience curve essentially models the decreasing cost of development over time.
End Users- Who the Product Development Team releases to. Because the Product Development Team includes *everyone* needed to release the software, the End User may receive the software immediately afterwards. End Users discover bugs in the software. This is currently set to a constant rate per story per iteration. So if the defect rate is 1%, then a team with a hundred stories complete can expect to have, give or take, one story per iteration reenter the Product Backlog as a new Bug Story. The size of the Bug Story is randomly determined based upon a discrete bug size distribution.
Bug Story- A Bug Story is a story that is focused on fixing a defect in the software. These stories are unlike normal stories in that they have no real value for the team and thus don't improve throughput. A Bug Story actually represents more of an opportunity cost as valuable work could be done in its place if the Bug Story hadn't needed to be written.
Support Team- Who the bugs are reported to. Currently only really used to track the total bug count. Could be used in the future to eliminate bugs due to "user error".

Each of these concepts map directly to objects in the Javascript simulation.

How The Simulation Works

First this is how the top most level of the simulation runs:

The distributions you see are assumed to be valid randomly chosen samples of real data. There's one set of data for story sizes and another set of data for bug sizes. In the interest of full disclosure, this is NOT real data. That's for my co-workers only. :)

The overall process is shown in the following code:

Simultaneously, that code also shows why I have a disdain for the fixation many developers have to instantiate objects within other objects to "hide worthless noise". It reads pretty damn well.

Constructing the simulated development teams is handled by the DevelopmentTeamFactory which constructs the teams using the TDD methodology as well as the standard development teams:

That code does instantiate within a method call. In this case I rationalize it as being a part of a factory and that being the factory's concern. I'm not certain that makes for the cleanest code though. I'll still be iterating over this after I publish this article. :)

Here's the rest of the code: http://github.com/jcbozonier/Monte-Carlo-Supporting-TDD

Everything should work in the latest versions of FireFox and Chrome. It requires HTML 5 in order to use the canvas for charting.

Here's a sample chart with 50 TDD teams (green tick marks) and 50 standard teams (red tick marks) each running over two years:

Feel free to open the code and modify the json simulation settings yourself and run your own simulations. Note that as soon as you press the Run button your browser will seize for a bit. I recommend Chrome for its amazing speed. Don't kill the process! I promise it will finish eventually. ;)

[Insert lesson about how important User Experience design is here]

How It Was Built

Baby steps. I started with one team that could be modeled and just showed final iteration stats on the web page. Then I moved onto simultaneously comparing that team with a team using a standard development model using a table of data. Next up I found a pretty good HTML 5 charting API and got the key data visualized (Total Value to Date vs. Time Passed).

Lessons Reinforced

Lowering the defect rate, even at the cost of reduced performance, results in higher value throughput in the long term. Lowering the defect rate in the short term however is hardly ever optimal.
A higher defect rate results in a much higher spread of possible value throughput... in other words there's a higher variance in what you can expect in terms of value output from a product development team.
Every development team has a point where the highest possible testing and quality rigor begins to outperform the less rigorous teams. The trick is identifying where this begins to happen for your particular company or project.

All of these discussions of you should always TDD, always bake quality in, always etc., etc. These statements are just as accurate for the opposing view point for some company. Where is it for your company? You'll need to have some answer for this before you can make any sort of a real argument, at least based upon the ideal of striving for zero defects.

My Hacker Mentality

Given that it's all about where we decide that testing becomes the highest value decision I realized why I never test personal projects. For me, I set the expected life of my projects to be practically zero. Likewise, I end up unable to reasonably justify any testing. By my estimates, testing won't produce any real value. The real problem with my hacker mentality is that I tend to underestimate how long I'll be working on something. Take this simulation as an example. I began it with no tests because I figured I'd slam something together in a night and be done with it. However, I enjoyed it much more than I expected and ended up wanting to explore the nooks and crannies of my model.

Maybe some day I'll learn?

Conclusion

Assuming a business life of time in years of T where T is far enough out in the future, the more testing the better and the more defects you can prevent the better. This is regardless of the costs we encounter because in the long run if we don't test our primary concern will be just preventing errors from occurring in pre-existing features, thus barring work on any new features that add value.

However, if you can assure yourself a limited time period T, you can rest assured it may actually be in your best interest to not have a zero defect mindset. Just don't think that you can change to a 20 year life span at the end of the 5th and see an instant turn around in value throughput.

If you've read through this far, you deserve a reward. Here's my conclusions on the questions I asked in the beginning:

How do we find common ground? Share our assumptions and make them explicit. Codify them so that they can't be conveniently shifted when the arguments get uncomfortable.
How much debt is too much? I didn't model technical debt in terms of needed refactoring... just in terms of defect likelihood. Too much debt is so much that you spend most time paying maintenance costs than delivering value.
Is a Zero-Defect Mindset ever worthwhile? When? Yes it is, when you have set a goal of a sufficiently large life time for your product.
How can I communicate abstract ideas without concrete evidence in a rigorous manner? Hopefully I just did.

6 responses

Nice. You might be interested in an alternative take on whether technical debt is really a call option...

http://www.m3p.co.uk/blog/2010/07/23/bad-code-isnt-technical-debt-its-an-unhe...

— Steve Freeman

Hey. This is really interesting - many thanks for taking the time to post it.

There's a couple of things I don't yet understand about the model. Maybe just because I'm hazily reading this over breakfast...

a) You model that defects will essentially cause wasted re-work, which sounds like a good model. But do you attempt to model whether TDD makes implementing individual stories slower or faster in the first place? People who are anti-TDD (which is an extreme end of the spectrum, I realise that) tend to initially assume that TDD is twice as slow to write each story (since you are writing code + tests, which is twice as much work.) I don't believe this is true, but wondered if you think there are any real effects here that you did attempt to model. After all, if the decision to go for zero-defects obviously had only one result (that defect rate dropped) without affecting rate of doing work, then it would be a no-brainer to always do it.

For the record, I'm a massive fan of TDD.

— Jonathan Hartley

Steve thanks! I've been thinking about for a while as well with some friends. Technical debt, like all debt, is something to be leveraged in addition to bugs. In this example, since I haven't yet modeled the effects of refactoring/technical debt, I focused on defects and was curious if I could push the system to a near total collapse where the defect rate was so high that valuable work couldn't be released. Short answer is, oh yeah it's totally possible, which just validates experiences I've had at past employers.

Jonathan- Thanks for taking the time to read this! Like most of what I write I didn't really expect anyone to care or think my model was the least bit valid. Through this however I've found that just forming such a rigorous model, if nothing else, can spur extremely valuable focused conversations with explicit assumptions.

RE: Modeling the lower velocity and learning curve of the TDD team... I did do this actually. Every iteration the development team uses this private "ramp up" method:
this._iteration_budget = this._max_velocity - (this._max_velocity - this._minimum_velocity) * Math.pow(this._current_iteration_id, - (1/3));

This models what is known as an experience curve. I mentioned in the article you can get a more detailed analysis here: http://en.wikipedia.org/wiki/Experience_curve_effects

I had started using what is known in Statistical Regression as a Logistic Curve, but a friend (Robert Ream) pointed me to this much simpler curve that models decreasing costs over time more closely. http://en.wikipedia.org/wiki/Logistic_function

Suffice it to say, that there is an initial high cost to learning something new (in terms of the effect on producing value) and that it nears some lower bound as a limit over time.

In my model, I limit the TDD team to be able to only go 80% as fast as the standard team *at the maximum*. At the beginning of the curve however they have a velocity of 1 which slowly grows over many weeks.

— Justin Bozonier

RE: "Slower" TDD team but more value per iteration
P.S. Jonathan... While the TDD team's expected amount of work to get done is smaller, they produce more value on the work they do in the long run. Likewise, on teams that wouldn't count defects towards velocity, the TDD team does actually end up with a higher velocity in the end. Always. It's just whether or not your company cares about planning that far in the future.

So what about teams that half-ass it, and try to do TDD but end up with a bunch of broken or old tests (or just low code coverage)? What does their value curve look like? It seems like you'd want to either go all out or just cut your losses and not test. It would be awesome to see another dimension to this graph showing varying levels of testing "commitment". Thanks for the incredibly though-provoking article! I have these same discussions with my team!

— David Ackerman

1 visitor upvoted this post.