Creating an Image Proxy server in Node.js

This will be a short post. I am writing this to document how I created a Node.js server that can act as an image proxy. I needed this to get around a limitation in HTML5's canvas implementation that prevents getting a loaded image's binary data if that image is from a different web domain. This function is very handy though if you're building an image editor so I had to find a work around.

My solution is to create an image proxy on the web server in question. I pass the url of the image I want to a specific route on my server and then it downloads the image data and returns it to my javascript thus hiding its true origins.
Here is my complete code. I'll explain what all of the parts do afterwards:

Because Node is event oriented when you download the image you actually create a request and add some listeners for certain events. In order to start the download you need to call the "end" function. That signals to Node that you are done setting up the request so it can be sent. The two events that need to be listened for are "data" and "end". The data event is called each time node downloads a chunk of data from the url you requested (yes it is called multiple times for a single request). As far as I know node won't aggregate the response automatically so that's why you see me adding the chunks of data to the buffer.

One big note that threw me off for a bit of time. In order to create a buffer of the correct size (it needs to be allocated up front) you need to find out how large the image is that you're downloading. Just grab the Content-Length property from the HTTP response header... BUT! When you get the content length from the response you have to convert it from a string to an into integer before using it to allocate the size of your buffer. If you don't, the buffer will be too small and the actual number of bytes you receive will be greater than the number of bytes in your buffer and things will ASPLODE.

Hopefully that helps someone. Enjoy!

Why I'm (Finally) Switching to CoffeeScript

CoffeeScript Misconceptions

You may have already heard about CoffeeScript and some of the hype surrounding it but you still have found several reasons to not make the switch. This blog post is for you. Here are some of the reasons I held out for so long:
  • I wanted to understand Javascript and just didn't see how using a "simpler version" (my own thoughts) would make my life easier in the long run.
  • If I DID use an intermediate language, I wanted to be able to dump it at any time and not feel like I was forced to continue using it.
  • Putting one more thing with bugs in between myself and my code seemed fool hardy.
So here's the reasons I finally switched:
  • It's less verbose Javascript, not a different or simplified language.
  • A couple of shortcuts that enable you to use list comprehensions rather than error prone for statements.
  • CoffeeScript compiles to pretty awesome Javascript. I wouldn't have any concern dumping CoffeeScript at any time because of this. It would also have put some great conventions in my Javascript that I could follow.
  • Eli Thompson is always right. (You should read his blog. He's smart: http://eli.eliandlyndi.com/)
For-Loop Boiler Plate Banished

Here's an example of several Javascript for-loops embedded in a switch statement:

GNARLY! Now obviously we could clean this code up a bit... but seriously... 

Well let's just see that same code in CoffeeScript and how much better it can be:
That's the power of expressions.

Less Complex OO

Now how about classes? These are the bane of Javascript programmers everywhere. There are few right ways to do them and a billion wrong ways. CoffeeScript classes are the biggest simplification CoffeeScript makes to Javascript and was a big reason for my holding out for so long. Really though, they're just short hand that removes a bunch of boiler plate so I have less opportunity to introduce bugs. Here's a simple example: 

Simple Javascript class:

Same class in CoffeeScript:

Get Off My Lawn

In the past, people tried compiling to Javascript simply because they didn't get it. This is different. It's been over a decade since Javascript began to see wide use and enough of us get it now that we're starting to see tools that don't try to cover it up for being a broken language. Instead, we're seeing improvements made to a programming language we love and think can be even better. The only reason I see remaining to stick with Plain Old Javascript is nostalgia and fear of change.

I'll leave you with one last example. It's a full set of CoffeeScript that does drag/drop and zoom/pan in canvas. You can decide for yourself which version of the code you'd rather work on.

Before CoffeeScript:

After CoffeeScript:

Introducing GeekRations

What's GeekRations?

Tonight I launched my latest project, GeekRations (check it out at http://www.geekrations.com). It's a gift of the month club for geeks that pulls weird and off the wall gifts from the hidden nooks and crannies of the internet and delivers them to you monthly. I originally envisioned it for people like myself who love receiving packages in the mail just for the surprise of what's inside. It also makes for an awesome gift for that geek in your life you don't know how to buy for. 

Where We're At Right Now

Currently, GeekRations is taking emails from interested prospective customers. As soon as we're ready to start shipping gifts you'll be notified where you can sign up for the service. Visit http://www.geekrations.com and sign up to be notified once we're taking orders! 

Geeky Details

GeekRations is a lean start up in the purest sense of the word. The purpose of the landing page was to see if anyone even cared about this business idea. Apparently people do, so the business idea will be moving forward. Furthermore, GeekRations has an A/B test running on the splash page wording. One of them is pretty straight faced and very plain in describing our service while the other tries to be a little looser and silly. I will reveal the results of which one wins once I feel I've aggregated enough data that I can tell which is the clear winner.

Unit Testing the DOM

How I Unit Test in jQuery

I created a function that will add arbitrary html to the DOM and remove it immediately after my test has run. This is what it looks like in use:

And this is what comprises the function:

It is important to note that errors leave garbage divs behind. This is definitely a work in progress. :)

If you're looking for the short and quick, that was it. If you're wondering why I'm doing this read on. (You can also view the live typed version here: http://ietherpad.com/ep/pad/view/Ek6pNOcyjv/latest)
How Brittle Is Your jQuery?

When you need to test your HTML DOM manipulations which of the following best describes your approach:
  • Just don't test it. You use Javascript templating and keep the interactions simple enough that it's low risk and has never proven to be a huge issue.
  • Write a javascript unit test, write your jQuery code, then verify your jquery interactions using jQuery to test the DOM
  • Write some jQuery, load the web page, and manually test it each time you make a change
  • Write your web page and test it using Selenium after the fact
  • Just don't test it. It would be valuable for you but you just don't have the time.
These are the most common strategies in my experience. The top two are strategies I have been known to employ. In one project I have been very successful in leveraging a very event oriented MVC-like templating technique that hasn't bitten me yet for net testing my code. At  Cheezburger however I have been going with the technique of more QUnit tests. 

I have found not testing or using Selenium to be the wrong ideal.

Why Test Javascript At All?

Realistically if we're professionals we always test our code. The controversy in testing is usually related to whether or not we automate it. Why automate anything? Everything is so  easy once you understand how to do it. If you just take the time to understand the code (by grabbing your nearest warm body) you won't need tests because it is just that easy.

Because I understand that I am falliable, that I miscommunicate even when I say things exactly as they are and I mean them (ain't human perception a bitch?), because I don't want my team to have to take the time to ask for my opinion. It feels great on the ego.. that's a huge behaviour smell right there.

Why Avoid Selenium?

It's an issue of short and tight feedback cycles. To be perfectly fair, I'm sure there are ways of using Selenium such that one can have very tight feedback loops. That's not how I do it though and I've not yet experienced anyone else who has used the tool in that way either... just sayin. When I am delving into unknown code it is much too expensive to  wait a couple seconds for the server to start up and then another several seconds for the other tests to run for every change I make. 

Having said that, I love Selenium for code that is too difficult to get under unit tests. It's great to have a way to give me a pretty high degree of confidence that I haven't broken anything.

Find/Replace on a JSON Object Graph

Today I had cause to implement a method for finding and replacing a value that appears at the end of a certain JSON path in an object graph. I couldn't find a preexisting tool to the dirty work so I wrote it myself and then this article. :)

Here's a concrete example. Imagine you have the following JSON:

Now imagine that you want to make the shirt color of every employee with a status of awesome orange. Why? No clue. Work with me here. How would you accomplish that?

After looking for someone else having already done this, I set out to do it myself and was surprised at how simple this was in Ruby. The technique I thought of was to search through every  node in the object graph and call a special replace function on each one. If the given node matched the criteria, then it or its children would be updated accordingly.

The following code amounts to a depth first search of the object graph

Really all of the "magic" is in the search method which really just knows how to enumerate either a Hash or an Array, call the replace method and then recursively search its children. If the child object fails some aspect of the replace criteria, nothing happens, we just move on searching through all of its children's Hash or Array children and so on until no other options exist.

What's most surprising to me is the simplicity and elegance of the solution. I probably spent more time looking for an alternate than it took me to write that code.

Now before you think that this will only work in the simplest of cases, here is the actual replacement code I needed for my real world scenario:

Selling Software or Wanking Code

First off, I love beautiful code and have been known to fixate on it so this article is a formalization of what I think to myself every time I start to get religious over coding quality.

Code quality is an oft talked about yet poorly defined topic amongst programmers. Ask 100 different developers what "quality" means to them and you'll receive 100 different answers. Responses will range from "Quality code is code that can be easily changed and understood" to "Quality code is hard to define but following the SOLID principles is a good start" to "it's more of an art that's hard to define." Ok, but maybe you're thinking that these are too abstract and should refer to reducing costs and reducing bugs. Sure. Maybe. Ultimately, however, all of these definitions of "quality" skirt the elephant in the room.

On commercial projects, high quality code will help enable my company to maximize its profits.

When I get in a heavy debate over whether or not someone is really "unit" testing or just "integration" testing, nowadays I ask myself (and then my sparring partner) "Is this why we can't deliver software?" Put another way, "Is this the most critical obstacle in the path of my company making more money over the short and long term?" The answer is usually no.

When it is no, I have to suck up my ego and walk away from the discussion since I've admitted there's limited value to be had. Note, there's some value, especially if my purpose is getting on the same page as my team.

When the answer is yes, as it oh so very rarely is, now I can make a bold statement if I can concretely share WHY this is more seriously affecting the performance of the company over every other concern. Here's an example:

Imaginary dev: "I don't have time for automated testing so get off my back about it."

Me: "Automated testing is the single most critical thing we can be doing to drive our company's profit because every bug we miss is a bug our customers have to catch and their time is so limited that they cant possibly catch them all. That means the bugs will make it to our customers who will slowly lose faith in our product with every issue they find. We can't afford to manually test so we absolutely have to run automated tests."

While perhaps not bullet proof, that's a strong argument. What would an even stronger counter argument look like?

Imaginary dev: "If I can do what's worked for me in the past and just get this feature done by this hard deadline our customer will pay us a $3 million bonus. Our business customers have already decided and agree that even if we do nothing but review the code I have written for two weeks after the deadline, it will still have been extremely profitable for us to take this measured risk."

Do you believe in "quality" so much that you would ask your team to not let one of the devs on the team you've seen consistently pull in results to bring in a $3 million pay day? If you do, what's your number? If you don't have one, then you're not in this business for the business.

Hi, my name is Justin. I'm a recovering code wanker.

Introducing STFU and Code

My recent foray into the Ruby world with Sinatra and Heroku has taught me a lot about what we could be doing better in .NET. STFU and Code is my **first** response.

I formed this project with Tim Erickson, a great friend, to reduce the friction of getting down to brass tacks and working on a small .NET project. We're using AvalonEdit (http://wiki.sharpdevelop.net/AvalonEdit.ashx) to provide us with syntax highlighting and in order to suss out whether or not the code parses.

It's really more of a code thought, but it's usable and your comments are welcome!

Learning Data Visualization From A Data Scientist

How I Came Across A Real Live Data Scientist!

I was fortunate enough to be able to attend this year's Strangeloop conference (http://thestrangeloop.com/). Hilary Mason, data scientist extraordinaire, gave the opening keynote entitled "Machine Learning: A Love Story". As soon as she said we'd need a little bit of math to get through the presentation, I knew it was gonna be good. After healthy background on failed attempts at machine learning across the twentieth century she got into Bayesian statistics and then related this back to her work at bit.ly

That's when I decided it was my weekend's goal to get her to hack on something, anything, related to data mining with me. Check her out on Twitter @hmason or her website @ http://www.hilarymason.com/

Graciously, she agreed and we set up the time and place. We ended up with around ten people in total hacking for about an hour in a small cafe here in St. Louis. I published the final product here: http://github.com/jcbozonier/Strangeloop-Data-Visualization

and Hilary is hosting the visualization here:

That's the background and this is what came of it for me.

Answers Are Easy, Asking The Right Questions are Hard

I've been self-studying data analysis for a few months in my spare time and it can be so confusing knowing what I'm doing right or wrong. It's not like programming where I can tell if I have a right answer... it's more or less just me thinking the answer feels right. That's really hard for me.

By grouping up with Hilary I was hoping to get some insight into her professional workflow, what tools she uses, and also I wanted to get a feel for her general approach and mindset for answering a given question with her data-fu.

The question we ultimately decided to work on was what "What does the Strangeloop social network look like on Twitter?" In other words, who's talking to who and how much? Our shared mental model for the problem was essentially a graph of nodes interconnected with a bunch of undirected edges which indicated those two people had communicated via Twitter. Hilary had already grabbed Protovis along with a sample of using it to create a force-directed layout so it was a perfect fit for answering that question. 

Three Steps

Today I learned to think about data analysis as three main steps or phases (since the steps can get a little large). 

1. Get Data- Get the data. In whatever form is easiest, just gather all of the data you'll need and get it on disk. Don't worry about how nice and neat it is.

2. Prune it- Now you can take that mass of data and start to think about what portions of it you can use. The pruning phase is your chance to trim your data down and focus it a bit. This is where you eliminate all aspects of the data except for the ones you'll want to visualize.

3. Glam it up- Here's where you figure out what you'll need to do to get your data into a visualizable form. 

1. Getting Data From Twitter

To get our data I wrote a script that used Twitter's search api to download all tweets that contained the hash tag #strangeloop. Since the data is paged, my code had to loop through about 15 pages until it had exhausted Twitter's records.

This is the code. It's pretty simple but effective.

There may be errors or corner cases and that's fine. None of this is code I would unit test until it became apparent that I should. The main task at hand here is to get data and in this case at least that's a binary result. It's easy to know if some part of that code went wrong. Also, I need to be able to work quickly enough that I can stay in the flow of the problem at hand. I'm really just hacking at Twitter trying to get the data I want to a file on disk. If I have to do it by hand that's fine.

2. Pruning The Data To Fit My Mental Model

I chose to download the data as JSON because I assumed that would be a pretty simple format to integrate with. Now that Ruby 1.9 comes with a JSON module out of the box, it totally was! Well... pretty much.

Once I had downloaded all of the data I manually massaged each of the 15 JSON result objects to leave behind only their tweets and none of the meta-data surrounding the search. Once I had that completed I had a file containing 1400-1500 JSON tweet objects in a JSON array. 

Now during our group session I didn't actually write this portion of the solution. It was actually David Joyner (follow him on Twitter as @djoyner) and he delivered the end result to Hilary in CSV format via Python. I've recoded it here because there was a bug in the code we wrote to create the data we visualized and I needed a way to regenerate the data once the bug was fixed. Since I didn't have his Python script I just opted to rewrite what he had done.

From here I just tried to get the data loaded up into Ruby via the JSON module. I load the saved JSON from disk with the following code:

My approach once again was very hack-oriented. Do a little bit of ruby script in such a way that I can verify that it worked via the command line, reiterate by adding another step or two and repeating. It's like TDD but much less thought, just hacking and feeling my way around the problem space.

3. Glamming It Up For Protovis

To recap, so far you've got me getting the data downloaded into a parseable form, this other guy loading that from disk, and then he also did the original work on pulling the data into a set of undirected edges of people talking to one another. I also rewrote this for lack of his code and for lack of Hilary's code converting his data into something Protovis could use. In order to make the graph really interesting we also decided to add up the number of times a given edge was used which you'll see being computed in this:

David Joyner was also kind enough to send me his original Python code that essentially does the same thing:

The thought was that the more active a person was on Twitter, the more they influenced the network. This could cause someone who was really chatty to get over-emphasized in the visualization but in our case it worked out well.

So ok we had all of this data but it wasn't in the form that Protovis needed to show our awesome visualization. Hilary figured this out by downloading a sample project from their project's website. The data needed to be put in this form:

If you scroll through that a ways you'll eventually see some data that looks like this:
{source:72, target:27, value:1},

Nice eh? Those numbers are basically saying draw a line from the node at index 72 of our list of nodes to the node at index 27. That complicated things a bit but Hilary got through it with some code I imagine wasn't too dramatically different from this:

I just basically create a hash where I store the index number for each Twitter user's name and then look it up when I'm generating that portion of the file.

Biggest Take Away: Baby Steps

There was definitely a fair amount of work here and without all of the team work we wouldn't have been able to get this done in the 45 minutes it took us. Part of the team work was just figuring out what components of work we had in front of us. The three steps I laid out in this article are how I saw us tackling the problem and there were many other much more iterative steps I left out.

When I do more data analysis in the future I plan to just work it through piece by piece and not get overwhelmed by all of the different components that will need to come together in the end. 

The Other Biggest Take Away: Get Data At Any Cost Necessary

It's easy as a programmer for me to get bogged down in thoughts of "quality". Even Hilary was apologizing for the extremely hacked together code she had written. Ultimately though t really doesn't matter here. The code will not be ran continuously and hell it may never even be ran again! If the code falls apart and blows up, I can quickly rewrite it. I'm my own customer in this sense. I can tolerate errors and I can fix them on the fly. When I'm exploring a problem space the most important thing for me is to reduce the friction of my thought process. If I think best hacking together code then awesome. Once I can get my data I'm done. I don't care about robustness... I just need it to work right now.

I'm harping on this point because it's such a dramatic shift from the way I see production code for my day job. Code I write for work needs to be understood by a whole team, solid against unconsidered use cases, reliable, etc. Code I write to get me data really quick, I just need the data.

While Hilary is a pythonista, at one point I remember her commenting on programming language choice and saying something to the effect of "It doesn't matter, they all work well." She was so calm about it... it was almost zen like. After having so many passionate talks regarding programming languages with other programmers it was very refreshing to interact with someone who had a definite preference but was able to keep her eye on the prize... the data and more importantly the answers that the data held.

Next Steps

I'd like to work on a way to tell which of the people I follow on Twitter are valuable and which I should stop following. Essentially a classifier I guess. On top of that I'd like to write another one to recommend people I should follow based on their similarity to other people I do follow (and who are valuable)... We'll see. I've got another project that desperately needs my time right now. If you happen to write this though or know of anyone who has, let me know!

Monte Carlo Analysis of the Zero Defect Mentality of TDD

Challenges that led me here

  • Heated arguments at work regarding how much TDD is enough and how little is too little. How do we find common ground?
  • An acknowledgement of technical debt and a confusion about how to leverage it. How much debt is too much?
  • Being labeled as pedantic and a zealot. Is a Zero-Defect Mindset ever worthwhile? When?
  • Learning exercise in how we can gain concrete insights using our intuition in a methodical fashion. How can I communicate abstract ideas without concrete evidence in a rigorous manner?

This article represents my lessons learned from this exploration.

Making the Abstract Concrete

It was a normal day at work, myself and another co-worker were strongly and passionately arguing for the benefits of strict, pure, clean TDD against a couple of other equally passionate co-workers who were sold on the idea of everything in moderation. Having just completed a four month full time Agile immersion with an amazing albeit very idealistic consultant, his ideas about a zero-defect mindset and the idea that it was practically achievable were seductive. I had entertained my own idealistic fantasies for a while never really thinking they could or should be taken so seriously.

It was liberating. 

Also, it was isolating. Having these thoughts, and that excitement placed me on one extreme side of a continuum with many of my other teammates on the other side or somewhere in the middle, nearer to the side of limiting TDD in the name of practicality. Conversation after conversation, debate after debate, we ended in the same place, perhaps even galvanized a bit by the disagreement and a bit further from finding common ground.

I finally came to understand that regardless of what I knew to be right, everyone on my team had their own perception and their own knowledge of what was right as well. That's not sarcasm. In social interactions there are multiple realities and all of them need to be appreciated and considered valid enough to be worth understanding. 

How could I model my perception of reality in some sort of a concrete way that would enable me to make rigorous (albeit somewhat subjective) predictions? How could I ensure my mental model was at least self-consistent and work-able? Like self-respecting geek, I decided the best way to model uncertainty was to run thousands of simulations and projections of reality to see what lessons could be gleaned.

Finding Common Ground in a Common Purpose

The first decision I had to make was figuring out the underlying metric I would use to compare the two development methodologies. Having been just recently introduced to systems thinking and the Theory of Constraints, I thought a great start would be to use the value throughput of the companies. 

But what is value? When we speak of delivering value to our business customers what is it we are actually delivering? In discussions with my team, we decided that business value is best seen as the present day value of your company were it to be valued by an external party. For the purposes of the simulation, I assume the value delivered by completed stories to be equivalent to some randomly assigned numbers provided by a value distribution and assigned without regard for feature size. That's right, it means a feature that takes next to nothing to develop may create an enormous amount of value for the company.

For further assumptions and specifics of my model, read on.

My Model for Thinking About This (AKA My Domain Model)

Concepts and their role in the simulation:
  • User Story- In this simulation, a User Story is the smallest unit of work that the Product Development Team can work on that provides the slightest bit of business value. They also have an associated size.
  • Business Customers- Generates a random set of randomly sized (to a discrete distribution) stories each iteration. Their value and size are also randomly assigned upon creation.
  • Product Backlog- Repository for all stories. New stories are all added as a top priority in the order delivered. Bugs are randomly dispersed into the Product Backlog when they are received. 
  • Product Development Team- Anyone and everyone responsible for getting the release out the door. This includes programmers, testers, technical writers, etc., etc. The Product Development Team iterates over the Product Backlog and works to complete stories. They also are the ones deciding the cost of the various stories. Over time the speed of their work can go up if a range (minimum velocity and maximum velocity) > 1 is specified on construction. The function which controls the team's performance improvement is an "Experience Curve" as documented here: http://en.wikipedia.org/wiki/Experience_curve_effects Without getting too into it, this experience curve essentially models the decreasing cost of development over time.
  • End Users- Who the Product Development Team releases to. Because the Product Development Team includes *everyone* needed to release the software, the End User may receive the software immediately afterwards. End Users discover bugs in the software. This is currently set to a constant rate per story per iteration. So if the defect rate is 1%, then a team with a hundred stories complete can expect to have, give or take, one story per iteration reenter the Product Backlog as a new Bug Story. The size of the Bug Story is randomly determined based upon a discrete bug size distribution. 
  • Bug Story- A Bug Story is a story that is focused on fixing a defect in the software. These stories are unlike normal stories in that they have no real value for the team and thus don't improve throughput. A Bug Story actually represents more of an opportunity cost as valuable work could be done in its place if the Bug Story hadn't needed to be written.
  • Support Team- Who the bugs are reported to. Currently only really used to track the total bug count. Could be used in the future to eliminate bugs due to "user error".
Each of these concepts map directly to objects in the Javascript simulation.

How The Simulation Works

First this is how the top most level of the simulation runs:

The distributions you see are assumed to be valid randomly chosen samples of real data. There's one set of data for story sizes and another set of data for bug sizes. In the interest of full disclosure, this is NOT real data. That's for my co-workers only. :) 

The overall process is shown in the following code:

Simultaneously, that code also shows why I have a disdain for the fixation many developers have to instantiate objects within other objects to "hide worthless noise". It reads pretty damn well.

Constructing the simulated development teams is handled by the DevelopmentTeamFactory which constructs the teams using the TDD methodology as well as the standard development teams:

That code does instantiate within a method call. In this case I rationalize it as being a part of a factory and that being the factory's concern. I'm not certain that makes for the cleanest code though. I'll still be iterating over this after I publish this article. :)

Everything should work in the latest versions of FireFox and Chrome. It requires HTML 5 in order to use the canvas for charting.

Here's a sample chart with 50 TDD teams (green tick marks) and 50 standard teams (red tick marks) each running over two years:

Feel free to open the code and modify the json simulation settings yourself and run your own simulations. Note that as soon as you press the Run button your browser will seize for a bit. I recommend Chrome for its amazing speed. Don't kill the process! I promise it will finish eventually. ;)

[Insert lesson about how important User Experience design is here]

How It Was Built

Baby steps. I started with one team that could be modeled and just showed final iteration stats on the web page. Then I moved onto simultaneously comparing that team with a team using a standard development model using a table of data. Next up I found a pretty good HTML 5 charting API and got the key data visualized (Total Value to Date vs. Time Passed).

Lessons Reinforced
  • Lowering the defect rate, even at the cost of reduced performance, results in higher value throughput in the long term. Lowering the defect rate in the short term however is hardly ever optimal.
  • A higher defect rate results in a much higher spread of possible value throughput... in other words there's a higher variance in what you can expect in terms of value output from a product development team.
  • Every development team has a point where the highest possible testing and quality rigor begins to outperform the less rigorous teams. The trick is identifying where this begins to happen for your particular company or project.
All of these discussions of you should always TDD, always bake quality in, always etc., etc. These statements are just as accurate for the opposing view point for some company. Where is it for your company? You'll need to have some answer for this before you can make any sort of a real argument, at least based upon the ideal of striving for zero defects.

My Hacker Mentality

Given that it's all about where we decide that testing becomes the highest value decision I realized why I never test personal projects. For me, I set the expected life of my projects to be practically zero. Likewise, I end up unable to reasonably justify any testing. By my estimates, testing won't produce any real value. The real problem with my hacker mentality is that I tend to underestimate how long I'll be working on something. Take this simulation as an example. I began it with no tests because I figured I'd slam something together in a night and be done with it. However, I enjoyed it much more than I expected and ended up wanting to explore the nooks and crannies of my model. 

Maybe some day I'll learn?

Conclusion

Assuming a business life of time in years of T where T is far enough out in the future, the more testing the better and the more defects you can prevent the better. This is regardless of the costs we encounter because in the long run if we don't test our primary concern will be just preventing errors from occurring in pre-existing features, thus barring work on any new features that add value.

However, if you can assure yourself a limited time period T, you can rest assured it may actually be in your best interest to not have a zero defect mindset. Just don't think that you can change to a 20 year life span at the end of the 5th and see an instant turn around in value throughput.

If you've read through this far, you deserve a reward. Here's my conclusions on the questions I asked in the beginning:
  • How do we find common ground? Share our assumptions and make them explicit. Codify them so that they can't be conveniently shifted when the arguments get uncomfortable.
  • How much debt is too much? I didn't model technical debt in terms of needed refactoring... just in terms of defect likelihood. Too much debt is so much that you spend most time paying maintenance costs than delivering value.
  • Is a Zero-Defect Mindset ever worthwhile? When? Yes it is, when you have set a goal of a sufficiently large life time for your product.
  • How can I communicate abstract ideas without concrete evidence in a rigorous manner? Hopefully I just did.