TDD-ing Concurrent Code

A Method for Modelling Concurrency
 

I'm prepping code for Code Camp Boise and Seattle and I thought I'd share some of the simple stuff I'm writing as I'm writing it to act as an introduction of sorts to the concepts.

 
I hear a lot of people say things like "Well we made this process concurrent so now we can't test it." That just always felt wrong to me. Over the past year or two, as I've been reading about threading though I've kept this in mind. Like any concern, it's difficult to test without taking it into account if the concern isn't abstracted away from the code under test.
 

Testing concurrent software can be extremely difficult. While debugging, breakpoints can be seemingly randomly tripped by other threads that you don't care about, your data can change right under your nose whether or not you're paused, etc.
 
Another issue I hear is that synchronizing across threads is a pain. What happens if after verifying the object you want to use is in the appropriate state, some other thread changes it and then when you use it it throws an exception? In this way, race conditions can be extremely difficult to manage.
 
One way to handle this is to use a message based data flow oriented model. Why? Well first and foremost because this allows you to model your data dependencies and allow the abstraction to suss out the details. By just declaring a network of processes (which are essentially objects) as a directed graph you gain the ability to do this. Now you've explicitly declared how these different processes will interact with one another and since data flow programming uses immutable objects you won't have to worry about any processes interfering with each other.
 
Yet another great thing about developing a data flow network is that you can test each process in isolation without it even having any knowledge of threading. That's what I will be talking about for the remainder of this post.
 
Some Context
 
A friend of mine needed a computer program that would go through a text file of over 10,000 lines of text (sometimes more) and find all of the valid email addresses. First and foremost I came up with a quick description and overview of the process I could see going through:
 
FileReadingAgent
reads in lines from file line by line and passes them along to the ObviousEmailExtractionAgent while skipping the blank lines.

ObviousEmailExtractionAgent
extracts obviously good email addresses from each line and passes them on to the GoodEmailCollectionAgent. 
Lines without obviously good email addresses (or with none at all) are passed to the NonObviousEmailExtractionAgent for further processing.

NonObviousEmailExtractionAgent
Uses more intelligent email extraction rules to find less obvious email addresses
Passes any found email addresses on to the GoodEmailCollectionAgent.

GoodEmailCollectionAgent
Aggregates known good email addresses.
 
So to reiterate, all of these agents should be assumed to be running in their own threads. Also, they only communicate to one another via immutable messages. 
 
The FileReadingAgent would have a connection to the ObviousEmailExtractionAgent. The ObviousEmailExtractionAgent would have *two* connections. One to the GoodEmailCollectionAgent and another to the NonObviousEmailExtractionAgent.
 
In this post I'd like to share the tests that went into creating the FileReadingAgent. 
 
TDD-ing a Process
 
The first context I worked on assumed there was only one line of text in a file. This is a basic context that just helps to ensure that the basic plumbing for my agent is all hooked up. This is how I tested this context:
 
 
The next context I worked on contained blank text lines. I wanted to ensure that those lines of text didn't get passed on to my email finding agents.
 
 
The final context has lines of text with whitespace characters and one line of text that is an email address. I wanted to ensure that only lines with any kind of text moved on to the agents that would actually try to parse out email addresses. In hindsight, this probably should have gone in the ObviousEmailExtractionAgent. It seems like the FileReading agent shouldn't really be concerned with this. I could probably just change the name of my FileReadingAgent to NonBlankLineReadingAgent and get by that way. ;)
 
 
My "final" code doesn't handle disposal or anything and it definitely should! That's an oversight on my part. Aside from that this code should be pretty much complete:
 
 
Notice the use of the IObserver interface? I'm stealing a bit from the new .NET Reactive framework (an idea that I got from Robert Ream). By using the OnNext method I can make my network of agents push oriented rather than pull oriented. The benefits of this can be enumerated in another blog post. :)
 
How could I connect these to run synchronously? Super easy. This is how I could link the LineByLineFileReader to the ObviousGoodEmailExtractionAgent:
 
lineByLineFileReader.ShouldSendLinesOfTextTo(obviousGoodEmailExtractionAgent);
 
Then to start I'd send the filepath I wanted to be processed to the lineByLineFileReader like so: 
 
lineByLineFileReader.OnNext("c:/myfile.txt");
 
Next time I'll show an overview of the whole application and how it works concurrently with a WPF UI.