Code Like Bozo

James Shore posted an architectural challenge this week on his blog and personally threw his gauntlet in my face to answer the challenge using this message oriented design stuff I've been ranting and raving about. Of course when I say "threw his gauntlet in my face" I really mean he said it might be interesting to see... BUT STILL! A man can't back down from that! ;)

You can read about his challenge in detail here: http://jamesshore.com/Blog/Architectural-Design-Challenge.html

To sum it up, the idea is to build a ROT13 file encoder, TDD'd and beautiful. There were two parts to his challenge. The first was just to get everything done by reading the whole file into memory. Then he wanted us to refine our design around this idea. Once that was done we could move onto to part two of the challenge which required us to process the file as we load it off of disk and save it back to disk incrementally.

What did I learn from this experience? I'm extremely happy with the flexibility and robustness of the designs I get when I approach things from a message oriented point of view. There are times where it's too much (there are no absolutes laws right?) but for any system I work on of any actual complexity, it is a great guiding hand for me.

This is my final solution: http://github.com/jcbozonier/The-Jim-Shore-Architecture-Challenge

To get a general idea of what I did I've provided the way I connected my objects together below:

First my mistakes:

1) This part is confusing. I'm basically just creating an object that will forward every message it receives to both other objects but it isn't executed well:

2) Instead of having a separate configuration command, the things I wanted configured, should have just been configured on the fly. Setting them to be configured and then calling for configuration to occur seems way too meh.

3) The line where I call out fileReader.Read(); is where the whole system comes to life but I fear that's not obvious.

Now what I like:

1) Whenever I create a message oriented design, I can discuss the whole system by pointing to the place where I configure my dependencies. The overall system flow may not be perfectly digestable but if one were to try to create a flow chart from this configuration they would find it very easy (I have and it lends itself well to presentations ;) ). Another thing is, instead of needing a call graph that shows how objects talk to one another, the same ideas fall out of the view of how the objects are dependent on one another in my experience.

2) Whenever I run into too much pain with this approach it's a smell I did something wrong. Case in point: While working on part I of the challenge, I had begun to write and test a class that was essentially going to orchestrate all of the other classes together on top of the class which configures which objects talk to one another. Essentially I was building a router. The pain for me was that I was creating WAY too many fakes and needing to care WAY too much about what they were doing. So I took a step back and drew my objects on a piece of paper and then reconnected them per the new design I drew. There were hardly any code changes necessary and it was pretty short work.

3) I tend to write tiny objects. Some people hate having too many objects or objects that don't do much so your tastes may vary. I've found that smaller more focused classes help me however. When they encompass literally only a single responsibility I find them to be easier to replace/modify when they no longer fit my needs and I only need to mock when I absolutely need to.

If you haven't been talking with me or reading what I've been writing about Message Oriented Object Design here's a brief quick summary:

Message Oriented Object Design is an object oriented design philosophy wherein we view objects as sending immutable messages/publishing events on channels. MOOD systems also rely on the configuration of object networks to enable collaboration between them. A core tenet is the lack of inter-object getters (be it method or property calls).

4) I like how little code there is in my console application.

That's it! Leave me a comment if you want to lend your own critique of what I've done. I also encourage you to head over to James' site and throw your own hat in the ring and critique other people's designs (be harder on the other designs though of course!).

Till next time.

A Method for Modelling Concurrency

I'm prepping code for Code Camp Boise and Seattle and I thought I'd share some of the simple stuff I'm writing as I'm writing it to act as an introduction of sorts to the concepts.

I hear a lot of people say things like "Well we made this process concurrent so now we can't test it." That just always felt wrong to me. Over the past year or two, as I've been reading about threading though I've kept this in mind. Like any concern, it's difficult to test without taking it into account if the concern isn't abstracted away from the code under test.

Testing concurrent software can be extremely difficult. While debugging, breakpoints can be seemingly randomly tripped by other threads that you don't care about, your data can change right under your nose whether or not you're paused, etc.

Another issue I hear is that synchronizing across threads is a pain. What happens if after verifying the object you want to use is in the appropriate state, some other thread changes it and then when you use it it throws an exception? In this way, race conditions can be extremely difficult to manage.

One way to handle this is to use a message based data flow oriented model. Why? Well first and foremost because this allows you to model your data dependencies and allow the abstraction to suss out the details. By just declaring a network of processes (which are essentially objects) as a directed graph you gain the ability to do this. Now you've explicitly declared how these different processes will interact with one another and since data flow programming uses immutable objects you won't have to worry about any processes interfering with each other.

Yet another great thing about developing a data flow network is that you can test each process in isolation without it even having any knowledge of threading. That's what I will be talking about for the remainder of this post.

Some Context

A friend of mine needed a computer program that would go through a text file of over 10,000 lines of text (sometimes more) and find all of the valid email addresses. First and foremost I came up with a quick description and overview of the process I could see going through:

FileReadingAgent

reads in lines from file line by line and passes them along to the ObviousEmailExtractionAgent while skipping the blank lines.

ObviousEmailExtractionAgent

extracts obviously good email addresses from each line and passes them on to the GoodEmailCollectionAgent.

Lines without obviously good email addresses (or with none at all) are passed to the NonObviousEmailExtractionAgent for further processing.

NonObviousEmailExtractionAgent

Uses more intelligent email extraction rules to find less obvious email addresses

Passes any found email addresses on to the GoodEmailCollectionAgent.

GoodEmailCollectionAgent

Aggregates known good email addresses.

So to reiterate, all of these agents should be assumed to be running in their own threads. Also, they only communicate to one another via immutable messages.

The FileReadingAgent would have a connection to the ObviousEmailExtractionAgent. The ObviousEmailExtractionAgent would have *two* connections. One to the GoodEmailCollectionAgent and another to the NonObviousEmailExtractionAgent.

In this post I'd like to share the tests that went into creating the FileReadingAgent.

TDD-ing a Process

The first context I worked on assumed there was only one line of text in a file. This is a basic context that just helps to ensure that the basic plumbing for my agent is all hooked up. This is how I tested this context:

The next context I worked on contained blank text lines. I wanted to ensure that those lines of text didn't get passed on to my email finding agents.

The final context has lines of text with whitespace characters and one line of text that is an email address. I wanted to ensure that only lines with any kind of text moved on to the agents that would actually try to parse out email addresses. In hindsight, this probably should have gone in the ObviousEmailExtractionAgent. It seems like the FileReading agent shouldn't really be concerned with this. I could probably just change the name of my FileReadingAgent to NonBlankLineReadingAgent and get by that way. ;)

My "final" code doesn't handle disposal or anything and it definitely should! That's an oversight on my part. Aside from that this code should be pretty much complete:

Notice the use of the IObserver interface? I'm stealing a bit from the new .NET Reactive framework (an idea that I got from Robert Ream). By using the OnNext method I can make my network of agents push oriented rather than pull oriented. The benefits of this can be enumerated in another blog post. :)

How could I connect these to run synchronously? Super easy. This is how I could link the LineByLineFileReader to the ObviousGoodEmailExtractionAgent:

lineByLineFileReader.ShouldSendLinesOfTextTo(obviousGoodEmailExtractionAgent);

Then to start I'd send the filepath I wanted to be processed to the lineByLineFileReader like so:

lineByLineFileReader.OnNext("c:/myfile.txt");

Next time I'll show an overview of the whole application and how it works concurrently with a WPF UI.

Code Like Bozo

Tag tdd

Message Oriented Object Design and James Shore's Challenge

TDD-ing Concurrent Code