-
-
Notifications
You must be signed in to change notification settings - Fork 422
Thinking Functionally: Application Architecture
Now that you know how to define functions, how can you organise them? Classically in C# we use methods; and methods are attached to objects with state. This is a pattern of behaviour that you'll want to start distancing yourself from as you start to think functionally.
A module is just a set of functions that are grouped together, typically because they work on the same data type or types. Modules is a term taken from F# (and OCaml, ...) which are basically organisational units for groups of functions. In C# you can do this by using static classes with static methods and without any static state.
Here's a module that contains two functions:
public static class MathStuff
{
static public int Add(int x, int y) =>
x + y;
static public int Subtract(int x, int y) =>
x - y;
}
If you're a seasoned OO-head then you might be screaming "That's the anaemic domain model!". All I can say to that is: Forget everything the Gang of Four have told you. They are wrong. Martin Fowler isn't some kind of god. Well, he might be god of OO land, but he's responsible for many, many lost hours.
The 'anaemic domain module' is a good thing, if:
- You work mostly with immutable types
- Your functions are mostly pure / referentially transparent (more on that later)
One technique I use (and I'm not saying this is the one true way), is to create a 'schema' project. That is a set of Record types (i.e. immutable classes, with all data public, and no processing functionality other than methods that work entirely with the data of the record type). For example let's say you had a Record
type called Person
, with FirstName
, and LastName
fields; then I think it's OK to have a GetFullName()
method attached to it. But not a AddPersonToDatabase()
method, or anything similar that has dependencies outside of the type itself. You can use the Record<A>
feature of language-ext to help you build immutable record types that have structural equality by default:
public class Person : Record<Person>
{
public readonly int Id;
public readonly string Name;
public readonly string Surname;
public Person(int id, string name, string surname)
{
Id = id;
Name = name;
Surname = surname;
}
}
var x = new Person(1, "Joe", "Bloggs");
var y = new Person(1, "Joe", "Bloggs");
var z = new Person(2, "Jane", "Doe");
x == y; // true
x == z; // false
The purpose of defining this schema, is it's a bit like defining a schema in a relational database. And this feels natural because relational databases are basically functional data structures and functional programming in action. I find it really useful to clarify my data structures upfront like this, it makes me think about how the various parts of the system are going to fit together.
Once you have that, you want to start defining behaviours. Now you can go one of two ways: bottom->up
, or top->down
. For example, let's say you were writing a web-app. Your top->down
approach might be to think of the function: MapRequestToResponse
. Which is a static
function that takes a Request
record and maps it to a Response
record. You can then break down that function into its composable parts. The bottom->up
approach would be to work out the small composable units and then start building bigger ones out of it until you have your system.
They both have pros and cons and actually I find different sub-systems benefit better from one approach than the other. For example in echo-process
I have a configuration file parser which uses lang-ext Parser
monad. It works bottom->up
, building all the smaller components, then combining those into the larger ones, and what emerges is a really complex system that can not only parse config files, but is also a mini-statically typed language. If I'd gone top->down
then the end result would probably have failed.
The top->down
approach tends to work better when you have a clear end result. So the MapRequestToResponse
example is a good one. You can imagine writing:
public static Response MapRequestResponse(Request req, Func<Headers, FormItems, Session, Response> userRequestHandler)
{
var headers = ExtractHeaders(req);
var formItems = ExtractFormItems(req);
var session = InitialiseSession(req);
return userRequestHandler(headers, formItems, session);
}
And then breaking down what ExtractHeaders
does, and ExtractFormItems
, etc. This feels quite natural; and the point of it all is that each function just maps from an immutable record to another immutable record (from your 'schema' project).
What I tend to find with this is that it's much easier to create 'modules'. That is static classes full of pure functions that map records. Because they're all islands of functionality in their own right (and very easy to test). So you tend to group similar functions together, or functions that naturally belong near each other (say an AccountingModule
, or a DataAccessModule
, etc.) What's good about the module approach is if you need to move any functions, it doesn't matter, because there's not some huge dependency graph like OO - they only depend on their arguments and other pure functions, and therefore there's not the same cyclomatic complexity issue because of where things sit in the code base.
The difficult part comes with IO: Talking to the mutable 'world'. This is truly the most awkward part about functional programming in C# (and to be fair F# too). You want to be working with pure functions, so you don't want IO and global state getting in the way.
Really an idealised (but not realistic) solution is:
(World, R) DoOperation(World world, A value);
Where an immutable world is passed into your pure function, and if it needs to read something from the World
it gets it from the parameters rather than by calling File.ReadText(p)
or whatever. The result would be the new World
and a value. If you haven't seen it this talk by Rich Hickey all about time will explain why that's a valuable approach.
A less insane version of that is to just pass in the bits of the World
that matter. So if DoOperation
reads data from a file, then pass in the data instead and move the IO outside of the function. The further and further to the edges of your application you can push your IO the more pure code you will have, which means more code that you can reliably verify.
This is where Haskell does a really good job. Because it has the IO monad, which doesn't do IO at the point of the instruction, it builds a set of continuations. The abstract form of the IO monad is called a Free monad. I'm using one now on a project where I want to put the IO in a box but I can't split it out from the pure functionality. It allows me to provide an interpreter for the computations, which then allows me to inject mocked IO rather than the real IO without compromising any of the quality of core logic. Unfortunately I haven't yet found a way to generalise the Free monad in C#, so if you want to use it you have to build one per use-case.
Take a look at this gist for an example of how to build your own IO monad and how to build interpreters that run it. It's very powerful, and the most successful way I know of splitting out the 'difficult bits' and making all of my code truly composable. It requires a reasonable amount of upfront typing, but once your IO is defined then you're good. It is obviously much less efficient than just writing statement; statement; statement;
. Because each 'statement' generates a new object.
I would say that it's probably not worth the hassle, unless you have an obsession about the purity thing. In other systems I instead rely on the actor model, which is why I built the echo-process library.
The idea is that each actor is just a function, one with the signature State -> Message -> State
. So it takes a state and message and returns a new state. Each actor is single threaded, so can only process one message at a time. This gives it the exact same signature as Fold
in functional programming. So I treat these mini-processes as pure bubbles of functional code.
Each actor basically has its own 'global state', but it is its own piece of state, and only ever updated by itself. So the edge is the boundary of the actor rather than lots of global static state across the application that other code could mess around with. When I need something to interact with the outside world (a database, a file, etc.) then I create an actor that does the dirty work.
The good thing about the echo-process actor model is that it has a supervision hierarchy. So I create an 'Application' actor, it creates child actors, they create child actors etc. And that's where the application structure comes from.
I think giving advice about how to structure your code over and above this level is dangerous in such a small space. I could write a book on it to cover all the possible ways you could architect large systems.
Mostly you shouldn't get too caught up on how different functional programming is in architecture to say OO. You will still tend to have layers in your app, or services that do stuff. It's how those layers are implemented that matter:
- Schema project describing the data structures
- Mostly static classes with pure functions,
- Mostly immutable data structures that don't have externally dependent functionality attached to them,
- IO under control
Finally you don't have to completely throw OO under the bus. When it comes to extensible architecture it can sometimes be better. Inheritance isn't always bad. It's just nearly always bad. If I was implementing a file-system, and I wanted a way of representing files of different types, each with their own loading code, or editing code then having a File
base class, with two virtual
methods called Load
and Edit
is a pretty good way of doing it.
It's good because at a later date I can easily add another file-type. That's the key (and described by The Expression Problem). In my experience most programs are not like this, and don't need extensible inheritance hierarchies, and that the abstraction obsession creates unnecessary complexity.