-
-
Notifications
You must be signed in to change notification settings - Fork 422
Thinking Functionally: What is LINQ really?
LINQ (the syntax, not the fluent library) is an enormously powerful feature of C#. It allows for very succinct processing of collections and data-sources from many sources. But it has a much more fundamental property: it is C#'s way of doing monads.
'Monad' as a term is almost as famous for its mystique as it is in functional programming circles as a super-powered compositional tool.
Because a monad is a monoid in the category of endofunctors, so what's the problem? Only kidding! This is a bit of a joke on the way that category theorists tend to describe monads. And it's true their origins are from category theory, but then, so is polymorphism, and type theory, and actually all of programming. So let's not get too caught up on that.
What are monads for? Succinctly they're design patterns that allow you to stop writing the same error prone boilerplate over and over again.
IEnumerable<A>
is a monad it removes the need to write:
foreach(var a in listA)
{
foreach(var b in listB)
{
yield return Process(a, b);
}
}
Instead you can write:
var result = from a in listA
from b in listB
select Process(a, b);
Now that might seem like a small win. But for us functional programmers it's quite large, because the second example is an expression, whereas the first one isn't.
OK, let's quickly go through some other monads to see how it removes boilerplate:
using static LanguageExt.Prelude;
Option<string> optionA = Some("Hello, ");
Option<string> optionB = Some("World");
Option<string> optionNone = None;
var result1 = from x in optionA
from y in optionB
select x + y;
var result2 = from x in optionA
from y in optionB
from z in optionNone
select x + y + z;
The Option
monad represents optional values, they can be in one of two states: Some(value)
or None
. In result1
the output is Some("Hello, World")
, in result2
the output is None
. Whenever a None
appears in the LINQ expression the final result will always be None
.
This is the shortcut for if
. And it's an expression, which is good. The imperative equivalent would be:
string valueA = "Hello, ";
string valueB = "World";
string valueNone = null;
string result1 = null;
if(valueA != null)
{
if(valueB != null)
{
result1 = valueA + valueB;
}
}
string result2 = null;
if(valueA != null)
{
if(valueB != null)
{
if(valueC != null)
{
result2 = valueA + valueB + valueC;
}
}
}
You may think "So what?" I could just change it to: if(valueA != null && valueB != null && valueC != null)
to make it more concise. And yes, in this trivial example you could. But just imagine a real world example where valueA
, valueB
, valueC
are calls to functions that depend on the previous values and you'll see that there is a complexity to this. In fact it has a name: Cyclomatic Complexity; and this is what we're reducing with the Option
monad.
Writing with expressions also removes silly mistakes. In fact whilst I was writing this I had written result1 = valueA + valueB + valueC;
for the second example. That mistake couldn't happen with the first one.
And what about else
? Should we have done something there? There is no compilation error, but a programmer looking at your code might not immediately know that there's a bug because you left off the else
part of the statement. This is a major source of bugs in C#. So if you want to know why monads are useful, this is one reason right here.
Like all good design-patterns, monads capture common behaviours so you make fewer mistakes. The problem most people have is that the rules that make monads what they are - are so abstract that it's hard to get a handle on them. LINQ is a syntax for monads and it works like so:
from a in ma
This is saying "Get the value a
out of the monad ma
". For IEnumerable
that means get the values from the stream, for Option
it means get the value if it's not in a None
state.
from a in ma
from b in mb
As before this is saying "Get the value a
out of monad ma
, and then if we have a value get the value b
out of monad mb
". So for IEnumerable
if ma
is an empty collection then the second from
won't run. For Option
if ma
is a None
then the second from
won't run.
from a in ma
from b in mb
select a + b;
select
in monad land means put this value back into a new monad (in Haskell it's called return
, more on that later). So for IEnumerable
it means create a new stream of values with the result, and for Option
it means create a new option with the result in.
Let's look at how this is implemented:
public static class EnumerableExtensions
{
public static IEnumerable<B> Select<A, B>(this IEnumerable<A> self, Func<A, B> map)
{
foreach(var item in self)
{
yield return map(item);
}
}
public static IEnumerable<C> SelectMany<A, B, C>(
this IEnumerable<A> self,
Func<A, IEnumerable<B>> bind,
Func<A, B, C> project)
{
foreach(var a in self)
{
foreach(var b in bind(a))
{
yield return project(a, b);
}
}
}
}
So that's the definition for IEnumerable
. The methods Select
and SelectMany
are kind of hacks in C#. They're special case methods that make a type monadic. The reason for this is because C# doesn't support 'higher kinded types'. That's not important for this discussion, but we'll come across that again later.
Select
allows this to work:
from a in ma
select a;
SelectMany
allows this to work:
from a in ma
from b in mb
select a + b;
You can see from the implementations how they capture the foreach
behaviour I mentioned before. This is good. If I were to expand out the above expressions:
from a in ma
select a;
// Is the same as
ma.Select(a => a);
And:
from a in ma
from b in mb
select a + b;
// Is the same as
ma.SelectMany(a => mb.Select(b => a + b));
Let's take a look at the implementation for Option
:
public static class OptionExtensions
{
public static Option<B> Select<A, B>(this Option<A> self, Func<A, B> map) =>
self.Match(
Some: a => Some(map(a)),
None: () => None
);
public static Option<C> SelectMany<A, B, C>(
this Option<A> self,
Func<A, Option<B>> bind,
Func<A, B, C> project) =>
self.Match(
Some: a => bind(a).Match(
Some: b => Some(project(a, b)),
None: () => None),
None: () => None
);
}
What's happening here is that when the Option
is in None
state nothing is happening. If you look at SelectMany
then if self
is None
then None
is returned; bind
isn't invoked, and neither is project
. But if self
is Some(a)
then bind
is invoked.
If the return of bind(a)
is None
then project
isn't run; but if it is Some(b)
then Some(project(a, b))
is run.
This is capturing the if
behaviour from before. The process is known as binding.
Ok, so that's two monads defined. We'll move on to some more later But first it's worth specifying the rules of monads. Here's the definition in Haskell (a language famous for its monad-first style of pure-functional programming):
pure: a -> ma
bind: ma -> (a -> mb) -> mb
pure
is the pure constructor of the monad: it converts a pure-value: A
into an embellished-value: Monad<A>
. So for Option<A>
it's converting an A
into an Option<A>
. As we've seen, Some(a)
does that.
bind
is the function that does the chaining of one monad to another. Now in theory that's what SelectMany
is doing. Unfortunately when LINQ was devised the C# team decided to make it a bit more complicated in an attempt to make it more efficient.
This is what Bind
should look like for IEnumerable
:
public static IEnumerable<B> Bind<A, B>(
this IEnumerable<A> self,
Func<A, IEnumerable<B>> bind)
{
foreach(var a in self)
{
foreach(var b in bind(a))
{
yield return b;
}
}
}
What you may notice is that it's exactly the same as SelectMany
apart from the project
function.
Now Option<A>
public static Option<B> Bind<A, B>(
this Option<A> self,
Func<A, Option<B>> bind) =>
self.Match(
Some: a => bind(a),
None: () => None);
Again we don't have the Match
on the result of bind
, we simply return the result of it.
In fact what you may realise is that SelectMany
is a combination of Bind
and Select
:
public static IEnumerable<C> SelectMany<A, B, C>(
this IEnumerable<A> self,
Func<A, IEnumerable<B>> bind,
Func<A, B, C> project) =>
self.Bind(a =>
bind(a).Select(b =>
project(a,b)));
And the same for Option
:
public static Option<C> SelectMany<A, B, C>(
this Option<A> self,
Func<A, Option<B>> bind,
Func<A, B, C> project) =>
self.Bind(a =>
bind(a).Select(b =>
project(a,b)));
So, if you define Bind
then you can just copy and paste the SelectMany
implementation from above and replace the type-names to get a perfect implementation of SelectMany
without any thought.
So if SelectMany
is a bastardised version of Bind
, which is one of the rules of monads. And we already have the pure
function covered with Some(a)
for Option
and the various collection constructors for IEnumerable
, what is Select
?
Select
in the functional world is known as Map
. Any type that implements Map
is a 'functor'.
Oh god, here we go again with the abstract terminology. Just remember you had to learn what 'polymorphism' was at some point. Functors are mappable.
So Select
/Map
makes a type into a functor. Why is that good?
Functors allow you to use regular functions with types that are containers for other types. So if I have an IEnumerable<A>
then any function that works by accepting a single argument of A
can be mapped.
Take a look at this:
var value = 'a';
char result = Char.ToUpper(value); // 'A'
The Char.ToUpper
function takes a char
and returns a char
. It maps a lower-case char
to an upper-case char
.
Now if we want to do that with an IEnumerable
in the classic imperative way:
static IEnumerable<char> ToUpper(this IEnumerable<char> self)
{
foreach(var ch in self)
{
yield return Char.ToUpper(ch);
}
}
There we go again, typing the same old boilerplate. Well Select
/Map
can rescue us:
var list = List('a', 'b', 'c');
var uppers = list.Map(Char.ToUpper);
So Map
and Select
allow us to work with what's known as the bound value within a functor, without having to provide any special case code for dealing with the outer type (IEnumerable
, Option
, etc.)
In fact this works for Option
too:
var option = Some('a');
option = option.Map(Char.ToUpper);
And all types that are functors in language-ext (which is most of them).
Let's start with the differences: functors don't have a pure
constructor function. Well, they can have one, but its not part of the rules of being a functor.
But the similarities are interesting. Take a look at the signatures for Map
and Bind
:
Option<B> Map <A, B>(Option<A> ma, Func<A, B> map);
Option<B> Bind<A, B>(Option<A> ma, Func<A, Option<B>> bind);
Apart from the return type for the Func
in each, they're identical. In fact Map
and Select
can be implemented in terms of Bind
:
static Option<B> Map<A, B>(this Option<A> ma, Func<A, B> map) =>
ma.Bind(a => Some(map(a));
So we've seen how Map
, Select
, and SelectMany
can all be implemented in terms of Bind
. It's often less efficient to do that, but it may start to give you a bit of insight into how this stuff all fits together.
Below are some monadic types in this library:
Type | Description |
---|---|
Arr |
Immutable array |
Either |
Either one value (Left) or another value (Right) |
HashSet |
Immutable hash-set |
IO |
Make impure IO side-effects pure |
Lst |
Immutable list |
Option |
Optional result (which can't be null ) |
Parser |
Parser combinators - Super, super powerful system for building composable parsers |
Que |
Immutable queue |
Reader |
Provides an 'environment' that can be accessed via Prelude.ask - this is how dependency injection is done in the functional world |
Seq |
Immutable set |
State |
Provides a state value that can be read using Prelude.get , and written with Prelude.put . This allows for context objects to be passed through an expression without having to explicitly add them as extra arguments to your functions. This is how contextual state is managed in the functional world when you don't have mutable arguments. |
Try |
Optional result (which can't be null ) and that catches exceptions and propagates them in the expression result. |
Writer |
This is for writing a log of values. So it's a bit like the State monad without an input value. |
Validation |
Works like Either when used as a monad (i.e. in LINQ and with Map and Bind , but is also an applicative, which allows for collection of multiple failure values rather than just the one with Either ). |
You don't need to use any of these things to write functional code. And I would advise that with the more advanced monads like Reader
, State
, Writer
, to use them sparingly early on in your learning. They're mostly to get over limitations in pure languages like Haskell. However, for the right application they are very powerful.
As with all new concepts, monads take a bit of time to get your head around. Just remember they're basically encapsulated functionality with a few rules attached.
This introduction to LINQ and the idea behind monads doesn't tell the whole story. The problem with using extension methods (
Select
andSelectMany
) is that we can't generalise over all monads. This limitation is because C# doesn't have Higher-Kinded traits. If you're feeling adventurous I have a blog-series of how to do get higher-kinds, in a type-safe way, in C#. It also goes much deeper on the theory behind functors, monads, and other traits.