Skip to content

Thinking Functionally: What is LINQ really?

Paul Louth edited this page Oct 25, 2024 · 29 revisions

LINQ (the syntax, not the fluent library) is an enormously powerful feature of C#. It allows for very succinct processing of collections and data-sources from many sources. But it has a much more fundamental property: it is C#'s way of doing monads.

'Monad' as a term is almost as famous for its mystique as it is in functional programming circles as a super-powered compositional tool.

Why should I care about monads?

Because a monad is a monoid in the category of endofunctors, so what's the problem? Only kidding! This is a bit of a joke on the way that category theorists tend to describe monads. And it's true their origins are from category theory, but then, so is polymorphism, and type theory, and actually all of programming. So let's not get too caught up on that.

What are monads for? Succinctly they're design patterns that allow you to stop writing the same error prone boilerplate over and over again.

IEnumerable<A> is a monad it removes the need to write:

    foreach(var a in listA)
    {
        foreach(var b in listB)
        {
             yield return Process(a, b);
        }
    }

Instead you can write:

    var result = from a in listA
                 from b in listB
                 select Process(a, b);

Now that might seem like a small win. But for us functional programmers it's quite large, because the second example is an expression, whereas the first one isn't.

OK, let's quickly go through some other monads to see how it removes boilerplate:

   using static LanguageExt.Prelude;

   Option<string> optionA = Some("Hello, ");
   Option<string> optionB = Some("World");
   Option<string> optionNone = None;

   var result1 = from x in optionA
                 from y in optionB
                 select x + y;

   var result2 = from x in optionA
                 from y in optionB
                 from z in optionNone
                 select x + y + z;

The Option monad represents optional values, they can be in one of two states: Some(value) or None. In result1 the output is Some("Hello, World"), in result2 the output is None. Whenever a None appears in the LINQ expression the final result will always be None.

This is the shortcut for if. And it's an expression, which is good. The imperative equivalent would be:

    string valueA = "Hello, ";
    string valueB = "World";
    string valueNone = null;

    string result1 = null;
    if(valueA != null)
    {
        if(valueB != null)
        {
           result1 = valueA + valueB; 
        }
    }

    string result2 = null;
    if(valueA != null)
    {
        if(valueB != null)
        {
            if(valueC != null)
            {
               result2 = valueA + valueB + valueC; 
            }
        }
    }

You may think "So what?" I could just change it to: if(valueA != null && valueB != null && valueC != null) to make it more concise. And yes, in this trivial example you could. But just imagine a real world example where valueA, valueB, valueC are calls to functions that depend on the previous values and you'll see that there is a complexity to this. In fact it has a name: Cyclomatic Complexity; and this is what we're reducing with the Option monad.

Writing with expressions also removes silly mistakes. In fact whilst I was writing this I had written result1 = valueA + valueB + valueC; for the second example. That mistake couldn't happen with the first one.

And what about else? Should we have done something there? There is no compilation error, but a programmer looking at your code might not immediately know that there's a bug because you left off the else part of the statement. This is a major source of bugs in C#. So if you want to know why monads are useful, this is one reason right here.

So how does it work?

Like all good design-patterns, monads capture common behaviours so you make fewer mistakes. The problem most people have is that the rules that make monads what they are - are so abstract that it's hard to get a handle on them. LINQ is a syntax for monads and it works like so:

    from a in ma

This is saying "Get the value a out of the monad ma". For IEnumerable that means get the values from the stream, for Option it means get the value if it's not in a None state.

    from a in ma
    from b in mb

As before this is saying "Get the value a out of monad ma, and then if we have a value get the value b out of monad mb". So for IEnumerable if ma is an empty collection then the second from won't run. For Option if ma is a None then the second from won't run.

    from a in ma
    from b in mb
    select a + b;

select in monad land means put this value back into a new monad (in Haskell it's called return, more on that later). So for IEnumerable it means create a new stream of values with the result, and for Option it means create a new option with the result in.

Let's look at how this is implemented:

    public static class EnumerableExtensions
    {
        public static IEnumerable<B> Select<A, B>(this IEnumerable<A> self, Func<A, B> map)
        {
            foreach(var item in self)
            {
                yield return map(item);
            }
        }

        public static IEnumerable<C> SelectMany<A, B, C>(
            this IEnumerable<A> self, 
            Func<A, IEnumerable<B>> bind, 
            Func<A, B, C> project)
        {
            foreach(var a in self)
            {
                foreach(var b in bind(a))
                {
                    yield return project(a, b);
                }
            }
        }
    }

So that's the definition for IEnumerable. The methods Select and SelectMany are kind of hacks in C#. They're special case methods that make a type monadic. The reason for this is because C# doesn't support 'higher kinded types'. That's not important for this discussion, but we'll come across that again later.

Select allows this to work:

    from a in ma
    select a;

SelectMany allows this to work:

    from a in ma
    from b in mb
    select a + b;

You can see from the implementations how they capture the foreach behaviour I mentioned before. This is good. If I were to expand out the above expressions:

    from a in ma
    select a;

    // Is the same as

    ma.Select(a => a);

And:

    from a in ma
    from b in mb
    select a + b;

    // Is the same as

    ma.SelectMany(a => mb.Select(b => a + b));

Let's take a look at the implementation for Option:

    public static class OptionExtensions
    {
        public static Option<B> Select<A, B>(this Option<A> self, Func<A, B> map) =>
            self.Match(
                Some: a  => Some(map(a)),
                None: () => None
            );

        public static Option<C> SelectMany<A, B, C>(
            this Option<A> self, 
            Func<A, Option<B>> bind, 
            Func<A, B, C> project) =>
            self.Match(
                Some: a  => bind(a).Match(
                                Some: b  => Some(project(a, b)),
                                None: () => None),
                None: () => None 
            );
    }

What's happening here is that when the Option is in None state nothing is happening. If you look at SelectMany then if self is None then None is returned; bind isn't invoked, and neither is project. But if self is Some(a) then bind is invoked.

If the return of bind(a) is None then project isn't run; but if it is Some(b) then Some(project(a, b)) is run.

This is capturing the if behaviour from before. The process is known as binding.

Monad rules

Ok, so that's two monads defined. We'll move on to some more later But first it's worth specifying the rules of monads. Here's the definition in Haskell (a language famous for its monad-first style of pure-functional programming):

    pure: a -> ma
    bind: ma -> (a -> mb) -> mb

pure is the pure constructor of the monad: it converts a pure-value: A into an embellished-value: Monad<A>. So for Option<A> it's converting an A into an Option<A>. As we've seen, Some(a) does that.

bind is the function that does the chaining of one monad to another. Now in theory that's what SelectMany is doing. Unfortunately when LINQ was devised the C# team decided to make it a bit more complicated in an attempt to make it more efficient.

This is what Bind should look like for IEnumerable:

    public static IEnumerable<B> Bind<A, B>(
        this IEnumerable<A> self, 
        Func<A, IEnumerable<B>> bind)
    {
        foreach(var a in self)
        {
            foreach(var b in bind(a))
            {
                yield return b;
            }
        }
    }

What you may notice is that it's exactly the same as SelectMany apart from the project function.

Now Option<A>

    public static Option<B> Bind<A, B>(
        this Option<A> self, 
        Func<A, Option<B>> bind) =>
        self.Match(
            Some: a  => bind(a),
            None: () => None);

Again we don't have the Match on the result of bind, we simply return the result of it.

In fact what you may realise is that SelectMany is a combination of Bind and Select:

    public static IEnumerable<C> SelectMany<A, B, C>(
        this IEnumerable<A> self, 
        Func<A, IEnumerable<B>> bind, 
        Func<A, B, C> project) =>
        self.Bind(a => 
            bind(a).Select(b => 
                project(a,b)));

And the same for Option:

    public static Option<C> SelectMany<A, B, C>(
        this Option<A> self, 
        Func<A, Option<B>> bind, 
        Func<A, B, C> project) =>
        self.Bind(a => 
            bind(a).Select(b => 
                project(a,b)));

So, if you define Bind then you can just copy and paste the SelectMany implementation from above and replace the type-names to get a perfect implementation of SelectMany without any thought.

So what is Select?

So if SelectMany is a bastardised version of Bind, which is one of the rules of monads. And we already have the pure function covered with Some(a) for Option and the various collection constructors for IEnumerable, what is Select?

Select in the functional world is known as Map. Any type that implements Map is a 'functor'.

Oh god, here we go again with the abstract terminology. Just remember you had to learn what 'polymorphism' was at some point. Functors are mappable.

So Select/Map makes a type into a functor. Why is that good?

Functors allow you to use regular functions with types that are containers for other types. So if I have an IEnumerable<A> then any function that works by accepting a single argument of A can be mapped.

Take a look at this:

   var value = 'a';

   char result = Char.ToUpper(value);  // 'A'

The Char.ToUpper function takes a char and returns a char. It maps a lower-case char to an upper-case char.

Now if we want to do that with an IEnumerable in the classic imperative way:

    static IEnumerable<char> ToUpper(this IEnumerable<char> self)
    {
        foreach(var ch in self)
        {
            yield return Char.ToUpper(ch);
        }
    }

There we go again, typing the same old boilerplate. Well Select/Map can rescue us:

    var list = List('a', 'b', 'c');

    var uppers = list.Map(Char.ToUpper);

So Map and Select allow us to work with what's known as the bound value within a functor, without having to provide any special case code for dealing with the outer type (IEnumerable, Option, etc.)

In fact this works for Option too:

    var option = Some('a');

    option = option.Map(Char.ToUpper);

And all types that are functors in language-ext (which is most of them).

Similarities between functors and monads

Let's start with the differences: functors don't have a pure constructor function. Well, they can have one, but its not part of the rules of being a functor.

But the similarities are interesting. Take a look at the signatures for Map and Bind:

    Option<B> Map <A, B>(Option<A> ma, Func<A, B> map);
    Option<B> Bind<A, B>(Option<A> ma, Func<A, Option<B>> bind);

Apart from the return type for the Func in each, they're identical. In fact Map and Select can be implemented in terms of Bind:

    static Option<B> Map<A, B>(this Option<A> ma,  Func<A, B> map) =>
       ma.Bind(a => Some(map(a));

So we've seen how Map, Select, and SelectMany can all be implemented in terms of Bind. It's often less efficient to do that, but it may start to give you a bit of insight into how this stuff all fits together.

Other monads

Below are some monadic types in this library:

Type Description
Arr Immutable array
Either Either one value (Left) or another value (Right)
HashSet Immutable hash-set
IO Make impure IO side-effects pure
Lst Immutable list
Option Optional result (which can't be null)
Parser Parser combinators - Super, super powerful system for building composable parsers
Que Immutable queue
Reader Provides an 'environment' that can be accessed via Prelude.ask - this is how dependency injection is done in the functional world
Seq Immutable set
State Provides a state value that can be read using Prelude.get, and written with Prelude.put. This allows for context objects to be passed through an expression without having to explicitly add them as extra arguments to your functions. This is how contextual state is managed in the functional world when you don't have mutable arguments.
Try Optional result (which can't be null) and that catches exceptions and propagates them in the expression result.
Writer This is for writing a log of values. So it's a bit like the State monad without an input value.
Validation Works like Either when used as a monad (i.e. in LINQ and with Map and Bind, but is also an applicative, which allows for collection of multiple failure values rather than just the one with Either).

You don't need to use any of these things to write functional code. And I would advise that with the more advanced monads like Reader, State, Writer, to use them sparingly early on in your learning. They're mostly to get over limitations in pure languages like Haskell. However, for the right application they are very powerful.

As with all new concepts, monads take a bit of time to get your head around. Just remember they're basically encapsulated functionality with a few rules attached.

This introduction to LINQ and the idea behind monads doesn't tell the whole story. The problem with using extension methods (Select and SelectMany) is that we can't generalise over all monads. This limitation is because C# doesn't have Higher-Kinded traits. If you're feeling adventurous I have a blog-series of how to do get higher-kinds, in a type-safe way, in C#. It also goes much deeper on the theory behind functors, monads, and other traits.

NEXT: Application architecture