(转)Functional C# - Fun With Folds
原文: http://weblogs.asp.net/podwysocki/archive/2009/02/26/functional-c-fun-with-folds.aspx
Recently on Twitter, I’ve stated that many times when explaining these functional programming terms that we don’t get the real reason why, and I hope with some of these posts in the future to change that perception. Instead of just showing you some examples, I’ll try to step from imperative to functional.
In my previous adventure “Fun With Folds”, I explained a bit about folds as well as a challenge to rewrite some standard Haskell prelude functions as folds in both Haskell and F#. After this, I thought to myself that I’d give myself another challenge, this time to implement several functions as folds in C# using the Aggregate operator.
Aggregate / Fold in LINQ
If you recall from the previous post, I explained the basic notion of folds. A fold is a higher-order function that knows how to reduce a given a sequence of elements into a single return value. We could describe it as doing something to each value in the list, updating an accumulator as we go, and returning the value of the accumulator when we’re finished. There are two sorts of folds, a left fold and a right fold. The difference is the way the data is "folded".
The LINQ Aggregate function is an implementation of a left fold. Let’s look at the signatures of the Aggregate overloads to get an idea of how they work:
// Apply an accumulator function over the sequence
public static TSource Aggregate<TSource>(
this IEnumerable<TSource> source,
Func<TSource, TSource, TSource> func);
// Apply accumulator with initial seed
public static TAccumulate Aggregate<TSource, TAccumulate>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> func);
// Apply accumulator with initial seed and projection
public static TResult Aggregate<TSource, TAccumulate, TResult>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> func,
Func<TAccumulate, TResult> resultSelector);
So, what can we do with this knowledge? Some of the easy LINQ operators that you can implement with a left fold are Sum and Count. Let’s implement each with a set of tests to verify behavior.
Count
The first item is to calculate the number of all items in the list. Traditionally, in a more imperative coding style, you might calculate it in the following style:
public static int CountItems<T>(
this IEnumerable<T> items)
{
var count = 0;
foreach (var _ in items)
count++;
return count;
}
Instead, we could implement as a left fold in order to reuse existing higher-order functions and write in a bit more functional style. Let’s write a test to verify the behavior of our CountItems and the Count LINQ operator. Since these can be described as purely functional, we could write them as property-based checks ala QuickCheck. Alas, QuickCheck does not exist for C#, so instead, we could do one better and use Pex to generate our test data and use xUnit.net to do our assertions. Our code might look like the following:
[PexClass(typeof(Extensions))]
public partial class ExtensionsSpecs
{
[PexMethod]
public void CountItems_should_equal_Count(
[PexAssumeNotNull]int[] items)
{
Assert.Equal(items.Count(), items.CountItems());
}
}
public static class Extensions
{
public static int CountItems<T>(this IEnumerable<T> source)
{
return source.Aggregate(0, (acc, x) => 1 + acc);
}
}
What this allows us to do is generate a non-null array of integers to test whether the property of my CountItems should always equal the Count function. We can then run Pex to verify this behavior. But, getting back to the CountItems function, we simply seed the length with a 0 and then with the accumulator and our current index, we add 1 to the accumulator which is then returned.
Sum
The second easy LINQ operator you can implement as a fold would be the Sum function. This function simply adds all of the numbers from the sequence together from left to right. Typically in an imperative coding style, it might look like the following:
public static int SumNums(this IEnumerable<int> items)
{
var acc = 0;
foreach(var item in items)
acc += item;
return acc;
}
Once again, we can eliminate the foreach loops with a fold once again in a manner very similar to above. While using Pex, we can write our property based tests and our implementation as follows:
[PexClass(typeof(Extensions))]
public partial class ExtensionsSpecs
{
[PexMethod]
public void SumInt_should_equal_Sum(
[PexAssumeNotNull]int[] items)
{
Assert.Equal(items.Sum(), items.SumInt());
}
}
public static class Extensions
{
public static int SumInt(this IEnumerable<int> source)
{
return source.Aggregate(0, (acc, x) => x + acc);
}
}
This time, instead of adding one to the accumulator, we are adding the current item. We are using a seed of 0 because if the list is empty, there will be no proper return value. These are interesting, yet simple examples of how you can use higher-order functions to express simple operations and lets you lose that for/each loop.
But, let’s move onto the real challenge.
The Challenge
The challenge this time is to implement other standard LINQ operators that are a little bit more challenging to express as folds. These functions are:
- Map (Select LINQ Operator)
- Filter (Where LINQ Operator)
- Append (Concat LINQ Operator)
- Flatten (C# Implementation of the F# Seq.concat library function)
Map
The map higher-order function applies a given function to a sequence of elements and returns a sequence of the results. In LINQ, this has been implemented as the Select function. Typically, using sequences, a map might be implemented as follows:
public static IEnumerable<R> Map<T, R>(
this IEnumerable<T> items,
Func<T, R> func)
{
foreach (var item in items)
yield return func(item);
}
But, just as above with the other LINQ functions, we can express this one as a fold, moreover, a right fold. In the previous post, I showed how to express a factorial as a right fold using the LINQ Aggregate function, so we’ll apply some of that same logic in this instance.
[PexClass(typeof(Extensions))]
public partial class ExtensionsSpecs
{
[PexMethod]
public void Map_should_equal_Select(
[PexAssumeNotNull]int[] items)
{
Func<int, int> multiplyByTwo = x => x * 2;
Assert.Equal(
items.Select(multiplyByTwo).ToArray(),
items.Map(multiplyByTwo).ToArray());
}
}
public static class Extensions
{
public static IEnumerable<R> Map<T, R>(
this IEnumerable<T> items, Func<T, R> proj)
{
return items.Aggregate<T, Func<IEnumerable<R>, IEnumerable<R>>>(
x => x, (f, c) => x => f((new[] { proj(c) }).Concat(x)))
(Enumerable.Empty<R>());
}
}
Don’t be alarmed at what you see here. What you need to pay attention to is inside the f function which applies the projection to c and then concatenates it to the accumulator. You also will notice we seeded the function with an empty result list. We can run Pex and it will verify the results that indeed they produce identical results when we apply the multiplyByTwo function to all items in the sequence Why a right fold? We want to work our way from the right on the empty sequence and pre-pend our projected result onto it, this producing our sequence in proper order.
Filter
The next higher-order function we can represent is the filter function. This function processes a given list to produce a new list which contains only those element satisfied by the predicate function. In LINQ, this function is expressed as the Where function. Typically, we could express this using a foreach loop much as we have with the other solutions such as this:
public static IEnumerable<T> Filter<T>(
this IEnumerable<T> items,
Func<T, bool> pred)
{
foreach(var item in items)
if(pred(item)) yield return item;
}
Instead, we could apply the lessons we learned from above and implement this as a right fold as well. Let’s review what that might look like with associated test to verify our behavior.
[PexClass(typeof(Extensions))]
public partial class ExtensionsSpecs
{
[PexMethod]
public void Filter_should_equal_Where(
[PexAssumeNotNull]int[] items)
{
Func<int, bool> isEven = x => x % 2 == 0;
Assert.Equal(
items.Where(isEven).ToArray(),
items.Filter(isEven).ToArray());
}
}
public static class Extensions
{
public static IEnumerable<T> Filter<T>(
this IEnumerable<T> items,
Func<T, bool> pred)
{
return items.Aggregate<T, Func<IEnumerable<T>, IEnumerable<T>>>(
x => x, (f, c) => x => f(pred(c) ? (new[] { c }).Concat(x) : x))
(Enumerable.Empty<T>());
}
}
We can test our behavior by applying a isEven function to the list to return only even numbers. Once implemented, we can turn the lamp green by implementing the filter function much as we did above. The difference is we apply the predicate function to c, and if true, we concatenate to the accumulator, else we return the accumulator. As you will also note, we seed the accumulator with an empty sequence. Once we understand the pattern here, this gets easier. The reasoning for using a right fold is exactly the same as above.
Append
The append function is a function that takes two lists and appends the second to the first. In LINQ, this is implemented as the Concat function. I don’t really like the name that they used as to me, Concat means flatten, which we’ll get to next. Anyhow, since this is a more difficult function to express as I have above for the map and filter, let’s skip ahead to our implementation.
[PexClass(typeof(Extensions))]
public partial class ExtensionsSpecs
{
[PexMethod]
public void Append_should_equal_Concat(
[PexAssumeNotNull]int[] left,
[PexAssumeNotNull]int[] right)
{
var expected = left.Concat(right);
var result = left.Append(right);
Assert.Equal(
expected.ToArray(),
result.ToArray());
}
}
public static class Extensions
{
public static IEnumerable<T> Append<T>(
this IEnumerable<T> source,
IEnumerable<T> other)
{
return source.Aggregate<T, Func<IEnumerable<T>, IEnumerable<T>>>(
x => x, (f, c) => x => f((new[] { c }).Append(x)))(other);
}
}
You’ll start to notice a pattern here. Why am I using a right fold? In order to append properly, we start with the other sequence and work our way from the right, pre-pending the rightmost item from the source sequence. We can use Pex to verify this behavior as well. I hope you’re starting to see a pattern in these examples.
Flatten
The final example is the flatten function. This function takes a sequence of sequences and flattens it to a single sequence. In F#, we use the Seq.concat function to express this. In an imperative style, we might be able to write Concat as the following:
public static IEnumerable<T> Flatten<T>(
this IEnumerable<IEnumerable<T>> items)
{
foreach (var x in items)
foreach (var y in x)
yield return y;
}
But, what we did is basically reimplement a naive implementation of the SelectMany without the projection. Instead, we could also rewrite this using a standard LINQ SelectMany operator using a identity projection.
public static IEnumerable<T> Flatten<T>(
this IEnumerable<IEnumerable<T>> items)
{
return items.SelectMany(x => x);
}
But getting back to this challenge, how might we implement this as a right fold instead? Again, since we’re trying to concat to the beginning of the list from the right, this is the best way to do this. Below is the code plus the test to assert the behavior.
[PexClass(typeof(Extensions))]
public partial class ExtensionsSpecs
{
[PexMethod]
public void Flatten_should_concat_inners(int[][] items)
{
PexAssume.IsNotNull(items);
PexAssume.TrueForAll(items, x => x != null);
// Calculate length of array with fold
var length =
items.Aggregate(0, (acc, x) => acc + x.GetLength(0));
Assert.Equal(length, items.Flatten().Count());
}
}
public static class Extensions
{
public static IEnumerable<T> Flatten<T>(
this IEnumerable<IEnumerable<T>> items)
{
return items.Aggregate<IEnumerable<T>,
Func<IEnumerable<T>, IEnumerable<T>>>(
x => x, (f, c) => x => f(c.Concat(x)))
(Enumerable.Empty<T>());
}
}
My test was to ensure that the number of items in the resulting sequence is equal to the sum the length of each item in the array. Our implementation is much like the Append example above, the difference being that the c parameter is now a collection itself, so we can call the Concat method to append the x accumulator.