数据流
需要 System.Threading.Tasks.Dataflow
包
TPL 定义了三种数据流块:源块、目标块和传播器块。 源块作为数据源,可以读取。 目标块作为数据接收方,可以写入。 传播器块作为源块和目标块,可以读取和写入。
当源链接到目标时,您可以根据消息的值提供一个委托来决定目标块是接受还是拒绝该消息。 这种筛选机制很有用,它可以保证数据流块只接收特定值。
对于大多数预定义的数据流块类型,如果源块连接到多个目标块,那么当目标块拒绝消息时,源将向下一个目标提供该消息。 源向目标提供消息的顺序是按源定义的,可以根据源类型的不同而不同。
一个目标接受消息后,大多数源块类型会停止提供该消息。 此规则的例外情况是 BroadcastBlock<T> 类,这个类向所有目标提供每条消息,即使某些目标拒绝消息。
由于每个预定义源数据流块类型确保了消息是按照它们接收的顺序来传播的,因此每一条消息都必须在源块可以处理下一条消息之前从源块读取。 因此,当您使用筛选向一个源连接多个目标时,请确保至少一个目标块能够接收每一条消息。 否则,您的应用程序可能发生死锁。
预定义的数据流块类型
TPL 数据流库提供了多个预定义的数据流块类型。 这些类型分为三个类别:缓冲块、执行块和分组块
三种缓冲块类型:System.Threading.Tasks.Dataflow.BufferBlock<T>、System.Threading.Tasks.Dataflow.BroadcastBlock<T> 和 System.Threading.Tasks.Dataflow.WriteOnceBlock<T>。
三种执行块类型:ActionBlock<TInput>、System.Threading.Tasks.Dataflow.TransformBlock<TInput,TOutput> 和 System.Threading.Tasks.Dataflow.TransformManyBlock<TInput,TOutput>。
BufferBlock<T> 类表示一般用途的异步消息结构。 此类存储先进先出 (FIFO) 消息队列,此消息队列可由多个源写入或从多个目标读取。 在目标收到来自 BufferBlock<T> 对象的消息时,将从消息队列中删除此消息。 因此,虽然一个 BufferBlock<T> 对象可以具有多个目标,但只有一个目标将接收每条消息。 需将多条消息传递给另一个组件,且该组件必须接收每条消息时,BufferBlock<T> 类十分有用。
需向多个组件广播消息时, BroadcastBlock<T> 类很有用。
// Create a BroadcastBlock<double> object.
var broadcastBlock = new BroadcastBlock<double>(null);
// Post a message to the block.
broadcastBlock.Post(Math.PI);
// Receive the messages back from the block several times.
for (int i = 0; i < 3; i++)
{
Console.WriteLine(broadcastBlock.Receive());
}
/* Output:
3.14159265358979
3.14159265358979
3.14159265358979
*/
由于值在被读取之后不会从 BroadcastBlock<T> 对象中移除,因此每一次的可用值都相同。
WriteOnceBlock<T> 对象在收到值后(而不是在构造时)成为不可变对象。 与 BroadcastBlock<T> 类相似,在目标收到来自 WriteOnceBlock<T> 对象的消息时,不会从该目标删除此消息。 因此,多个目标将接收到该消息的副本。 当您想要仅传播多条消息中的第一条时,WriteOnceBlock<T> 类很有用。
// Create a WriteOnceBlock<string> object. var writeOnceBlock = new WriteOnceBlock<string>(null); // Post several messages to the block in parallel. The first // message to be received is written to the block. // Subsequent messages are discarded. Parallel.Invoke( () => writeOnceBlock.Post("Message 1"), () => writeOnceBlock.Post("Message 2"), () => writeOnceBlock.Post("Message 3")); // Receive the message from the block. Console.WriteLine(writeOnceBlock.Receive()); /* Sample output: Message 2 */
TransformBlock<TInput,TOutput> 类与 ActionBlock<TInput> 类相似,不同之处在于它可以同时充当源和目标。
TransformManyBlock<TInput,TOutput> 为每一个输入值生成零个或多个输出值,
// Create a TransformManyBlock<string, char> object that splits // a string into its individual characters. var transformManyBlock = new TransformManyBlock<string, char>( s => s.ToCharArray()); // Post two messages to the first block. transformManyBlock.Post("Hello"); transformManyBlock.Post("World"); // Receive all output values from the block. for (int i = 0; i < ("Hello" + "World").Length; i++) { Console.WriteLine(transformManyBlock.Receive()); } /* Output: H e l l o W o r l d */
每个 ActionBlock<TInput>、TransformBlock<TInput,TOutput> 和 TransformManyBlock<TInput,TOutput> 对象都缓冲输入消息,直到块准备处理它们。 默认情况下,这些类以接收消息的顺序处理消息,一次处理一条消息。 您还可以指定并行度,使 ActionBlock<TInput>、TransformBlock<TInput,TOutput> 和 TransformManyBlock<TInput,TOutput> 对象同时处理多条消息。
设置 ExecutionDataflowBlockOptions.MaxDegreeOfParallelism 属性使执行数据流块一次处理多条消息。 当数据流块需要执行长时间运行的计算并且可从并行处理消息中获益时,这种做法很有用。
using System; using System.Diagnostics; using System.Threading; using System.Threading.Tasks.Dataflow; // Demonstrates how to specify the maximum degree of parallelism // when using dataflow. class Program { // Performs several computations by using dataflow and returns the elapsed // time required to perform the computations. static TimeSpan TimeDataflowComputations(int maxDegreeOfParallelism, int messageCount) { // Create an ActionBlock<int> that performs some work. var workerBlock = new ActionBlock<int>( // Simulate work by suspending the current thread. millisecondsTimeout => Thread.Sleep(millisecondsTimeout), // Specify a maximum degree of parallelism. new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = maxDegreeOfParallelism }); // Compute the time that it takes for several messages to // flow through the dataflow block. Stopwatch stopwatch = new Stopwatch(); stopwatch.Start(); for (int i = 0; i < messageCount; i++) { workerBlock.Post(1000); } workerBlock.Complete(); // Wait for all messages to propagate through the network. workerBlock.Completion.Wait(); // Stop the timer and return the elapsed number of milliseconds. stopwatch.Stop(); return stopwatch.Elapsed; } static void Main(string[] args) { int processorCount = Environment.ProcessorCount; int messageCount = processorCount; // Print the number of processors on this computer. Console.WriteLine("Processor count = {0}.", processorCount); TimeSpan elapsed; // Perform two dataflow computations and print the elapsed // time required for each. // This call specifies a maximum degree of parallelism of 1. // This causes the dataflow block to process messages serially. elapsed = TimeDataflowComputations(1, messageCount); Console.WriteLine("Degree of parallelism = {0}; message count = {1}; " + "elapsed time = {2}ms.", 1, messageCount, (int)elapsed.TotalMilliseconds); // Perform the computations again. This time, specify the number of // processors as the maximum degree of parallelism. This causes // multiple messages to be processed in parallel. elapsed = TimeDataflowComputations(processorCount, messageCount); Console.WriteLine("Degree of parallelism = {0}; message count = {1}; " + "elapsed time = {2}ms.", processorCount, messageCount, (int)elapsed.TotalMilliseconds); } } /* Sample output: Processor count = 4. Degree of parallelism = 1; message count = 4; elapsed time = 4032ms. Degree of parallelism = 4; message count = 4; elapsed time = 1001ms. */
分组块在各种约束下合并一个或多个源的数据。 TPL 数据流库提供三种联接块类型:BatchBlock<T>、JoinBlock<T1,T2> 和 BatchedJoinBlock<T1,T2>。
// Create a BatchBlock<int> object that holds ten // elements per batch. var batchBlock = new BatchBlock<int>(10); // Post several values to the block. for (int i = 0; i < 13; i++) { batchBlock.Post(i); } // Set the block to the completed state. This causes // the block to propagate out any any remaining // values as a final batch. batchBlock.Complete(); // Print the sum of both batches. Console.WriteLine("The sum of the elements in batch 1 is {0}.", batchBlock.Receive().Sum()); Console.WriteLine("The sum of the elements in batch 2 is {0}.", batchBlock.Receive().Sum()); /* Output: The sum of the elements in batch 1 is 45. The sum of the elements in batch 2 is 33. */
// Create a JoinBlock<int, int, char> object that requires // two numbers and an operator. var joinBlock = new JoinBlock<int, int, char>(); // Post two values to each target of the join. joinBlock.Target1.Post(3); joinBlock.Target1.Post(6); joinBlock.Target2.Post(5); joinBlock.Target2.Post(4); joinBlock.Target3.Post('+'); joinBlock.Target3.Post('-'); // Receive each group of values and apply the operator part // to the number parts. for (int i = 0; i < 2; i++) { var data = joinBlock.Receive(); switch (data.Item3) { case '+': Console.WriteLine("{0} + {1} = {2}", data.Item1, data.Item2, data.Item1 + data.Item2); break; case '-': Console.WriteLine("{0} - {1} = {2}", data.Item1, data.Item2, data.Item1 - data.Item2); break; default: Console.WriteLine("Unknown operator '{0}'.", data.Item3); break; } } /* Output: 3 + 5 = 8 6 - 4 = 2 */
BatchedJoinBlock<T1,T2> 视为 BatchBlock<T> 和 JoinBlock<T1,T2> 的组合。 在创建 BatchedJoinBlock<T1,T2> 对象时,指定每个批的大小。 BatchedJoinBlock<T1,T2> 还提供了属性 Target1 和 Target2 来实现 ITargetBlock<TInput>。
如果从一个或多个源收集数据,再批处理多个数据元素,就会发现这种批处理机制非常有用。
// For demonstration, create a Func<int, int> that // returns its argument, or throws ArgumentOutOfRangeException // if the argument is less than zero. Func<int, int> DoWork = n => { if (n < 0) throw new ArgumentOutOfRangeException(); return n; }; // Create a BatchedJoinBlock<int, Exception> object that holds // seven elements per batch. var batchedJoinBlock = new BatchedJoinBlock<int, Exception>(7); // Post several items to the block. foreach (int i in new int[] { 5, 6, -7, -22, 13, 55, 0 }) { try { // Post the result of the worker to the // first target of the block. batchedJoinBlock.Target1.Post(DoWork(i)); } catch (ArgumentOutOfRangeException e) { // If an error occurred, post the Exception to the // second target of the block. batchedJoinBlock.Target2.Post(e); } } // Read the results from the block. var results = batchedJoinBlock.Receive(); // Print the results to the console. // Print the results. foreach (int n in results.Item1) { Console.WriteLine(n); } // Print failures. foreach (Exception e in results.Item2) { Console.WriteLine(e.Message); } /* Output: 5 6 13 55 0 Specified argument was out of the range of valid values. Specified argument was out of the range of valid values. */
指定贪婪与非贪婪行为
几个分组数据流块类型可以在贪婪或非贪婪模式下运行。 默认情况下,预定义的数据流块类型在贪婪模式下运行。
对于联接块类型(如 JoinBlock<T1,T2>),贪婪模式意味着块立即接受数据,即使相应的数据联接不可用。 非贪婪模式意味着块推迟所有传入的消息,直到在其每个目标上有一个可完成联接。 如果任何推迟的消息不再可用,则联接块会释放所有推迟的消息并重新启动该过程。 对于 BatchBlock<T> 类,贪婪和非贪婪行为非常相似,不同之处在于在非贪婪模式下,BatchBlock<T> 对象推迟所有传入的消息,直到不同源中有足够消息可用于完成批作业。
new JoinBlock<NetworkResource, MemoryResource>( new GroupingDataflowBlockOptions { Greedy = false });
非贪婪模式使联接块能够共享一个或多个源块,以便在其他块等待数据时能够使进程向前推进。 使用非贪婪联接还有助于防止应用程序中出现死锁。 在软件应用中,如果两个或多个进程分别留有资源,且相互等待另一进程释放其他资源,就会发生死锁 。 考虑一个定义两个 JoinBlock<T1,T2> 对象的应用程序。 两个对象都从两个共享源块读取数据。 在贪婪模式下,如果一个联接块从第一个源读取,第二个联接块从第二个源读取,则应用程序可能发生死锁,原因是两个联接块相互等待另一个联接块释放其资源。 在非贪婪模式下,每个联接块只在所有数据可用时才从其源读取,因此消除了死锁风险。
数据流块的写入和读取
在应用组件间传播消息的一种方法是,调用 Post 和 DataflowBlock.SendAsync 方法,向目标数据流块发送消息(Post 同步运行,SendAsync 异步运行),再调用 Receive、ReceiveAsync 和 TryReceive 方法接收源数据流块发送的消息。
// Write to and read from the message block concurrently. var post01 = Task.Run(() => { bufferBlock.Post(0); bufferBlock.Post(1); }); var receive = Task.Run(() => { for (int i = 0; i < 3; i++) { Console.WriteLine(bufferBlock.Receive()); } }); var post2 = Task.Run(() => { bufferBlock.Post(2); }); Task.WaitAll(post01, receive, post2); /* Sample output: 2 0 1 */
using System; using System.Threading.Tasks; using System.Threading.Tasks.Dataflow; // Demonstrates a basic producer and consumer pattern that uses dataflow. class DataflowProducerConsumer { // Demonstrates the production end of the producer and consumer pattern. static void Produce(ITargetBlock<byte[]> target) { // Create a Random object to generate random data. Random rand = new Random(); // In a loop, fill a buffer with random data and // post the buffer to the target block. for (int i = 0; i < 100; i++) { // Create an array to hold random byte data. byte[] buffer = new byte[1024]; // Fill the buffer with random bytes. rand.NextBytes(buffer); // Post the result to the message block. target.Post(buffer); } // Set the target to the completed state to signal to the consumer // that no more data will be available. target.Complete(); } // Demonstrates the consumption end of the producer and consumer pattern. static async Task<int> ConsumeAsync(ISourceBlock<byte[]> source) { // Initialize a counter to track the number of bytes that are processed. int bytesProcessed = 0; // Read from the source buffer until the source buffer has no // available output data. while (await source.OutputAvailableAsync()) { byte[] data = source.Receive(); // Increment the count of bytes received. bytesProcessed += data.Length; } return bytesProcessed; } static void Main(string[] args) { // Create a BufferBlock<byte[]> object. This object serves as the // target block for the producer and the source block for the consumer. var buffer = new BufferBlock<byte[]>(); // Start the consumer. The Consume method runs asynchronously. var consumer = ConsumeAsync(buffer); // Post source data to the dataflow block. Produce(buffer); // Wait for the consumer to process all data. consumer.Wait(); // Print the count of bytes processed to the console. Console.WriteLine("Processed {0} bytes.", consumer.Result); } } /* Output: Processed 102400 bytes. */
如果您的应用程序中有多个使用者,请使用 TryReceive 方法从源块读取数据
// Demonstrates the consumption end of the producer and consumer pattern. static async Task<int> ConsumeAsync(IReceivableSourceBlock<byte[]> source) { // Initialize a counter to track the number of bytes that are processed. int bytesProcessed = 0; // Read from the source buffer until the source buffer has no // available output data. while (await source.OutputAvailableAsync()) { byte[] data; while (source.TryReceive(out data)) { // Increment the count of bytes received. bytesProcessed += data.Length; } } return bytesProcessed; }
数据流管道
除了直接从消息块读取和写入。 还可以连接数据流块来形成管道 (这是数据流块的线性序列)或网络 (这是数据流块的图形)。 在管道或网络中,当数据可用时源向目标异步传播数据
using System; using System.Collections.Generic; using System.Linq; using System.Net.Http; using System.Threading.Tasks.Dataflow; // Demonstrates how to create a basic dataflow pipeline. // This program downloads the book "The Iliad of Homer" by Homer from the Web // and finds all reversed words that appear in that book. static class DataflowReversedWords { static void Main() { // // Create the members of the pipeline. // // Downloads the requested resource as a string. var downloadString = new TransformBlock<string, string>(async uri => { Console.WriteLine("Downloading '{0}'...", uri); return await new HttpClient(new HttpClientHandler{ AutomaticDecompression = System.Net.DecompressionMethods.GZip }).GetStringAsync(uri); }); // Separates the specified text into an array of words. var createWordList = new TransformBlock<string, string[]>(text => { Console.WriteLine("Creating word list..."); // Remove common punctuation by replacing all non-letter characters // with a space character. char[] tokens = text.Select(c => char.IsLetter(c) ? c : ' ').ToArray(); text = new string(tokens); // Separate the text into an array of words. return text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries); }); // Removes short words and duplicates. var filterWordList = new TransformBlock<string[], string[]>(words => { Console.WriteLine("Filtering word list..."); return words .Where(word => word.Length > 3) .Distinct() .ToArray(); }); // Finds all words in the specified collection whose reverse also // exists in the collection. var findReversedWords = new TransformManyBlock<string[], string>(words => { Console.WriteLine("Finding reversed words..."); var wordsSet = new HashSet<string>(words); return from word in words.AsParallel() let reverse = new string(word.Reverse().ToArray()) where word != reverse && wordsSet.Contains(reverse) select word; }); // Prints the provided reversed words to the console. var printReversedWords = new ActionBlock<string>(reversedWord => { Console.WriteLine("Found reversed words {0}/{1}", reversedWord, new string(reversedWord.Reverse().ToArray())); }); // // Connect the dataflow blocks to form a pipeline. // var linkOptions = new DataflowLinkOptions { PropagateCompletion = true }; downloadString.LinkTo(createWordList, linkOptions); createWordList.LinkTo(filterWordList, linkOptions); filterWordList.LinkTo(findReversedWords, linkOptions); findReversedWords.LinkTo(printReversedWords, linkOptions); // Process "The Iliad of Homer" by Homer. downloadString.Post("http://www.gutenberg.org/cache/epub/16452/pg16452.txt"); // Mark the head of the pipeline as complete. downloadString.Complete(); // Wait for the last block in the pipeline to process all messages. printReversedWords.Completion.Wait(); } } /* Sample output: Downloading 'http://www.gutenberg.org/cache/epub/16452/pg16452.txt'... Creating word list... Filtering word list... Finding reversed words... Found reversed words doom/mood Found reversed words draw/ward Found reversed words aera/area Found reversed words seat/taes Found reversed words live/evil Found reversed words port/trop Found reversed words sleek/keels Found reversed words area/aera Found reversed words tops/spot Found reversed words evil/live Found reversed words mood/doom Found reversed words speed/deeps Found reversed words moor/room Found reversed words trop/port Found reversed words spot/tops Found reversed words spots/stops Found reversed words stops/spots Found reversed words reed/deer Found reversed words keels/sleek Found reversed words deeps/speed Found reversed words deer/reed Found reversed words taes/seat Found reversed words room/moor Found reversed words ward/draw */
通过使用 TransformBlock<TInput,TOutput> 使管道的每个成员能对其输入数据执行操作并将结果发送到管道中的下一步骤。 管道的 findReversedWords
成员是一个 TransformManyBlock<TInput,TOutput> 对象,因为该成员会为每个输入生成多个独立输出。 管道的结尾 printReversedWords
是一个 ActionBlock<TInput> 对象,因为它会对其输入执行一个操作,但不产生结果。
当您调用 LinkTo 方法将源数据流块连接到目标数据流块时,源数据流块会在数据可用时将数据传播到目标块。 如果你还提供 DataflowLinkOptions,并将 PropagateCompletion 设置为 true,则在管道中成功或未成功完成一个块都将导致管道中下一个块的完成。
如果要通过管道发送多个输入,请在提交了所有输入后调用 IDataflowBlock.Complete 方法。 如果您的应用程序没有表示数据不再可用或应用程序不必等待管道完成的定义完善的点,则可以忽略此步骤。
取消链接数据流块
using System; using System.Threading; using System.Threading.Tasks.Dataflow; // Demonstrates how to unlink dataflow blocks. class DataflowReceiveAny { // Receives the value from the first provided source that has // a message. public static T ReceiveFromAny<T>(params ISourceBlock<T>[] sources) { // Create a WriteOnceBlock<T> object and link it to each source block. var writeOnceBlock = new WriteOnceBlock<T>(e => e); foreach (var source in sources) { // Setting MaxMessages to one instructs // the source block to unlink from the WriteOnceBlock<T> object // after offering the WriteOnceBlock<T> object one message. source.LinkTo(writeOnceBlock, new DataflowLinkOptions { MaxMessages = 1 }); } // Return the first value that is offered to the WriteOnceBlock object. return writeOnceBlock.Receive(); } // Demonstrates a function that takes several seconds to produce a result. static int TrySolution(int n, CancellationToken ct) { // Simulate a lengthy operation that completes within three seconds // or when the provided CancellationToken object is cancelled. SpinWait.SpinUntil(() => ct.IsCancellationRequested, new Random().Next(3000)); // Return a value. return n + 42; } static void Main(string[] args) { // Create a shared CancellationTokenSource object to enable the // TrySolution method to be cancelled. var cts = new CancellationTokenSource(); // Create three TransformBlock<int, int> objects. // Each TransformBlock<int, int> object calls the TrySolution method. Func<int, int> action = n => TrySolution(n, cts.Token); var trySolution1 = new TransformBlock<int, int>(action); var trySolution2 = new TransformBlock<int, int>(action); var trySolution3 = new TransformBlock<int, int>(action); // Post data to each TransformBlock<int, int> object. trySolution1.Post(11); trySolution2.Post(21); trySolution3.Post(31); // Call the ReceiveFromAny<T> method to receive the result from the // first TransformBlock<int, int> object to finish. int result = ReceiveFromAny(trySolution1, trySolution2, trySolution3); // Cancel all calls to TrySolution that are still active. cts.Cancel(); // Print the result to the console. Console.WriteLine("The solution is {0}.", result); cts.Dispose(); } } /* Sample output: The solution is 53. */
LinkTo 方法有一个重载版本,其含有一个具有 MaxMessages 属性的 DataflowLinkOptions 对象,当该属性设置为 1
时,则指示源块在目标收到来自源的一条消息后取消与目标的链接。
数据流块完成
有两种方法来确定数据流块完成时是否没有出错、遇到一个或多个错误或已取消。 第一种方法是在 try
-catch
块中对完成任务调用 Task.Wait 方法。
// Create an ActionBlock<int> object that prints its input // and throws ArgumentOutOfRangeException if the input // is less than zero. var throwIfNegative = new ActionBlock<int>(n => { Console.WriteLine("n = {0}", n); if (n < 0) { throw new ArgumentOutOfRangeException(); } }); // Post values to the block. throwIfNegative.Post(0); throwIfNegative.Post(-1); throwIfNegative.Post(1); throwIfNegative.Post(-2); throwIfNegative.Complete(); // Wait for completion in a try/catch block. try { throwIfNegative.Completion.Wait(); } catch (AggregateException ae) { // If an unhandled exception occurs during dataflow processing, all // exceptions are propagated through an AggregateException object. ae.Handle(e => { Console.WriteLine("Encountered {0}: {1}", e.GetType().Name, e.Message); return true; }); } /* Output: n = 0 n = -1 Encountered ArgumentOutOfRangeException: Specified argument was out of the range of valid values. */
第二种确定数据流块的完成状态的方法是使用延续执行完成任务,
// Create a continuation task that prints the overall // task status to the console when the block finishes. throwIfNegative.Completion.ContinueWith(task => { Console.WriteLine("The status of the completion task is '{0}'.", task.Status); });
.Net Framework 4中内建了两个Scheduler,一个是默认的ThreadPoolTaskScheduler,另一个是用于UI线程切换的SynchronizationContextTaskScheduler。如果你使用的Block设计到UI的话,那可以使用后者,这样在UI线程切换上面将更加方便。
.Net Framework 4.5 中,还有一个类型被加入到System.Threading.Tasks名称空间下:ConcurrentExclusiveSchedulerPair。这个类是两个TaskScheduler的组合。它提供两个TaskScheduler:ConcurrentScheduler和ExclusiveScheduler;我们可以把这两个TaskScheduler构造进要使用的Block中。他们保证了在没有排他任务的时候(使用ExclusiveScheduler的任务),其他任务(使用ConcurrentScheduler)可以同步进行,当有排他任务在运行的时候,其他任务都不能运行。其实它里面就是一个读写锁。这在多个Block操作共享资源的问题上是一个很方便的解决方案。
图例