C#并行开发_Thread/ThreadPool, Task/TaskFactory, Parallel
Posted on 2012-12-23 14:12 Roger Luo 阅读(1572) 评论(0) 编辑 收藏 举报参考书目:Professional.C#.4.0.and.NET.4.pdf 以及 Pro .NET 4 Parallel Programming in C#.pdf
Parallel Program in C#中有Delegate的Asynchronous也有Thread的Asynchronous,前者已经在《C#异步调用详细》中阐述清楚了,那它跟Thread的有什么区别呢?
可能大家都混淆了,我也快糊涂了,C#中异步(并行)编程有几类:
1. Asynchronous Delegates
Asychronous calling is used when you have work items that should be handled in the background and you care when they finish.
2. BackgroundWorker
Use BackgroundWorker if you have a single task that runs in the background and needs to interact with the UI. and use it if you don't care when they finish their task. The task of marshalling data and method calls to the UI thread are handled automatically
through its event-based model.
Avoid BackgroundWorker if (1) your assembly does not already reference the System.Windows.Form assembly, (2) you need the thread to be a foreground thread, or (3) you need to manipulate the thread priority.
3. ThreadPool
Use a ThreadPool thread when efficiency is desired. The ThreadPool helps avoid the overhead associated with creating, starting, and stopping threads.
Avoid using the ThreadPool if (1) the task runs for the lifetime of your application, (2) you need the thread to be a foreground thread, (3) you need to manipulate the thread priority, or (4) you need the thread to have a fixed identity (aborting, suspending,
discovering).
4. Thread class
Use the Thread class for long-running tasks and when you require features offered by a formal threading model, e.g., choosing between foreground and background threads, tweaking the thread priority, fine-grained control over thread execution, etc.
5. Task Parallel Library
Task/TaskFactory, Parallel.For, Parallel.ForEach, Parallel.Invoke
6. Parallel LINQ
TBD
好了进入主题了,先来介绍Thread命名空间。
Thread class
创建与指定委托(Create):
can be constructed with two kinds of delegate:
1. ThreadStart: void ()(void)
2. ParameterizedThreadStart: void ()(Object obj)
跟Asynchronous delegate 相比,输入参数已经很受限制,才支持一个,而且还是object对象的,没有进行类型检查。
控制:
Start() or Start(obj)
Start是直接让新创建的异步执行,类似于Asynchronous的BeginInvoke方法,就是异步于caller来执行。
下面的代码分别显示Asynchronous Delegate以及Thread做同样的事情
Asynchronous Delegate部分
public void AsyncOperation(int data) { int i = 0; while (i++ < data) { Thread.Sleep(1000); Console.WriteLine(string.Format("Running for {0} seconds, in thread id: {1}.", i, Thread.CurrentThread.ManagedThreadId)); } } public delegate void AsyncOperationDelegate(int data); public void RunBeginInvoke() { AsyncOperationDelegate d1 = new AsyncOperationDelegate(AsyncOperation); d1.BeginInvoke(3, null, null); int i = 0; while (i++ < 3) { Thread.Sleep(1000); Console.WriteLine(string.Format("[BeginInvoke]Running for {0} seconds, in thread id: {1}.", i, Thread.CurrentThread.ManagedThreadId)); } }
Thread部分:
private void AsyncOperation(object obj) { int data = (int)obj; int i = 0; while (i++ < data) { Thread.Sleep(1000); Console.WriteLine(string.Format("Running for {0} seconds, in thread id: {1}.", i, Thread.CurrentThread.ManagedThreadId)); } } public void RunThread() { Thread t1 = new Thread(new ParameterizedThreadStart(AsyncOperation)); t1.Start((object)3); int i = 0; while (i++ < 3) { Thread.Sleep(1000); Console.WriteLine(string.Format("[Thread]Running for {0} seconds, in thread id: {1}.", i, Thread.CurrentThread.ManagedThreadId)); } }
使用起来比Asynchronous Delegate实在别扭以及难看。
书中提议使用类的成员函数作为委托函数,同时由于类成员函数能否访问类的成员变量,从而实现复杂或者多个参数传递,或者获取修改后的值。具体例子如下:
class ThreadData { public int data = 0; public ThreadData() { } public void RunThread() { int i = 0; while (i++ < 1000000) data++; } } class ThreadTest { public void RunThreadWithDataInClass() { ThreadData d1 = new ThreadData(); Thread t1 = new Thread(d1.RunThread); t1.Start(); int i = 0; while (i++ < 1000000) d1.data++; Thread.Sleep(2000);// wait for the new thread to finish Console.WriteLine(string.Format("The data in ThreadData: {0}.", d1.data)); } }这样的确能够节省,但是这样就出现了数据竞争的情况,同理也可以使用AD来实现(AD: Asynchronous Delegate)。
后台线程与前台线程的区别:
The process of the application keeps running as long as at least one foreground thread is running. If more
than one foreground thread is running and the Main() method ends, the process of the application remains
active until all foreground threads finish their work.
A thread you create with the Thread class, by default, is a foreground thread. Thread pool threads are
always background threads.
从上面对话来看,AD调用的Thread Pool的线程来执行委托,如果异步委托的宿主也就是caller执行完了, 同时“进程”中没有其他前台线程,则其BeginInvoke的委托将会强制关闭。如果是Thread创建的线程,同时没有修改它成后台线程,则即使caller结束了,ThreadStart或者ParameterizedThreadStart委托将会继续进行。所以这里就出现了必须使用Thread的情况了:当需要创建前台线程时。
IsBackground
ThreadProperty
这两者都是Thread才有的, 而aynchronous delegate没有的
获取返回值
从它的构造函数你觉得它凭什么提供返回值呢,void()() 与 void()(Object obj)来看是没戏的了。不过可以使用成员函数来实现异步调用,也就是说将需要的参数与异步函数内嵌成一个类,像ThreadData类一样,成员变量data来储存异步线程的执行结果,而成员函数RunThread就是需要调用的线程,如果委托(对于ThreadData来说是RunThread函数)费时,则需要在主线程调用Thread的Join函数来实现等待异步线程执行完毕。
至此Thread类就先介绍完毕了。因为后面有一堆后浪出来了,虽然不能说Thread已经死在滩上了,不过也“个头近”了。
蹬蹬蹬蹬, ThreadPool粉墨登场了
其实ThreadPool就是系统已经给你准备好一堆thread等待你的委托,而不用让你管理Thread的细节。
先来看看他有多简单吧:
public void ThreadProc(Object state) { Thread.Sleep(1000); Console.WriteLine(string.Format("Running in ThreadProc, id: {0}.", Thread.CurrentThread.ManagedThreadId)); } public void RunSimpleThreadPool() { ThreadPool.QueueUserWorkItem(ThreadProc); Console.WriteLine(string.Format("Running in main thread, id: {0}", Thread.CurrentThread.ManagedThreadId)); Thread.Sleep(2000); }
在RunSimpleThreadPool添加2秒睡眠是为了防止主线程结束,后台(ThreadPool里的线程都是后台的)给杀了,所以使用sleep等待一下。
从代码来看,的确很简单。不过他委托的函数的格式更加限制了,必须是符合WaitCallback形式,为void()(Object);
比Thread好不了哪去,之前说类型没有检查的问题,其实使用object作为参数,估计framework的人想给个超超基类指针,的确,object是所有类的老母,所以进去ThreadProc后可以通过GetType来获取类型。
虽然ThreadPool很好用,但是他的限制条件也挺多的,正如上面所列的:
1. 所有线程都是background,也就意味着前台结束了,后台也得跟着倒台。同时它不提供机会给你修改这个属性,不能变前台。
2. 不能设置Priority,想优先不靠谱,还有Name都不能改,挺苦的。
3. For COM objects, all pooled threads are multithreaded apartment (MTA) threads. Many COM objects require a single-threaded apartment (STA) thread。 这个留在后面的篇中讲解,尤其是win form的开发过程中。
4. 放进queue里的委托是不能cancel的。
5. 最后也是个建议,使用ThreadPool执行的委托大多数是短的task, 如果想要生命周期长的,能够提供很多诸如中断,suspend等操作,建议使用Thread。
书上对于ThreadPool的使用也是点到这里,但是作为一名技术狂热者,这些是不够的,来看看使用ThreadPool还能给我们带来什么惊喜吧。
先看RegisterWaitForSingleObject的函数定义:
public static RegisteredWaitHandle RegisterWaitForSingleObject( WaitHandle waitObject, WaitOrTimerCallback callBack, Object state, int millisecondsTimeOutInterval, bool executeOnlyOnce )
一个参数就是需要forcus的handle, 第二就是委托函数,类型为WaitOrTimerCallback, 而他需要的函数的格式为 void ()(Object obj, bool timeout),第三个参数就是时间间隔了,最后一个顾名思义,就是是否执行多次,看到这个函数就知道他的作用可以用于执行间隔的查询功能,因为第三个参数,当然啦也可以执行永久的,但是正如前面所说的,生命周期长的线程(不管是前台还是后台)还是选择使用Thread,比较靠谱。
先来看使用这个函数实现间隔执行异步的例子:
class ThreadPoolData { public string Name = "Default"; public RegisteredWaitHandle Handle = null; public ThreadPoolData() { } } class ThreadPoolTest { public void RunRegisterHandle() { AutoResetEvent ev = new AutoResetEvent(false); ThreadPoolData d1 = new ThreadPoolData(){Name = "First"}; d1.Handle = ThreadPool.RegisterWaitForSingleObject(ev, ThreadProc, d1, 1000, false); Thread.Sleep(3100); Console.WriteLine("Main thread signals."); ev.Set(); Console.WriteLine("Main thread finish."); Thread.Sleep(1000);// wait for the completing the threadproc function Console.ReadLine(); } public void ThreadProc(Object state, bool timeout) { ThreadPoolData d1 = (ThreadPoolData)state; string cause = "TIMED OUT"; if (!timeout) { cause = "SIGNALED"; if (d1.Handle != null) d1.Handle.Unregister(null); } Console.WriteLine("[0]WaitProc( {0} ) executes on thread {1}; cause = {2}.", d1.Name, Thread.CurrentThread.GetHashCode().ToString(), cause ); } }
例子开始定义一个对象(将需要参数塞进类中,然后再传递给异步函数,从而多参数的影响),成员变量有RegisteredWaitHandle,这个是执行RegisterWaitForSingleObject的返回值,这是能否是异步函数能否unregister这个handle的作用,从而从thread queue中撤去,(这个类似Asynchronous Delegate中BeginInvoke的第四个参数经常用其委托函数做参,从而能够实现在callback中调用返回结果; 或者更像Asynchronous Delegate中使用Handle获取返回值一幕,大家可以返回看看,主要使用入参嵌入handle成员,同时在异步函数中调用set,实现主线程结束等待)。
接着是启动函数中调用RegisterWaitForSingleObject的第三个参数是1000,意味着每隔1秒给我跑一次异步函数,注意如果异步函数执行时间超过一秒的话,ThreadPool会选择另一个新的线程执行委托,所以你会看到同一时刻可能有多个相同的委托正在执行,最后一个参数是false表示执行多次,直到对返回值RegisteredWaitHandle执行Unregister。
最后输出结果为:
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
Main thread signals.
[0]WaitProc( First ) executes on thread 4; cause = SIGNALED.
Main thread finish.
如果将Unregister注释掉了,则输出结果则成为:
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
Main thread signals.
[0]WaitProc( First ) executes on thread 4; cause = SIGNALED.
Main thread finish.
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
...repeat n times...
出现的原因是由于:
1. 主线程通过执行Console.ReadLine()来阻塞主线程来等待输入,从而主线程(前台线程)没有掉;
2. 没有执行Unregister函数,ThreadPool就会一直moniter queue中的委托,每次都去执行。
如果将第四个参数改成true,则只执行一次,如果这一秒间隔中没有发生signaled新号,输出结果为:
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
Main thread signals.
Main thread finish.
如果在ThreadProc入口处加入Thread.Sleep(1100), 则输出结果为:
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
Main thread signals.
[0]WaitProc( First ) executes on thread 5; cause = TIMED OUT.
[0]WaitProc( First ) executes on thread 4; cause = TIMED OUT.
Main thread finish.
[0]WaitProc( First ) executes on thread 6; cause = SIGNALED.
从上述结果能看出ThreadPool的确每隔“一秒”执行一次委托,不管委托是否还在跑着。
至此,ThreadPool给我们带来了两个方法: QueueUserWorkItem以及RegisterWaitForSingleObject;前者是快速执行一些类似task的短任务,是可以使用AsynchronousDelegate的BeginInvoke来实现,后者是提供间隔执行委托任务的方法,到目前为止,好像也没有这个介绍,使用情景应该是需要定时查询操作的业务。
接着阐述一下task类,task的技术支持是靠ThreadPool。
先看看task的优势介绍:
you can define continuation work — what should be done after a task is complete. This can be differentiated whether the task was successful or not. Also, you can organize tasks in a hierarchy. For example, a parent task can create new children tasks. This can
create a dependency, so that canceling a parent task also cancels its child tasks.
先来看看task能支持的异步函数的签名有Action, Action<Object>,而Action的签名为void()(void)或者void()(Object obj), 跟Thread的委托签名一致。
先看简单的例子:
public void ThreadProc(string state) { Console.WriteLine("Data : {0}.", state); } public void RunTaskWithParameter() { string str = "luodingjia"; Task t = new Task((obj) => { ThreadProc((string)obj); }, (Object)str); t.Start(); Console.ReadLine(); }从这里能够看出lambda的作用的方便。
TaskFactory跟Task 对 THreadPool跟Thread有点异曲同工之妙。
例如使用TaskFactory他也不需要手动的调用类似start的函数,而是直接使用Task.Factory.StartNew(Action)的方法,这跟ThreadPool中的QueueItemWorker很相似,立马执行的。
而Task定义出一个对象之后,还是需要调用Start函数来calling异步委托,这个Thread需要调用Start是一致的。
Task跟TaskFactory都提供类似TaskCreationOptions的枚举量,具体使用的有PreferFairness, LongRunning(内部使用Thread执行),AttachedToParent。
正如之前所说的Task以及TaskFactory有个很强大的功能就是ContinueTask功能,正如下面例子看到的:
public void TaskProc() { Console.WriteLine("Runing TaskProc in ID: {0}.", Task.CurrentId); Thread.Sleep(1000); } public void TaskProcContinued1(Task t) { Console.WriteLine("Runing TaskProcContinued1 in ID: {0} after TaskProc {1}.", Task.CurrentId, t.Status.ToString()); Thread.Sleep(1000); } public void TaskProcContinued2(Task t) { Console.WriteLine("Runing TaskProcContinued2 in ID: {0} after TaskProc {1}.", Task.CurrentId, t.Status.ToString()); Thread.Sleep(1000); } public void TaskProcContinued3(Task t) { Console.WriteLine("Runing TaskProcContinued3 in ID: {0} after TaskProcContinued1 {1}.", Task.CurrentId, t.Status.ToString()); Thread.Sleep(1000); } public void RunContinueTask() { Task t = new Task(TaskProc, TaskCreationOptions.PreferFairness); Task t1 = t.ContinueWith(TaskProcContinued1); Task t2 = t.ContinueWith(TaskProcContinued2); Task t3 = t1.ContinueWith(TaskProcContinued3); t.Start(); Console.ReadLine(); }
输出结果:
Runing TaskProc in ID: 1.
Runing TaskProcContinued1 in ID: 3 after TaskProc RanToCompletion.
Runing TaskProcContinued2 in ID: 2 after TaskProc RanToCompletion.
Runing TaskProcContinued3 in ID: 4 after TaskProcContinued1 RanToCompletion.
接着讲述Parallel类
针对这个类,主要提供两个方法族,一个是For,另一个是ForEach。从名字就知道他想干嘛的了,就是应用于需要同时开始相同的task的时候就可以通过Parallel来实现。
先一个Parallel的简单的例子:
class ParallelData { public int Data = 0; public string Name = "Default"; } class ParallelTest { public ParallelTest() { } public void RunSimpleParallelTest() { ParallelLoopResult result = Parallel.For(0, 10, (int val) => { Thread.Sleep(3000); Console.WriteLine(string.Format("Index: {0}, TaskId: {1}, ThreadId: {2}.", val, Task.CurrentId, Thread.CurrentThread.ManagedThreadId)); }); Console.WriteLine(result.IsCompleted); } }
输出结果为:
TaskId: , ThreadId: 1.
Index: 0, TaskId: 1, ThreadId: 1.
Index: 2, TaskId: 2, ThreadId: 3.
Index: 4, TaskId: 3, ThreadId: 4.
Index: 6, TaskId: 4, ThreadId: 5.
Index: 8, TaskId: 5, ThreadId: 6.
Index: 1, TaskId: 6, ThreadId: 7.
Index: 3, TaskId: 7, ThreadId: 8.
Index: 5, TaskId: 1, ThreadId: 1.
Index: 7, TaskId: 8, ThreadId: 3.
Index: 9, TaskId: 9, ThreadId: 4.
True
从上述结果能看出几点:
1. 执行的Index(order)是乱序的,但是能保证[fromInclusive, toExclusive)中的每一个index都平等的执行。
2. 主线程是阻塞于Loop中的,就跟普通的For一样。
3. Parallel的第三个参数(Action<Int32>)中的输入参数就是对应的Index,不太像普通的For循环需要指明迭代变量。
使用ParallelLoopState来监控并且控制整个循环,由于每个Loop都会有自动提供,所以不需要自己创建该类的实例,正如下面例子所示,直接在委托中加入ParallelLoopState pls参数,同时在本体中直接使用:
public void RunSimpleParallelTest() { DateTime tBgn = DateTime.Now; ParallelData d1 = new ParallelData() { Data = 5 }; ParallelLoopResult result = Parallel.For(0, 20, (int val, ParallelLoopState pls) => { Thread.Sleep(2000); d1.Data++; if (val > 6) { pls.Break(); Console.WriteLine(string.Format("Break occurred at {0}, {1}", val, pls.LowestBreakIteration)); } Console.WriteLine(string.Format("Index: {0}, Data: {1}, TaskId: {2}, ThreadId: {3}.", val, d1.Data, Task.CurrentId, Thread.CurrentThread.ManagedThreadId)); }); Console.WriteLine(result.IsCompleted); Console.WriteLine(string.Format("LowBreakIteration: {0}.", result.LowestBreakIteration)); DateTime tEnd = DateTime.Now; TimeSpan ts = tEnd - tBgn; Console.WriteLine("Using {0} seconds.", ts.TotalMilliseconds); Console.ReadLine(); }目的是当所在iteration的index大于6则执行break,中断整个循环, 输出结果为:
Index: 0, Data: 6, TaskId: 1, ThreadId: 9.
Index: 5, Data: 7, TaskId: 2, ThreadId: 10.
Break occurred at 10, 10
Break occurred at 15, 15
Index: 10, Data: 9, TaskId: 3, ThreadId: 11.
Index: 15, Data: 9, TaskId: 4, ThreadId: 12.
Index: 1, Data: 10, TaskId: 5, ThreadId: 13.
Index: 6, Data: 11, TaskId: 6, ThreadId: 14.
Index: 2, Data: 12, TaskId: 1, ThreadId: 9.
Break occurred at 7, 7
Index: 7, Data: 13, TaskId: 7, ThreadId: 10.
Index: 4, Data: 14, TaskId: 8, ThreadId: 11.
Break occurred at 9, 7
Index: 9, Data: 15, TaskId: 9, ThreadId: 14.
Index: 3, Data: 16, TaskId: 1, ThreadId: 9.
False
LowBreakIteration: 7.
Using 6015.625 seconds.
从结果上看,得出以下结论:
Break 可用來與需要執行的目前反覆項目之後沒有其他反覆項目的迴圈進行通訊。 例如,如果從 for 迴圈 (以平行方式從 0 到 1000 逐一查看) 的第 100 個反覆項目呼叫 Break,則所有小於 100 的所有反覆項目仍應執行,但從 101 到 1000 的反覆項目則不一定要執行。
對於已長時間執行的反覆運算,如果目前的索引小於 LowestBreakIteration 的目前值,則 Break 會使 LowestBreakIteration 設定為目前反覆項目的索引。
如果将Break函数改成Stop,输出结果如下:
Break occurred at 10,
Index: 0, Data: 8, TaskId: 1, ThreadId: 1.
Index: 5, Data: 8, TaskId: 2, ThreadId: 3.
Index: 10, Data: 8, TaskId: 3, ThreadId: 4.
False
LowBreakIteration: .
Using 1015.625 seconds.
Stop 可用來與沒有其他反覆項目需要執行的迴圈進行通訊。對於已長時間執行的反覆運算,Stop 會使 IsStopped 針對迴圈的所有其他反覆項目傳回 True,則如果它觀察到 True 值,便會使其他反覆項目檢查 IsStopped 並提早結束。Stop 通常會在以搜尋為基礎的演算法中採用,其中一旦找到位置,則不需要執行其他反覆項目。
上述结论都是从MSDN抄过来,都不太好理解,本人先跳过去了。
接着有类似于Parallel.For的姊妹Parallel.ForEach,后者跟前者很相似,先看例子:
public void RunParallelForEach() { List<int> intArr = new List<int>(); Random rdn = new Random(DateTime.Now.TimeOfDay.Milliseconds); for (int i = 0; i < 20; i++) { intArr.Add(rdn.Next(100)); } Parallel.ForEach<int>(intArr, (val, pls, index) => { Console.WriteLine(string.Format("intArr[{0}]: {1}.", index, val)); }); Console.WriteLine("Finish"); Console.ReadLine(); }
输出结果为:
intArr[0]: 56.
intArr[5]: 87.
intArr[15]: 9.
intArr[16]: 48.
intArr[17]: 27.
intArr[18]: 34.
intArr[19]: 87.
intArr[4]: 9.
intArr[3]: 65.
intArr[11]: 6.
intArr[6]: 44.
intArr[7]: 10.
intArr[13]: 45.
intArr[14]: 77.
intArr[12]: 48.
intArr[10]: 56.
intArr[8]: 72.
intArr[9]: 56.
intArr[1]: 42.
intArr[2]: 16.
Finish
跟For一样,是乱序,Action为第一个参数是容器元素,第二个是parallel的state控制参数,第三个是iteration,也即是容器的索引。
并行编程的同步
1. 使用lock关键字(注意只针对reference type也就是类之类的对象,如果是int就不行了,那是值对象),代码如下:
class SynchronousData
{
public int Data = 0;
public string Description = "Default";
}
class SynchronousTest
{
public SynchronousTest()
{
}
private void TaskProc(Object state)
{
SynchronousData d = state as SynchronousData;
if (d != null)
{
int i = 0;
while (i++ < 1000000)
{
lock(d)
{
d.Data++;
}
}
Console.WriteLine(string.Format("Finish in task id {0}, thread id {1}.", Task.CurrentId, Thread.CurrentThread.ManagedThreadId));
}
}
public void RunSimpleSynchrounousTestInTask()
{
SynchronousData d1 = new SynchronousData() { Data = 5 };
Task t1 = new Task(TaskProc, d1);
Task t2 = new Task(TaskProc, d1);
t1.Start();
t2.Start();
t1.Wait();
t2.Wait();
Console.WriteLine(d1.Data);
Console.ReadLine();
}
}
2. interlocked
使用interlocked是比其他同步方法都要快的,但是能够实现的用途受其成员函数的限制, 对于上述的例子可以使用interlocked.Increment来代替,将
lock(d) { d.Data++; }替换成:
Interlocked.Increment(ref d.Data);3. Moniter
Moniter使用Enter以及Exit函数(类似Mutex一样)来获取有限资源,他跟lock类似,但是他比lock好处是他可以使用TryEnter来指定等待时间,如果等待时间超出了,可以执行别的任务,下面是展示代码:
bool lockTaken = false; Monitor.TryEnter(obj, 500, ref lockTaken); if (lockTaken) { try { // acquired the lock // synchronized region for obj } finally { Monitor.Exit(obj); } } else { // didn't get the lock, do something else }