TWO-MINUTE PRIMER ON CONCURRENT PROGRAMMING

Concurrent programming is all about independent computations that the machine can execute in any order. Iterations of loops and function calls within the code that can be executed autonomously are two instances of computations that can be independent. Whatever concurrent work you can pull out of the serial code can be assigned to threads (or cooperating processes) and run on any one of the multiple cores that are available (or run on a single processor by swapping the computations in and out of the processor to give the illusion of parallel execution). Not everything within an application will be independent, so you will still need to deal with serial execution amongst the concurrency.

To create the situation where concurrent work can be assigned to threads, you will need to add calls to library routines that implement threading. These additional function calls add to the overhead of the concurrent execution, since they were not in the original serail code. Any additional code that is needed to control and coordinate threads, especially calls to threading library functions, is overhead. Code that you add for threads to determine if the computation should continue or to get more work or to signal other threads when desired conditions have been met is all considered overhead, too. Some of that code may be devoted to ensuring that there are equal amounts of work assigned to each thread. This balancing of the workload between threads will make sure threads aren't sitting idle and wasting system resources, which is considered another form of overhead. Overhead is something that concurrent code must keep to a minimum as much as possible. In order to attain the maximum performance gains and keep your concurrent code as scalable as possible, the amount of work that is assigned to a thread must be large enough to minimize or mask the detrimental effects of overhead.

Since threads will be working together in shared memory, there may be times when two or more threads need to access the same memory location. If one or more of these threads is looking to update that memory location, you will have a storage conflict or data race. The operating system schedules threads for execution. Because the scheduling algorithm relies on many factors about the current status of the system, that scheduling appears to be asynchronous. Data races may or may not show up, depending on the order of thread execution. If the correct execution of your concurrent code depends on a particular order of memory updates (so that other threads will be sure to get the proper saved value), it is the responsibilty of the program to ensure this order is guaranteed. There must be some means of controlling the updates of the shared resources.

There are several different methods of synchronizing threads to ensure mutually exclusive access to shared memory. While synchronization is a necessary evil, use of synchronization object is considered overhead (just like thread creation and other coordination functions) and their use should be reserved for situation that cannot be resolved in any other way.

The goal of all of this, of course, is to improve the performance of your application by reducing the amount of time it takes to execute, or to be able to process more data within a fixed amount of time. You will need an awareness of the perils and pitfalls of concurrent programming and how to avoid or correct them in order to create a correctly executing application with satisfactory performance.

总结下,这几段话主要就是说多线程是有代价的,将一个串行程序多线程化是不是能提高性能,要看这样那样的overhead是不是足够小。多线程化应是一种粗粒度的解决方案,也就是要满足各线程要large enough.

多线程化引入的开销(相对于串行实现来讲)通常有:

  • 调用各种线程库函数的开销(用于线程的建立、交互、消亡等等)
  • 各种控制、协调多线程执行的代码
  • 各种通知消息的代码(通知一个线程继续执行或者增加一个线程的任务或者通知线程某个等待的条件已经满足等)
  • 对各线程进行负载均衡调度的代码
  • 同步的开销
  • ......

[...Some might even spawn accepted new concurrent programming languages. ...However, in the grand scheme of things, threads are here now and will be around for the foreseeable future.]

posted on 2010-09-05 15:24  胡是  阅读(219)  评论(0编辑  收藏  举报

导航