openmp中的任务(task)
一、Task概念
Tasks are composed of:
– code to execute
– data environment
– internal control variables (ICV)
并行程序会用一个线程按照程序代码的顺序生成任务; 在不附加何限制的情况下, 这些任务将放入到任务池中, 由空闲的线程取出执行, 如上图所示。换言之, 任务的默认执行顺序是未指定的、随机的。
指令task 主要适用于不规则的循环迭代(如do while) 和递归的函数调用, 这些都是无法利用指令for 完成的情况。
二、任务的创建
1、指令parallel和子句single
为了避免一个任务被重复地定义, 需要single 子句, 如下例所示。
一般而言, 通常使用指令single 利用一个线程创建任务(single 子句保证只有一个线程进行创建任务)。这些任务在创建后, 将被放入到任务池, 供线程组中空闲的线程获取和执行。
1.1 如何理解parallel区块中用single,然后再在single中使用task创建任务
参考https://stackoverflow.com/questions/68502197/how-do-omp-single-and-omp-task-provide-parallelism回答
1 #pragma omp parallel 2 { 3 #pragma omp single 4 { 5 for(node* p = head; p; p = p->next) 6 { 7 #pragma omp task 8 process(p); 9 } 10 } // barrier of single construct 11 }
In the code, I have marked a barrier that is introduced at the end of the single
construct.
What happens is this:
First, when encountering the parallel
construct, the main thread spawns the parallel region and creates a bunch of worker threads. Then you have n threads running and executing the parallel region.
Second, the single
construct picks any one of the n threads and executes the code inside the curly braces of the single
construct. All other n-1 threads will proceed to the barrier in line 10. There, they will wait for the last thread to catch up and complete the barrier synchronization. While these threads are waiting there, they are not only wasting time but also wait for work to arrive.
Third, the thread that was picked by the single
construct (the "producer") executes the for
loop and for each iteration it creates a new task. This task is then put into a task pool so that another thread (one of the ones in the barrier) can pick it up and execute it. Once the producer is done creating tasks, it will join the barrier and if there are still tasks in the task pool waiting for execution, it will help the other threads execute tasks.
Single选中的那个线程会在for循环中不停地创建任务,每一次循环就创建一个任务,这个任务被放入线程池(注意!!这个任务不是被single的这个线程执行,而是先被放入任务池)。其它的n-1个线程就会去执行。
Fourth, once all tasks have been generated and executed that way, all threads are done and the barrier synchronization is complete.
1.2 示例