Parallel programming Open MP-Bell
● CHAPTER 1
● Basic directives
● include a white space between the directive sentinel !$OMP and the following OpenMP directive.
● conditional compilation !$
● parallel region constructor
● !$OMP PARALLEL
● !$OMP END PARALLEL
● Before and after the parallel region, the code is executed by only one thread-serial regions. (It is not allowed to jump in or out of the parallel region using GOTO command.)
● master thread - when a thread executing a serial region encounters a parallel region, it creates a team of thread, and it becomes the master thread of the team.
● thread number- ranges from zero, for the master thread, up to N_p-1 .
● At the beginning of the parallel region it is possible to impose clauses which fix certainaspects of the way in which the parallel region is going to work: for example the scope ofvariables, the number of threads, special treatments of some variables, etc.
● !$OMP PARALLEL clause 1, clause 2...
● !$OMP END PARALLEL
●
● only the following ones are allowed within the !$OMP PARALLEL directive,
● PRIVATE(list)
● SHARED(list)
● DEFAULT(PRIVATE|SHARED|NONE)
● FIRSTPRIVATE(list)
● COPYIN(list)
● REDUCTION(operator:list)
● IF(scalar_logical_expression)
● NUM_THREADS(scalar_integer_expression)
● Nested parallel region-totally N_p^2+N_p messages will be printed on the screen.
● !$OMP PARALLEL
● WRITE(,) "HELLO"
● !$OMP PARALLEL
● WRITE(,) "HI"
● !$OMP END PARALLEL
● !$OMP END PARALLEL
●
● CHAPTER 2 OpenMP constructs
● 2.1 Work-sharing constructs
● restrictions
● Work-sharing constructs must be encountered by all threads in a team or by noneat all.
● Work-sharing constructs must be encountered in the same order by all threads in ateam.
● 2.1.1 !$OMP DO END DO (should be placed inside a parallel region)
● !$OMP DO
● do i =1,1000
● ...
● end do
● !$OMP END DO
●
● The way in which the work is distributed and in general how the working-sharing construct has to behave can be controlled with claused.
● !$OMP DO clause 1, clause 2, ...
● !$OMP END DO end_clause
● only the following clauses are allowed in the !$OMP DO directive
● PRIVATE(list)
● FIRSTPRIVATE(list)
● LASTPRIVATE(list)
● REDUCTION(operator:list)
● SCHEDULE(type, chunk)
● ORDERED
● add to the closing directive the NOWAIT clause in order to avoid the implied synchronization.
● If after the do-loop the modified variables have to be used, it is nescessary to add an implied or an explicit updating of the shared variables using !$OMP FLUSH directive.
● using !$OMP ORDERED OMP END ORDERED
● !$OMP DO ORDERED
● do i=1,1000
● !$OMP ORDERED
● A(i)=A(i-1)
● !$OMP ORDERED
● end do
● !$OMP END DO
● . When several nested do-loops are present, it is always convenient to parallelizethe outer most one, since then the amount of work distributed over the different threadsis maximal.
● 2.1.2 !$OMP SECTIONS-assign to each thread a completely different task leading to an multiple programs multiple data. Each section of code is executed once and only once by a thread in the team.
● syntax- each block of the code, to be executed by one of the threads, starts with an !$OMP SECTION directive and extend until the same directive is found again or until the closing-directive OMP END SECTIONS is found.
● !$OMP SECTIONS clause 1, clause 2
● ...
● !$OMP SECTION
● !$OMP SECTION
● ...
● !$OMP END SECTIONS end_clause
● !$OMP SECTIONS accepts the following clauses
● PRIVATE(list)
● FIRSTPRIVATE(list)
● LASEPRIVATE(list)
● REDUCTION(operator:list)
● !$OMP END SECTIONS only accepts the NOWAIT clause.
● Example
● !$OMP SECTIONS
● !$OMP SECTION
● write(,) "hello"
● !$OMP SECTION
● write(,) "bye"
● !$OMP END SECTIONS
●
● 2.1.3 !$OMP SINGLE OMP END SINGLE-The code enclosed in this directive-pair is only executed by one of the threads in the team,namely the one who first arrives to the opening-directive OMP SINGLE.
● all the remaining threads wait at the implied synchronization in the closing-directive !$OMP END SINGLE.
● !$OMP SINGLE clause 1, clause 2, ...
● ...
● !$OMP END SINGLE end_clause
● end_clause can be the cluase NOWAIT or COPYPRIVATE, but not both at the same time.
● Only the following two clauses can be used in the opening-directive:
● PRIVATE(list)
● FIRSTPRIVATE(list)
●
● 2.1.4 !$OMP WORKSHARE OMP END WORKSHARE-allow parallelizable Fortran 95 commands' parallelization.
● parallelizable Fortran 95 commands, like forall and where statements, cannot be treated with OpenMP directives.
● Fortran 95 transformational array intrinsic functions can be parallelized with the aid of the !OMP WORKSHARE/!$OMP END WORKSHARE directive-pair:matmul, dot product, sum, product, maxval, minval, count, any, all, spread, pack, unpack,reshape, transpose, eoshift, cshift, minloc and maxloc.
● 2.2 Combined parallel work-sharing constructs-specifying a parallel region that contains only one work-sharing construct 【对于有单个work-sharing的结构,可以指定一个并行区域】
● 2.2.1 !$OMP PARALLEL DO OMP END PARALLEL DO
● !$OMP PARALLEL DO clause 1, clause 2, ...
● ...
● !$OMP END PARALLEL DO
● clause 1, clause2 可以是!$OMP PARALLEL 或者是OMP DO后面的directive
● 2.2.2 !$OMP PARALLEL SECTIONS OMP END PARALLEL SECTIONS-用来指定仅包含单个OMP SECTIONS OMP END SECTIONS的directive-pairs
● !$OMP PARALLEL SECTIONS clause 1, clause 2, ...
● !$OMP END PARALLEL SECTIONS
● clause 1, clause2 可以是!$OMP PARALLEL 或者是OMP SECTIONS后面的directive
● 2.3 Synchronization constructs
● 2.3.1 !$OMP MASTER OMP END MASTER-the code enclosed inside this directive-pair is executed only by the master thread of the team. Meanwhile, all the other threads continue with their work: no implied synchronization exists!
● !$OMP MASTER
● ...
● !$OMP END MASTER
● In essence, this directive-pair is similar to using the !$OMP SINGLE/!OMP END SINGLE directive-pair presented before together with the NOWAIT clause
● 2.3.2 !$OMP CRITICAL OMP END CRITICAL-This directive-pair restricts the access to the enclosed code to only one thread at a time
● !$OMP CRITICAL name
● ...
● !$OMP END CRITICAL name
● name argument identifies the critical section. it is strongly recommended to give a name to each critical section
● When a thread reaches the beginning of a critical section, it waits there until no other thread is executing the code in the critical section. Different critical sections using the same name are treated as one common critical section, which means that only one thread at a time is inside them.
● all unnamed critical sections are considered as one common critical section
● !$OMP CRITICAL write_file
● !$OMP CRITICAL write_file
● 2.3.3 !$OMP BARRIER-This directive represents an explicit synchronization between the different threads in the team. When encountered, each thread waits until all the other threads have reached this point.
● The !$OMP BARRIER directive must be encountered by all threads in a team or bynone at all.
● it is necessary to avoid deadlock:
● !$OMP CRITICAL
● !$OMP BARRIER
● !$OMP END CRITICAL
● !$OMP SINGLE
● !$OMP BARRIER
● !$OMP END SINGLE
● !$OMP MASTER
● !$OMP BARRIER
● !$OMP END MASTER
● !$OMP SECTIONS
● !$OMP SECTION
● !$OMP BARRIER
● !$OMP SECTION
● !$OMP END SECTIONS
● 2.3.4 !$OMP ATOMIC-When a variable in use can be modified from all threads in a team, it is necessary to ensure that only one thread at a time is writing/updating the memory location of the considered variable. The present directive targets to ensure that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneously writing threads
● Only the followingones can be used together with the !$OMP ATOMIC directive:
●
● The variable x, affected by the !$OMP ATOMIC directive, must be of scalar nature and of intrinsic type.
●
● !$OMP ATOMIC -this directive only affects the immediately following statement.
● 2.3.5 !$ OMP FLUSH-. This directive must appear at the precise point in the code at which the data synchronizationis required.It ensures the updating of all variables.
● the !$OMP FLUSH directive offers the possibility ofincluding a list with the names of the variables to be flushed
● !$OMP FLUSH (variable 1, variable 2,...)
● 有(显式或者隐式)数据同步的命令:
●
● 无显式(或隐式)数据同步的命令,隐式数据同步可以通过NOWAIT关闭
●
● 2.3.6 !$OMP ORDERED OMP END ORDERED
● no thread can enter the ORDERED section until it is guaranteed that all previous iterations have been completed
●
● the order of entrance is specified by the sequence condition of the loop iterations.
● without the implied synchronization
● only one ORDERED section is allowed to be executed by each iteration inside a parallelized do-loop
● 2.4 Data environment constructs
● there are two kinds of data environment constructs
● which are independent of other OpenMP constructs
● which are associated to an OpenMP constructs and which effect only that OpenMP construct and its lexical extend (data scope attribute clauses)
● 2.4.1 !$OMP THREADPRIVATE(list)-its value is accessible from everywhere inside each thread and thatits value does not change from one parallel region to the next
● e.g. my_id
● The !$OMP THREADPRIVATE directive needs to be placed just after the declarations ofthe variables and before the main part of the software unit
●
● can only appear in the clauses COPYIN and COPYPRIVATE.
● application
● CHAPTER 3 PRIVATE SHARED & Co
● 3.1 Data scope attribute clauses
● 3.1.1 PRIVATE(list)-非常耗费资源
● !$OMP PARALLEL PRIVATE(a,b)
● Variables that are used as counters for do-loops, forall commands or implicit do-loopsor are specified to be THREADPRIVATE become automatically private to each thread, eventhough they are not explicitly included inside a PRIVATE clause at the beginning of thescope of the directive-pair.
● Variables declared as private have an undefined value at the beginning of the scope of the directive-pair, since they have just been created. Also when the scope of the directive-pair finishes, the original variable will have an undefined value (which valuefrom the available ones should it have!?).
● 3.1.2 SHARED(list)
● !$OMP PARALLEL SHARED(c,d)
● c and d are seen by all the threads inside the scope of the directive-pair.
● does not consume any additional resources.
● does not guarantee that the threads are immediately aware of changes made to the variable by another thread;
● force the update of the shared variables by using the directive !$OMP FLUSH
● avoid racing condition by programmer or !$OMP ATOMIC
● 3.1.3 DEFAULT(PRIVATE | SHARED | NONE)
● When most of the variables used inside the scope of a directive-pair are going to be private/shared, it is possible to specify a default setting.
● If no DEFAULT clause is specified, the default behavior is the same as if DEFAULT(SHARED) were specified
● !$OMP PARALLEL DEFAULT(PRIVATE) SHARED(a)
● NONE:defualt语句指定为none属性时,并行语句范围内的所有变量都要在并行命令开始处显式声明属性。例外为:do循环的counter, forall语句, 隐式do循环,以及属性为THREADPRIVATE的变量。
● 3.1.4 FIRSTPRIVATE(list)-适用于需要初始值的局部变量
● 属性为PRIVATE的变量在directive-pair范围的开始处,具有未定义的值。
● !$OMP PARALLEL PRIVATE(a) FIRSTPRIVATE(b)
● a的属性为private,进入parallel region时,初始值未定义;但是b的初始值为parallel region 之前serial region的值。
● 非常耗费资源(变量的值需要从serial region传给每个thread,相当于传N倍的数据,N is the number of threads.)
● 3.1.5 LASTPRIVATE(list)
● 当属性为lastprivate时,变量的值为执行完并行语句时的值
● 执行完并行语句时,变量的值在不同线程之间进行同步。需要有显式或者隐式的同步。
● 3.1.6 COPYIN(list)
● 属性为threadprivate的变量可以通过COPYIN语句将变量值设置为master thread中的值。
● 耗费资源:需要将master thread中的值传递到每个thread.
● 如下图,初始a为每个线程的id,后面通过copyin语句,使得对于每个线程,a的值都是0.
● 3.1.7 COPYPRIVATE(list)
● 只能用在!$OMP END SINGLE关闭指令之后
● 用于在!OMP SINGLE/OMP END SINGLE语句执行完之后,将属性为private的变量broadcast到每个线程
● !$OMP END SINGLE之后,NOWAIT与COPYPRIVATE(list)语句不能同时使用
● 3.1.8 REDUCTION(operator:list)
● 确保只有一个线程在写入/更新某个属性为SHARED的变量
● 只适用于以下情况:
● x = x operator expr
● x = intrinsic_procedure (x, expr_list), 变量x必须为标量和内置类型
● operator和intrinsic_procedure有:
●
● 3.2 Other clauses
● 3.2.1 IF(scalar_logical_expression)
● 在某些特定条件下开启并行(因为某些情况开启并行区域所耗费的时间比串行运行更长)
● 3.2.2 NUM_THREADS(scalar_integer_expression)
● 使用指定数目的线程
● !$OMP PARALLEL NUM_THREADS(4)-此并行区域使用4个线程
● 3.2.3 NOWAIT
● 避免同步
● 使用NOWAIT时,同时关闭了隐式的同步
● 3.2.4 SCHEDULE(type, chunk)-chunk is optional
● 允许为DO循环指定线程的分配方式(不一定要均分)
● !$OMP DO SCHEDULE(type,chunk)
● four different options for scheduling :
● STATIC
● !$OMP DO SCHEDULE(STATIC,chunk)
●
● 如上图,假设一共有三个线程的话,没有chunk,默认chunk=200;chunk取不同数值时,分配方案如下:
●
● DYNAMIC
● !$OMP DO SCHEDULE(DYNAMIC,chunk)
● iteration space被划分为chunk大小的pieces,当一个线程执行完一个pieces后,就自动去执行下一个;如果chunk没有值,默认是1
● 相较于STATIC,具有更好的性能;但是增加了分配循环的过程,当piece越小,此过程cost越大
● GUIDED
● !$OMP DO SCHEDULE(GUIDED,chunk)
● 与DYNAMIC类似,线程仍然是执行完一个去执行下一个,但是pieces的大小越来越小(指数关系),也就意味着线程执行的任务的piece越来越小。
● chunk指定了最小piece的数目。但是由于指数划分的原因,有可能不相等,最后会变成等分。
● 举例:
● RUNTIME
● !$OMP DO SCHEDULE(RUNTIME)
● 前三种都是在编译时制定好线程分配方案,runtime运行在程序运行时更改线程分配方案。
● 3.2.5 ORDERED
● 当DO循环需要被顺序执行时的命令
● 需要在DO循环开始时加上ORDERD命令
●
● CHAPTER 4 The OpenMP run-time library-包含一系列外部过程,封装在omp_lib库中
● 4.1 Execution environment routines
● 4.1.1 OMP_set_num_threads-并行区域中使用的线程数目
● call OMP_set_num_thread(number_of_threads)
● 只能在并行区域外部被调用
● 优先级高于OMP_NUM_THREADS这一环境变量
● 4.1.2 OMP_get_num_threads-正在使用的线程数目
● integer::a
● a= OMP_get_num_threads()
● 只能在并行区域里被调用,可以在并行区域的串行区域或者nested并行区域中被调用
● 4.1.3 OMP_get_max_threads
● integer::a
● a=OMP_get_max_threads
● 可以在并行区域或者串行区域中被调用;
● 返回当前程序中最多可以使用的线程的数目
● 4.1.4 OMP_get_thread_num
● integer::a
● a=OMP_get_thread_num()
● 返回当前线程的标识号
● 4.1.5 OMP_get_num_procs
● integer::a
● a=OMP_get_thread_num()
● 返回当前程序中可以使用的核的数目
● 4.1.6 OMP_in_parallel-获得当前程序是否是在并行的信息;如果parallel region中至少有一个block是并行的,则返回.TRUE.,否则.FALSE.
● logical::a
● a=OMP_get_thread_num()
● 4.1.7 OMP_set_dynamic-若为.TRUE.,并行区域中的线程数可以被run-time environment自动调整
● call OMP_set_dynamic(.TRUE.)
● call OMP_set_dynamic(.FALSE.)
● 4.1.8 OMP_get_dynamic-用于判断线程动态调整是否打开,若是,返回.TRUE.,否则.FALSE.
● logican::a
● a=OMP_get_dynamic()
● 4.1.9 OMP_set_nested-设置是否允许并行。默认值为FALSE,这意味着默认情况下,嵌套的并行是以串行方式进行的。优先级高于环境变量:OMP_NESTED
● call OMP_set_nested(.TRUE.)
● call OMP_set_nested(.FALSE.)
● 4.1.10 OMP_get_nested-获得嵌套并行是否允许的逻辑值。
● 4.2 Lock routines
● 4.2.1 OMP_init_lock and OMP_init_nest_lock
● 4.2.2 OMP_set_lock and OMP_set_nest_lock
● 4.2.3 OMP_unset_lock and OMP_unset_nest_lock
● 4.2.4 OMP_test_lock and OMP_test_nest_lock
● 4.2.5 OMP_destroy_lock and OMP_destroy_nest_lock
● 4.3 Timing routines
● 4.3.1 OMP_get_wtime
● 4.3.2 OMP_get_wtick
● 4.4 The Fortran 90 module omp_lib
Parallel programming Open MP-Bell
● CHAPTER 1
● Basic directives
● include a white space between the directive sentinel !$OMP and the following OpenMP directive.
● conditional compilation !$
● parallel region constructor
● !$OMP PARALLEL
● !$OMP END PARALLEL
● Before and after the parallel region, the code is executed by only one thread-serial regions. (It is not allowed to jump in or out of the parallel region using GOTO command.)
● master thread - when a thread executing a serial region encounters a parallel region, it creates a team of thread, and it becomes the master thread of the team.
● thread number- ranges from zero, for the master thread, up to N_p-1 .
● At the beginning of the parallel region it is possible to impose clauses which fix certainaspects of the way in which the parallel region is going to work: for example the scope ofvariables, the number of threads, special treatments of some variables, etc.
● !$OMP PARALLEL clause 1, clause 2...
● !$OMP END PARALLEL
●
● only the following ones are allowed within the !$OMP PARALLEL directive,
● PRIVATE(list)
● SHARED(list)
● DEFAULT(PRIVATE|SHARED|NONE)
● FIRSTPRIVATE(list)
● COPYIN(list)
● REDUCTION(operator:list)
● IF(scalar_logical_expression)
● NUM_THREADS(scalar_integer_expression)
● Nested parallel region-totally N_p^2+N_p messages will be printed on the screen.
● !$OMP PARALLEL
● WRITE(,) "HELLO"
● !$OMP PARALLEL
● WRITE(,) "HI"
● !$OMP END PARALLEL
● !$OMP END PARALLEL
●
● CHAPTER 2 OpenMP constructs
● 2.1 Work-sharing constructs
● restrictions
● Work-sharing constructs must be encountered by all threads in a team or by noneat all.
● Work-sharing constructs must be encountered in the same order by all threads in ateam.
● 2.1.1 !$OMP DO END DO (should be placed inside a parallel region)
● !$OMP DO
● do i =1,1000
● ...
● end do
● !$OMP END DO
●
● The way in which the work is distributed and in general how the working-sharing construct has to behave can be controlled with claused.
● !$OMP DO clause 1, clause 2, ...
● !$OMP END DO end_clause
● only the following clauses are allowed in the !$OMP DO directive
● PRIVATE(list)
● FIRSTPRIVATE(list)
● LASTPRIVATE(list)
● REDUCTION(operator:list)
● SCHEDULE(type, chunk)
● ORDERED
● add to the closing directive the NOWAIT clause in order to avoid the implied synchronization.
● If after the do-loop the modified variables have to be used, it is nescessary to add an implied or an explicit updating of the shared variables using !$OMP FLUSH directive.
● using !$OMP ORDERED OMP END ORDERED
● !$OMP DO ORDERED
● do i=1,1000
● !$OMP ORDERED
● A(i)=A(i-1)
● !$OMP ORDERED
● end do
● !$OMP END DO
● . When several nested do-loops are present, it is always convenient to parallelizethe outer most one, since then the amount of work distributed over the different threadsis maximal.
● 2.1.2 !$OMP SECTIONS-assign to each thread a completely different task leading to an multiple programs multiple data. Each section of code is executed once and only once by a thread in the team.
● syntax- each block of the code, to be executed by one of the threads, starts with an !$OMP SECTION directive and extend until the same directive is found again or until the closing-directive OMP END SECTIONS is found.
● !$OMP SECTIONS clause 1, clause 2
● ...
● !$OMP SECTION
● !$OMP SECTION
● ...
● !$OMP END SECTIONS end_clause
● !$OMP SECTIONS accepts the following clauses
● PRIVATE(list)
● FIRSTPRIVATE(list)
● LASEPRIVATE(list)
● REDUCTION(operator:list)
● !$OMP END SECTIONS only accepts the NOWAIT clause.
● Example
● !$OMP SECTIONS
● !$OMP SECTION
● write(,) "hello"
● !$OMP SECTION
● write(,) "bye"
● !$OMP END SECTIONS
●
● 2.1.3 !$OMP SINGLE OMP END SINGLE-The code enclosed in this directive-pair is only executed by one of the threads in the team,namely the one who first arrives to the opening-directive OMP SINGLE.
● all the remaining threads wait at the implied synchronization in the closing-directive !$OMP END SINGLE.
● !$OMP SINGLE clause 1, clause 2, ...
● ...
● !$OMP END SINGLE end_clause
● end_clause can be the cluase NOWAIT or COPYPRIVATE, but not both at the same time.
● Only the following two clauses can be used in the opening-directive:
● PRIVATE(list)
● FIRSTPRIVATE(list)
●
● 2.1.4 !$OMP WORKSHARE OMP END WORKSHARE-allow parallelizable Fortran 95 commands' parallelization.
● parallelizable Fortran 95 commands, like forall and where statements, cannot be treated with OpenMP directives.
● Fortran 95 transformational array intrinsic functions can be parallelized with the aid of the !OMP WORKSHARE/!$OMP END WORKSHARE directive-pair:matmul, dot product, sum, product, maxval, minval, count, any, all, spread, pack, unpack,reshape, transpose, eoshift, cshift, minloc and maxloc.
● 2.2 Combined parallel work-sharing constructs-specifying a parallel region that contains only one work-sharing construct 【对于有单个work-sharing的结构,可以指定一个并行区域】
● 2.2.1 !$OMP PARALLEL DO OMP END PARALLEL DO
● !$OMP PARALLEL DO clause 1, clause 2, ...
● ...
● !$OMP END PARALLEL DO
● clause 1, clause2 可以是!$OMP PARALLEL 或者是OMP DO后面的directive
● 2.2.2 !$OMP PARALLEL SECTIONS OMP END PARALLEL SECTIONS-用来指定仅包含单个OMP SECTIONS OMP END SECTIONS的directive-pairs
● !$OMP PARALLEL SECTIONS clause 1, clause 2, ...
● !$OMP END PARALLEL SECTIONS
● clause 1, clause2 可以是!$OMP PARALLEL 或者是OMP SECTIONS后面的directive
● 2.3 Synchronization constructs
● 2.3.1 !$OMP MASTER OMP END MASTER-the code enclosed inside this directive-pair is executed only by the master thread of the team. Meanwhile, all the other threads continue with their work: no implied synchronization exists!
● !$OMP MASTER
● ...
● !$OMP END MASTER
● In essence, this directive-pair is similar to using the !$OMP SINGLE/!OMP END SINGLE directive-pair presented before together with the NOWAIT clause
● 2.3.2 !$OMP CRITICAL OMP END CRITICAL-This directive-pair restricts the access to the enclosed code to only one thread at a time
● !$OMP CRITICAL name
● ...
● !$OMP END CRITICAL name
● name argument identifies the critical section. it is strongly recommended to give a name to each critical section
● When a thread reaches the beginning of a critical section, it waits there until no other thread is executing the code in the critical section. Different critical sections using the same name are treated as one common critical section, which means that only one thread at a time is inside them.
● all unnamed critical sections are considered as one common critical section
● !$OMP CRITICAL write_file
● !$OMP CRITICAL write_file
● 2.3.3 !$OMP BARRIER-This directive represents an explicit synchronization between the different threads in the team. When encountered, each thread waits until all the other threads have reached this point.
● The !$OMP BARRIER directive must be encountered by all threads in a team or bynone at all.
● it is necessary to avoid deadlock:
● !$OMP CRITICAL
● !$OMP BARRIER
● !$OMP END CRITICAL
● !$OMP SINGLE
● !$OMP BARRIER
● !$OMP END SINGLE
● !$OMP MASTER
● !$OMP BARRIER
● !$OMP END MASTER
● !$OMP SECTIONS
● !$OMP SECTION
● !$OMP BARRIER
● !$OMP SECTION
● !$OMP END SECTIONS
● 2.3.4 !$OMP ATOMIC-When a variable in use can be modified from all threads in a team, it is necessary to ensure that only one thread at a time is writing/updating the memory location of the considered variable. The present directive targets to ensure that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneously writing threads
● Only the followingones can be used together with the !$OMP ATOMIC directive:
●
● The variable x, affected by the !$OMP ATOMIC directive, must be of scalar nature and of intrinsic type.
●
● !$OMP ATOMIC -this directive only affects the immediately following statement.
● 2.3.5 !$ OMP FLUSH-. This directive must appear at the precise point in the code at which the data synchronizationis required.It ensures the updating of all variables.
● the !$OMP FLUSH directive offers the possibility ofincluding a list with the names of the variables to be flushed
● !$OMP FLUSH (variable 1, variable 2,...)
● 有(显式或者隐式)数据同步的命令:
●
● 无显式(或隐式)数据同步的命令,隐式数据同步可以通过NOWAIT关闭
●
● 2.3.6 !$OMP ORDERED OMP END ORDERED
● no thread can enter the ORDERED section until it is guaranteed that all previous iterations have been completed
●
● the order of entrance is specified by the sequence condition of the loop iterations.
● without the implied synchronization
● only one ORDERED section is allowed to be executed by each iteration inside a parallelized do-loop
● 2.4 Data environment constructs
● there are two kinds of data environment constructs
● which are independent of other OpenMP constructs
● which are associated to an OpenMP constructs and which effect only that OpenMP construct and its lexical extend (data scope attribute clauses)
● 2.4.1 !$OMP THREADPRIVATE(list)-its value is accessible from everywhere inside each thread and thatits value does not change from one parallel region to the next
● e.g. my_id
● The !$OMP THREADPRIVATE directive needs to be placed just after the declarations ofthe variables and before the main part of the software unit
●
● can only appear in the clauses COPYIN and COPYPRIVATE.
● application
● CHAPTER 3 PRIVATE SHARED & Co
● 3.1 Data scope attribute clauses
● 3.1.1 PRIVATE(list)-非常耗费资源
● !$OMP PARALLEL PRIVATE(a,b)
● Variables that are used as counters for do-loops, forall commands or implicit do-loopsor are specified to be THREADPRIVATE become automatically private to each thread, eventhough they are not explicitly included inside a PRIVATE clause at the beginning of thescope of the directive-pair.
● Variables declared as private have an undefined value at the beginning of the scope of the directive-pair, since they have just been created. Also when the scope of the directive-pair finishes, the original variable will have an undefined value (which valuefrom the available ones should it have!?).
● 3.1.2 SHARED(list)
● !$OMP PARALLEL SHARED(c,d)
● c and d are seen by all the threads inside the scope of the directive-pair.
● does not consume any additional resources.
● does not guarantee that the threads are immediately aware of changes made to the variable by another thread;
● force the update of the shared variables by using the directive !$OMP FLUSH
● avoid racing condition by programmer or !$OMP ATOMIC
● 3.1.3 DEFAULT(PRIVATE | SHARED | NONE)
● When most of the variables used inside the scope of a directive-pair are going to be private/shared, it is possible to specify a default setting.
● If no DEFAULT clause is specified, the default behavior is the same as if DEFAULT(SHARED) were specified
● !$OMP PARALLEL DEFAULT(PRIVATE) SHARED(a)
● NONE:defualt语句指定为none属性时,并行语句范围内的所有变量都要在并行命令开始处显式声明属性。例外为:do循环的counter, forall语句, 隐式do循环,以及属性为THREADPRIVATE的变量。
● 3.1.4 FIRSTPRIVATE(list)-适用于需要初始值的局部变量
● 属性为PRIVATE的变量在directive-pair范围的开始处,具有未定义的值。
● !$OMP PARALLEL PRIVATE(a) FIRSTPRIVATE(b)
● a的属性为private,进入parallel region时,初始值未定义;但是b的初始值为parallel region 之前serial region的值。
● 非常耗费资源(变量的值需要从serial region传给每个thread,相当于传N倍的数据,N is the number of threads.)
● 3.1.5 LASTPRIVATE(list)
● 当属性为lastprivate时,变量的值为执行完并行语句时的值
● 执行完并行语句时,变量的值在不同线程之间进行同步。需要有显式或者隐式的同步。
● 3.1.6 COPYIN(list)
● 属性为threadprivate的变量可以通过COPYIN语句将变量值设置为master thread中的值。
● 耗费资源:需要将master thread中的值传递到每个thread.
● 如下图,初始a为每个线程的id,后面通过copyin语句,使得对于每个线程,a的值都是0.
● 3.1.7 COPYPRIVATE(list)
● 只能用在!$OMP END SINGLE关闭指令之后
● 用于在!OMP SINGLE/OMP END SINGLE语句执行完之后,将属性为private的变量broadcast到每个线程
● !$OMP END SINGLE之后,NOWAIT与COPYPRIVATE(list)语句不能同时使用
● 3.1.8 REDUCTION(operator:list)
● 确保只有一个线程在写入/更新某个属性为SHARED的变量
● 只适用于以下情况:
● x = x operator expr
● x = intrinsic_procedure (x, expr_list), 变量x必须为标量和内置类型
● operator和intrinsic_procedure有:
●
● 3.2 Other clauses
● 3.2.1 IF(scalar_logical_expression)
● 在某些特定条件下开启并行(因为某些情况开启并行区域所耗费的时间比串行运行更长)
● 3.2.2 NUM_THREADS(scalar_integer_expression)
● 使用指定数目的线程
● !$OMP PARALLEL NUM_THREADS(4)-此并行区域使用4个线程
● 3.2.3 NOWAIT
● 避免同步
● 使用NOWAIT时,同时关闭了隐式的同步
● 3.2.4 SCHEDULE(type, chunk)-chunk is optional
● 允许为DO循环指定线程的分配方式(不一定要均分)
● !$OMP DO SCHEDULE(type,chunk)
● four different options for scheduling :
● STATIC
● !$OMP DO SCHEDULE(STATIC,chunk)
●
● 如上图,假设一共有三个线程的话,没有chunk,默认chunk=200;chunk取不同数值时,分配方案如下:
●
● DYNAMIC
● !$OMP DO SCHEDULE(DYNAMIC,chunk)
● iteration space被划分为chunk大小的pieces,当一个线程执行完一个pieces后,就自动去执行下一个;如果chunk没有值,默认是1
● 相较于STATIC,具有更好的性能;但是增加了分配循环的过程,当piece越小,此过程cost越大
● GUIDED
● !$OMP DO SCHEDULE(GUIDED,chunk)
● 与DYNAMIC类似,线程仍然是执行完一个去执行下一个,但是pieces的大小越来越小(指数关系),也就意味着线程执行的任务的piece越来越小。
● chunk指定了最小piece的数目。但是由于指数划分的原因,有可能不相等,最后会变成等分。
● 举例:
● RUNTIME
● !$OMP DO SCHEDULE(RUNTIME)
● 前三种都是在编译时制定好线程分配方案,runtime运行在程序运行时更改线程分配方案。
● 3.2.5 ORDERED
● 当DO循环需要被顺序执行时的命令
● 需要在DO循环开始时加上ORDERD命令
●
● CHAPTER 4 The OpenMP run-time library-包含一系列外部过程,封装在omp_lib库中
● 4.1 Execution environment routines
● 4.1.1 OMP_set_num_threads-并行区域中使用的线程数目
● call OMP_set_num_thread(number_of_threads)
● 只能在并行区域外部被调用
● 优先级高于OMP_NUM_THREADS这一环境变量
● 4.1.2 OMP_get_num_threads-正在使用的线程数目
● integer::a
● a= OMP_get_num_threads()
● 只能在并行区域里被调用,可以在并行区域的串行区域或者nested并行区域中被调用
● 4.1.3 OMP_get_max_threads
● integer::a
● a=OMP_get_max_threads
● 可以在并行区域或者串行区域中被调用;
● 返回当前程序中最多可以使用的线程的数目
● 4.1.4 OMP_get_thread_num
● integer::a
● a=OMP_get_thread_num()
● 返回当前线程的标识号
● 4.1.5 OMP_get_num_procs
● integer::a
● a=OMP_get_thread_num()
● 返回当前程序中可以使用的核的数目
● 4.1.6 OMP_in_parallel-获得当前程序是否是在并行的信息;如果parallel region中至少有一个block是并行的,则返回.TRUE.,否则.FALSE.
● logical::a
● a=OMP_get_thread_num()
● 4.1.7 OMP_set_dynamic-若为.TRUE.,并行区域中的线程数可以被run-time environment自动调整
● call OMP_set_dynamic(.TRUE.)
● call OMP_set_dynamic(.FALSE.)
● 4.1.8 OMP_get_dynamic-用于判断线程动态调整是否打开,若是,返回.TRUE.,否则.FALSE.
● logican::a
● a=OMP_get_dynamic()
● 4.1.9 OMP_set_nested-设置是否允许并行。默认值为FALSE,这意味着默认情况下,嵌套的并行是以串行方式进行的。优先级高于环境变量:OMP_NESTED
● call OMP_set_nested(.TRUE.)
● call OMP_set_nested(.FALSE.)
● 4.1.10 OMP_get_nested-获得嵌套并行是否允许的逻辑值。
● 4.2 Lock routines
● 4.2.1 OMP_init_lock and OMP_init_nest_lock
● 4.2.2 OMP_set_lock and OMP_set_nest_lock
● 4.2.3 OMP_unset_lock and OMP_unset_nest_lock
● 4.2.4 OMP_test_lock and OMP_test_nest_lock
● 4.2.5 OMP_destroy_lock and OMP_destroy_nest_lock
● 4.3 Timing routines
● 4.3.1 OMP_get_wtime
● 4.3.2 OMP_get_wtick
● 4.4 The Fortran 90 module omp_lib
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步