Compute Shader

Dispatch: 定义有多少个线程组(约束上限)SV_GroupID, uint3

numthreads:定义每个线程组里有多少个线程(约束上限)SV_GroupThreadID uint3

SV_GroupIndex: 某个线程组内线程索引,SV_GroupThreadID的一维化,即 SV_GroupTheadID.z * numthreads.x* numthreads.y + SV_GroupThreadID.y * numthreads.x + SV_GroupThreadID.x

SV_DispatchThreadID: SV_GroupID * numthreads + SV_GroupThreadID, 一般用于索引全局空间

A compute shader provides high-speed general purpose computing and takes advantage of the large numbers of parallel processors on the graphics processing unit (GPU). The compute shader provides memory sharing and thread synchronization features to allow more effective parallel programming methods. You call the ID3D11DeviceContext::Dispatch or ID3D11DeviceContext::DispatchIndirect method to execute commands in a compute shader. A compute shader can run on many threads in parallel.

void Dispatch( [in] UINT ThreadGroupCountX, [in] UINT ThreadGroupCountY, [in] UINT ThreadGroupCountZ ); // x,y,z: 在各个方向的thread group的数量,最大值为65535;

Dispatch(5, 5, 2); //启动50个线程组

In the following illustration, assume a thread group with 50 threads where the size of the group is given by (5,5,2). A single thread is identified from a thread group with 50 threads in it, using the vector (4,1,1).

Illustration of a single thread within a thread group of 50 threads

 

numthreads(X, Y, Z)

The X, Y and Z values indicate the size of the thread group in a particular direction and the total of X*Y*Z gives the number of threads in the group. The ability to specify the size of the thread group across three dimensions allows individual threads to be accessed in a manner that logically 2D and 3D data structures

Compute ShaderMaximum ZMaximum Threads (X*Y*Z)
cs_4_x 1 768
cs_5_0 64 1024

 

[numthreads(32, 32, 1)]  //numthreads defines the number of threads to be executed in a single thread group when a compute shader is dispatched,每一个线程组内的线程数:[numthreads(x,y,z)], x*y*z, 32*32*1= 1024
void MainComputeShader(uint3 Gid : SV_GroupID, //atm: -, 0...256, - in rows (Y) --> current group index (dispatched by c++)
uint3 DTid : SV_DispatchThreadID, //atm: 0...256 in rows & columns (XY) --> "global" thread id
uint3 GTid : SV_GroupThreadID, //atm: 0...256, -,- in columns (X) --> current threadId in group / "local" threadId
uint GI : SV_GroupIndex) //atm: 0...256 in columns (X) --> "flattened" index of a thread within a group)
{
}

 

The following illustration shows the relationship between the parameters passed to ID3D11DeviceContext::Dispatch, Dispatch(5,3,2), the values specified in the numthreads attribute, numthreads(10,8,3),

and values that will passed to the compute shader for the thread related system values (SV_GroupIndex,SV_DispatchThreadID,SV_GroupThreadID,SV_GroupID).

SV_GroupID: uint3 --> Dispatch(x, y, z), 取值范围: ([0,x-1], [0-y-1], [0, z-1])    组的ID,从0开始到该方向上的最大值 

SV_GroupThreadID: uint3 --> numthread(x,y,z) 取值范围:([0,x-1], [0-y-1], [0, z-1])  某组中的线程ID

SV_GroupIndex(uint) = SV_GroupThreadID.z*dimx*dimy + SV_GroupThreadID.y*dimx + SV_GroupThreadID.x

SV_DispatchThreadID is the sum of SV_GroupID * numthreads and GroupThreadID. It varies across the range specified in Dispatch and numthreads. For example if Dispatch(2,2,2) is called on a compute shader with numthreads(3,3,3) SV_DispatchThreadID will have a range of 0..5 for each dimension.

illustration of the relationship between dispatch, thread groups, and threads

256*256*1= (8,8,1) * (32,32,1), 总的工作量等于组数乘以每个组内的线程数

 

  • The maximum number of threads is limited to D3D11_CS_THREAD_GROUP_MAX_THREADS_PER_GROUP (1024) per group.
  • The X and Y dimension of numthreads is limited to D3D11_CS_THREAD_GROUP_MAX_X (1024) and D3D11_CS_THREAD_GROUP_MAX_Y (1024).
  • The Z dimension of numthreads is limited to D3D11_CS_THREAD_GROUP_MAX_Z (64).
  • The maximum dimension of dispatch is limited to D3D11_CS_DISPATCH_MAX_THREAD_GROUPS_PER_DIMENSION (65535).
  • The maximum number of unordered-access views that can be bound to a shader is D3D11_PS_CS_UAV_REGISTER_COUNT (8).
  • Supports RWStructuredBuffers, RWByteAddressBuffers, and typed unordered-access views (RWTexture1DRWTexture2DRWTexture3D, and so on).
  • Atomic instructions are available.
  • Double-precision support might be available. For information about how to determine whether double-precision is available, see D3D11_FEATURE_DOUBLES.
posted @ 2022-02-15 17:57  引擎之旅  阅读(386)  评论(0编辑  收藏  举报