Unity GPGPU教程翻译(一)

原文：https://scrawkblog.com/category/directcompute
Today I will be going over the core concepts for writing compute shaders in Unity.
At the heart of a compute shader is the kernel. This is the entry point into the shader and acts like the Main function in other programming languages. I will also cover the tiling of threads by the GPU. These tiles are also known as blocks or thread groups. DirectCompute officially refers to these tiles as thread groups.
compute shader的核心是Kernel，是compute shader的计算的入口。其作用就像是其他编程语言中的主函数，我还将介绍GPU上线程的Tilling，每个Tiles也被称为线程组或Block。DirectCompute正式的名称叫线程组。

To create a compute shader in Unity simply go to the project panel and then click create->compute shader and then double click the shader to open it up in Monodevelop for editing. Paste in the following code into the newly created compute shader.
要在Unity中创建一个Compute Shader，只需进入项目面板，然后点击Create->Compute Shader，然后双击该Compute Shader，在Monodevelop中打开它进行编辑。将以下代码粘贴到新创建的Compute Shader中。

#pragma kernel CSMain1
[numthreads(4,1,1)]
void CSMain1(){

}

This is the bare minimum of content for a compute shader and will of course do nothing but will serve as a good starting point. A compute shader has to be run from a script in Unity so we will need one of those as well. Go to the project panel and click Create->C# script. Name it KernelExample and paste in the following code.
这是一个Compute Shader最基本的内容，当然什么也做不了，但可以作为一个很好的起点。一个Compute Shader必须从Unity的脚本中运行，所以我们也需要一个这样的脚本。进入项目面板，点击创建->C#脚本。将其命名为KernelExample，并粘贴以下代码。

    using UnityEngine;
    using System.Collections;
    public class useSimpleComputeShader:MonoBehaviour{
        public ComputeShader computeShader；
        void Start(){
            computeShader.Dispatch(0,1,1,1);
        }
        void Update(){

        }
    }

Now drag the script onto any game object and then attach the compute shader to the shader attribute. The shader will now run in the start function when the scene is run. Before you run the scene however you need to enable dx11 in Unity. Go to Edit->Project Settings->Player and then tick the “Use Direct3D 11” box. You can now run the scene. The shader will do nothing but there should also be no errors.
现在把脚本拖到任何游戏对象上，然后把Compute Shader附加到着色器属性上。该着色器现在将在场景运行时在启动函数中运行。然而在你运行场景之前，你需要在Unity中启用dx11。进入Edit->Project Settings->Player，然后勾选 "Use Direct3D 11 "框。你现在可以运行这个场景了。着色器将不做任何事情，但也不应该有错误。

In the script you will see the “Dispatch” function called. This is responsible for running the shader. Notice the first variable is a 0. This is the kernel id that you want to run. In the shader you will see the “#pragma kernel CSMain1“. This defines what function in the shader is the kernel as you may have many functions (and even many kernels) in one shader. There must be a function will the name CSMain1 in the shader or the shader will not compile.
在脚本中，你会看到 "Dispatch "函数被调用。它负责运行着色器。注意第一个变量是0，这是你想要运行的kernel的ID。在着色器中，你会看到 "#pragma kernel CSMain1"。这定义了shader中的哪个函数是kernel，因为你可能在一个shader中拥有许多函数（甚至是许多kernel）。着色器中必须有一个名为CSMain1的函数，否则着色器将无法编译。

Now notice the “[numthreads(4,1,1)]” line. This tells the GPU how many threads of the kernel to run per group. The 3 numbers relate to each dimension. A thread group can be up to 3 dimensions and in this example we are just running a 1 dimension group with a width of 4 threads. That means we are running a total of 4 threads and each thread will run copy of the kernel. This is why GPU’s are so fast. They can run thousands of threads at a time.
Now lets get the kernel to actually do something. Change the shader to this…
现在注意"[numthreads(4,1,1)]"一行。这告诉GPU每个线程组有多少个运行当前Kernel的线程。这3个数字与每个维度有关。一个线程组最多可以有3个维度，在这个例子中，我们只是在运行一个宽度为4个线程的1维线程组。这意味着我们总共运行了4个线程，每个线程将运行Kernel的副本。这就是为什么GPU的速度如此之快。它们可以同时运行成千上万的线程。
现在让我们让kernel真正做一些事情。把着色器改成这样...

#pragma kernel CSMain1
RWStructuredBuffer<int>buffer1;

[numthreads(4,1,1)]
void CSMain1(int3 threadID:SV_GroupThreadID){
    buffer1[threadID.x]=threadID.x;
}

更改C#脚本如下:

    using UnityEngine;
    using System.Collections;
    public class useSimpleComputeShader:MonoBehaviour{
        public ComputeShader computeShader；
        void Start(){
            ComputeBuffer buffer=new ComputeBuffer(4,sizeof(int));
            computeShader.SetBuffer(0,"buffer1",buffer);

            computeShader.Dispatch(0,1,1,1);

            int[] data=new int[4];
            buffer.GetData(data);

            for(int i=0;i<4;i++){
                Debug.Log(data[i]);
            }
            buffer.Release();
        }
        void Update(){

        }
    }

Now run the scene and you should see the numbers 0, 1, 2 and 3 printed out.
运行当前场景，然后你应该能看到控制台打印出来0,1,2,3.

Don’t worry too much about the buffer for now. I will cover them in detail in the future but just know that a buffer is a place to store data and it needs to have the release function called when you are finished with it.
目前不用太担心Buffer是用来干什么的，我会在后面详细的介绍它们。现在你只需要知道，Buffer是用来存储数据的，当你用完它时，需要调用释放函数。

Notice this argument added to the CSMain1 function “int3 threadID : SV_GroupThreadID“. This is a request to the GPU to pass into the kernel the thread id when it is run. We are then writing the thread id into the buffer and since we have told the GPU we are running 4 threads the id ranges from 0 to 3 as we see from the print out.
注意到CSMain1函数的参数int3 threadID:SV_GroupThreadID。SV_GroupThreadID表示ComputeShader的输入参数，会将运行时的线程ID传入Kernel。然后我们把ThreadID写入到缓冲区中。由于我们已经告诉GPU当前这个线程组里有四个线程(通过[numthreads(4,1,1)]),所以从打印结果来看当前线程的ID范围为0到3。

语义HLSL定义：
https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-semantics
语义是附加在着色器输入或输出上的一个字符串，它传达了关于参数的预期用途的信息。在着色器阶段之间传递的所有变量都需要语义。为着色器变量添加语义的语法在这里显示（Variable Syntax (DirectX HLSL)）。

一般来说，在流水线阶段之间传递的数据是完全通用的，并且不被系统唯一地解释；任意的语义是允许的，它没有特殊的意义。包含这些特殊语义的参数（在Direct3D 10及以后的版本）被称为系统值语义。

Now those 4 threads make up whats called a thread group. In this case we are running 1 group of 4 threads but you can run multiple groups of threads. Lets run 2 groups instead of 1. Change the shaders kernel to this…
现在四个线程组成了一个线程组，我们现在只有一个线程组，一个线程组由4个线程组成，但是你可以运行多个线程组。运行两组线程组，只需要作一下更改：

#pragma kernel CSMain1
RWStructuredBuffer<int>buffer1;

[numthreads(4,1,1)]
void CSMain1(int3 threadID:SV_GroupThreadID,int3 groupID:SV_GroupID){
    buffer1[threadID.x+groupID.x*4]=threadID.x;
}

    using UnityEngine;
    using System.Collections;
    public class useSimpleComputeShader:MonoBehaviour{
        public ComputeShader computeShader；
        void Start(){
            ComputeBuffer buffer=new ComputeBuffer(4*2,sizeof(int));
            computeShader.SetBuffer(0,"buffer1",buffer);

            computeShader.Dispatch(0,2,1,1);

            int[] data=new int[4*2];
            buffer.GetData(data);

            for(int i=0;i<4*2;i++){
                Debug.Log(data[i]);
            }
            buffer.Release();
        }
        void Update(){

        }
    }

然后运行场景，应该会看到0-3被输出了两次。

Now notice the change to the dispatch function. The last three variables (the 2,1,1) are the number of groups we want to run and just like the number of threads groups can go up to 3 dimensions and in this case we are running 1 dimension of 2 groups. We have also had to change the kernel with the argument “int3 groupID : SV_GroupID” added. This is a request to the GPU to pass in the group id when the kernel is run. The reason we need this is because we are now writing out 8 values, 2 groups of 4 threads. We now need the threads position in the buffer and the formula for this is the thread id plus the group id times the number of threads ( threadID.x + groupID.x4 ).
现在注意下Dispatch函数的变化。后面的三个变量(2,1,1)是我们想要运行的线程组的数量，想线程的数量一样，线程组可以到达到3个维度，而现在我们正运行着一个，共处一个维度的两个线程组。我们还得改下Kernel的参数，添加int3 groupID:SV_GroupID参数，GPU就会在运行Kernel时传入线程组的ID。我们需要这个参数的原因是，我们要写8个值进Buffer中，2组4个线程。我们现在需要计算线程在Buffer中的位置[一个线程对应Buffer中的一个位置]，这个计算公式是:线程ID加上线程组ID乘以线程数每一个线程组包含的线程数即(threadID.x+groupID.x4)

This is a bit awkward to write. Surely the GPU knows the threads position? Yes it does. Change the shaders kernel to this and rerun the scene.
这写起来有点别扭，GPU理应是知道线程的位置的，确实，GPU是知道的。把Kernel作以下更改，再运行下场景

#pragma kernel CSMain1
RWStructuredBuffer<int>buffer1;

[numthreads(4,1,1)]
void CSMain1(int3 threadID:SV_GroupThreadID,int3 dispatchID:SV_DispatchThreadID){
    buffer1[dispatchID.x]=threadID.x;
}

The results should be the same, two sets of 0-3 printed. Notice that the group id argument has been replaced with “int3 dispatchID : SV_DispatchThreadID“. This is the same number our formula gave us except now the GPU is doing it for us. This is the threads position in the groups of threads.
结果应该是一样的，也同样是打印了两组的0-3的数据。注意，int3 groupID:SV_GroupID被替换成了int3 dispatchID:SV_DispatchThreadID。这和我们的公式给出的数字是一样的，只不过现在是GPU已经替我们做好了计算，这个SV_DispatchThreadID就是线程在线程组的位置。

So far these have all been in 1 dimension. Lets step thing up a bit and move to 2 dimensions and instead of rewriting the kernel lets just add another one to the shader. Its not uncommon to have a kernel for each dimension in a shader performing the same algorithm. First add this code to the shader below the previous code so there are two kernels in the shader.
目前为止，我们都是在1维中工作。让我们扩展到2维，这次我们不重写原来的Kernel，让我们在原来的基础上添加多一个Kernel，执行相同的计算，这样的情况并不少见。首先，将这段代码添加到着色器当中，使得在着色器当中存在两个Kernel。

#pragma kernel CSMain1
#pragma kernel CSMain2

RWStructuredBuffer<int>buffer1;

[numthreads(4,1,1)]
void CSMain1(int3 threadID:SV_GroupThreadID,int3 dispatchID:SV_DispatchThreadID){
    buffer1[dispatchID.x]=threadID.x;
}

RWStructuredBuffer<int>buffer2;

[numthreads(4,4,1)]
void CSMain2(int3 dispatchID:SV_DispatchThreadID){
    int id=dispatchID.x+dispatchID.y*8;
    buffer2[id]=id;
}

对C#代码修改如下：

    using UnityEngine;
    using System.Collections;
    public class useSimpleComputeShader:MonoBehaviour{
        public ComputeShader computeShader；
        void Start(){
            ComputeBuffer buffer=new ComputeBuffer(4*4*2*2,sizeof(int));

            int kernel=computeShader.FindKernel("CSMain2");
            computeShader.SetBuffer(kernel,"buffer2",buffer);

            computeShader.Dispatch(kernel,2,2,1);

            int[] data=new int[4*4*2*2];
            buffer.GetData(data);

            for(int i=0;i<8;i++){
                string line="";
                for(int j=0;j<8;j++)
                    line+=" "+data[j+i*8];
                Debug.Log(line);
            }

            buffer.Release();
        }
        void Update(){

        }
    }

运行场景你会看到第一行打印0-7，第二行是8-15，如此类推到63。
为什么是从0打印到63？
因为我们现在有4个2维的线程组。(Dispatch(kernel,2,2,1)所以是4个)，
每个线程组是4*4，所以有16个线程。(ComputeShader的Kernel中 numthreads(4,4,1) )
这就让我们总共拥有了64个线程。

Notice what value we are out putting from this line “int id = dispatchID.x + dispatchID.y * 8“. The dispatch id is the threads position in the groups of threads for each dimension. We now have 2 dimension so we need the threads global position in the buffer and this is just the dispatch x id plus the dispatch y id times the total number of threads in the first dimensions (4 * 2). This is a concept you will have to be familiar with when working with compute shaders. The reason is that buffers are always 1 dimensional and when working in higher dimension you need to calculate what index the result should be written into the buffer at.
请注意我们这一行 "int id = dispatchID.x + dispatchID.y * 8"。
dispatchID是线程在每个维度的线程组中的位置索引值。我们现在有两个维度，所以我们需要计算线程在缓冲区中的全局位置，这就是为什么要使用dispatchID.x+dispatchID.y*(4 * 2)第一个维度中的线程总数。这是一个你在使用计算着色器时必须要熟悉的概念。因为buffer总是一维的，当在更高的维度上工作时，你需要计算你的计算结果应该写到buffer中的哪个索引。

The same theory applies when working with 3 dimensions but as it gets fiddly I will only demonstrate up to 2 dimensions. You just need to know that in 3 dimensions the buffer position is calculated as “int id = dispatchID.x + dispatchID.y * groupSizeX + dispatchID.z * groupSizeX * groupSizeY” where group size is the number of groups times number of threads for that dimension.
这个理论同样适用于3维，但是由于它变得更加复杂，我将只演示到2维，你只需要知道在3维中,buffer位置的计算方法是“int id = dispatchID.x + dispatchID.y * groupSizeX + dispatchID.z * groupSizeX * groupSizeY”，其中GroupSize是该维度的线程数*线程组数。

posted @ 2021-10-10 21:10 凶恶的真实阅读(424) 评论(0) 收藏举报

刷新页面返回顶部

凶恶的真实

Unity GPGPU教程翻译(一)

公告