DirectX10  Sample 翻译

CubeMapGS Sample

The CubeMapGS sample demonstrates rendering a cubic texture render target with a single DrawIndexed() call using two new features in Direct3D 10: render target array and geometry shader. A render target array allows multiple render target and depth stencil textures to be active simultaneously. By using an array of six render targets, one for each face of the cube texture, all six faces of the cube can be rendered together. When the geometry shader emits a triangle, it can control which render target in the array the triangle gets rasterized on. For every triangle that gets passed to the geometry shader, six triangles are generated in the shader and output to the pixel shader, one triangle for each render target.

CubeMapGS Sample描述了如何使用DX10绘制一个Cubic Texture,在这里用到了DX10的两个新特性:Render Target数组和GS。RT数组可以让多个Render Target和Depth Stencil Texture纹理同时被激活。使用一个由6个代表cube texture每个面的Render Target组成的RT数组,可以同时绘制cube texture的6个面。对于送入GS的每个三角形,它(GS)会生成6个三角形分别送到不同Render Target的Pixel Shader上。

例图:

Sample Overview

Environment mapping is a popular and well-supported technique in 3D graphics. Traditionally, dynamic cubic environment maps are created by obtaining a surface for each face of a cube texture and setting that surface as the render target to render the scene once for each face of the cube. The cube texture can then be used to render environment-mapped objects. This method, while it works, increases the number of rendering passes from one to seven, greatly reducing the application frame rate. In Direct3D 10, applications can use geometry shaders and render target arrays to alleviate this problem.

    环境映照是3D图形学中比较流行和支持的比较好的技术。传统动态Cube环境映照的生成是通过对每个面分别绘制一次场景得到。这样Cube Texture就能用来绘制带环境映照的物体。这样做把绘制的pass从1次增加到了7次,降低了帧率。在DX10可以使用RT数组和GS解决这个问题。

How the Sample Works

A geometry shader in Direct3D 10 is a piece of code that runs for each primitive to be rendered. In each invocation, the geometry shader can output zero or more primitives to be rasterized and processed in the pixel shader. For each output primitive, the geometry shader can also control to which element slice of the render target array the primitive gets emitted.

    GS是DX10中对每个被处理图元执行的一段代码(注意为HLSL,GPU中执行)。GS每次调用都能输出图元到PS里光栅化和做像素处理。对每个输出图元,GS能够控制它被发送到RT数组中的哪个Render Target执行。

The render target array in Direct3D 10 is a feature that enables an application to render onto multiple render targets simultaneously at the primitive level. The application uses a geometry shader to output a primitive and select which render target in the array should receive the primitive. This sample uses a render target array of 6 elements for the 6 faces of the cube texture. The following code fragment creates the render target view used for rendering to the cube texture.

    RT数组是DX10的一个新特性,它可以使程序在图元级别上同时对多个Render Target进行绘制。(在DX9中,只能在PS里对最多4个Render Target绘制,被认为是像素级别)。程序使用GS输出图元并选择哪个Render Target接受这个图元。这个sample使用一个6个元素的RT数组,其中每个元素代表cube texture的每个面。下面的代码生成了绘制cube texture的render target view。(view是DX10的一个新特性,表示对资源(Buffer,纹理)的读取方式,一个纹理既可以当作1维纹理,也可以当作2维纹理,完全由读取它的view决定):

// Create the 6-face render target view
//rener target view的描述符
D3D10_RENDER_TARGET_VIEW_DESC DescRT;
DescRT.Format = dstex.Format;//render target格式为纹理格式
DescRT.ResourceType = D3D10_RESOURCE_TEXTURECUBE;//资源类型为cube texture
DescRT.TextureCube.FirstArraySlice = 0;//offset=0(个人理解?)
DescRT.TextureCube.ArraySize = 6;//大小为6
DescRT.TextureCube.MipSlice = 0;//mip map level=0(个人理解?)
//创建view
//m_pEnvMap为创建好的cube texture纹理
//m_pEnvMapRTV为被创建的view
m_pD3D10Device->CreateRenderTargetView( m_pEnvMap, &DescRT, &m_pEnvMapRTV );

By setting DescRT.TextureCube.FirstArraySlice to 0 and DescRT.TextureCube.ArraySize to 6, this render target view represents an array of 6 render targets, one for each face of the cube texture. When the sample renders onto the cube map, it sets this render target view as the active view by calling ID3D10Device::OMSetRenderTargets() so that all 6 faces of the texture can be rendered at the same time.

    通过设定render target view的offset为0和大小为6,这个render target view就代表了一个有6个render target的数组。每个rener target代表cube texture的每个面。绘制是通过ID3D10Device::OMSetRenderTargets()激活这个view,cube texture的6个面就能同时绘制了。

// Set the env map render target and depth stencil buffer
ID3D10RenderTargetView* aRTViews[ 1 ] = { m_pEnvMapRTV };
//三个参数分别为:
//传入的render target view和depth stencil view的个数
//rt view的指针数组
//ds view的指针
//rt view使用指针数组而ds view使用指针,因为在一帧中一般不会出现不同view使用不同//的depth stencil格式
m_pD3D10Device->OMSetRenderTargets( sizeof(aRTViews) / sizeof(aRTViews[0]), aRTViews, m_pEnvMapDSV );

Rendering the CubeMap

At the top level, rendering begins in Render(), which calls RenderSceneIntoCubeMap() and RenderScene(), in that order. RenderScene() takes a view matrix a projection matrix, then renders the scene onto the current render target. RenderSceneIntoCubeMap() handles rendering of the scene onto a cube texture. This texture is then used in RenderScene() to render the environment-mapped object

    在应用程序中,绘制开始于Render()函数;在Render()函数里按顺序调用RenderSceneIntoCubMap()函数和RenderScene()函数。RenderScene()函数接受view矩阵和projection矩阵,然后把场景绘制到屏幕上。而RenderSceneIntoCubeMap()把场景绘制到cube texture上。这个texture被RenderScene()用来绘制环境映照物体。

In RenderSceneIntoCubeMap(), the first necessary task is to compute the 6 view matrices for rendering to the 6 faces of the cube texture. The matrices have the eye point at the camera position and the viewing directions along the -X, +X, -Y, +Y, -Z, and +Z directions. A boolean flag, m_bUseRenderTargetArray, indicates the technique to use for rendering the cube map. If false, a for loop iterates through the faces of the cube map and renders the scene by calling RenderScene() once for each cube map face. No geometry shader is used for rendering. This technique is essentially the legacy method used in Direct3D 9 and prior. If m_bUseRenderTargetArray is true, the cube map is rendered with the RenderCubeMap effect technique. This technique uses a geometry shader to output each primitive to all 6 render targets. Therefore, only one call to RenderScene() is required to draw all 6 faces of the cube map.

RenderSceneIntoCubeMap()函数中,首先要计算6个面的View矩阵。矩阵视点在相机所在位置,视点方向沿着-X,+X,-Y,+Y,-Z,+Z方向。布尔值m_bUseRenderTargetArray表示是否用RT数组来绘制。如果是false的化,就使用循环对每个面分别用RenderScene()绘制而不使用GS。这在DX9和过去版本中是标准方法。如果是true,就是用RenderCubeMap特效技术,使用GS对6个RenderTarget输出图元,并且只要调用一次RenderScene()就可以完成6个面的绘制。

The vertex shader that is used for rendering onto the cube texture is VS_CubeMap, as shown below. This shader does minimal work of transforming vertex positions from object space to world space. The world space position will be needed in the geometry shader.

    绘制中使用的VS是VS_CubeMap,在后面会列出。这个Shader只是把顶点位置从对象坐标系转换到世界坐标系。世界坐标系坐标会在GS里用到。

struct VS_OUTPUT_CUBEMAP
{
    float4 Pos : SV_POSITION;    // World position
    float2 Tex : TEXCOORD0;      // Texture coord
};
VS_OUTPUT_CUBEMAP VS_CubeMap( float4 Pos : POSITION, float3 Normal : NORMAL, float2 Tex : TEXCOORD )
{
    VS_OUTPUT_CUBEMAP o = (VS_OUTPUT_CUBEMAP)0.0f;
 
    // Compute world position
    o.Pos = mul( Pos, mWorld );
 
    // Propagate tex coord
    o.Tex = Tex;
 
    return o;
}

One of the two geometry shaders in this sample, GS_CubeMap, is shown below. This shader is run once per primitive that VS_CubeMap has processed. The vertex format that the geometry shader outputs is GS_OUTPUT_CUBEMAP. The RTIndex field of this struct has a special semantic: SV_RenderTargetArrayIndex. This semantic enables the field RTIndex to control the render target to which the primitive is emitted. Note that only the leading vertex of a primitive can specify a render target array index. For all other vertices of the same primitive, RTIndex is ignored and the value from the leading vertex is used. As an example, if the geometry shader constructs and emits 3 vertices with RTIndex equal to 2, then this primitive goes to element 2 in the render target array.

两个GS其中的一个GS_CubeMap会在后面列出。这个Shader在VS_CubeMap处理完一个图元后运行。GS输出的顶点格式为GS_OUTPUT_CUBEMAP。结构中,RIndex有一个特殊的语义:SV_RenderTargetArrayIndex,使得RIndex可以控制该图元送到哪个RenderTarget中。注意在产生的图元中,只有第一个顶点能确定输出到哪个RenderTarget中。图元中的其他顶点会被忽略。举例说,比如一个GS发送了3个顶点,Ridex=2,那么它就会被送到第二个Target中。

At the top level, the shader consists of a for loop that loops 6 times, once for each cube face. Inside the loop, another loop runs 3 times per cube map to construct and emit three vertices for the triangle primitive. The RTIndex field is set to f, the outer loop control variable. This ensures that in each iteration of the outer loop, the primitive is emitted to a distinct render target in the array. Another task that must be done before emitting a vertex is to compute the Pos field of the output vertex struct. The semantic of Pos is SV_POSITION, which represents the projected coordinates of the vertex that the rasterizer needs to properly rasterize the triangle. Because the vertex shader outputs position in world space, the geometry shader needs to transform that by the view and projection matrices. In the loop, the view matrix used to transform the vertices is g_mViewCM[f]. This array of matrix is filled by the sample and contains the view matrices for rendering the 6 cube map faces from the environment-mapped object's perspective. Thus, each iteration uses a different view transformation matrix and emits vertices to a different render target. This renders one triangle onto 6 render target textures in a single pass, without calling DrawIndexed() multiple times.

在程序里,Shader由6次循环构成,一次绘制一个cubemap的表面。每次循环中,另一个循环对每个表面循环三次,每次发送三个顶点出去。RTIndex设置为循环索引,这保证每个外部循环中产生的顶点被发送到同一个RenderTarget上。另外要做的一件事就是在发送定点之前计算输出顶点的Pos域。Pos域的语义是SV_POSITION,代表顶点变换后的坐标,用来做整个三角形的光栅化操作。因为VS输出的是世界坐标系的坐标,因此需要在GS里做视点变换和投影变换。循环中,视点变换矩阵存储在g_mViewCM[f]。这个矩阵数组由程序来填充,因此在每个循环中使用不同的View矩阵。这个绘制过程在一个pass(指的是处理一个三角形)中绘制1个三角形到6个Target纹理上,而不需要调用DrawIndexed()多次。

struct GS_OUTPUT_CUBEMAP
{
    float4 Pos : SV_POSITION;     // Projection coord
    float2 Tex : TEXCOORD0;       // Texture coord
    uint RTIndex : SV_RenderTargetArrayIndex;
};
 
//最大顶点数18,6个三角形,每个cubemap面一个
[maxvertexcount(18)]
void GS_CubeMap( triangle VS_OUTPUT_CUBEMAP In[3], //输入为三角形
                  //输出为三角形数据流
inout TriangleStream<GS_OUTPUT_CUBEMAP> CubeMapStream
 )
{
    for( int f = 0; f < 6; ++f )
    {
        // Compute screen coordinates
        GS_OUTPUT_CUBEMAP Out;
        Out.RTIndex = f;
        for( int v = 0; v < 3; v++ )
        {
            Out.Pos = mul( In[v].Pos, g_mViewCM[f] );
            Out.Pos = mul( Out.Pos, mProj );
Out.Tex = In[v].Tex;
//输入顶点
            CubeMapStream.Append( Out );
        }
         //结束输入一个图元(这里是三角形)
        CubeMapStream.RestartStrip();
    }

The pixel shader, PS_CubeMap, is rather straight-forward. It fetches the diffuse texture and applies it to the mesh. Because the lighting is baked into this texture, no lighting is performed in the pixel shader

PS PS_CubeMap中的运算很简单。从diffuse纹理中取得纹理颜色贴到mesh上。因为光照明被烘焙到纹理上,因此不需要计算光照。

Rendering the Reflective Object

Three techniques are used to render the reflective object in the center of the scene. All three fetch texels from the cubemap just as they would from a cubemap that wasn't rendered in a single pass. In short, this technique is orthogonal to the way in which the resulting cubemap is used. The three techniques differ mainly in how they use the cubemap texels. RenderEnvMappedScene uses an approximated fresnel reflection function to blend the colors of the car paint with reflection from the cubemap. RenderEnvMappedScene_NoTexture does the same, but without the paint material. RenderEnvMappedGlass adds transparency to RenderEnvMappedScene_NoTexture.

三种技术被用作绘制场景中的有反射属性的物体,它们都是从cubemap上取纹理, 而且不是在1个pass里完成的。简短的说,这些技术在使用cubemap的方法上是无关的。三种技术的主要区别就在于此。RenderEnvMappedScene使用一个逼近菲涅尔发射函数混合车的材质和采样到的CubeMap颜色。RenderEnvMappedScene_NoTexture也一样,但是不画车的颜色材质。RenderEnvMappedGlass在RenderEnvMappedScene_NoTexture基础上添加透明效果。

Higher Order Normal Interpolation

Traditional normal interpolation has been linear. This means that the normals calculated in the vertex shader are linearly interpolated across the face of the triangle. This causes the reflections in the car to appear to slide when the direction of the normal changes rapidly across the face of a polygon. To mitigate this, this sample uses a quadratic normal interpolation. In this case, 3 extra normals are calculated in the geometry shader. These normals are placed in the center of each triangle edge and are the average of the two normals at the vertices that make up the edge. In addition, the geometry shader calculates a barycentric coordinate for each vertex that is passed down to the pixel shader and used to interpolate between the six normals.

传统的法向插值是线性插值,这意味着在VS里计算的法向是沿着三角形平面线性插值的。这会导致当法向在多边形表面变化迅速时车表面的反射太平滑。为了解决这个问题,sample里使用了四元法向插值。GS里计算了每三角形每条边中点的法向,同时还计算了每个顶点的重心坐标系坐标,送入PS中对6个法向(3顶点,3中点)线性插值。

In the pixel shader, these 6 normals weighted by six basis functions and added together to create the normal for that particular pixel. The basis functions are as follows.

PS里,6个法向被乘上由基函数算出来的权重值,并累加得到当前象素的法向。使用的基函数如下:

x,y 为重心坐标系坐标

2x^2 + 2y^2 + 4xy -3x -3y + 1 =(2x + 2y 1)* (x + y 1)
-4x^2 -4xy + 4x = -4x(x + y 1)
2x^2  x = x(2x  1)
-4y^2 -4xy + 4y = -4y(x + y 1)
2y^2  y = y(2y  1)
4xy = 4xy
 

Draw Predicated Sample

This sample uses the occlusion predicate query to avoid drawing unnecessary geometry.

    这个Sample使用occlusion predicate query来避免绘制不必要的几何模型。

How the Sample Works

The scene contains four meshes. The first two meshes are high resolution meshes that comprise the focus of the scene. Because of the large number of vertices, these meshes are costly to draw. The third mesh is a much lower resolution mesh that is used as an approximation of the first two meshes. For games, this could be a mesh created specifically for occlusion purposes or a collision mesh. The last mesh is a low resolution mesh made of multiple extruded boxes. When the camera traverses the scene, these boxes occlude the first two meshes. Traditionally, the vertex processing cost of the first two meshes would still be present even when the meshes are completely occluded by the fourth mesh.

这个场景有四个mesh。开始两个mesh为高密mesh,组成场景的焦点部分。因为它们有大量的顶点,因此绘制比较耗时。第三个mesh为非常稀疏mesh,用以模拟前两个mesh。在游戏中,这能够用作遮挡加速和碰撞检测。最后一个mesh是由多个拉伸的盒子构成的比较稀疏的mesh。当视点穿过场景时,盒子遮挡了第一第二个mesh。传统上,即使前两个mesh被第四个mesh挡住了,它们的顶点处理也会照常进行。

The DrawPredicated sample removes the vertex processing cost by using the D3DQUERYTYPE_OCCLUSIONPREDICATE query type in Direct3D 10. This query allows a drawing operation to succeed or fail based upon whether pixels from a previous draw operation were visible. Because the predicate query is created with D3D10_QUERY_MISCFLAG_PREDICATEHINT, the result of the query can be used to draw or not draw geometry without waiting for the result to be transferred from the GPU to the CPU.

    这个Sample使用DX10里的D3DQUERYTYPE_OCCLUSIONPREDICATE query消除了多余的顶点处理。这个query基于上一次绘制是否可见来使的本次绘制成功或者失败。因为预测query是由D3D10_QUERY_MISCFLAG_PREDICATEHINT,query的结果能够直接判断绘制还是不绘制,不需要等待结果从GPU传回到CPU。

Creating the Query

//
// Create an occlusion predicate query
//
ID3D10Predicate           * g_pPredicate;
...
D3D10_QUERY_DESC qdesc;
qdesc.MiscFlags = D3D10_QUERY_MISCFLAG_PREDICATEHINT;
qdesc.Query = D3D10_QUERY_OCCLUSIONPREDICATE;
V_RETURN(pd3dDevice->CreatePredicate( &qdesc, &g_pPredicate ));

Issuing the Query

// Render the occluder mesh
g_pPredicate->Begin();
g_OccluderMesh.Render( pd3dDevice, g_pEffect10, g_pRenderOccluder );
g_pPredicate->End(NULL);

Drawing based upon the results

//
// Render the vertex heavy meshes
//
 
//enable predication
pd3dDevice->SetPredication( g_pPredicate, FALSE );
 
//predicate the following drawing commands if the results of the g_pPredicate query is false
g_ColumnMesh.Render( pd3dDevice, g_pEffect10, g_pRenderTextured );
for(int i=0; i<6; i++)
{
         D3DXMATRIX mMatRot;
         D3DXMATRIX mWVP;
         D3DXMatrixRotationY( &mMatRot, i*(D3DX_PI/3.0f) );
         mWVP = mMatRot*mWorldViewProj;
         g_pmWorldViewProj->AsMatrix()->SetMatrix( &mWVP );
         g_HeavyMesh.Render( pd3dDevice, g_pEffect10, g_pRenderTextured );
}
 
//disable predication
pd3dDevice->SetPredication( NULL, FALSE );

Limitations

Predicate hints are non-stalling and are therefore not guaranteed. Because the hardware may not have finished drawing the occluding mesh before the result of the predicate is needed, the vertex-laden meshes may draw even though the occluder mesh is completely hidden. In anticipation of this, application should be careful that the depth and stencil buffers contain the correct information regardless of whether the rendering occurs. In the case of occlusion culling, this integrity is maintained. To ensure that the results of the predicate hints are useful, the application should ensure that enough drawing takes place between the issue of the query and the use of the query to ensure synchronization.

    预测条件是没有延迟的但也是不可预测的。因为硬件在需要预测时可能还没有结束绘制遮挡的mesh,多顶点的mesh也可能在完全被挡住的时候绘制。为了预测这种情况,程序需要保证Depth和Stencil Buffer保存正确的信息,无论绘制是否开始。在occlusion culling中,这种完整性被保证了。为了保证预测条是可用的,程序应该在发出query和使用query来保证同步之间确定足够的绘制发生。(???应该指得是在绘制复杂模型前,Depth Buffer里应该保存足够的信息来使复杂模型可以被剔除)

Another limitation is only drawing and execution APIs honor predication; state-setting APIs do not. Commands that honor predication are as follows:

另外一个限制就是只有绘制和执行api可以使用预测。状态设置api则不可以。有效的和无效的api如下:

  • Draw
  • DrawIndexed
  • DrawInstanced
  • DrawIndexedInstanced
  • DrawOpaque
  • ClearRenderTarget
  • ClearDepthStencil
  • CopyRegions
  • CopyResource
  • UpdateSubresourceUP

Commands that do not honor predication include:

  • IASetTopology
  • IASetInputLayout
  • IASetVertexBuffers
  • Present
  • Flush
  • Commit/Lock
  • Decommit/Unlock
  • CreateResource
  • RetrieveSubresourceUP
  • RetrieveBufferFilledSizeUP

FixedFuncEMU Sample

This sample attempts to emulate certain aspects of the Direct3D 9 fixed function pipeline in a Direct3D 10 environment.

    这个Sample在DX10里模拟DX9的fixed function流水线的特性。

例图:

 

How the Sample Works

This sample attempts to emulate the following aspects of the Direct3D 9 fixed-function pipeline:

  • Fixed-function Transformation Pipeline 坐标变换
  • Fixed-function Lighting Pipeline 光照明
  • AlphaTest
  • User Clip Planes 裁减平面
  • Pixel Fog 象素雾
  • Gouraud and Flat shade modes GouraudFlat照明模型
  • Projected texture lookups (texldp) 投射纹理采样
  • Multi-Texturing 多纹理
  • D3DFILL_POINT fillmode 填充模式
  • Screen space UI rendering 界面绘制

Fixed-function Transformation Pipeline

The Direct3D 9 fixed-function transformation pipeline required 3 matrices to transform the raw points in 3d space into their 2d screen representations. These were the World, View, and Projection matrices. Instead of using SetTransform( D3DTS_WORLD, someMatrix ), we pass the matrices in as effect variables. The shader multiplies the input vertex positions by each of the World, View, and Projection matrices to get the same transformation that would have been produced by the fixed-function pipeline.

    DX9 fixed-function变换流水线需要3个矩阵把物体的原始坐标从3维空间转换到2维屏幕空间。它们分别是世界变换矩阵,视点变换矩阵和投影矩阵。我们把矩阵当作effect变量传入。Shader依次乘上3个矩阵完成fixed-function一样的变换。

//output our final position in clipspace
float4 worldPos = mul( float4( input.pos, 1 ), g_mWorld );
float4 cameraPos = mul( worldPos, g_mView ); //Save cameraPos for fog calculations
output.pos = mul( cameraPos, g_mProj );

Fixed-function Lighting Pipeline

ColorsOutput CalcLighting( float3 worldNormal, float3 worldPos, float3 cameraPos )
{
    ColorsOutput output = (ColorsOutput)0.0;
    
    for(int i=0; i<8; i++)
{
     //光线方向
        float3 toLight = g_lights[i].Position.xyz - worldPos;
         //离光源距离
        float lightDist = length( toLight );
         //atten=(1/(a[2]*d*d+a[1]*d+a[1])
        float fAtten = 1.0/dot( g_lights[i].Atten, float4(1,lightDist,lightDist*lightDist,0) );
        float3 lightDir = normalize( toLight );
         //H
        float3 halfAngle = normalize( normalize(-cameraPos) + lightDir );
        
         //Phong方程,逐顶点光照
        output.Diffuse += max(0,dot( lightDir, worldNormal ) * g_lights[i].Diffuse * fAtten) + g_lights[i].Ambient;
        output.Specular += max(0,pow( dot( halfAngle, worldNormal ), 64 ) * g_lights[i].Specular * fAtten );
    }
    
    return output;
}

AlphaTest

Alpha test is perhaps the simplest Direct3D 9 functionality to emulate. It does not require the user to set a alpha blend state. Instead the user can simply choose to discard a pixel based upon its alpha value. The following pixel shader does not draw the pixel if the alpha value is less than 0.5.

Alpha test可能是最简单的DX9功能模拟。这里不需要用户设置Alpha blend状态。用户能够基于Alpha值剔除某些象素。下面的PS当Alpha<0.5时,不去绘制这个Pixel。

//
// PS for rendering with alpha test
//
float4 PSAlphaTestmain(PSSceneIn input) : COLOR0
{        
         float4 color =  tex2D( g_samLinear, g_txDiffuse, input.tex ) * input.colorD;
         if( color.a < 0.5 )
                 discard;
         return color;
}

 

User Clip Planes

User Clip Planes are emulated by specifying a clip distance output from the Vertex Shader with the SV_ClipDistance[n] flag, where n is either 0 or 1. Each component can hold up to 4 clip distances in x, y, z, and w giving a total of 8 clip distances.

    用户裁减平面通过在VS里设定一个SV_ClipDistance[n]标记,定义一个裁减距离输出得到,n为0或1。这里一共能存放8个Clip Plane距离,分别使用数组两个元素的x,y,z,w通道。

In this scenario, each clip planes is defined by a plane equation of the form:

在这个场景里,每个clip plane被一个平面方程定义:

Ax + By + Cz + D =0;

Where <A,B,C> is the normal of the plane, and D is the distance of the plane from the origin. Plugging in any point <x,y,z> into this equation gives its distance from the plane. Therefore, all points <x,y,z> that satisfy the equation Ax + By + Cz + D = 0 are on the plane. All points that satisfy Ax + By + Cz + D < 0 are below the plane. All points that satisfy Ax + By + Cz + D > 0 are above the plane.

<A,B,C>是平面法向,D是平面到原点的距离。把任意点<x,y,z>代入方程能得到它到平面的距离。所有满足方程=0的点在平面上,<0的点在平面下而 >0的点在平面上。

In the Vertex Shader, each vertex is tested against each plane equation to produce a distance to the clip plane. Each of the three clip distances are stored in the first three components of the output component with the semantic SV_ClipDistance0. These clip distances get interpolated over the triangle during rasterization and clipped if the value every goes below 0.

    在VS中,每个顶点会带入平面方程做测试。每个三角形的Clip距离存在SV_ClipDistance0语义输出的前三个通道中。这个距离在光栅化中被线性插值,所有小于0的像素被剔除。

Pixel Fog

Pixel fog uses a fog factor to determine how much a pixel is obscured by fog. In order to accurately calculate the fog factor, we must have the distance from the eye to the pixel being rendered. In Direct3D 9, this was approximated by using the Z-coordinate of a point that has been transformed by both the World and View matrices. In the vertex shader, this distance is stored in the fogDist member of the PSSceneIn struct for all 3 vertices of a triangle. It is then interpolated across the triangle during rasterization and passed to the pixel shader.

    逐像素雾化使用雾化参数确定有个Pixel在雾中有多少模糊。为了精确计算雾化参数,我们必须有像素的距离。在DX9中,可以使用变换后的z值来计算。在VS中,这个值存储在fogDist中,然后在光栅化过程中被线性插值,被送到PS中。

The pixel shader takes this fogDist value and passes it to the CalcFogFactor function which calculates the fog factor based upon the current value of g_fogMode.

PS取得fogDist值传入CalcFogFactor函数,根据雾的类型计算模糊。

//
// Calculates fog factor based upon distance
//
// E is defined as the base of the natural logarithm (2.71828)
float CalcFogFactor( float d )
{
         float fogCoeff = 1.0;
         
         if( FOGMODE_LINEAR == g_fogMode )
         {
                 fogCoeff = (g_fogEnd - d)/(g_fogEnd - g_fogStart);
         }
         else if( FOGMODE_EXP == g_fogMode )
         {
                 fogCoeff = 1.0 / pow( E, d*g_fogDensity );
         }
         else if( FOGMODE_EXP2 == g_fogMode )
         {
                 fogCoeff = 1.0 / pow( E, d*d*g_fogDensity*g_fogDensity );
         }
         
         return clamp( fogCoeff, 0, 1 );
}

Finally, the pixel shader uses the fog factor to determine how much of the original color and how much of the fog color to output to the pixel.

最后,PS使用雾化参数确定雾颜色和纹理颜色的混合程度。

return fog * normalColor + (1.0 - fog)*g_fogColor;

Gouraud and Flat shade modes

Gouraud shading involves calculating the color at the vertex of each triangle and interpolating it over the face of the triangle. By default Direct3D 10 uses D3D10_INTERPOLATION_MODE D3D10_INTERPOLATION_LINEAR to interpolate values over the face of a triangle during rasterization. Because of this, we can emulate Gouraud shading by calculating the lighting using Lambertian lighting ( dot( Normal, LightDir ) ) at each vertex and letting Direct3D 10 interpolate these values for us. By the time the color gets to the pixel shader, we simply use it as our lighting value and pass it through. No further work is needed.

Gouraud照明包括计算三角形每个顶点的颜色,并把它在三角平面上线性插值。通过默认的D3D10_INTERPOLATION_MODE D3D10_INTERPOLATION_LINEAR在光栅化过程中进行线性插值。所以我们可以使用Lambertian照明来计算每个顶点颜色(光线方向和法向的点积),然后让DX10来线性插值。在PS里我们作为光照颜色输出。除了这些外,再没有其他的工作要做。

Direct3D 10 also provides another way to interpolate data across a triangle, D3D10_INTERPOLATION_CONST. A naive approach would be to use this to calculate the color at the first vertex and allow that color to be spread across the entire face during rasterization. However, there is a problem with using this approach. Consider the case where the same sphere mesh needs to be rendered in both Gouraud and Flat shaded modes. To give the illusion of a faceted mesh being smooth, the normals at the vertices are averages of the normals of the adjacent faces. In short, on a sphere, no vertex will have a normal that is exactly perpendicular to any face it is a part of. For Gouraud shading, this is intentional. This is what allows the sphere to look smooth even though it is comprised of a finite number of polygons. However, for flat shading, this will give ill results as shown by the diagram below.

DX10提供另外一种三角形线性插值方式:D3D10_INTERPOLATION_CONST。它的方法很简单,计算第一个顶点的颜色,并在光栅化过程中让它在三角形表面扩散(作为三角面片的常数颜色)。当然,这样使用有个问题。比如在用Gouraud和Flat模型绘制球面时,为了让球面更为平滑,每个顶点的法向都是和它邻接表面的平均值。说简单点,球面上没有哪个顶点的法向和它邻接的平面垂直。对于Gouraud模型,这是必须的。这也是为什么由有限个平面组成的球面看起来还是很平滑。对于Flat照明,它会造成比较差的效果,如下所示:

D3D10_INTERPOLATION_CONST takes the value of the color calculated at the first vertex and spreads it across the entire triangle giving us shading that looks as if the normal was bent compared to the orientation of the triangle. A better method using the geometry shader to calculate the normal is shown below.

D3D10_INTERPOLATION_CONST使用第一个顶点颜色作为三角面片颜色,使球面看起来朝着三角形的方向。更好的计算方法如下所示:

    The second method gives more accurate results. The following code snippet illustrates how the geometry shader constructs a normal from the input world positions (wPos) of the triangle vertices. The lighting value is then calculated from this normal and spread to all vertices of the triangle.

    第二种方法给出更精确的结果。下面的代码描述了GS怎么通过输入的三角形三个顶点的世界坐标来计算法向。光照值根据法向计算,扩散到所有顶点上。

//
// Calculate the face normal
//
float3 faceEdgeA = input[1].wPos - input[0].wPos;
float3 faceEdgeB = input[2].wPos - input[0].wPos;
 
//
// Cross product
//
float3 faceNormal = cross(faceEdgeA, faceEdgeB);

 

Projected Texture Lookups

Projected texturing simply divides the <x,y,z> coordinates of a 4d texture coordinate by the w coordinate before using it in a texture lookup operation. However, without the discussion of projected texturing, it becomes unclear why this functionality is useful.

投影纹理采样只是在采样纹理时,把4维纹理坐标<x,y,z>值除上的w值。当然,如果不讨论投影纹理采样的话,就不明白为什么它会有效。

Projecting a texture onto a surface can be easily illustrated by imagining how a projector works. The projector projects an image onto anything that happens to be in front of the projector. The same effect could be taken care of in the fixed-function pipeline by carefully setting up texture coordinate generation and texture stage states. In Direct3D 10, it is handled in shaders

    投影一张纹理到表面上通过想象一下投影机制就很容易明白。Projector投影一张图片到在它面前的任何东西上。相同的功能在fixed-function流水线里需要仔细设置纹理生成和纹理采样才能得到。在DX10里,通过Shader就可以。

To tackle projected texturing, we must think about the problem in reverse. Instead of projecting a texture onto geometry, we simply render the geometry, and for each point, determine where that point would be hit by the projector texture. To do this, we only need to know the position of the point in world space as well as the view and projection matrices of the projector.

    为了弄明白这个问题,我们需要从反面来考虑。除了把纹理投影到几何模型上外,我们还要绘制几何模型,对每个点,确定这个点对应纹理上的哪个位置。为了得到这些信息,我们需要知道每个点的世界坐标系位置,纹理采样的视点矩阵及投影矩阵。

By multiplying the world space coordinate by the view and projection matrices of the light, we now have a point that is in the space of the projection of the light. Unfortunately, because we are converting this to texture coordinates, we would really like this point in some sort of [0..1] range. This is where the w coordinate comes into play. After the projection into the projector space, the w coordinate can be thought of as how much this vertex was scaled from a [-1..1] range to get to its current position. To get back to this [-1..1] range, we simply divide by the w coordinate. However, the projected texture is in the [0..1] range, so we must bias the result by halving it and adding 0.5.

    通过把世界坐标乘以光源投影矩阵,我们现在得到点在光源坐标系的位置。不幸的是,因为我们需要把它转换到纹理坐标,我们需要把点位置换到[0,1]范围内。这就是w值的作用。把点投影到纹理空间后,w坐标可以认为是这个点从[-1,1]之间转换到投影空间位置的量度。为了回到[-1,1]之间,我们只需要把坐标除以w。当然,为了转换到[0,1]之间,我们需要把的到的结果除以2再加上0.5。

//calculate the projected texture coordinate for the current world position
float4 cookieCoord = mul( float4(input.wPos,1), g_mLightViewProj );
 
//since we don't have texldp, we must perform the w divide ourselves befor the texture lookup
cookieCoord.xy = 0.5 * cookieCoord.xy / cookieCoord.w + float2( 0.5, 0.5 );

Multi-Texturing

The texture stages from the fixed-function pipeline are officially gone. In there place is the ability to load textures arbitrarily in the pixel shader and to combine them in any way that the math operations of the language allow. The FixedFuncEMU sample emulates the D3DTOP_ADD texture blending operation. The first color is loaded from the diffuse texture at the texture coordinates defined in the mesh and multiplied by the input color.

    fixed-function流水线的纹理合并属性已经被官方取消了,取而代之的是在PS里载入纹理,使用数学操作合并它们。这个sample模拟了D3DTOP_ADD模拟方式。第一个颜色从漫反射纹理中导入,乘上输入颜色。

float4 normalColor = tex2D( g_samLinear, g_txDiffuse, input.tex ) * input.colorD + input.colorS;

The second color is loaded from the projected texture using the projected coordinates described above.

第二个颜色从投影纹理中导入。

cookieColor = tex2D( g_samLinear, g_txProjected, cookieCoord.xy );

D3DTOP_ADD is simply emulated by adding the projected cookie texture to the normal texture color

D3DTOP_ADD只是简单模拟了把两个纹理相加的过程。

normalColor += cookieColor;

For D3DTOP_MODULATE, the shader would simply multiply the colors together instead of adding them. The effects file can also be extended to handle traditional lightmapping pipeline by passing a second set of texture coordinates stored in the mesh all the way down the pixel shader. The shader would then lookup into a lightmap texture using the second set of texture coordinates instead of the projected coordinates.

对于D3DTOP_MODULATE,shader会把它们相乘。Effect文件能够通过传入mesh中保存的第二个纹理坐标在PS里载入纹理,进行传统的lightmapping操作。PS会使用第二个纹理坐标载入纹理。

Screen space UI rendering

Rendering objects in screen space in Direct3D 10 requires that the user scale and bias the input screen coordinates against the viewport size. For example, a screen space coordinate in the range of <[0..639],[0..479]> needs to be transformed into the range of <[-1..1],[-1..1]>. So that it can be transformed back by the viewport transform.

在屏幕上绘制需要需要用户按照视口大小缩放和偏移输入屏幕坐标。比如一个屏幕坐标在<[0..639],[0..479]>之间需要转换到<[-1..1],[-1..1]>之间。所以它们能够通过视口变换来得到。

The vertex shader code below performs this transformation. Note that the w coordinate of the position is explicitly set to 1. This ensures that the coordinates will remain the same when the w-divide occurs. Additionally, the Z position is passed into the shader in clip-space, meaning that it is in the [0..1] range, where 0 represents the near plane, and 1 represents the far plane.

下面的VS代码执行这个转换。注意新坐标的w被设为1,这保证坐标被w除时不变。同时,z坐标必须在裁减平面内,就是说必须在[0..1]之间,0代表近平面,1代表远平面。

//output our final position
output.pos.x = (input.pos.x / (g_viewportWidth/2.0)) -1;
output.pos.y = -(input.pos.y / (g_viewportHeight/2.0)) +1;
output.pos.z = input.pos.z;
output.pos.w = 1;

D3DFILL_POINT fillmode

The emulation of point fillmode requires the use of the geometry shader to turn one triangle of 3 vertices into 6 triangles of 12 vertices. For each vertex of the input triangle, the geometry shader emits 4 vertices that comprise a two-triangle strip at that position. The positions of these vertices are displaced such that the screen-space size in pixels is equal to the g_pointSize variable.

点绘制模拟需要使用GS把1个三角形的三个顶点转换维6个三角形的12个顶点。对于每个输入顶点,GS送出4个顶点组成这个位置的一个三角形带。这些顶点的位置被由点在屏幕坐标系中的像素大小取代。

This sample does not show how to emulate point sprite functionality, which is closely related to point rendering. For point sprite rendering please see the ParticlesGS sample.

这个sample没有模拟点精灵的绘制,但它和点绘制很相似。点精灵的绘制请看ParticlesGS sample。

GPUSpectrogram Sample

This sample demonstrates how to perform data processing on the GPU without creating a window or swapchain. In this case, it constructs a spectrogram from wav data using the GPU

这个sample描述了如何不创建窗口或back buffer交换链表而用GPU来做数据处理。在这个例子中,使用GPU从wave数据中产生声谱图。

例图:

General-Purpose Processing on the GPU

GPUSpectrogram demonstrates how to use the power of the GPU for non-graphics related tasks. In this case, the GPU creates a spectrogram from data in a wave file. The application does not create a window or swap chain and makes use of Render-to-Texture type operations for computation.

    这个sample描述了如何使用GPU做非图形学相关的工作。在这个例子里,GPU从wave文件中生成了生谱图。程序没有生成一个窗口或者backbuffer交换链表,只是使用了Render-to-Texture操作计算。

    A spectrogram represents sound data as a graph. The vertical axis represents the frequency components of the sound, while the horizontal axis represents time. The colors represent the intensity and phase shift of that particular frequency.

    声谱图使用图像来表示声音数据。纵轴代表声音频率,横轴代表时间。颜色代表这个频率声音的强度和相位。

In a wave file, the data is stored as a one dimensional array, where each value represents air pressure (or amplitude) and each index represents a distinct time step.

wave文件中,数据作为1维数据存储,每个值代表空气压力(或者振幅),每个索引代表时间。

In order to convert from the amplitude versus time domain to the frequency versus time domain, the sample employs the use of the Discrete Fourier Transforms (DFTs). Specifically, the sample uses a subset of Discrete Fourier Transforms known as Fast Fourier Transforms (FFTs). They are considered fast because their O(N*log(N)) complexity is much lower than the O(N*N) complexity of general DFTs.

为了把振幅时间关系图转换为频率时间关系图,sample使用了傅立叶变换(DFT)中的快速傅立叶变换(FFT)。FFT的时间复杂度为O(N*log(N)),远比DFT的O(N*N)有效。

FFTs only convert amplitude versus time data into frequency data. In order to get frequency data over time, we must split the incoming audio data into windows. Each window consists of a subsequent set of N audio samples, where N is the window size. For example, if the window size is 16 samples, then the first window consists of samples 0 to 15. The second consists of samples 16-31, and so on.

    FFT只是把振幅时间关系转换为频率关系。为了得到频率时间关系,我们必须把输入数据分为多个窗口。每个窗口由一组n声音采样,n是窗口大小。比如说,窗口大小为16个采样,第一个窗口有采样点0-15,第二个窗口有采样点16-31,依此类推。

Because each window represents the amplitudes encountered during a specific time interval, the FFT of any particular window will represent the frequencies encountered during that specific time interval. Putting these FFTs next to each other on a graph gives a plot of frequency components versus time.

    因为窗口代表了在一段时间内的振幅变换,因此每个窗口的FFT变换代表了频率在一段时间内的变化。把FFT依次画出,就是频率和时间的关系图。

How the Sample Works

Loading Audio Data

The CAudioData class handles loading the sound data from the wave file. Internally it utilizes the CWaveFile sample class to load the actual data from the wave file. The CAudioData class splits the wave data into channels and stores the data as floating point numbers regardless of the internal format of the wave file.

CaudioData类从wave文件中导入声音数据。在类里面,它操纵CwaveFile类载入wave文件的实际数据。CaudioData类把wave数据分为多个通道,并把数据存为浮点数,而不靠率wave文件的内部格式。

Getting Audio Data onto the GPU

The audio data loaded from the CAudioData class is placed into a texture. Because the FFT works on complex numbers, the texture contains two color channels, red and green. Red represents the real components corresponding to the amplitudes in the wave file. Green represents the imaginary components and is set to 0 when the texture is populated with wave data. For GPUSpectrogram, two such textures are used. One is used as the render target while the other is bound as a texture input. When the operation is complete, the two are switched.

CaudioData中导入的声音数据被放入纹理中。因为FFT对复数进行计算,因此纹理有两个颜色通道,红色和绿色。红色代表实数域,对应wave文件中的振幅。绿色代表虚数域,当纹理从wave数据生成是它被设为0。GPUSpectrogram使用了2个这样的纹理,一个作为Render Taget,而另一个作为纹理输入。当一次操作结束,两者被切换。

The texture is exactly N elements wide, where N is the size of the window. Each row of the texture contains exactly N audio samples. For a wave file that contains M audio samples, the texture will be N by ceil(M/N) texels in size. This allows the GPU to perform an FFT on each row of the texture in parallel.

    纹理宽度刚好是N,N为窗口大小。每行纹理代表N个声音采样。对于一个有M个声音采样的纹理文件,纹理被分为M/N行。这使得GPU可以并行操作这些纹理。

GPU FFT Algorithm

The bulk of the GPUSpectrogram calculations are performed using Render-to-Texture type operations. In these operations, the viewport is set to the exact size of the texture. The texture with the most recent data is bound as the input texture, while the other texture is bound as the render target. A quad that has texture coordinates that correspond exactly with the input texture is draw to exactly cover the render target. When the operation completes, the render target texture contains the most recent data.

    Sample的许多计算使用Render-to-Texture操作。在这些操作中,视口被设为纹理大小。含有上次计算结果数据的纹理被绑定到输入纹理上,输出纹理被绑到RenderTarget上。一个有纹理坐标的四边形,对应输入纹理,被覆盖到视口上。当Render-to-Texture操作结束时,Render Target纹理包含结果数据。

GPUSpectrogram leverages the Danielson-Lanczos FFT algorithm. This document will not go into the details of the algorithm, only the GPU implementation of it. The first part of the algorithm sorts the audio data according to the reverse bit order of the column index. This is handled in the PSReverse pixel shader. This shader uses the subroutine ReverseBits to find the location of the data that will fill the current position after the sort.

    Sample实现了Danielson-Lanczos FFT算法。这个文件不会详述算法细节,只是给出GPU实现。算法第一步对声音数据按其所在的列号相反的顺序排序,使用PSReverse PS实现。这个Shader使用ReverseBits函数找到在排序后能够填充当前位置的数据。(因为光栅化后的像素是无法改变位置的,因此需要在每个PS里找到适合它的像素)。

uint ReverseBits( uint x )
{
         //uses the SWAR algorithm for bit reversal of 16bits
         x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
         x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
         x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));     //8bits
 
         //uncomment for 9 bit reversal
         //x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));   //16bits
         //push us back down into 9 bits
         //return (x >> (16-9) & 0x000001ff;
 
         return x;
}
 
//
// PSReverse
//
float4 PSReverse( PSQuadIn input ) : COLOR
{        
         uint iCurrentIndex = input.tex.x;
         uint iRevIndex = ReverseBits( iCurrentIndex );
         
         float2 fTex = float2( (float)iRevIndex, (float)input.tex.y );
         fTex /= g_TextureSize;
         return tex2D( g_samPointClamp, g_txSource, fTex ).xyyy;
}

After the reverse sort, the actual FFT takes place. The outer two loops of the Daneilson-Lanczos algorithm are performed on the CPU and consist of setting various shader constants. The inner-loop is performed on the GPU. The final result is that the most recent render target contains the results of the FFT on the data.

在翻转排序后,开始FFT计算。Daneilson-Lanczos algorithm外部的两个循环由CPU执行,给shader设置常数,内部循环GPU执行。最后的结果由Render Target保存。

//outer two loops
         UINT iterations = 0;
         float wtemp,wr,wpr,wpi,wi,theta;
         UINT n = g_uiTexX;
         UINT mmax = 1;
         while( n > mmax )
         {
                 UINT istep = mmax << 1;
                 theta = 6.28318530717959f / ((float)mmax*2.0f);
                 wtemp = sin( 0.5f*theta );
                 wpr = -2.0f*wtemp*wtemp;
                 wpi = sin( theta );
                 wr = 1.0f;
                 wi = 0.0f;
 
                 for( UINT m=0; m < mmax; m++ )
                 {
                          //Inner loop is handled on the GPU
                          {
                                   g_pWR->AsScalar()->SetFloat(wr);
                                   g_pWI->AsScalar()->SetFloat(wi);
                                   g_pMMAX->AsScalar()->SetInt(mmax);
                                   g_pM->AsScalar()->SetInt(m);
                                   g_pISTEP->AsScalar()->SetInt(istep);
 
                                   if( 0 == iterations%2 )
                                   {
                                            g_ptxSource->AsShaderResource()->SetResource( g_pDestTexRV );
                                            RenderToTexture( pd3dDevice, g_pSourceRTV, false, g_pFFTInner );
                                   }
                                   else
                                   {
                                            g_ptxSource->AsShaderResource()->SetResource( g_pSourceTexRV );
                                            RenderToTexture( pd3dDevice, g_pDestRTV, false, g_pFFTInner );
                                   }
                                   pd3dDevice->OMSetRenderTargets( 1, apOldRTVs, pOldDS );
 
                                   iterations++;
                          }
 
                          wtemp = wr;
                          wr = wtemp*wpr-wi*wpi+wr;
                          wi = wi*wpr+wtemp*wpi+wi;
                 }
                 mmax = istep;
         }

Complexity Considerations

A look at the implementation of the FFT shows that the inner loop gets executed N times, where N is the size of the window. Because the pixel shader is run for every pixel in the row, and a row is N samples wide, the time complexity could be said to be O(N*N). This is no better than the general DFT. However, taking a closer look at the PSFFTInner pixel shader reveals that most of the work is only done when special cases are met. For each execution of the shader the following happens, texture load, modulus operation, and two if statements. However, if any if the if statements are true, which they are 2*Log(N) times out of N, you have many more operations. Because of this you can state that this implementation is actually an O( M*N*N + 2*L*N*Log(N) ) operation, where M < L. This is still not O( N*log(N) ), but does perform fewer operations than the O(N*N) DFT.

初看起来,FFT的实现显示内部循环执行N次,N为窗口大小。因为PS对每个像素都执行操作,而每行有N个像素,所以时间复杂度为O(N*N),这并不比普通DFT好。但是,仔细观察可以发现,PSFFTInner PS显示这些工作的大部分只在某些特定的情况碰到。对于Shader的每次执行,只做以下操作:载入纹理,乘操作,两个if语句。当然,任何一个if语句是真的话,在N次操作基础上,你就要多执行2*Log(N)倍操作。因为如此,你可以说它是在M<L时是一个O( M*N*N + 2*L*N*Log(N) )操作,而不是O( N*log(N) ),但是比O( N*N)DFT执行更少的操作。

Instancing Sample

This sample demonstrates the use of the Instancing and Texture Arrays to reduce the number of draw calls required to render a complex scene. In addition, it uses AlphaToCoverage to avoid sorting semi-transparent primitives.

这个sample讲述了如何使用Instantce和纹理数组来减少绘制大规模场景时的绘制调用。同时,它还使用AlphaToCoverage来避免半透明物体排序。

How the Sample Works

Reducing the number of draw calls made in any given frame is one way to improve graphics performance for a 3D application. The need for multiple draw calls in a scene arises from the different states required by different parts of the scene. These states often include matrices and material properties. One way to combat these issues is to use Instancing and Texture Arrays. In this sample, instancing enables the application to draw the same object multiple times in multiple places without the need for the CPU to update the world matrix for each object. Texture arrays allow multiple textures to be loaded into same resource and to be indexed by an extra texture coordinate, thus eliminating the need to change texture resources when a new object is drawn.

在每帧减少绘制函数的调用是3D应用程序提高显卡性能的一种方式。现在,在同一场景中对同一物体使用不同的绘制状态绘制需求越来越高。这些状态包括不同的状态矩阵和材质属性。其中的一个解决方案就是使用Instantce和纹理矩阵。在这个sample里,instance是的程序在不同的地方绘制同一个物体多次,而不需要CPU来更新矩阵。纹理数组是的多个纹理被载入到同一个纹理资源,使用一个纹理坐标,这样减少了当新物体被绘制时纹理切换的需求。

The Instancing sample draws several trees, each with many leaves, and several blades of grass using 3 draw calls. To achieve this, the sample uses one tree mesh, one leaf mesh, and one blade mesh instanced many time throughout the scene and drawn with DrawIndexedInstanced. To achieve variation in the leaf and grass appearance, texture arrays are used to hold different textures for both the leaf and grass instances. AlphaToCoverage allows the sample to further unburden the CPU and draw the leaves and blades of grass in no particular order. The rest of the environment is drawn in 6 draw calls.

       这个sample使用三次draw调用绘制了几棵有许多叶子的树,几丛草。为了画出它们,sample使用一个树mesh,一个草mesh和一个叶片mesh,并将它们绘制多次。为了得到树叶和草的变化,这里使用纹理数组来保存不同树叶和草的纹理。AlphaToCoverage允许sample不需要用特定的顺序来绘制草。剩下的场景用6draw函数完成。

Instancing the Tree

In order to replicate a tree the sample needs two pieces of information. The first is the mesh information. In this case, the mesh is loaded from tree_super.x. The second piece of information is a buffer containing a list of matrices that describe the locations of all tree instances. The sample uses IASetVertexBuffers to bind the mesh information to vertex stream 0 and the matrices to stream 1. To get this information into the shader correctly, the following InputLayout is used:

       为了复制树,sample需要两种信息。首先是mesh的信息,在这个例子里,meshtree_super.x里面载入。第二种是树位置的矩阵信息,这个sample使用IASetVertexBuffers绑定mesh信息到stream0,矩阵到stream1。为了在Shader里正确获得这些信息,接下来输入层这样写:

const D3D10_INPUT_ELEMENT_DESC instlayout[] =
{
    { L"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D10_INPUT_PER_VERTEX_DATA, 0 },
    { L"NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 12, D3D10_INPUT_PER_VERTEX_DATA, 0 },
    { L"TEXTURE0", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 24, D3D10_INPUT_PER_VERTEX_DATA, 0 },
    { L"mTransform", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 0, D3D10_INPUT_PER_INSTANCE_DATA, 1 },
    { L"mTransform", 1, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 16, D3D10_INPUT_PER_INSTANCE_DATA, 1 },
    { L"mTransform", 2, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 32, D3D10_INPUT_PER_INSTANCE_DATA, 1 },
    { L"mTransform", 3, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 48, D3D10_INPUT_PER_INSTANCE_DATA, 1 },
};
 

The vertex shader will be called (number of vertices in mesh)*(number of instance matrices) times. Because the matrix is a shader input, the shader can position the vertex at the correct location according to which instance it happens to be processing.

 

       VS会被调用 顶点数*Instantce矩阵数。因为矩阵作为Shader输入,Shader能够把顶点放置到正确的位置。

 

Instancing the Leaves

Because one leaf is instanced over an entire tree and one tree is instanced several times throughout the sample, the leaves must be handled differently than the tree and grass meshes. The matrices for the trees are loaded into a constant buffer. The InputLayout is setup to make sure the shader sees the leaf mesh data m_iNumTreeInstances time before stepping to the next leaf matrix. The last element, fOcc, is a baked occlusion term used to shade the leaves.

因为一片叶子覆盖一棵树,并且一棵树被绘制多次,所以叶子必须和草及树有不同的处理方式。树的矩阵被载入了常数BufferInputLayout被设定使得Shader在跳转到下个树矩阵之前处理叶子mesh多次。最后一个元素,Focc,是烘焙遮挡项,用作绘制叶子。

 
const D3D10_INPUT_ELEMENT_DESC leaflayout[] =
{
    { L"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D10_INPUT_PER_VERTEX_DATA, 0 },
    { L"TEXTURE0", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D10_INPUT_PER_VERTEX_DATA, 0 },
    { L"mTransform", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 0, D3D10_INPUT_PER_INSTANCE_DATA, m_iNumTreeInstances },
    { L"mTransform", 1, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 16, D3D10_INPUT_PER_INSTANCE_DATA, m_iNumTreeInstances },
    { L"mTransform", 2, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 32, D3D10_INPUT_PER_INSTANCE_DATA, m_iNumTreeInstances },
    { L"mTransform", 3, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 48, D3D10_INPUT_PER_INSTANCE_DATA, m_iNumTreeInstances },
    { L"fOcc", 0, DXGI_FORMAT_R32_FLOAT, 1, 64, D3D10_INPUT_PER_INSTANCE_DATA, m_iNumTreeInstances },
};

The Input Assembler automatically generates an InstanceID which can be passed into the shader. The following snippet of shader code demonstrates how the leaves are positioned.

       IA自动生成InstanceID传入Shader。下面的Shader代码表明叶子是怎样被放置的。

int iTree = input.InstanceId%g_iNumTrees;
    float4 vInstancePos = mul( float4(input.pos, 1), input.mTransform  );
float4 InstancePosition = mul(vInstancePos, g_mTreeMatrices[iTree] );

If there were 3 trees in the scene, the leaves would be drawn in the following order: Tree1, leaf1; Tree2, leaf1; Tree3, leaf1; Tree1, leaf2; Tree2, leaf2; etc...

    如果有3棵树在场景里,叶子会按以下顺序绘制:树1,叶1,树2,叶1,树3,叶1,树1,叶2,树2,叶2……

Instancing the Grass

Grass rendering is handled differently than the Tree and Leaves. Instead of using the input assembler to instance the grass using a separate stream of matrices, the grass is dynamically generated in the geometry shader. The top of the island mesh is passed to the vertex shader, which passes this information directly to the GSGrassmain geometry shader. Depending on the grass density specified, the GSGrassmain calculates psuedo-random positions on the current triangle that correspond to grass positions. These positions are then passed to a helper function that creates a blade of grass at the point. An 1D texture of random floating point values is used to provide the psuedo-random numbers. It is indexed by vertex ids of the input mesh. This ensures that the random distribution doesn't change from frame to frame.

草的绘制也和树及叶子不同。这里使用GS动态生成草,而不是使用另一个矩阵流来绘制草。岛的mesh被送入VS,它的信息被送入GS GSGrassmain。GSGrassmain依赖草的密集程度参数计算代表草的三角面片的伪随机位置。这些位置被送入一个辅助函数来生成叶片。一个1维纹理被用作提供伪随机数。这保证草的变化不会每帧都不同。

Alternatively, the grass can be rendered in the same way as the tree by placing the vertices of a quad into the first vertex stream and the matrices for each blade of grass in the second vertex stream. The fOcc element of the second stream can be used to place precalculated shadows on the blades of grass (just as it is used to precalculate shadows on the leaves). However, the storage space for a stream of several hundred thousand matrices is a concern even on modern graphics hardware. The grass generation method of using the geometry shader, while lacking built-in shadows, uses far less storage.

    当然草叶可以和树一样绘制。第二个流的focc元素能够放置预计算好叶片上的阴影(仅当预计算阴影时使用)。当然,一个stream存储大量的矩阵对于现代显卡来说还是个问题。使用GS生成草,会缺少阴影,但是减少了存储量。

Changing Leaves with Texture Arrays

Texture arrays are just what the name implies. They are arrays of textures, each with full mip-chains. For a texture2D array, the array is indexed by the z coordinate. Because the InstanceID is passed into the shader, the sample uses InstanceID%numArrayIndices to determine which texture in the array to use for rendering that specific leaf or blade of grass.

    纹理数组就像它的名称一样,它们是一组纹理,每个有完全的minmap。对于一个2维纹理数组,数组用z坐标排序。因为InstanceID被送入Shader,sample 使用InstanceID%numArrayIndices确定哪个数组元素被使用。

Drawing Transparent Objects with Alpha To Coverage

The number of transparent leave and blades of grass in the sample makes sorting these objects on the CPU expensive. Alpha to coverage helps solve this problem by allowing the Instancing sample to produce convincing results without the need to sort leaves and grass back to front. Alpha to coverage must be used with multisample anti-aliasing (MSAA). MSAA is a method to get edge anti-aliasing by evaluating triangle coverage at a higher-frequency on a higher resolution z-buffer. With alpha to coverage, the MSAA mechanism can be tricked into creating psuedo order independant transparency. Alpha to coverage generates a MSAA coverage mask for a pixel based upon the pixel shader output alpha. That result gets AND'ed with the coverage mask for the triangle. This process is similar to screen-door transparency, but at the MSAA level.

大量的草和叶片使得CPU对它们半透明排序很耗时。Alpha to coverage允许Instance生成一个简便的结果,而不用对物体排序绘制。Alpha to coverage必须和多采样反走样MSAA一起使用。MSAA是通过高采样频率和高分辨率zBuffer对三角面片边缘进行反走样。使用Alpha to coverage,MSAA机制能够生成伪半透明的效果。Alpha to coverage在PS里对像素的Alpha通道生成一个MSAA mask。这个通道和三角形的颜色进行Alpha操作。这个操作和screen-door半透明类似,但是只是在MSAA层次上。

Alpha to coverage is not designed for true order independent transparency like windows, but works great for cases where alpha is being used to represent coverage, like in a mipmapped leaf texture

    Alpha to coverage不是为真实的有序透明设计的,但是在alpha被用作代表覆盖时很有效,比如mipmap叶子纹理。

ParticlesGS Sample

This sample implements a complete particle system on the GPU using the Direct3D 10 Geometry Shader, Stream Output, and DrawAuto.

    这个sample使用DX10 GS,流输出和DrawAuto函数实现了一个完整的GPU例子系统。

 

例图:

How the Sample Works

Particle system computation has traditionally been performed on the CPU with the GPU rendering the particles as point sprites for visualization. With Geometry Shaders, the ability to output arbitrary amounts of data to a stream allows the GPU to create new geometry and to store the result of computations on existing geometry.

粒子系统计算传统方式是在CPU上进行,使用GPU点绘制来实现。但现在可以使用GS向stream out输出一定数量的数据来生成新的几何,并且把计算出来的几何数据保存下来。

This sample uses the stream out capabilities of the geometry shader to store the results of particles calculations into a buffer. Additionally, the Geometry Shader controls particle birth and death by either outputting new geometry to the stream or by avoiding writing existing geometry to the stream. A Geometry Shader that streams output to a buffer must be constructed differently from a normal Geometry Shader.

    这个sample使用GS的stream out能力把粒子系统的计算结果存储到显存中,同时GS通过向stream输出几何或者不输出几何来控制粒子的产生和消失。一个向stream输出数据到显存的GS和一般的GS构建方式是不一样的。

When used inside an FX file

//--------------------------------------------------------------------------------------
// Construct StreamOut Geometry Shader
//--------------------------------------------------------------------------------------
geometryshader gsStreamOut = ConstructGSWithSO(compile gs_4_0 GSAdvanceParticlesMain(),
                                                "POSITION.xyz;
                                                NORMAL.xyz;
                                                TIMER.x;
                                                TYPE.x" );

When used without FX

//--------------------------------------------------------------------------------------
// Construct StreamOut Geometry Shader
//--------------------------------------------------------------------------------------
D3D10_STREAM_OUTPUT_DECLARATION_ENTRY pDecl[] =
{
         // semantic name, semantic index, start component, component count, output slot
         { L"POSITION", 0, 0, 3, 0 }, // output first 3 components of "POSITION"
         { L"NORMAL", 0, 0, 3, 0 }, // output the first 3 components of "NORMAL"
         { L"TIMER", 0, 0, 1, 0 }, //  output the first component of "TIMER"
         { L"TYPE", 0, 0, 1, 0 }, //  output the first component of "TYPE"
};
 
CreateGeometryShaderWithStreamOut( pShaderData, pDecl, 4, sizeof(PARTICLE_VERTEX), &pGS );

Particle Types

This particle system is composed of 5 different particle types with varying properties. Each particle type has its own velocity and behavior and may or may not emit other particles.

    粒子系统由5种不同的有着变化属性的粒子组成。每个粒子有它自己的速度和行为,可以生成也可以不生成其他粒子。

 

Launcher Particles

Launcher particles do not move and do not die. They simply count down until they can emit a Shell particle. Once they have emitted a Shell particle, they reset their timer.

    发射粒子不会移动,也不会小时。它们只是能够发射Shell粒子。一旦它们反射出一个Shell粒子,它们会将定时器重设清零。

Shell Particles

Shell particles are single particles that are given random velocities and launched into the air by Launcher particles. They are meant to represent fireworks shells before they explode. When a Shell particle reaches the end of it's lifespan it does not re-emit itself into the system. Instead, it emits several Ember1 and Ember2 type particles.

    Shell粒子是单个粒子,有着随机速度,通过Laucher粒子发射到空气中。它们代表烟火中爆炸前的壳子。当一个Shell粒子到达终点时,它不会把自己再发射一次,只是发射出几个Ember1和Ember2类型粒子。

Ember1 Particles

Ember1 particles are emitted from Shell particles when they explode. Ember1 particles have short lifespans and fade out as their timer counts down. When their timer reaches zero, these particles do not re-emit themselves into the system and effectively "die."

    Ember1粒子从当Shell粒子爆炸时,从Shell粒子中发射出来。Ember1粒子只有很短的生命期,当结束时会变淡。当时间变为0时,它们不在从粒子系统中再次发射。

Ember2 Particles

Ember2 particles are also emitted from Shell particles when they explode. Unlike Ember1 particles, when Ember2 particles reach the end of their lifespans they emit Ember3 particles. These are the source of the secondary explosions in the system.

    Ember2粒子也在Shell粒子被爆炸的时候发射出来。和Ember1粒子不一样的是,当Ember2粒子到达生命终点时,它们发射出Ember3粒子。它们时粒子系统中第二次爆炸的粒子源。

Ember3 Particles

Ember3 particles are similar to Ember1 particles except that they are of a different color and have a shorter lifespan.

    Ember3粒子和Ember1粒子相似,只是有不同的颜色,并且生命周期更短。

Handling Particles

    Particles are handled entirely on the GPU by the Geometry Shader. Instead of going to the rasterizer, the vertex data passed into the geometry shader is output to another vertex buffer. After an initial seeding of the vertex buffer with LAUNCHER type particles, the system can sustain itself on the GPU with only per-frame timing information coming from the CPU.

    粒子系统完全由GPU的GS来操控。GS里,输出的顶点数据并不是通向光栅化,而是输出到另一个顶点Buffer。通过一个初始化在Vertex Buffer设置Laucher粒子,粒子系统能够根据CPU传过来的时间信息自己在GPU上运行。

The sample uses 3 buffers consisting of vertex data to facilitate a fireworks particle system. The first stream contains the initial particles needed to "seed" the system. It is used only once during the first frame that the particle system is active. The second and third buffers are ping-pong buffers and trade off being streamed to and rendered from every other frame.

    这个Sample使用3个Buffer来保存模拟烟火的粒子系统。第一个buffer存放初始化好的种子粒子。它只是在第一帧里粒子系统被激活时使用一次。第二个第三个buffer是在每帧实际绘制的时候交换使用(一个用作接受GS的stream输出,一个作为VS的stream输入)。

The particle system works in the following manner:

1.     The seed buffer is filled with an initial launcher particle

种子buffer填满初始化laucher粒子。

2.     The first time through the GS, the GS sees that the LAUNCHER is at 0 and emits a SHELL at the launcher position. NOTE: Because the particle system is rebuilt every pass through the GS, any particles that are necessary for the next frame need to be emitted, not just new particles.

第一次执行GS,GS看见Laucher粒子在0处,就在Laucher位置发射一个Shell粒子。注意:因为粒子系统在每次执行GS时会被重建,任何下一帧所需的粒子都要被发送到stream中,而不仅仅时新创建的粒子。

3.     The second time through the GS, the LAUNCHER and SHELL timers are decremented and the SHELL is moved to a new position.

第二此执行GS,Laucher和Shell时间递减,Shell被移到新位置。

4.     The SHELL timer is decremented to 0, which means that this is the last frame for this SHELL.

Shell的时间减为0,这意味着这是这个Shell的最后一帧。

5.     Because its timer is at 0, the SHELL is not emitted again. In its place, 4 EMBER particles are emitted.

因为时间为0,Shell不再被送出绘制,同时4个Ember粒子被送出。

6.     The LAUNCHER is at zero, and therefore must emit another SHELL particle. The EMBER particles are moved to a new position and have their timers decremented as well. The LAUNCHER timer is reset

Laucher在0点,它会发射出另一个Shell粒子。Ember粒子被移到新位置,它们的时间也递减。Laucher时间被重设。

Knowing How Many Particles Were Output

Geometry Shaders can emit a variable amount of data each frame. Because of this, the sample has no way of knowing how many particles are in the buffer at any given time. Using standard Draw calls, the sample would have to guess at the number of particles to tell the GPU to draw. Fortunately, DrawAuto is designed to handle this situation. DrawAuto allows the dynamic amount of data written to streamout buffer to be used as the input amount of data for the draw call. Because this happens on the GPU, the CPU can advance and draw the particle system with no knowledge of how many particles actually comprise the system.

    GS能够在同一帧送出大量的粒子。因此,sample不知道在某个时刻的粒子数目。Sample要使用标准Draw调用,就需要猜测GPU绘制的粒子数。幸运的是,DrawAuto被设计出来解决这个问题。DrawAuto允许动态数量的数据写入streambuffer,同时作为draw的输入数据。因为这是在GPU上发生,CPU能够绘制粒子系统,而不必知道要绘制多少个粒子。

Rendering Particles

After the particles are advanced by the gsStreamOut Geometry Shader, the buffer that just received the output is used in a second pass for rendering the particles as point sprites. The VSSceneMain Vertex Shader takes care of assigning size and color to the particles based upon their type and age. GSSceneMain constructs point sprites from the points by emitting a 2 triangle strip for every point that is passed in. In this pass, the Geometry Shader output is passed to the rasterizer and does not stream out to any buffer.

    在GS gsStreamOut生成粒子之后,接受粒子的Buffer在第二个pass中作为点精灵输入绘制。VS VSSceneMain根据粒子不同的类型和生命期设置粒子颜色和大小。GS GSSceneMain通过输入的点数据构建点精灵,它对每个点发出两个三角形。这个pass中GS输出被送到光栅化而不是输出到buffer中。

PipesGS Sample

This sample implements two vine generation algorithms on the GPU using the StreamOut capabilities of the Geometry Shader and DrawAuto。

    这个sample使用GPU的GS和DrawAuto及Stream Out,实现了两种藤状物生成算法。

例图:

How the Sample Works

The vines for VinesGS are generated using two algorithms. The first algorithm starts the vines at the face centers of random mesh faces and grows the vines away from the normal of the face. The second starts the vines at the face centers of random mesh faces, but then grows the vines along the tangent of the face in the direction of one of the nearest face centers. The details of the algorithms can be found by looking through the geometry shader code for both types of vine generation.

    VineGS使用两种算法生成藤。第一种算法从随机取到的mesh表面中心开始生成,沿着表面的法向生长。另一种从随机取到的mesh表面中心开始,沿着表面切向朝最近的另一个表面中心生长。两种算法细节在GS代码种可以查看。

The focus of the sample is geometry amplification and deamplification using the Geometry Shader. As with ParticlesGS, the geometry is handled in two passes. The first pass grows the pipes, while the second renders them.

    Sample的焦点是使用GS进行几何的扩展和收缩。和ParticlesGS一样,几何被操作两个pass,一个pass生成管状物(藤),另一个pass绘制它们。

Growing Pipes

For more details on geometry amplification in the Geometry Shader, see ParticlesGS

几何扩展的细节在GS中,参见ParticlesGS。

For this sample, the pipes segments are tracked by a line list. Each point in the line list represents one pipe segment. The end of the pipe segment is the only area where a new pipe segment can grow from. This is labeled as a Grow point. All other points are either labeled as Start points (the start of a new vine) or Static points. Start points differentiate where one pipe stops and another pipe begins. Static points can only age and die when their timer is up. When the Geometry Shader encounters a Grow point, it coverts it to a Static point. It then outputs another Grow point a short distance away from the current point in a direction defined by the generation algorithm. For Static points, the shader keeps track of the time it's been alive as well as the total time that the vine it belongs to has been alive. When a Static point's timer runs out, it is not output to the stream-out buffer. This effectively kills the point.

在这个sample里,管状物使用线段表来表示。线段表中的每个点代表一个管段。管段的终点是唯一一个新管段可以生长的地方。它们被标记为生长点。其他点有的是开始点(新藤的开始)或者静态点。开始点区分一条管状物的开始和另一条的结束。静态点只能成长,并在时间耗尽时消失。当GS碰到一个生长点,它把点转换为静态点。接下来,GS在生成算法定义的方向(法向或切向)上的一小段距离外输出另外一个生长点。对于静态点,GS记录生长时间和一个生命周期的时间。当静态点超出生命周期时,它就不被输出到stream out缓存。这样的效果相当于取消了这个点。

 

Growing Leaves

PipesGS also grows leaves along the length of the pipe. Leaves are handled with a separate variable in the point structure calls leaves. For each point that is added the line list, a random number in the range of [0..1] is selected inside the shader. If the number is below the leaf generation rate as defined by the application, the Leaves member of that point structure is set to a random number. Otherwise, Leaves is 0 for this point. During the rendering pass, the Geometry Shader renders a leaf with a random texture at that point if Leaves is non-zero.

PipesGS同样在藤上也生成了叶子。叶子作为顶点数据结构中一个分开的变量,叫做leaves。对于加到顶点列表的每个点,Shader会选择一个[0..1]之间的随机数。如果这个数组低于程序中定义的叶子的生长概率,该点的叶子数被设为随机数;否则,叶子数就是0。在绘制过程中,GS会用随机纹理来绘制叶子数不为0的管段上的叶片。

Drawing the Pipes and Leaves

This particle system is composed of 5 different particle types with varying properties. Each particle type has it's own velocity and behavior and may or may not emit other particles.(?????)

这段应该说对于每个线段,生成一个包围它的圆柱。可能是编辑错误。

Additionally points at each end of the cylinder are scaled according to the timing information of the closest point. Segments will go through the following life cycle. Young segments will grow until they reach their maximum radius. They will continue to age until they reach a time where they start to whither. During withering, the radius will be scaled down to zero. Once the radius is scaled to zero, the segment may die.

 

       在每个圆柱尾部的点根据最近点的时间来缩放。管段会按这样的生命周期走:新的管段会长粗直到到达它们最大半径。它们在生长过程中持续变老。在终点管段,半径会减为0。一旦半径为0,这个管段就消失了。

For each pair of points, the shader checks to see whether the first point's Leaves field is non-zero. If it is, a leaf is draw at this point. The leaf is made from a two-triangle strip that represents a quad oriented with the pipe at that point. The Z texture coordinate for the leaf quad is selected from the range of [1..5] based upon the value of Leaves. In the pixel shader, any leaves will fetch a texel from the indices [1..5] of the texture array, which have been preloaded as leaf textures. All pipe sections will use index 0 of the texture array, which is a bark texture.

对于每对点,GS会检查它的叶片数是不是非0。如果不是0,就会在这个位置画一张叶子。叶子由两个三角形构成四边形,从点位置出发。叶子四边形的Z纹理坐标根据叶子的值取得,范围在[1..5],在PS里,每片叶子会从叶子纹理数组[1..5]中取得预先载入的对应纹理。所有的管段使用0号纹理,代表树皮纹理。

Knowing How Many Pipe Segements are in the Buffer

Geometry Shaders can emit a variable amount of data each frame. Because of this, the sample has no way of knowing how many pipe segments are in the buffer at any given time. Using standard Draw calls, the sample would have to guess at the number of pipe segments to tell the GPU to draw. Fortunately, DrawAuto is designed to handle this situation. DrawAuto allows the dynamic amount of data written to streamout buffer to be used as the input amount of data for the draw call. Because this happens on the GPU, the CPU can advance and draw the pipes with no knowledge of how much geometry is actually in any of the buffers.

DrawAuto可以在不知道几何个数的情况下绘制几何数据。

 

ShadowVolume Sample

How the Sample Works

A shadow volume of an object is the region in the scene that is covered by the shadow of the object caused by a particular light source. When rendering the scene, all geometry that lies within the shadow volume should not be lit by the particular light source. A closed shadow volume consists of three parts: a front cap, a back cap, and the side. The front and back caps can be created from the shadow-casting geometry: the front cap from light-facing faces and the back cap from faces facing away from light. In addition, the back cap is formed by translating the front facing faces a large distance away from the light, to make the shadow volume long enough to cover enough geometry in the scene. The side is usually created by first determining the silhouette edges of the shadow-casting geometry then generating faces that represent the silhouette edges extruded for a large distance away from the light direction. Figure 1 shows different parts of a shadow volume.

Shadow Volume是场景中一个物体被一个点光源照射时形成的阴影区域。在绘制时,处于Shadow Volume中的物体不能被光源照射。一个封闭的Shadow Volume包括三个部分,封顶,封底和边界面。封顶和封底由阴影生成GS创建:封顶就是被照到的表面,而封底就是远离光源的对应表面。封底通过把封顶沿光线方向移动一大段距离得到,使得Shadow Volume能够覆盖场景中的物体。边界面的确定首先需要确定遮挡物的阴影边界,然后将它沿光线方向拉伸一个很大的距离。图1显示Shadow volume的不同部分。

Figure 1: Creation of a shadow volume. The front cap (blue) and back cap (red) are created from the occluder's geometry. The back cap is translated to prolong the shadow volume, and the side faces (purple) are generated to enclose the shadow volume.

1:生成Shadow Volume。封顶(蓝色)和封底(红色)从遮挡物的几何中生成。边界面(紫色)来封闭Shadow Volume。

This sample demonstrates a specific implementation of shadow volumes. Many traditional shadow volume approaches determine the silhouette and generate the shadow volume geometry on the CPU. This sample determines the silhouette in the geometry shader and uses the fact that the geometry shader can send a variable amount of data to the rasterizer to create new shadow volume geometry on the fly. The underlying idea is that for triangles that face the light, we can use them as-is for the front cap of the shadow volume. The back cap is generated from the front facing triangles translated a large distance along the light direction at each vertex, then they can be used as the back cap. However, a problem occurs at silhouette edges where one triangle faces the light and its neighbor faces away from the light. In this situation, the geometry shader extrudes two new triangles to create a quad to match up between the front cap of the shadow volume and the back cap.

这个sample描述了Shadow Volume的特殊实现方式。许多传统的Shadow Volume算法需要确定阴影边界,然后用CPU生成Shadow Volume几何。这个sampleGS里确定阴影边界,并且使用了GS能够生成可变图元数据的功能生成Shadow Volume。对于朝向光源的三角形,我们把它们作为Shadow Volume的封顶。封底通过将封顶每个顶点沿光源方向移动很大的距离来得到。当然,在阴影边界,一个三角形朝向光源,而另一个邻接三角形背对光源时,会有问题出现。在这种情况下,GS抽取两个新三角形生成一个四边形来匹配封顶和封底之间的Shadow Volume边界。

In order for the geometry shader to find silhouette edges it must know which faces are adjacent to each other. Fortunately, the geometry shader has support for a new type of input primitive, triangleadj. Triangleadj assumes that every other vertex is an adjacent vertex. The index buffer of the geometry must be modified to reflect the adjacency information of the mesh. The CDXUTMesh10::ConvertToAdjacencyIndices handles this by creating an index buffer in which every other value is the index of the adjacent vertex of the triangle that shares an edge with the current triangle. The index buffer will double in size due to the extra information being stored. The figure below demonstrates the ordering of the adjacency information.

 

为了让GS找到阴影边界,它需要知道哪些边相互邻接。幸运的是,GS支持了一种新的输入图元,邻接三角形。邻接三角形假设三角形每个顶点都有邻接的顶点。为了表现邻接情况,几何的Index Buffer必须做改动。CDXUTMesh10::ConvertToAdjacencyIndices做了这个操作,在这个Index Buffer中,每个其他的值代表和这个三角形共享边的其他三角形的Index。这个Index Buffer会使存储的大小翻倍。下面的图描述了邻接信息的顺序。

Rendering Shadows

At the top level, the rendering steps look like the following:

  • If ambient lighting is enabled, render the entire scene with ambient only. For each light in the scene, do these: Disable depth-buffer and frame-buffer writing. Prepare the stencil buffer render states for rendering the shadow volume. Render the shadow volume mesh with a vertex extruding shader. This sets up the stencil buffer according to whether or not the pixels are in the shadow volume. Prepare the stencil buffer render states for lighting. Prepare the additive blending mode. Render the scene for lighting with only the light being processed.

在顶层,绘制顺序如下:如果漫反射光照被打开,使用漫反射绘制整个场景。对场景中的每个光源,做如下操作:关闭Depth-BufferFrame-Buffer写操作,准备好绘制shadow Volumestencil buffer状态。使用顶点扩展VS绘制shaow volume mesh。当PixelShadow里时设置stencil Buffer。准备光照的stencil buffer。准备好blend状态。使用光照绘制整个场景。

The lights in the scene must be processed separately because different light positions require different shadow volumes, and thus different stencil bits get updated. Here is how the code processes each light in the scene in details. First, it renders the shadow volume mesh without writing to the depth buffer and frame buffer. These buffers need to be disabled because the purpose of rendering the shadow volume is merely setting the stencil bits for pixels covered by the shadow, and the shadow volume mesh itself should not be visible in the scene. The shadow mesh is rendered using the depth-fail stencil shadow technique and a vertex extruding shader. In the shader, the vertex's normal is examined. If the normal points toward the light, the vertex is left where it is. However, if the normal points away from the light, the vertex is extruded to infinity. This is done by making the vertex's world coordinates the same as the light-to-vertex vector with a W value of 0. The effect of this operation is that all faces facing away from the light get projected to infinity along the light direction. Since faces are connected by quads, when one face gets projected and its neighbor does not, the quad between them is no longer degenerate. It is stretched to become the side of the shadow volume. Figure 6 shows this.

场景中的每个光源绘制必须分开来,因为每个光源有不同的Shadow Volume,因此生成不同stencil buffer。这里说明了如果处理每个光源的细节:首先,不写depth bufferframe buffer,绘制Shadow Volume。因为Shadow volume在绘制中其实不可见,只需要对被挡住的像素点设置stencil值。在顶点扩展VS里使用zfail算法绘制。VS里,首先检查顶点法向,如果法向指向光源,顶点被保留,否则顶点被拉伸到无限远。拉伸无限远的做法是把顶点的世界坐标位置和光源到顶点向量的w值设为0。这样做的效果是,背对光源的面被拉伸到无限远。因为面之间是由四边形连接的,当一个面被拉伸是如果它的邻接面没有被拉伸,它们之间的四边形就不用退化,成为Shadow Volume的边。图6显示如下:

When rendering the shadow mesh with the depth-fail technique, the code first renders all back-facing triangles of the shadow mesh. If a pixel's depth value fails the depth comparison (usually this means the pixel's depth is greater than the value in the depth buffer), the stencil value is incremented for that pixel. Next, the code renders all front-facing triangles, and if a pixel's depth fails the depth comparison, the stencil value for the pixel is decremented. When the entire shadow volume mesh has been rendered in this fashion, the pixels in the scene that are covered by the shadow volume have a non-zero stencil value while all other pixels have a zero stencil. Lighting for the light being processed can then be done by rendering the entire scene and writing out pixels only if their stencil values are zero.

当使用zfaile绘制阴影时,首先绘制背面三角形。如果像素深度测试失败(意味着背面在frame buffer像素之后),stencil buffer递增。然后绘制前面,如果像素深度测试失败,stencil值递减。当整个Shadow Volume mesh绘制完时,被阴影遮挡的区域stencil值不为0。对于这些点,进行光照计算。

Figure 5 illustrates the depth-fail technique. The orange block represents the shadow receiver geometry. Regions A, B, C, D and E are five areas in the frame buffer where the shadow volume is rendered. The numbers indicate the stencil value changes as the front and back faces of the shadow volume are rendered. In region A and E, both the front and back faces of the shadow volume fail the depth test, and therefore both cause the stencil value to change. For region A, the orange shadow receiver is causing the depth test to fail, and for region E, the cube's geometry is failing the test. The net result is that stencil values in this region stay at 0 after all faces are rendered. In region B and D, the front faces pass the depth test while the back faces fail, so the stencil values are not changed with the front faces, and the net stencil changes are 1. In region C, both the front and back faces pass the depth test, and so neither causes the stencil values to change, and the stencil values stay at 0 in this region. When the shadow volume is completely rendered, only the stencil values in regions B and D are non-zero, which correctly indicates that regions B and D are the only shadowed areas for this particular light.

5描述了zfail技术。橘黄色区域代表接受阴影物体。ABCDE五个区域代表Shadow Volume的绘制区域。数字代表stencil值。AE正面背面都zfail,因此处于阴影之外。对于区域A,遮挡物导致z test fail。区域Eshadow Volume导致z test fail。这样的结果是这里的面被保留绘制光照。BD,正面通过z test而反面没有,所以它们在阴影中。区域C正面和背面都通过了z test,所以stencil值不变化。当Shadow volume绘完时,只有BD stencil值不为0,因此它们处于阴影中。

 

Performance Considerations

The current sample performs normal rendering and shadow rendering using the same mesh. Because the mesh class tracks only one index buffer at a time, adjacency information is sent to the shader even when it is not needed for shadow calculations. The shader must do extra work to remove this adjacency information in the geometry shader. To improve performance the application could keep two index buffers for each mesh. One would be the standard index buffer and would be used for non-shadow rendering. The second would contain adjacency information and only be used when extruding shadow volumes.

       这个sample执行了普通阴影绘制。因为mesh类只使用了一个Index buffer,因此邻接信息即使在不必要时也被送入绘制。Shader必须做额外的工作在GS里移除这些邻接信息。为了提高效率,Shader可以保存两个index buffer,一个是标准Index Buffer在不计算阴影时使用,另外一个在计算Shadow Volume时使用。

Finally, there is another area that could call for some performance optimization. As shown earlier, the rendering algorithm with shadow volumes requires that the scene be rendered in multiple passes (one plus the number of lights in the scene, to be precise). Every time the scene is rendered, the same vertices get sent to the device and processed by the vertex shaders. This can be avoided if the application employs deferred lighting with multiple rendertargets. With this technique, the application renders the scene once and outputs a color map, a normal map, and a position map. Then, in subsequent passes, it can retrieve the values in these maps in the pixel shader and apply lighting based on the color, normal and position data it reads. The benefit of doing this is tremendous. Each vertex in the scene only has to be processed once (during the first pass), and each pixel is processed exactly once in subsequent passes, thus ensuring that no overdraw happens in these passes.

    最后,在其他方面也可以进行优化。在前面提到,绘制算法需要对同一个场景绘制多次。(为了保证精确,对每个光源都要多绘制一次)。每次场景被绘制时,都处理相同的顶点数据。这可以使用多个Render Target避免。使用这项技术,首先程序把场景绘制为一张颜色纹理,一张法向纹理和一张位置纹理,然后可以使用PS得到这些纹理计算光照。这样做的好处时很明显的,每个顶点需要被处理一次(在第一个pass中),每个像素在后续过程中也只被处理一次,同时还保证了在这些pass之间没有重复绘制。

"Shadow Volume Artifacts

It is worth noting that a shadow volume is not a flawless shadow technique. Aside from the high fill-rate requirement and silhouette determination, the image rendered by the technique can sometimes contain artifacts near the silhouette edges, as shown in figure 9. The prime source of this artifact lies in the fact that when a geometry is rendered to cast shadow onto itself, its faces are usually entirely in shadow or entirely lit, depending on whether the face's normal points toward the light. Lighting computation, however, uses vertex normals instead of face normals. Therefore, for a face that is near-parallel to the light direction, it will either be all lit or all shadowed, when in truth, only part of it might really be in shadow. This is an inherent flaw of stencil shadow volume technique, and should be a consideration when implementing shadow support. The artifact can be reduced by increasing mesh details, at the cost of higher rendering time for the mesh. The closer to the face normal that the vertex normals get, the less apparent the artifact will be. If the application cannot reduce the artifact down to an acceptable level, it should also consider using other types of shadow technique, such as shadow mapping or pre-computed radiance transfer.

    Shadow Volume并不是一个没有缺陷的技术。除了高像素填充率和阴影边界检测外,在绘制过程中还可能出现错误。错误的主要原因是当一个几何模型在计算自阴影时,它的面通常被完全照亮或者完全变暗,这取决于这个面是否朝向光源。光照计算必须使用顶点法向而非表面法向。对于接近平行光源的面,它会被完全照亮或者完全变暗,而实际上应该部分处于光源中。这是从stencil shadow volume中继承来的问题,在计算阴影时必须被考虑。这个问题可以通过增加mesh密度来解决,这也增加了处理mesh的时间。顶点法向和面法向越接近,表面的问题就越少。如果程序不能把问题限制在可接受的范围内,就必须考虑使用其他的算法,比如PRT或者Shadow Map。

SparseMorphTargets Sample

This sample implements mesh animation using morph targets.

这个sample使用变形target实现了mesh动画。

Morph Targets

The SparseMorphTargets sample demonstrates facial animation by combining multiple morph targets on top of a base mesh to create different facial expressions.

这个sample描述如何使用多个morph targets对于一个人脸mesh实现不同表情的人脸动画。

Morph targets are different poses of the same mesh. Each of the minor variations below is a deformation of the base mesh located in the upper left. By combining different poses together at different intensities, the sample can effectively create many different facial poses.

    Morph targets和mesh不同。每个下面的每个小变化都是对一个mesh左上mesh的变形。通过对定点用不同位置和不同强度权值变化,这个sample能够生成许多不同的表情。

How the Sample Works

Preprocessing

In a preprocessing step, the vertex data for the base pose is stored into three 2D textures. These textures store position, normal, and tangent information respectively.

    在预处理过程中,在初始位置的顶点数据被放入二维纹理中。这些纹理存储位置,法向和切向信息。

This base texture is stored into the .mt file along with index buffer data and vertex buffer data that contains only texture coordinates. Then, for each additional morph target, the mesh is converted to three 2D textures. The base mesh textures are subtracted from the three morph target textures. The smallest subrect that can fully contain all of the differences between the two textures is then stored into the .mt file. By only storing the smallest texture subrect that can hold all of the differences between the two meshes, the sample cuts down on the storage requirements for the morph targets.

    原始纹理被存在mt文件中,包含Index Buffer和只含纹理坐标的Vertex Buffer。然后,对每个morph target,mesh被转换为2维纹理。原始mesh纹理被缩减维3个morph target纹理。最小的变化纹理,记录两张纹理之间的差别,被存在mt文件中。只存储最小纹理差别能够保存两个纹理之间的变化,这样做也也少了morph targets的存储量。

Applying Morph Targets at Runtime

Morph targets are handled in the MorphTarget.cpp and MorphTarget.h files. These contain classes that help with the application of the morph targets. To apply the different morph targets to the mesh, the sample sets up a texture2Darray render target with three array indices. The first holds position data, while the second and third hold normal and tangent information respectively.

    Morph targets在MorphTarget.cpp和MorphTarget.h文件中被操作。这些文件定义了一个类来操作morph target。为了对mesh的不同变形都能用,sample建立了有三个数组索引的2D纹理Render Target数组来保存它们。第一个保存位置,第二第三个保存法向和切向。

The first part of the process involves filling the render targets with the position, normal, and tangent information from the base pose. Then for any morph targets that need to be added, the following occurs:

  • The alpha blending mode is set to additive blending
  • The viewport is set to the exact size of the pos, norm, and tangent render targets
  • A quad covering only the pixels in the render target that will change for the given morph target is drawn
  • This quad is drawn with the morph target position, normal, and tangent texture bound as a textured2darray
  • A special Geometry Shader replicates the quad three times and sends one to each render target
  • The Pixel Shader sets the output alpha to the blend amount. The blend amount is the amount of influence this morph target should have in the final image.

首先在这些Render Target中填充由原始位置得到的位置,法向和切向信息。对于每个需要添加的Render Target,需要做如下操作:

打开alpha blend

视口设为和位置,法向,纹理render target一样大

使用GS对每个render target画一个覆盖视口的四边形

使用PS设置alpha通道确定blend数量。Blend 数量指得时在最后图像中morph变化有多大。

Effectively, the above uses the alpha blending hardware of the GPU to add up a series of sparse morph targets to a base mesh. The position, normal, and tangent render targets now contain the vertex data of the final deformed mesh. These are bound as inputs to the technique that renders the final morphed mesh onscreen.

上述过程把一些分散的变化添加到初始位置上,位置,方向和切向纹理保存了最后实际绘制的变形mesh。这些在最后绘制时用作纹理输入。

Rendering

The input vertex stream contains a vertex reference that tells the VSRefScenemain shader which texel in the position, normal, and tangent maps contains our vertex data. In the shader, the uiVertexRef is converted to a 2D texture coordinate. The position, normal, and tangent data is then loaded from textures and transformed as they would be as if they had been passed in via the vertex stream.

    输入顶点stream包含了顶点数据告诉VS VSRefScenemain刚才生成的三个target 纹理中哪个纹元被作为顶点输出。Shader里的uiVertexRef被转换维2D纹理坐标。位置,法向和切向被载入并被变换到适当的位置。

         //find out which texel holds our data
         uint iYCoord = input.uiVertexRef / g_DataTexSize;
         //workaround for modulus
         uint iXCoord = input.uiVertexRef - (input.uiVertexRef/g_DataTexSize)*g_DataTexSize;
         float4 dataTexcoord = float4( iXCoord, iYCoord, 0, 0 );
         dataTexcoord += float4(0.5,0.5,0,0);
         dataTexcoord.x /= (float)g_DataTexSize;
         dataTexcoord.y /= (float)g_DataTexSize;
         
         ...
 
         //find our position, normal, and tangent
         float3 pos = tex2Darraylod( g_samPointClamp, g_txVertData, dataTexcoord ).xyz;
         dataTexcoord.z = 1.0f;
         float3 norm = tex2Darraylod( g_samPointClamp, g_txVertData, dataTexcoord ).xyz;
         dataTexcoord.z = 2.0f;
         float3 tangent = tex2Darraylod( g_samPointClamp, g_txVertData, dataTexcoord ).xyz;
         
         //output our final positions in clipspace
         output.pos = mul( float4( pos, 1 ), g_mWorldViewProj );

Adding a Simple Wrinkle Model

To add a bit of realism the sample uses a simple wrinkle model to modulate the influence of the normal map in the final rendered image. When rendering the final morphed image, the Geometry Shader calculates the difference between the triangle areas of the base mesh and the triangle areas of the current morphed mesh. These differences are used to determine whether the triangle grew or shrank during the deformation. If it shrank, the influence of the normal map increases. If it grew, the influence of the normal map decreases.

为了增加真实感,这个sample增加了一些皱纹模型,乘上法向的影响银子。当绘制最后变形结果时,GS计算初始位置和变形位置的差别。这些差别被用来在变形的时候对三角形拉伸或者收缩。如果收缩的画,对Normal Map的影响就增加了。拉伸的话,对Normal map的影响就减少了。

Adding Illumination Using LDPRT

Local Deformable Precomputed Radiance Transfer is used to light the head. The PRT simulation is done on the base mesh and the 4th order LDPRT coefficients are saves to a .ldprt file. The sample loads this file into memory and merges it with the vertex buffer at load time. The bulk of the LDPRT lighting calculations are handled by GetLDPRTColor. The current implementation only uses 4th order coefficients. However, uncommenting the last half of GetLDPRTColor, re-running the PRT simulation to generate 6th order coefficients, and changing dwOrder to 6 in UpdateLightingEnvironment may give a better lighting approximation for less convex meshes.

        Local变形PRT被用作计算光照。PRT模拟在原始mesh上做,4阶LDPRT参数被存在a.ldprt文件中。Sample导入这个文件,并把它合并到纹理数据中。LDPRT的计算在GetLDPRTColor中进行。当前实现只用了4阶参数。当然,如果改为6阶效果会更好。更多的LDPRT信息参见DX9的LocalDeformablePRT Sample.

For a more detailed explanation of Local Deformable Precomputed Radiance Transfer, refer to the Direct3D 9 LocalDeformablePRT sample or the PRTDemo sample.

posted on 2006-01-18 09:28  王大牛的幸福生活  阅读(7692)  评论(4编辑  收藏  举报