ssao stands for Screen Space Ambien Occlusion, and it partially makes reality one of the deepest dreams of computer graphics programmers: ambient occlusion in realtime, mine included (see here or here). The term was first used by Crytek (a german game company) when they introduced it in a small paragraph of a paper called "Finding Next Gen ?CryEngine 2" (just google for that sentence).
ssao 就是屏幕空间环境遮挡(Screen Space Ambient Occlusion), 它初步实现了实时环境遮挡(Ambient Occlusion)。Crytek(德国一家游戏公司)在Finding Next Gen CryEngine2 这篇论文的一段中首次使用这个术语。
Since then many cg programming enthusiast tried to decipher how the technique works, and each one got different results with varying quality and performances. I did my own investigations and I arrived to a method that being not optimal, still gives cool results and is quite usable. Of course I went thru many revisions of the algorithm, but well, this is the technique I used for Kindernoiser.
很多cg编程爱好者希望破译这项技术的工作原理,很多人也得到了不同性能和质量的结果。我进行了调查,并且得到了一个实现的方法,虽然不是最优,但是仍然可以给出很酷的效果,并且十分具有可用性。当然我也遍看了很多不同版本的算法,我所采用方法来自于Kindernoiser.
Screen Space Ambient Occlusion term applied to a complex shape
(屏幕环境遮挡应用于复杂形体)
The trick:
Ambient occlusion, as other direct lighting techniques (and indirect too of course) is based on a non-local computations. This means it's not enough to know the surface properties of the point to be shaded, but one needs some description of the surrounding geomtry as well. Since this information is not accessible on modern rasterization hardware (that's why we will never see good realtime shadows in OpenglGL or Directx), the Crytek team (as many other guy somehow before them) came with the idea to use the zbuffer to partially recover such information. Zbuffer can be seen as a small repository of geometry information: from each pixel on the buffer one can recover the 3d position of the geometry (well, the closest to the camera surface) projected on that pixel.
环境遮挡,作为另一种直接光照技术(当然也是间接光照),它是基于非局部计算的。这就意味着,仅仅知道表面属性和要被着色的点,是不够的,我们还需要一些周围的几何信息描述。既然这些信息在现代光栅化硬件上无法访问(这也是为什么在OPENGL或者DIRECTX中我们看不到很好的实时阴影)。Crytek团队(以及在他们之前的人) 使用zbuffer 去部分地恢复一些信息:从buffer中一个像素,我们可以恢复出投影到那个像素上的原来几何体上的3d 坐标位置(距离相机观察平面最近的一个)。
Thus the idea is to use that information in a two (or more) pass algorith. First render the scene normally, or almost, and in a second full screen quad pass compute the ambient occlusion at each pixel and use it to modify the already computed lighting. For that, for each pixel for which we compute the AO we construct few 3d points around it and see if these points are occluded from camera's point of view. This is not ambient occlusion as in the usual definition, but it indeed gives some kind of concavity for the shaded point, what can be interpreted as an (ambient) occlusion factor.
因此这个方法需要使用两个或更多个pass,第一个pass,正常渲染场景,在第二个pass,渲染一个全屏的Quad,计算每像素得环境遮挡,并且用它来来修改已有的光照。因此,我们为每一个要为之计算AO的像素构造一些围绕它的3d 坐标点,并且从相机的观察点看这些点是否被遮挡。这不是通常所说的环境遮挡的定义,但是它确实给这个着色点带来了一些凹凸感,我们可以把它解释成(环境)遮挡系数。
To simplify computations on the second pass, the first pass outputs a linear eye space z distance (instead of the 1/z used on the regular zbuffers). This is done per vertex since z, being linear, can be safely interpolated on the surface of the poligons. By using multiple render targets one can output this buffer at the same time as the regular color buffer.
为了简化第二个pass的计算,第一个pass输出眼睛空间中的线性的z 距离(而不是标准z buffer 所使用的1/z).这样做是因为,每顶点的 z 既然是线性的,那么它就可以正确的在polygon 的表面上正确的插值。我们可以通过 multiple render targets ,同时在标准的color buffer输出这个(camera z distance )。
The second pass draws a screen space polyon covering the complete viewport and performs the ambient occlusion computation. For that it first recovers the eye space position of each pixel by unprojection: it reads the z value from the previously prepared texture, and given the eye space view vector (computed by interpolation from the vertex shader) it computes the eye space position. Say gl_TexCoord[0] contains the eye view vector (not necessarily normalized), tex0 the linear zbuffer, and gl_Color the 2d pixel coordinates (from 0 to 1 for the complete viewport), then:
第二个pass 画一个覆盖整个视口的屏幕空间多边形,并且执行环境遮挡(ambient occlusion )计算. 首先把每个象素通过反投影(unprojection):它从前面给出的texture中读取z 值,并且给出眼睛空间的观察矢量(由vertex shader插值计算得出 ),它计算出了(像素点)在眼睛空间中的位置。
float ez = texture2D( tex0, gl_Color.xy ); // eye z distance
vec3 ep = ez*gl_TexCoord[0].xyz/gl_TexCoord[0].z; // eye point
next we have to generate N 3d points. I believe Crytek uses 8 points for low end machines (pixel shaders 2.0) and 16 for more powerfull machines. It's a trade off between speed and quality, so definitively a parameter to play with. I generated the points around the current shading point in a sphere (inside the sphere, not just on the surface) from a small random lookup table (passed as constants), with a constant radius (scene dependant, and feature dependant - you can make the AO more local or global by adjusting this parameter).
接下来,我们必须生成N个3d 点坐标。我认为Crytek 在低端平台(ps 2.0 )上使用8个点,在高端机器上使用16个点。这是一个性能和质量之间的折中,所以用一个参数来决定N是多少。在要着色的点的周围,我从生成了一些点(在球体中,而不是在表面上),这些点从一个很小的随机查找表(传入的常量)生成,并且在一个常数半径(取决于场景,并且取决于你将通过调整这个参数来使得ao看起来更加局部,还是更加显得受全局影响。
for( int i=0; i<32; i++ )
{
vec3 se = ep + rad*fk3f[i].xyz;
Next we project these points back into clip space with the usual perspective division and look up on the zbuffer for the scene's eye z distane at that pixel, as in shadow mapping:
接下来,我们把这些点使用通常的透视除法向后投射到裁减空间,并且查找那个眼睛zbuffer中那一点的z 距离,就像shadow mapping那样
vec2 ss = (se.xy/se.z)*vec2(.75,1.0);
vec2 sn = ss*.5 + vec2(.5);
vec4 sz = texture2D(tex0,sn);
or alternatively
vec3 ss = se.xyz*vec3(.75,1.0,1.0);
vec4 sz = texture2DProj( tex0, ss*.5+ss.z*vec3(.5) );
Now the most tricky part of the algorithm comes. Unlike in shadow mapping where a simple comparison yields a binary value, in this case we have to be more careful because we have to account for occlusion, and occlusion factor are distance dependant while shadows are not. For example, a surface element that is far from the point under consideration will occlude less that point than if it was closer, with a quadratic attenuation (have a look here). So this means it should be a bit like a step() curve so that for negative values it does not occlude, but it should then slowly attenuate back to zero also. The attenuation factor, again, depends on the scale of the scene and aesthetical factors. I call this function "blocking or occlusion function". The idea is then to accumulate the occlusion factor, like:
现在,这个算法中最诡异的一部分到了,不像shadow mapping那样简单的比较并且给出一个二进制数值,这里我们必须更加小心仔细,因为必须估算遮挡,并且遮挡因子依赖距离,而阴影不是。比如说,距离一个所考虑点的距离较大的一个表面元素对这点的遮挡量会少于一个较近的表面元素,以二次方式衰减。So this means it should be a bit like a step() curve so that for negative values it does not occlude,但是它将随着趋于零而衰减变慢。衰减系数取决于场景的大小以及审美的标准。我把这个函数称为“遮挡函数”,用来累计遮挡系数,如下所示:
float zd = 50.0*max( se.z-sz.x, 0.0 );
bl += 1.0/(1.0+zd*zd); // occlusion = 1/( 1 + 2500*max{dist,0)^2 )
and to finish we just have to average to get the total estimated occlusion.
并且最后我们必须对全部遮挡求平均数。
}
gl_FragColor = vec4(bl/32.0);
The second trick:
Doing it as just described creates some ugly banding artifacts derived from the low sampling rate of the occlusion (32 in the example above, 8 or 16 in Cryteks implementation). So the next step is to apply some dithering to the sampling pattern. Crytek suggests to use a per pixel random plane to do a reflection on the sampling point around the shading point, what works very well in practice and is very fast. For that we have to prepare a small random normal map, accessible thru tex1 on the following modified code:
用如上所述的做法,对采样较少的地方(上面用了32个,Cryteks的实现用了8或16个),将会创建出一些难看的条纹。所以下一步就是应用一些在采样图案上采用一些抖动处理。Crytek 建议用对每像素取随机平面,对采样点周围的着色点进行反射,实际情况很好很快。因此,我们必须准备一个小的随机normal map,在下面修改过的代码中,通过tex1访问它:
vec3 se = ep + rad*reflect(fk3f[i].xyz,pl.xyz);
so the complete shader looks like:
完整的代码:
uniform vec4 fk3f[32];
uniform vec4 fres;
uniform sampler2D tex0;
uniform sampler2D tex1;
void main(void)
{
vec4 zbu = texture2D( tex0, gl_Color.xy );
vec3 ep = zbu.x*gl_TexCoord[0].xyz/gl_TexCoord[0].z;
vec4 pl = texture2D( tex1, gl_Color.xy*fres.xy );
pl = pl*2.0 - vec4(1.0);
float bl = 0.0;
for( int i=0; i<32; i++ )
{
vec3 se = ep + rad*reflect(fk3f[i].xyz,pl.xyz);
vec2 ss = (se.xy/se.z)*vec2(.75,1.0);
vec2 sn = ss*.5 + vec2(.5);
vec4 sz = texture2D(tex0,sn);
float zd = 50.0*max( se.z-sz.x, 0.0 );
bl += 1.0/(1.0+zd*zd);
}
gl_FragColor = vec4(bl/32.0);
}
The big secret trick is to apply next a bluring to the ambient occlusion, that we stored in a texture (occlusion map). It's easy to avoid bluring across obeject edges by checking the diference in z between the bluring sampling points, and the eye space normal too (that we can output as with the eye linear z distance on the very first pass).
更大的秘密是对ambient occlusion 应用模糊处理,它是我们储存了的一张纹理( occlusion map )。避免跨物体的边的模糊很容易,只需要检测要进行模糊处理的两个采样点的z,以及眼睛空间法线是否相同就可以。(眼睛空间法线我们可以和眼睛空间线性的z距离在一个pass中输出)
Optimziations:
The shader above does not execute in pixel shaders 2.0 hardware, because of the amount of instrucions, even with just 8 sampling points, while Crytek's does. So, the thing is to simplify the inner loop code. The first this one can do is to remove the perspective projection aplied to the sampling points. This has a nice side effect, and it's that the sampling sphere is constant size in screen space reagardless the distance to the camera, what allows for ambient occlusion both in close and distant objects at the same time. That's what Crytek guys do, I believe. Once could play with the blocking factor to remove few instructions too.
上面的shader,不能在pixel shader 2.0硬件上执行,因为指令数太多。即使仅使用8个采样点。所以,需要简化循环内的代码。首先我们可以做的是,移除应用到采样点上的透视投影。
Results:
I added few reference images below of this realtime Screen Space Ambien Occlusion implementation (the small one are clikable).
|