GPU Down Sampling For Point Based Rendering
摘要:点渲染技术的EWA渲染方式能够有俺们的实时GPU超采样产生的图效果好么?肯定是不会有的啦。
Abstract : Is the EWA splatting will be better than my GPU multipass supersampling method ? Of course Not !
Zusammemfassung : Ist die EWA splatting so besser als meine GPU multipass supersampling Methode ? Naturlich nicht!
经过仔细的测试以及权衡,我决定抛弃EWA过滤技术转向GPU超采样的方式对基于点渲染的图像进行过滤。
During careful benchmark and balancing, I decide to abandon the EWA filter, turn to use GPU supersampling filter method in offline rendering, did as the same as NVIDIA Gelato.
具体的做法跟Gelato一样。在硬件能力允许的情况下光栅化超大分辨率的图像,尔后使用离线渲染器使用的过滤器(Catmull-Rom、Gaussian、Sinc等等)在帧缓冲中进行X方向与Y方向上的卷积过滤,结果的图像素质远远超过EWA过滤得到的效果,近似达到离线渲染的水平,性能上却几乎没有损失,避免了使用硬件提供的蹩脚的MSAA。
We will render the whole scene into a big enough RT( Render Target ) not beyond the capability of hardware, filter this RT by Catmull-Rom, gaussian, sinc filters etc. It divides into X and Y pass, results nearly as the same result as offline render for preview relighting result, its quality is much better than EWA, avoid to using MSAA supplied by poor hardware.
做法很简单。根据用户设定的超采样率,将图像渲染到RT(Render Target,下同)里。尔后采样过滤器生成查找表,计算真实过滤半径。将原始RT的纹理寻址模式设置为GL_CLAMP_TO_EDGE,而不能使用GL_CLAMP或者GL_REPEAT。建立FBO,建立2张临时RT,第一个临时RT储存X方向上过滤后的结果,第二张储存完全过滤后的结果,如果用户需要则输出为指定格式图片。下面是一些琐碎的代码块和图,仅供参考。过滤器代码来自RenderMan Interface Specfication。
int RealRadius = FilterRadius*SSRate;
float WeightSum = 0.0f;
for( int i=-RealRadius; i<RealRadius; i++ )
{
float W = RiCatmullRomFilter( (i+0.5)/(float)RealRadius,0, FilterRadius, FilterRadius);
WeightSum += W;
Weights.push_back( W );
}
float* WeightPtr = new float[ Weights.size() ];
for( size_t i=0; i<Weights.size(); i++ )
{
WeightPtr[i] = Weights[i] / WeightSum;
printf("%f\n",WeightPtr[i]);
}
glGenTextures(1,&FilterTex);//过滤器纹理
glBindTexture(GL_TEXTURE_RECTANGLE_ARB,FilterTex);
glTexImage2D(GL_TEXTURE_RECTANGLE_ARB,0,GL_ALPHA32F_ARB,Weights.size(),1,0,GL_ALPHA,GL_FLOAT,WeightPtr);
glTexParameterf(GL_TEXTURE_RECTANGLE_ARB,GL_TEXTURE_MIN_FILTER,GL_NEAREST);
glTexParameterf(GL_TEXTURE_RECTANGLE_ARB,GL_TEXTURE_MAG_FILTER,GL_NEAREST);
glTexParameterf(GL_TEXTURE_RECTANGLE_ARB,GL_TEXTURE_WRAP_S,GL_CLAMP_TO_EDGE);
glTexParameterf(GL_TEXTURE_RECTANGLE_ARB,GL_TEXTURE_WRAP_T,GL_CLAMP_TO_EDGE);
delete [] WeightPtr;
Weights.clear();
//Filter on X|Y direction, Y is in comment.
//TEX0 binds the origin sampler
//TEX1 binds the filter weights
//WeightNum is used for loop
//SSRate is used to calculate the correct pixel offset on orgin sampler
uniform samplerRect TEX0;
uniform samplerRect TEX1;
uniform int WeightNum;
uniform int SSRate;
void main()
{
vec4 WPOS = gl_FragCoord;
vec2 Center = vec2( (WPOS.x+0.5)*float(SSRate), WPOS.y );//vec2 Center = vec2( WPOS.x, (ceil(WPOS.y)+0.5)*float(SSRate) );
for( int i=0; i<WeightNum; i++ )
{
float Weight = textureRect( TEX1, vec2(float(i)+0.5,0.0) ).a;
gl_FragColor += Weight*textureRect(TEX0,Center + vec2(float( i - WeightNum ),0.0));
}
}
一个局部地带(点击查看大图)
Local Area ( click to view large picture )
过滤半径为2x2,超采样率4x4,左边是Gaussian过滤器,右边是Catmull-Rom过滤器(点击查看大图)
filter Radius 2x2, supersampling Rate 4x4, left is using gaussian filter, right is using catmull-rom filter ( click to view large picture )
有些朋友可能会说,“你个丫的不知道分Tile?”,是的,我尝试过,可是如果分Tile进行渲染,基于点渲染的技术将全部被推翻,具体表现为使用经过修改的透视变换矩阵得到的点是不对头的,也就无法进行“透视正确的点的尺寸的计算”。而且分多Tile势必浪费了不少像素填充率,还不如直接渲染个超大尺寸的RT快速。
Maybe some guy will say, " you big shit why not render tile by tile ?", yes, I tried, if did that, the all GPU point based rendering method will become "shit". Because if we rendering tile with modified perspective matrix, it will be wrong, it's not perspective correct result.