OpenGL进阶之Batch rendering

What Is Batch Rendering?

每个游戏引擎都需要利用CPU生成游戏数据,然后在将这些数据传输到GPU,这样才能在屏幕上渲染出画面。当渲染不同的对象时,最好是将数据组织在一个组里,这样你就可以最小化CPU和GPU之间的调用,你同样需要最小化状态机改变的次数(过多的state change会把你程序性能拖成翔)。这些持有渲染数据的group就称为batch(批次)。

 

How To Create A Batch?

OpenGL中,一个batch就是创建一个Vertex Buffer Object(VBO)。创建一个VBO的细节和最佳实践如下:https://www.opengl.org/wiki/Vertex_Specification_Best_Practices。代码示例:

class Batch{
public:
private:
unsigned	_uMaxNumVertices;
unsigned	_uNumUsedVertices;
unsigned	_vao; //only used in OpenGL v3.x +
unsigned	_vbo;
BatchConfig _config;
GuiVertex _lastVertex;
//^^^^------ variables above ------|------ functions below ------vvvv
public:
Batch(unsigned uMaxNumVertices );
~Batch();
bool isBatchConfig( const BatchConfig& config ) const;
bool isEmpty() const;
bool isEnoughRoom( unsigned uNumVertices ) const;
Batch* getFullest( Batch* pBatch );
int getPriority() const;
void add( const std::vector& vVertices, const BatchConfig& config );
void add( const std::vector& vVertices );
void render();
protected:
private:
Batch( const Batch& c ); //not implemented
Batch& operator=( const Batch& c ); //not implemented
void cleanUp();
};//Batch

 

注意上面的代码,Batch要保持对可以存储的顶点数量进行追踪(_uMaxNumVertices),同样也记录了Batch中真正使用了多少顶点(_uNumUsedVertices),当一个Batch创建时,会同时创建一个VBO在GPU端存储顶点,每一个Batch只存储一组特定的顶点数组,这个数组是在BatchConfig中定义的。

一个BatchConfig的定义如下:

struct BatchConfig {
unsigned uRenderType;
int iPriority;
unsigned uTextureId;
glm::mat4 transformMatrix; //initialized as identity matrix
BatchConfig( unsigned uRenderTypeIn, int iPriorityIn, unsigned uTextureIdIn ) :
uRenderType( uRenderTypeIn ),
iPriority( iPriorityIn ),
uTextureId( uTextureIdIn )
{}
bool operator==( const BatchConfig& other) const {
if( uRenderType	!= other.uRenderType ||
iPriority	!= other.iPriority ||
uTextureId	!= other.uTextureId ||
transformMatrix != other.transformMatrix )
{
return false;
}
return true;
}
bool operator!=( const BatchConfig& other) const {
return !( *this == other );
}
};//BatchConfig

  

一个BatchConfig定义了一组顶点是如何被解释的(uRenderType):是被绘制为一组GL_LINES,还是一组GL_TRIANGLES,亦或是一组GL_TRIANGLE_STRIPS.

变量iPriority表示Batch被渲染的顺序,一个较高的优先级表示一个Batch的顶点会出现在其他优先级比较低的Batch的上面。

如果Batch中的顶点指定了纹理坐标,我们则需要知道绑定了哪张纹理(uTextureId)。

最后,如果Batch中的顶点在渲染之前需要空间变换,那他们的transformMatrix也需要包含进来。

本文使用的的顶点格式如下:

struct GuiVertex {
glm::vec2 position;
glm::vec4 color;
glm::vec2 texture;
GuiVertex( glm::vec2 positionIn, glm::vec4 colorIn, glm::vec2 textureIn = glm::vec2() ) :
position( positionIn ),
color( colorIn ),
texture( textureIn )
{}
};//GuiVertex

  

上面的GuiVertex定义了屏幕空间的2D坐标,同时定义了颜色和纹理坐标。

 

接下来我们实现一下Batch类的各个函数:

Batch::Batch( unsigned uMaxNumVertices ) :
_uMaxNumVertices( uMaxNumVertices ),
_uNumUsedVertices( 0 ),
_vao( 0 ),
_vbo( 0 ),
_config( GL_TRIANGLE_STRIP, 0, 0 ),
_lastVertex( glm::vec2(), glm::vec4() )
{
//optimal size for a batch is between 1-4MB in size. Number of elements that can be stored in a
//batch is determined by calculating #bytes used by each vertex
if( uMaxNumVertices < 1000 ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " uMaxNumVertices{" << uMaxNumVertices << "} is too small. Choose a number >= 1000 ";
throw ExceptionHandler( strStream );
}
//clear error codes
glGetError();
if( Settings::getOpenglVersion().x >= 3 ) {
glGenVertexArrays( 1, &_vao );
glBindVertexArray( _vao );
}
//create batch buffer
glGenBuffers( 1, &_vbo );
glBindBuffer( GL_ARRAY_BUFFER, _vbo );
glBufferData( GL_ARRAY_BUFFER, uMaxNumVertices * sizeof( GuiVertex ), nullptr, GL_STREAM_DRAW );
if( Settings::getOpenglVersion().x >= 3 ) {
unsigned uOffset = 0;
ShaderManager::enableAttribute( A_POSITION, sizeof( GuiVertex ), uOffset );
uOffset += sizeof( glm::vec2 );
ShaderManager::enableAttribute( A_COLOR, sizeof( GuiVertex ), uOffset );
uOffset += sizeof( glm::vec4 );
ShaderManager::enableAttribute( A_TEXTURE_COORD0, sizeof( GuiVertex ), uOffset );
glBindVertexArray( 0 );
ShaderManager::disableAttribute( A_POSITION );
ShaderManager::disableAttribute( A_COLOR );
ShaderManager::disableAttribute( A_TEXTURE_COORD0 );
}
glBindBuffer( GL_ARRAY_BUFFER, 0 );
if( GL_NO_ERROR != glGetError() ) {
cleanUp();
throw ExceptionHandler( __FUNCTION__ + std::string( " failed to create batch" ) );
}
}//Batch
//------------------------------------------------------------------------
Batch::~Batch() {
cleanUp();
}//~Batch

//------------------------------------------------------------------------
void Batch::cleanUp() {
if( _vbo != 0 ) {
glBindBuffer( GL_ARRAY_BUFFER, 0 );
glDeleteBuffers( 1, &_vbo );
_vbo = 0;
}
if( _vao != 0 ) {
glBindVertexArray( 0 );
glDeleteVertexArrays( 1, &_vao );
_vao = 0;
}
}//cleanUp

//------------------------------------------------------------------------
bool Batch::isBatchConfig( const BatchConfig& config ) const {
return ( config == _config );
}//isBatchConfig

//------------------------------------------------------------------------
bool Batch::isEmpty() const {
return ( 0 == _uNumUsedVertices );
}//isEmpty


//------------------------------------------------------------------------
//returns true if the number of vertices passed in can be stored in this batch
//without reaching the limit of how many vertices can fit in the batch
bool Batch::isEnoughRoom( unsigned uNumVertices ) const {
//2 extra vertices are needed for degenerate triangles between each strip
unsigned uNumExtraVertices = ( GL_TRIANGLE_STRIP == _config.uRenderType && _uNumUsedVertices > 0 ? 2 : 0 );
return ( _uNumUsedVertices + uNumExtraVertices + uNumVertices <= _uMaxNumVertices );
}//isEnoughRoom


//------------------------------------------------------------------------
//returns the batch that contains the most number of stored vertices between
//this batch and the one passed in
Batch* Batch::getFullest( Batch* pBatch ) {
return ( _uNumUsedVertices > pBatch->_uNumUsedVertices ? this : pBatch );
}//getFullest


//------------------------------------------------------------------------
int Batch::getPriority() const {
return _config.iPriority;
}//getPriority
//------------------------------------------------------------------------
//adds vertices to batch and also sets the batch config options
void Batch::add( const std::vector& vVertices, const BatchConfig& config ) {
_config = config;
add( vVertices );
}//add


//------------------------------------------------------------------------
void Batch::add( const std::vector& vVertices ) {
//2 extra vertices are needed for degenerate triangles between each strip
unsigned uNumExtraVertices = ( GL_TRIANGLE_STRIP == _config.uRenderType && _uNumUsedVertices > 0 ? 2 : 0 );
if( uNumExtraVertices + vVertices.size() > _uMaxNumVertices - _uNumUsedVertices ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " not enough room for {" << vVertices.size() << "} vertices in this batch. Maximum number of vertices allowed in a batch is {" << _uMaxNumVertices << "} and {" << _uNumUsedVertices << "} are already used";
if( uNumExtraVertices > 0 ) 
{
strStream << " plus you need room for {" << uNumExtraVertices << "} extra vertices too";
}
throw ExceptionHandler( strStream );
}
if( vVertices.size() > _uMaxNumVertices ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " can not add {" << vVertices.size() << "} vertices to batch. Maximum number of vertices allowed in a batch is {" << _uMaxNumVertices << "}";
throw ExceptionHandler( strStream );
}
if( vVertices.empty() ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " can not add {" << vVertices.size() << "} vertices to batch.";
throw ExceptionHandler( strStream );
}
//add vertices to buffer
if( Settings::getOpenglVersion().x >= 3 ) {
glBindVertexArray( _vao );
}
glBindBuffer( GL_ARRAY_BUFFER, _vbo );
if( uNumExtraVertices > 0 ) {
//need to add 2 vertex copies to create degenerate triangles between this strip
//and the last strip that was stored in the batch
glBufferSubData( GL_ARRAY_BUFFER, _uNumUsedVertices * sizeof( GuiVertex ), sizeof( GuiVertex ), &_lastVertex );
glBufferSubData( GL_ARRAY_BUFFER, ( _uNumUsedVertices + 1 ) * sizeof( GuiVertex ), sizeof( GuiVertex ), &vVertices[0] );
}
// Use glMapBuffer instead, if moving large chunks of data > 1MB
glBufferSubData( GL_ARRAY_BUFFER, ( _uNumUsedVertices + uNumExtraVertices ) * sizeof( GuiVertex ), vVertices.size() * sizeof( GuiVertex ), &vVertices[0] );
if( Settings::getOpenglVersion().x >= 3 ) {
glBindVertexArray( 0 );
}
glBindBuffer( GL_ARRAY_BUFFER, 0 );
_uNumUsedVertices += vVertices.size() + uNumExtraVertices;
_lastVertex = vVertices[vVertices.size() - 1];
}//add


//------------------------------------------------------------------------
void Batch::render() {
if( _uNumUsedVertices == 0 ) {
//nothing in this buffer to render
return;
}
bool usingTexture = INVALID_UNSIGNED != _config.uTextureId;
ShaderManager::setUniform( U_USING_TEXTURE, usingTexture );
if( usingTexture ) {
ShaderManager::setTexture( 0, U_TEXTURE0_SAMPLER_2D, _config.uTextureId );
}
ShaderManager::setUniform( U_TRANSFORM_MATRIX, _config.transformMatrix );
//draw contents of buffer
if( Settings::getOpenglVersion().x >= 3 ) {
glBindVertexArray( _vao );
glDrawArrays( _config.uRenderType, 0, _uNumUsedVertices );
glBindVertexArray( 0 );
} else { //OpenGL v2.x
glBindBuffer( GL_ARRAY_BUFFER, _vbo );
unsigned uOffset = 0;
ShaderManager::enableAttribute( A_POSITION, sizeof( GuiVertex ), uOffset );
uOffset += sizeof( glm::vec2 );
ShaderManager::enableAttribute( A_COLOR, sizeof( GuiVertex ), uOffset );
uOffset += sizeof( glm::vec4 );
ShaderManager::enableAttribute( A_TEXTURE_COORD0, sizeof( GuiVertex ), uOffset );
glDrawArrays( _config.uRenderType, 0, _uNumUsedVertices );
ShaderManager::disableAttribute( A_POSITION );
ShaderManager::disableAttribute( A_COLOR );
ShaderManager::disableAttribute( A_TEXTURE_COORD0 );
glBindBuffer( GL_ARRAY_BUFFER, 0 );
}
//reset buffer
_uNumUsedVertices = 0;
_config.iPriority = 0;
}//render

  

  

How To Use The Batch Class?

为了更方便的使用Batch类,我们需要一个BatchManager的管理类,定义如下:

 

class BatchManager{
public:
private:
std::vector> _vBatches;
unsigned _uNumBatches;
unsigned _maxNumVerticesPerBatch;
//^^^^------ variables above ------|------ functions below ------vvvv
public:
BatchManager( unsigned uNumBatches, unsigned numVerticesPerBatch );
~BatchManager();
void render( const std::vector& vVertices, const BatchConfig& config );
void emptyAll();
protected:
private:
BatchManager( const BatchManager& c ); //not implemented
BatchManager& operator=( const BatchManager& c ); //not implemented
void emptyBatch( bool emptyAll, Batch* pBatchToEmpty );
};//BatchManager

  

这个BatchManager类负责管理一个Batch池(_vBatches)。当调用BatchManager.render时,该类会为输入的顶点找到应该使用的Batch(通过BatchConfig),具体实现如下:

BatchManager::BatchManager( unsigned uNumBatches, unsigned numVerticesPerBatch ) :
_uNumBatches( uNumBatches ),
_maxNumVerticesPerBatch( numVerticesPerBatch )
{
//test input parameters
if( uNumBatches < 10 ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " uNumBatches{" << uNumBatches << "} is too small. Choose a number >= 10 ";
throw ExceptionHandler( strStream );
}
//a good size for each batch is between 1-4MB in size. Number of elements that can be stored in a
//batch is determined by calculating #bytes used by each vertex
if( numVerticesPerBatch < 1000 ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " numVerticesPerBatch{" << numVerticesPerBatch << "} is too small. Choose a number >= 1000 ";
throw ExceptionHandler( strStream );
}
//create desired number of batches
_vBatches.reserve( uNumBatches );
for( unsigned u = 0; u < uNumBatches; ++u ) {
_vBatches.push_back( std::shared_ptr( new Batch( numVerticesPerBatch ) ) );
}
}//BatchManager
//------------------------------------------------------------------------
BatchManager::~BatchManager() {
_vBatches.clear();
}//~BatchManager
//------------------------------------------------------------------------
void BatchManager::render( const std::vector& vVertices, const BatchConfig& config ) {
Batch* pEmptyBatch = nullptr;
Batch* pFullestBatch = _vBatches[0].get();
//determine which batch to put these vertices into
for( unsigned u = 0; u < _uNumBatches; ++u ) {
Batch* pBatch = _vBatches.get();
if( pBatch->isBatchConfig( config ) ) {
if( !pBatch->isEnoughRoom( vVertices.size() ) ) {
//first need to empty this batch before adding anything to it
emptyBatch( false, pBatch );
}
pBatch->add( vVertices );
return;
}
//store pointer to first empty batch
if( nullptr == pEmptyBatch && pBatch->isEmpty() ) {
pEmptyBatch = pBatch;
}
//store pointer to fullest batch
pFullestBatch = pBatch->getFullest( pFullestBatch );
}
//if we get here then we didn't find an appropriate batch to put the vertices into
//if we have an empty batch, put vertices there
if( nullptr != pEmptyBatch ) {
pEmptyBatch->add( vVertices, config );
return;
}
//no empty batches were found therefore we must empty one first and then we can use it
emptyBatch( false, pFullestBatch );
pFullestBatch->add( vVertices, config );
}//render
//------------------------------------------------------------------------
//empty all batches by rendering their contents now
void BatchManager::emptyAll() {
emptyBatch( true, _vBatches[0].get() );
}//emptyAll
//------------------------------------------------------------------------
struct CompareBatch : public std::binary_function {
bool operator()( const Batch* pBatchA, const Batch* pBatchB ) const {
return ( pBatchA->getPriority() > pBatchB->getPriority() );
}//operator()
};//CompareBatch
//------------------------------------------------------------------------
//empties the batches according to priority. If emptyAll is false then
//only empty the batches that are lower priority than the one specified
//AND also empty the one that is passed in
void BatchManager::emptyBatch( bool emptyAll, Batch* pBatchToEmpty ) {
//sort batches by priority
std::priority_queue, CompareBatch> queue;
for( unsigned u = 0; u < _uNumBatches; ++u ) {
//add all non-empty batches to queue which will be sorted by order
//from lowest to highest priority
if( !_vBatches->isEmpty() ) {
if( emptyAll ) {
queue.push( _vBatches.get() );
} else if( _vBatches->getPriority() < pBatchToEmpty->getPriority() ) {
//only add batches that are lower in priority
queue.push( _vBatches.get() );
}
}
}
//render all desired batches
while( !queue.empty() ) {
Batch* pBatch = queue.top();
pBatch->render();
queue.pop();
}
if( !emptyAll ) {
//when not emptying all the batches, we still want to empty
//the batch that is passed in, in addition to all batches
//that have lower priority than it
pBatchToEmpty->render();
}
}//emptyBatch

  

 

切记:

这篇文章的示例代码是将一些2D顶点数组组织起来进行渲染的,主要是为了方便演示如何充分利用批次的概念来组织渲染数据。GuiVertex中的iPriority就相当于3D绘制时的深度信息,用来决定渲染顺序的。如果想把这些实例代码用到3D顶点,则需要自己手动修改数据结构,比如将GuiVertex中的iPriortiy改成顶点到相机的距离,图元类型也可以自己扩展。

link:

https://www.gamedev.net/articles/programming/graphics/opengl-batch-rendering-r3900/

posted @ 2018-04-20 14:12  DeepDream  阅读(2548)  评论(0编辑  收藏  举报