OpenGL进阶之Batch rendering
What Is Batch Rendering?
每个游戏引擎都需要利用CPU生成游戏数据,然后在将这些数据传输到GPU,这样才能在屏幕上渲染出画面。当渲染不同的对象时,最好是将数据组织在一个组里,这样你就可以最小化CPU和GPU之间的调用,你同样需要最小化状态机改变的次数(过多的state change会把你程序性能拖成翔)。这些持有渲染数据的group就称为batch(批次)。
How To Create A Batch?
在OpenGL中,一个batch就是创建一个Vertex Buffer Object(VBO)。创建一个VBO的细节和最佳实践如下:https://www.opengl.org/wiki/Vertex_Specification_Best_Practices。代码示例:
class Batch{ public: private: unsigned _uMaxNumVertices; unsigned _uNumUsedVertices; unsigned _vao; //only used in OpenGL v3.x + unsigned _vbo; BatchConfig _config; GuiVertex _lastVertex; //^^^^------ variables above ------|------ functions below ------vvvv public: Batch(unsigned uMaxNumVertices ); ~Batch(); bool isBatchConfig( const BatchConfig& config ) const; bool isEmpty() const; bool isEnoughRoom( unsigned uNumVertices ) const; Batch* getFullest( Batch* pBatch ); int getPriority() const; void add( const std::vector& vVertices, const BatchConfig& config ); void add( const std::vector& vVertices ); void render(); protected: private: Batch( const Batch& c ); //not implemented Batch& operator=( const Batch& c ); //not implemented void cleanUp(); };//Batch
注意上面的代码,Batch要保持对可以存储的顶点数量进行追踪(_uMaxNumVertices),同样也记录了Batch中真正使用了多少顶点(_uNumUsedVertices),当一个Batch创建时,会同时创建一个VBO在GPU端存储顶点,每一个Batch只存储一组特定的顶点数组,这个数组是在BatchConfig中定义的。
一个BatchConfig的定义如下:
struct BatchConfig { unsigned uRenderType; int iPriority; unsigned uTextureId; glm::mat4 transformMatrix; //initialized as identity matrix BatchConfig( unsigned uRenderTypeIn, int iPriorityIn, unsigned uTextureIdIn ) : uRenderType( uRenderTypeIn ), iPriority( iPriorityIn ), uTextureId( uTextureIdIn ) {} bool operator==( const BatchConfig& other) const { if( uRenderType != other.uRenderType || iPriority != other.iPriority || uTextureId != other.uTextureId || transformMatrix != other.transformMatrix ) { return false; } return true; } bool operator!=( const BatchConfig& other) const { return !( *this == other ); } };//BatchConfig
一个BatchConfig定义了一组顶点是如何被解释的(uRenderType):是被绘制为一组GL_LINES,还是一组GL_TRIANGLES,亦或是一组GL_TRIANGLE_STRIPS.
变量iPriority表示Batch被渲染的顺序,一个较高的优先级表示一个Batch的顶点会出现在其他优先级比较低的Batch的上面。
如果Batch中的顶点指定了纹理坐标,我们则需要知道绑定了哪张纹理(uTextureId)。
最后,如果Batch中的顶点在渲染之前需要空间变换,那他们的transformMatrix也需要包含进来。
本文使用的的顶点格式如下:
struct GuiVertex { glm::vec2 position; glm::vec4 color; glm::vec2 texture; GuiVertex( glm::vec2 positionIn, glm::vec4 colorIn, glm::vec2 textureIn = glm::vec2() ) : position( positionIn ), color( colorIn ), texture( textureIn ) {} };//GuiVertex
上面的GuiVertex定义了屏幕空间的2D坐标,同时定义了颜色和纹理坐标。
接下来我们实现一下Batch类的各个函数:
Batch::Batch( unsigned uMaxNumVertices ) : _uMaxNumVertices( uMaxNumVertices ), _uNumUsedVertices( 0 ), _vao( 0 ), _vbo( 0 ), _config( GL_TRIANGLE_STRIP, 0, 0 ), _lastVertex( glm::vec2(), glm::vec4() ) { //optimal size for a batch is between 1-4MB in size. Number of elements that can be stored in a //batch is determined by calculating #bytes used by each vertex if( uMaxNumVertices < 1000 ) { std::ostringstream strStream; strStream << __FUNCTION__ << " uMaxNumVertices{" << uMaxNumVertices << "} is too small. Choose a number >= 1000 "; throw ExceptionHandler( strStream ); } //clear error codes glGetError(); if( Settings::getOpenglVersion().x >= 3 ) { glGenVertexArrays( 1, &_vao ); glBindVertexArray( _vao ); } //create batch buffer glGenBuffers( 1, &_vbo ); glBindBuffer( GL_ARRAY_BUFFER, _vbo ); glBufferData( GL_ARRAY_BUFFER, uMaxNumVertices * sizeof( GuiVertex ), nullptr, GL_STREAM_DRAW ); if( Settings::getOpenglVersion().x >= 3 ) { unsigned uOffset = 0; ShaderManager::enableAttribute( A_POSITION, sizeof( GuiVertex ), uOffset ); uOffset += sizeof( glm::vec2 ); ShaderManager::enableAttribute( A_COLOR, sizeof( GuiVertex ), uOffset ); uOffset += sizeof( glm::vec4 ); ShaderManager::enableAttribute( A_TEXTURE_COORD0, sizeof( GuiVertex ), uOffset ); glBindVertexArray( 0 ); ShaderManager::disableAttribute( A_POSITION ); ShaderManager::disableAttribute( A_COLOR ); ShaderManager::disableAttribute( A_TEXTURE_COORD0 ); } glBindBuffer( GL_ARRAY_BUFFER, 0 ); if( GL_NO_ERROR != glGetError() ) { cleanUp(); throw ExceptionHandler( __FUNCTION__ + std::string( " failed to create batch" ) ); } }//Batch //------------------------------------------------------------------------ Batch::~Batch() { cleanUp(); }//~Batch //------------------------------------------------------------------------ void Batch::cleanUp() { if( _vbo != 0 ) { glBindBuffer( GL_ARRAY_BUFFER, 0 ); glDeleteBuffers( 1, &_vbo ); _vbo = 0; } if( _vao != 0 ) { glBindVertexArray( 0 ); glDeleteVertexArrays( 1, &_vao ); _vao = 0; } }//cleanUp //------------------------------------------------------------------------ bool Batch::isBatchConfig( const BatchConfig& config ) const { return ( config == _config ); }//isBatchConfig //------------------------------------------------------------------------ bool Batch::isEmpty() const { return ( 0 == _uNumUsedVertices ); }//isEmpty //------------------------------------------------------------------------ //returns true if the number of vertices passed in can be stored in this batch //without reaching the limit of how many vertices can fit in the batch bool Batch::isEnoughRoom( unsigned uNumVertices ) const { //2 extra vertices are needed for degenerate triangles between each strip unsigned uNumExtraVertices = ( GL_TRIANGLE_STRIP == _config.uRenderType && _uNumUsedVertices > 0 ? 2 : 0 ); return ( _uNumUsedVertices + uNumExtraVertices + uNumVertices <= _uMaxNumVertices ); }//isEnoughRoom //------------------------------------------------------------------------ //returns the batch that contains the most number of stored vertices between //this batch and the one passed in Batch* Batch::getFullest( Batch* pBatch ) { return ( _uNumUsedVertices > pBatch->_uNumUsedVertices ? this : pBatch ); }//getFullest //------------------------------------------------------------------------ int Batch::getPriority() const { return _config.iPriority; }//getPriority //------------------------------------------------------------------------ //adds vertices to batch and also sets the batch config options void Batch::add( const std::vector& vVertices, const BatchConfig& config ) { _config = config; add( vVertices ); }//add //------------------------------------------------------------------------ void Batch::add( const std::vector& vVertices ) { //2 extra vertices are needed for degenerate triangles between each strip unsigned uNumExtraVertices = ( GL_TRIANGLE_STRIP == _config.uRenderType && _uNumUsedVertices > 0 ? 2 : 0 ); if( uNumExtraVertices + vVertices.size() > _uMaxNumVertices - _uNumUsedVertices ) { std::ostringstream strStream; strStream << __FUNCTION__ << " not enough room for {" << vVertices.size() << "} vertices in this batch. Maximum number of vertices allowed in a batch is {" << _uMaxNumVertices << "} and {" << _uNumUsedVertices << "} are already used"; if( uNumExtraVertices > 0 ) { strStream << " plus you need room for {" << uNumExtraVertices << "} extra vertices too"; } throw ExceptionHandler( strStream ); } if( vVertices.size() > _uMaxNumVertices ) { std::ostringstream strStream; strStream << __FUNCTION__ << " can not add {" << vVertices.size() << "} vertices to batch. Maximum number of vertices allowed in a batch is {" << _uMaxNumVertices << "}"; throw ExceptionHandler( strStream ); } if( vVertices.empty() ) { std::ostringstream strStream; strStream << __FUNCTION__ << " can not add {" << vVertices.size() << "} vertices to batch."; throw ExceptionHandler( strStream ); } //add vertices to buffer if( Settings::getOpenglVersion().x >= 3 ) { glBindVertexArray( _vao ); } glBindBuffer( GL_ARRAY_BUFFER, _vbo ); if( uNumExtraVertices > 0 ) { //need to add 2 vertex copies to create degenerate triangles between this strip //and the last strip that was stored in the batch glBufferSubData( GL_ARRAY_BUFFER, _uNumUsedVertices * sizeof( GuiVertex ), sizeof( GuiVertex ), &_lastVertex ); glBufferSubData( GL_ARRAY_BUFFER, ( _uNumUsedVertices + 1 ) * sizeof( GuiVertex ), sizeof( GuiVertex ), &vVertices[0] ); } // Use glMapBuffer instead, if moving large chunks of data > 1MB glBufferSubData( GL_ARRAY_BUFFER, ( _uNumUsedVertices + uNumExtraVertices ) * sizeof( GuiVertex ), vVertices.size() * sizeof( GuiVertex ), &vVertices[0] ); if( Settings::getOpenglVersion().x >= 3 ) { glBindVertexArray( 0 ); } glBindBuffer( GL_ARRAY_BUFFER, 0 ); _uNumUsedVertices += vVertices.size() + uNumExtraVertices; _lastVertex = vVertices[vVertices.size() - 1]; }//add //------------------------------------------------------------------------ void Batch::render() { if( _uNumUsedVertices == 0 ) { //nothing in this buffer to render return; } bool usingTexture = INVALID_UNSIGNED != _config.uTextureId; ShaderManager::setUniform( U_USING_TEXTURE, usingTexture ); if( usingTexture ) { ShaderManager::setTexture( 0, U_TEXTURE0_SAMPLER_2D, _config.uTextureId ); } ShaderManager::setUniform( U_TRANSFORM_MATRIX, _config.transformMatrix ); //draw contents of buffer if( Settings::getOpenglVersion().x >= 3 ) { glBindVertexArray( _vao ); glDrawArrays( _config.uRenderType, 0, _uNumUsedVertices ); glBindVertexArray( 0 ); } else { //OpenGL v2.x glBindBuffer( GL_ARRAY_BUFFER, _vbo ); unsigned uOffset = 0; ShaderManager::enableAttribute( A_POSITION, sizeof( GuiVertex ), uOffset ); uOffset += sizeof( glm::vec2 ); ShaderManager::enableAttribute( A_COLOR, sizeof( GuiVertex ), uOffset ); uOffset += sizeof( glm::vec4 ); ShaderManager::enableAttribute( A_TEXTURE_COORD0, sizeof( GuiVertex ), uOffset ); glDrawArrays( _config.uRenderType, 0, _uNumUsedVertices ); ShaderManager::disableAttribute( A_POSITION ); ShaderManager::disableAttribute( A_COLOR ); ShaderManager::disableAttribute( A_TEXTURE_COORD0 ); glBindBuffer( GL_ARRAY_BUFFER, 0 ); } //reset buffer _uNumUsedVertices = 0; _config.iPriority = 0; }//render
How To Use The Batch Class?
为了更方便的使用Batch类,我们需要一个BatchManager的管理类,定义如下:
class BatchManager{ public: private: std::vector> _vBatches; unsigned _uNumBatches; unsigned _maxNumVerticesPerBatch; //^^^^------ variables above ------|------ functions below ------vvvv public: BatchManager( unsigned uNumBatches, unsigned numVerticesPerBatch ); ~BatchManager(); void render( const std::vector& vVertices, const BatchConfig& config ); void emptyAll(); protected: private: BatchManager( const BatchManager& c ); //not implemented BatchManager& operator=( const BatchManager& c ); //not implemented void emptyBatch( bool emptyAll, Batch* pBatchToEmpty ); };//BatchManager
这个BatchManager类负责管理一个Batch池(_vBatches)。当调用BatchManager.render时,该类会为输入的顶点找到应该使用的Batch(通过BatchConfig),具体实现如下:
BatchManager::BatchManager( unsigned uNumBatches, unsigned numVerticesPerBatch ) : _uNumBatches( uNumBatches ), _maxNumVerticesPerBatch( numVerticesPerBatch ) { //test input parameters if( uNumBatches < 10 ) { std::ostringstream strStream; strStream << __FUNCTION__ << " uNumBatches{" << uNumBatches << "} is too small. Choose a number >= 10 "; throw ExceptionHandler( strStream ); } //a good size for each batch is between 1-4MB in size. Number of elements that can be stored in a //batch is determined by calculating #bytes used by each vertex if( numVerticesPerBatch < 1000 ) { std::ostringstream strStream; strStream << __FUNCTION__ << " numVerticesPerBatch{" << numVerticesPerBatch << "} is too small. Choose a number >= 1000 "; throw ExceptionHandler( strStream ); } //create desired number of batches _vBatches.reserve( uNumBatches ); for( unsigned u = 0; u < uNumBatches; ++u ) { _vBatches.push_back( std::shared_ptr( new Batch( numVerticesPerBatch ) ) ); } }//BatchManager //------------------------------------------------------------------------ BatchManager::~BatchManager() { _vBatches.clear(); }//~BatchManager //------------------------------------------------------------------------ void BatchManager::render( const std::vector& vVertices, const BatchConfig& config ) { Batch* pEmptyBatch = nullptr; Batch* pFullestBatch = _vBatches[0].get(); //determine which batch to put these vertices into for( unsigned u = 0; u < _uNumBatches; ++u ) { Batch* pBatch = _vBatches.get(); if( pBatch->isBatchConfig( config ) ) { if( !pBatch->isEnoughRoom( vVertices.size() ) ) { //first need to empty this batch before adding anything to it emptyBatch( false, pBatch ); } pBatch->add( vVertices ); return; } //store pointer to first empty batch if( nullptr == pEmptyBatch && pBatch->isEmpty() ) { pEmptyBatch = pBatch; } //store pointer to fullest batch pFullestBatch = pBatch->getFullest( pFullestBatch ); } //if we get here then we didn't find an appropriate batch to put the vertices into //if we have an empty batch, put vertices there if( nullptr != pEmptyBatch ) { pEmptyBatch->add( vVertices, config ); return; } //no empty batches were found therefore we must empty one first and then we can use it emptyBatch( false, pFullestBatch ); pFullestBatch->add( vVertices, config ); }//render //------------------------------------------------------------------------ //empty all batches by rendering their contents now void BatchManager::emptyAll() { emptyBatch( true, _vBatches[0].get() ); }//emptyAll //------------------------------------------------------------------------ struct CompareBatch : public std::binary_function { bool operator()( const Batch* pBatchA, const Batch* pBatchB ) const { return ( pBatchA->getPriority() > pBatchB->getPriority() ); }//operator() };//CompareBatch //------------------------------------------------------------------------ //empties the batches according to priority. If emptyAll is false then //only empty the batches that are lower priority than the one specified //AND also empty the one that is passed in void BatchManager::emptyBatch( bool emptyAll, Batch* pBatchToEmpty ) { //sort batches by priority std::priority_queue, CompareBatch> queue; for( unsigned u = 0; u < _uNumBatches; ++u ) { //add all non-empty batches to queue which will be sorted by order //from lowest to highest priority if( !_vBatches->isEmpty() ) { if( emptyAll ) { queue.push( _vBatches.get() ); } else if( _vBatches->getPriority() < pBatchToEmpty->getPriority() ) { //only add batches that are lower in priority queue.push( _vBatches.get() ); } } } //render all desired batches while( !queue.empty() ) { Batch* pBatch = queue.top(); pBatch->render(); queue.pop(); } if( !emptyAll ) { //when not emptying all the batches, we still want to empty //the batch that is passed in, in addition to all batches //that have lower priority than it pBatchToEmpty->render(); } }//emptyBatch
切记:
这篇文章的示例代码是将一些2D顶点数组组织起来进行渲染的,主要是为了方便演示如何充分利用批次的概念来组织渲染数据。GuiVertex中的iPriority就相当于3D绘制时的深度信息,用来决定渲染顺序的。如果想把这些实例代码用到3D顶点,则需要自己手动修改数据结构,比如将GuiVertex中的iPriortiy改成顶点到相机的距离,图元类型也可以自己扩展。
link:
https://www.gamedev.net/articles/programming/graphics/opengl-batch-rendering-r3900/