OpenGL Pixel Buffer Object (PBO)
Related Topics: Vertex Buffer Object (VBO), Frame Buffer Object (FBO)
Download: pboUnpack.zip, pboPack.zip
- Overview
- Creating PBO
- Mapping PBO
- Example: Streaming Texture Uploads with PBO
- Example: Asynchronous Readback with PBO
Overview
OpenGL ARB_pixel_buffer_object extension is very close to ARB_vertex_buffer_object. It simply expands ARB_vertex_buffer_object extension in order to store not only vertex data but also pixel data into the buffer objects. This buffer object storing pixel data is called Pixel Buffer Object (PBO). ARB_pixel_buffer_object extension borrows all VBO framework and APIs, plus, adds 2 additional "target" tokens. These tokens assist the PBO memory manger (OpenGL driver) to determine the best location of the buffer object; system memory, AGP (shared memory) or video memory. Also, the target tokens clearly specify the bound PBO will be used in one of 2 different operations; GL_PIXEL_PACK_BUFFER_ARB to transfer pixel data to a PBO, or GL_PIXEL_UNPACK_BUFFER_ARB to transfer pixel data from PBO.
For example, glReadPixels() and glGetTexImage() are "pack" pixel operations, and glDrawPixels(), glTexImage2D() and glTexSubImage2D() are "unpack" operations. When a PBO is bound with GL_PIXEL_PACK_BUFFER_ARB token, glReadPixels() reads pixel data from a OpenGL framebuffer and write (pack) the data into the PBO. When a PBO is bound with GL_PIXEL_UNPACK_BUFFER_ARB token, glDrawPixels() reads (unpack) pixel data from the PBO and copy them to OpenGL framebuffer.
The main advantage of PBO are fast pixel data transfer to and from a graphics card through DMA (Direct Memory Access) without involing CPU cycles. And, the other advantage of PBO is asynchronous DMA transfer. Let's compare a conventional texture transfer method with using a Pixel Buffer Object. The left side of the following diagram is a conventional way to load texture data from an image source (image file or video stream). The source is first loaded into system memory, and then, copied from system memory to an OpenGL texture object with glTexImage2D(). These 2 transfer processes (load and copy) are all performed by CPU.
On the contrary in the right side diagram, the image source can be directly loaded into a PBO, which is controlled by OpenGL. CPU still involves to load the source to the PBO, but, not for transferring the pixel data from a PBO to a texture object. Instead, GPU (OpenGL driver) manages copying data from a PBO to a texture object. This means OpenGL performs a DMA transfer operation without wasting CPU cycles. Further, OpenGL can schedule an asynchronous DMA transfer for later execution. Therefore, glTexImage2D() returns immediately, and CPU can perform something else without waiting the pixel transfer is done.
There are 2 major PBO approaches to improve the performance of the pixel data transfer: streaming texture update and asynchronous read-back from the framebuffer.
Creating PBO
As mentioned earlier, Pixel Buffer Object borrows all APIs from Vertex Buffer Object. The only difference is there are 2 additional tokens for PBOs: GL_PIXEL_PACK_BUFFER_ARB and GL_PIXEL_UNPACK_BUFFER_ARB. GL_PIXEL_PACK_BUFFER_ARB is for transferring pixel data from OpenGL to your application, and GL_PIXEL_UNPACK_BUFFER_ARB means transferring pixel data from an application to OpenGL. OpenGL refers to these tokens to determine the best memory space of a PBO, for example, a video memory for uploading (unpack) textures, or system memory for reading (pack) framebuffer. However, these target tokens are solely hint. OpenGL driver decides the appropriate location for you.
Creating a PBO requires 3 steps;
- Generate a new buffer object with glGenBuffersARB().
- Bind the buffer object with glBindBufferARB().
- Copy pixel data to the buffer object with glBufferDataARB().
If you specify a NULL pointer to the source array in glBufferDataARB(), then PBO allocates only a memory space with the given data size. The last parameter of glBufferDataARB() is another performance hint for PBO to provide how the buffer object will be used. GL_STREAM_DRAW_ARB is for streaming texture upload and GL_STREAM_READ_ARB is for asynchronous framebuffer read-back.
Please check VBO for more details.
Mapping PBO
PBO provides a memory mapping mechanism to map the OpenGL controlled buffer object to the client's address space. So, the client can modify a portion of the buffer object or the entire buffer by using glMapBufferARB() and glUnmapBufferARB().
glMapBufferARB() returns the pointer to the buffer object if success. Otherwise it returns NULL. The target parameter is GL_PIXEL_PACK_BUFFER_ARB or GL_PIXEL_UNPACK_BUFFER_ARB. The second parameter, access specifies what to do with the mapped buffer; read data from the PBO (GL_READ_ONLY_ARB), write data to the PBO (GL_WRITE_ONLY_ARB), or both (GL_READ_WRITE_ARB).
Note that if GPU is still working with the buffer object, glMapBufferARB() will not return until GPU finishes its job with the corresponding buffer object. To avoid this stall(wait), call glBufferDataARB() with NULL pointer right before glMapBufferARB(). Then, OpenGL will discard the old buffer, and allocate new memory space for the buffer object.
The buffer object must be unmapped with glUnmapBufferARB() after use of the PBO. glUnmapBufferARB() returns GL_TRUE if success. Otherwise, it returns GL_FALSE.
Example: Streaming Texture Uploads
Download the source and binary: pboUnpack.zip.
This demo application uploads (unpack) streaming textures to an OpenGL texture object using PBO. You can switch to the different transfer modes (single PBO, double PBOs and without PBO) by pressing the space key, and compare the performance differences.
The texture sources are written directly on the mapped pixel buffer every frame in the PBO modes. Then, these data are transferred from the PBO to a texture object using glTexSubImage2D(). By using PBO, OpenGL can perform asynchronous DMA transfer between a PBO and a texture object. It significantly increases the texture upload performance. If asynchronous DMA transfer is supported, glTexSubImage2D() should return immediately, and CPU can process other jobs without waiting the actual texture copy.
To maximize the streaming transfer performance, you may use multiple pixel buffer objects. The diagram shows that 2 PBOs are used simultaneously; glTexSubImage2D() copies the pixel data from a PBO while the texture source is being written to the other PBO.
For nth frame, PBO 1 is used for glTexSubImage2D() and PBO 2 is used to get new texture source. For n+1th frame, 2 pixel buffers are switching the roles and continue to update the texture. Because of asynchronous DMA transfer, the update and copy processes can be performed simultaneously. CPU updates the texture source to a PBO while GPU copies texture from the other PBO.
// "index" is used to copy pixels from a PBO to a texture object // "nextIndex" is used to update pixels in the other PBO index = (index + 1) % 2; if(pboMode == 1) // with 1 PBO nextIndex = index; else if(pboMode == 2) // with 2 PBOs nextIndex = (index + 1) % 2; // bind the texture and PBO glBindTexture(GL_TEXTURE_2D, textureId); glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[index]); // copy pixels from PBO to texture object // Use offset instead of ponter. glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, 0); // bind PBO to update texture source glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[nextIndex]); // Note that glMapBufferARB() causes sync issue. // If GPU is working with this buffer, glMapBufferARB() will wait(stall) // until GPU to finish its job. To avoid waiting (idle), you can call // first glBufferDataARB() with NULL pointer before glMapBufferARB(). // If you do that, the previous data in PBO will be discarded and // glMapBufferARB() returns a new allocated pointer immediately // even if GPU is still working with the previous data. glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, DATA_SIZE, 0, GL_STREAM_DRAW_ARB); // map the buffer object into client's memory GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY_ARB); if(ptr) { // update data directly on the mapped buffer updatePixels(ptr, DATA_SIZE); glUnmapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB); // release the mapped buffer } // it is good idea to release PBOs with ID 0 after use. // Once bound with 0, all pixel operations are back to normal ways. glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
Example: Asynchronous Read-back
Download the source and binary: pboPack.zip.
This demo application reads (pack) the pixel data from the framebuffer (left-side) to a PBO, then, draws it back to the right side of the window after modifying the brightness of the image. You can toggle PBO on/off by pressing the space key, and measure the performance of glReadPixels().
Conventional glReadPixels() blocks the pipeline and waits until all pixel data are transferred. Then, it returns control to the application. On the contrary, glReadPixels() with PBO can schedule asynchronous DMA transfer and returns immediately without stall. Therefore, the application (CPU) can execute other process right away, while transferring data with DMA by OpenGL (GPU).
This demo uses 2 pixel buffers. At frame n, the application reads the pixel data from OpenGL framebuffer to PBO 1 using glReadPixels(), and processes the pixel data in PBO 2. These read and process can be performed simultaneously, because glReadPixels() to PBO 1 returns immediately and CPU starts to process data in PBO 2 without delay. And, we alternate between PBO 1 and PBO 2 on every frame.
// "index" is used to read pixels from framebuffer to a PBO // "nextIndex" is used to update pixels in the other PBO index = (index + 1) % 2; nextIndex = (index + 1) % 2; // set the target framebuffer to read glReadBuffer(GL_FRONT); // read pixels from framebuffer to PBO // glReadPixels() should return immediately. glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[index]); glReadPixels(0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, 0); // map the PBO to process its data by CPU glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[nextIndex]); GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB); if(ptr) { processPixels(ptr, ...); glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB); } // back to conventional pixel operation glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);
PBO,即Pixel Buffer Object也是用于GPU的扩展(ARB_vertex_buffer_object)。这里的缓存当然就是GPU的缓存。PBO与VBO扩展类似,只不过它存储的是像素数据而不是顶点数据。PBO借用了VBO框架和所有API函数形式,并加了上两个"target"标志。这两个标识是:
- GL_PIXEL_PACK_BUFFER_ARB 将像素数据传给PBO
- GL_PIXEL_UNPACK_BUFFER_ARB 从PBO得到像素数据
这里的“pack”还是“unpack”,可分别理解为“传给”和“得到”。它们也都可以统一理解为“拷贝”,也就是像素数据的“传递”。
图1 opengl PBO
图2 不使用PBO的纹理载入
图3 使用PBO的纹理载入
生成PBO
映射PBO
____________________________________________________
Demo
例子程序pboUnpack.zip使用了不同方式来比较将纹理传给OpenGL的模式:
- 使用一个PBO;
- 使用两个PBO;
- 不使用PBO;
通过按空格键,可以不同模式间切换。
在PBO模式下,每帧的纹理源(像素)都是在映射PBO状态下直接写进去的。再通过调用glTexSubImage2D将PBO中的像素传递给纹理对象。通过使用纹理对象可以在PBO和纹理对象间进行异步DMA传递。它能大大提高像素传递的性能。
由于glTexSubImage2D立即返回,因此CPU能够直接进行其它工作,无需等待实际的像素传递。
图4 两个PBO更新纹理
为了将像素传递的性能最大化,可以使用多个PBO对象。图4中表明同时使用了两个PBO。在glTexSubImage2D将像素数据从PBO拷贝出来的同时,另一份像素数据写进了另一个PBO。
在第n帧时,PBO1用于glTexSubImage2D,而PBO2用于生成一个新的纹理对象了。再到n+1帧时,两个PBO则互换了角色。由于异步DMA传递,像素数据的更新和拷贝过程可同时进行,即CPU将纹理源更新到PBO,同时GPU将从另一PBO中拷贝出纹理。
例子程序pboPack.zip从窗口的左边读出(pack)像素数据到PBO,在更改它的亮度后,把它在窗口的右边绘制出来。通过按空格键,可以看glReadPixels的性能。
传统使用glReadPixels将阻塞渲染管道(流水线),直到所有的像素数据传递完成,才会将控制权交还给应用程序。相反,使用PBO的glReadPixels可以调度异步DMA传递,能够立即返回而不用等待。因此,CPU可以在OpenGL(GPU)传递像素数据的时候进行其它处理。
图5 用两个PBO异步glReadPixels
例子程序也使用了两个PBO,在第n帧时,应用帧缓存读出像素数据到PBO1中,同时在PBO中对像素数据进行处理。读与写的过程可同时进行,是因为,在调用glReadPixels时立即返回了,而CPU立即处理PBO2而不会有延迟。在下一帧时,PBO1和PBO2的角色互换。
OpenGL中渲染到纹理(Render to Texture)不同实现的比较
[align=center][b][size=6]渲染到纹理(Render to Texture) [/size][/b][/align][size=3][/size]
[size=3]
渲染到纹理实现方式:
1. 渲染到帧缓存,然后用glReadPixels将需要的部分读入到客户内存,然后使用glTexImage()函数创建纹理
[/size]
[size=3] 缺点:比较慢
2.渲染到帧缓存,然后使用glCopyTexImage() 直接从帧缓存创建纹理
3.渲染到帧缓存,然后使用glCopyTexSubImage() 从帧缓存中读出需要的部分更新纹理的一部分
4.使用 pbuffer,直接渲染到纹理[/size]
[size=3]
需要的扩展:
WGL_ARB_extensions_string
WGL_ARB_render_texture
WGL_ARB_pbuffer
WGL_ARB_pixel_format
缺点:
只可以在windows系统上使用。
每一个pbuffer在不同的OpenGL上下文中工作,管理麻烦
pbuffer之间切换开销很大
用pbuffer的DC 和RC作为渲染设备和渲染上下文执行渲染
5. 使用 framebuffer object (FBO)扩展: GL_EXT_framebuffer_object [/size]