CUDA SDK VolumeRender 分析 (3)

本文主要分析CUDA SDK sample如何同OpenGL相结合.

 

在CUDA中调用OpenGL主要有以下几个要点:

  1. Interoperability with OpenGL requires that the CUDA device be specified by cudaGLSetGLDevice() before any other runtime calls.
  2. Register resource to CUDA before mapping. 一个资源只需注册一次
  3. After registering to CUDA, a resource should be mapped before accessing with CUDA function and unmapped after accessing it by calling cudaGraphicsMapResources() and cudaGraphicsUnmapResources().
  4. A mapped resource can be read from or written to by kernels using the device memory address returned by cudaGraphicsResourceGetMappedPointer()
    for buffers and cudaGraphicsSubResourceGetMappedArray() for CUDA arrays.
  5. DO NOT access a resource through OpenGL or Direct3D while it is mapped to CUDA, cause it will produce undefined results.

 

整体伪代码

  1:   set_OpenGL_device(); 
  2:   register_resources();
  3:   
  4:   while( is_running )
  5:   {
  6:       map_resource();
  7:       resource_pointer *pointer = get_mapped_pointetr();
  8:       process_using_cuda( pointer );
  9:       unmap_resource();
 10:       do_normal_rendering();
 11:   }
 12:   unregister_resources();

选择设备
  1: // sets device as the current device for the calling host thread.
  2: extern __host__ cudaError_t CUDARTAPI cudaGLSetGLDevice(int device);

本例中被封装在chooseCudaDevice()函数中, 自动选择性能最佳的device.

 

资源创建和注册

这里使用的资源是Pixel Buffer Object : The buffer object storing pixel data is called Pixel Buffer Object (PBO). initPixelBuffer()函数负责创建并注册PBO.

  1:   // OpenGL pixel buffer object
  2:   GLuint pbo = 0;     
  3: 
  4:   // create pixel buffer object for display
  5:   glGenBuffersARB(1, &pbo);
  6:   glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pbo);
  7:   glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, width*height*sizeof(GLubyte)*4, 0, GL_STREAM_DRAW_ARB);
  8:   glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
  9: 
 10:   // register this buffer object with CUDA
 11:   cutilSafeCall(cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard));  

 

使用资源

  1: // CUDA Graphics Resource (to transfer PBO)
  2: struct cudaGraphicsResource *cuda_pbo_resource; 
  3: 
  4: // render image using CUDA
  5: void render()
  6: {
  7:   // Copy inverse view matrix to const device memory
  8:   copyInvViewMatrix(invViewMatrix, sizeof(float4)*3);
  9: 
 10:   // Map graphics resources for access by CUDA
 11:   uint *d_output;
 12:   cutilSafeCall(cudaGraphicsMapResources(1, &cuda_pbo_resource, 0));
 13: 
 14:   // Get CUDA device pointer
 15:   size_t num_bytes; 
 16:   cutilSafeCall(cudaGraphicsResourceGetMappedPointer((void **)&d_output, &num_bytes,  
 17:                    cuda_pbo_resource));
 18: 
 19:   // clear image
 20:   cutilSafeCall(cudaMemset(d_output, 0, width*height*4));
 21: 
 22:   // call CUDA kernel, writing results to PBO
 23:   render_kernel(gridSize, blockSize, d_output, width, height, density, brightness, transferOffset, transferScale);
 24: 
 25:   cutilSafeCall(cudaGraphicsUnmapResources(1, &cuda_pbo_resource, 0));
 26: }

这里的cutilSafeCall()是cuda util中的函数, 负责log错误. render_kernel()前面的d_render()函数负责写入计算出的颜色到PBO, d_output是map得到的供CUDA存取的指向PBO内存的指针.

 

显示

  1: // display results using OpenGL 
  2: void display()
  3: {
  4:     // use OpenGL to build view matrix
  5:     BuildViewMartix();
  6: 
  7:     // prepare pbo piexl
  8:     render();
  9: 
 10:     // display results
 11:     glClear(GL_COLOR_BUFFER_BIT);
 12: 
 13:     // draw image from PBO
 14:     glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
 15: 
 16:     // copy from pbo to texture
 17:     glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pbo);
 18:     glBindTexture(GL_TEXTURE_2D, tex);
 19:     glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, GL_RGBA, GL_UNSIGNED_BYTE, 0);
 20:     glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
 21: 
 22:     // draw textured quad
 23:     glEnable(GL_TEXTURE_2D);
 24:     glBegin(GL_QUADS);
 25:     glTexCoord2f(0, 0); glVertex2f(0, 0);
 26:     glTexCoord2f(1, 0); glVertex2f(1, 0);
 27:     glTexCoord2f(1, 1); glVertex2f(1, 1);
 28:     glTexCoord2f(0, 1); glVertex2f(0, 1);
 29:     glEnd();
 30: 
 31:     glDisable(GL_TEXTURE_2D);
 32:     glBindTexture(GL_TEXTURE_2D, 0);
 33: 
 34:     glutSwapBuffers();
 35:     glutReportErrors();
 36: 
 37:     cutilCheckError(cutStopTimer(timer));  
 38: 
 39:     computeFPS();
 40: }

其中render()就是上面写入PBO的函数, 这个display()函数是由glutDisplayFunc()注册的显示函数. 也就是渲染的全过程. 为了简化函数, 中间省略了一些统计和不核心的处理.

我们可以看到, 渲染的所有效果都是由CUDA通过volume render产生的, 最后OpenGL只是把结果作为一张图片贴在我们的视口上. 这里面有两个小细节glPixelStorei()函数修改数据对齐的单位, 详细介绍在这里. 第二是如何从PBO拷贝到纹理, Song Ho的OpenGL教程介绍的非常清楚, 我就不再赘述了.

 

看过以上几期的分析, 希望大家对Volume Render和CUDA能有一些新的理解, 欢迎大家与我讨论学习. 下一次想分析一下这个例子的一些细节技术.

 

参考:

CUDA C Programming Guide

posted @ 2011-07-18 20:58  nickolas  阅读(1274)  评论(0编辑  收藏  举报