len3d

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Memory Tips

Work in Progress. More diagrams and rewriting planned.

Note that much of the content of this tips page applies to options in standalone mental ray®, and that each OEM integration may or may not implement them as we suggest, because of its unique use of memory, and how it handles scene complexity. We hope to provide a foundation for understanding the issues. Much of this material has been put together from a variety of sources, including the on-line manual and training class material development.

Concepts for Memory Optimization

  • Render Process Memory
    • One thread per processing unit
    • Shared memory
  • Parallel rendering
    • Multiple hosts
    • Multiple threads available per host
  • mental ray cache manager
    • Scene Cache/Texture Cache
    • Dynamic job execution

The overriding factor to consider in mental ray memory usage is that it is designed from the ground up as a shared database so that it can leverage multiple processors working simultaneously on a render. Because of this, data may flow from one process to another, and from one render host to another, in a non-sequential execution process.

Computer system terminology with a bow toward understanding mental ray's use of resources.

  • A process executes a program such as mental ray within a given address space, with an associated set of system resources.
  • A thread is a single flow of control within a process, which may run one or more threads.
  • A processor, or CPU (central processing unit), typically executes one sequence of instructions at a time. In running mental ray, a single thread may execute on each available CPU.
  • A dual core processor design has 2 core processing units on one integrated circuit (IC) chip, each able to run a mental ray thread. Most identify this single IC chip as the CPU, though this naming can be confusing. Two processor cores can be placed on a single IC chip containing the processor, or CPU (central processing unit). In fact, now there are quad core processors available, which use 4 core processing units packed into a single CPU chip.
  • The physical memory is the physical amount of memory, or RAM, in your system.
  • A virtual memory setting may extend the size of memory seen by the system by using disk. A process uses as much virtual memory as the OS allows.
  • The size of the address space may impose limits to the available memory. Even if memory were infinite, 32-bit OSes have limits to the amount of memory they can address, usually limiting a given process to between 2-4 GB. For example, typically Windows limits to 2GB, but then there is overhead which further reduces the amount available for mental ray down to about 1.3GB.
  • A heap is an area of dynamically allocated memory as managed by the program in a process. It is shared between threads and from now on we'll refer to it as shared memory.
  • A stack is a working area of memory specific to each thread.

 

Let us further define some mental ray terminology.

  • The mental ray cache manager manages the loading and unloading of data.
  • The scene cache contains almost all the data shared by the render jobs.
  • The texture cache contains texture data for each given render host, only if textures are tiled or >4MB.
  • A job creates data, and may also require input data. This data resides in the scene cache. A job typically performs a general render sub-task, such as tesselating a region of geometry.
    • A task is a job that creates a section of image called a tile. The task size is the length of the side of a tile in pixel units.
  • The dynamic execution of a job means that a job is executed based on need of the data it creates. In other words, data is produced on demand. For example, a job that creates a tile may find out that it requires the data from a job that tesselates a piece of geometry.

 

Render Phases

Examining a render report can help you understand what happens in the various phases of the render. First the scene data base is constructed. Then, mental ray performs various pre-computing phases such as photon tracing and final gather point creation.

Once these are ready, the eye ray samples are calculated for each tile and filtered into pixels. Confusingly, we often call this the 'rendering' phase, but more accurately, it is the tile rendering phase. See Samples Tips for more details.

Memory Layout

There are many references on the web to show how memory is laid out typically in a computer. We will keep it simple and focus on the shared memory between threads in a given process called the heap. Note also that the memory address space will also contain the place the threads use for executing instructions called the stack.

The scene data base is constructed in the shared memory or heap and we call that the Scene cache. During scene input and preprocessing, we store the scene data including all the lights, cameras, objects, instances, options one sees in the scene description. Either this is read in from the scene file with standalone, or it is constructed directly through the mental ray api from a 3D application.

The scene data includes the basic object descriptions as defined in the scene file or the api. This is how the objects are represented before they may undergo tessellation to approximate either a surface, a subdivision surface, or a displacement. We call this pre-tessellated data the source geometry and it is part of the scene DAG data we read in or construct from the application.

After that, tessellation occurs, as well as BSP tree construction.

As shaders require image-based textures, they are also loaded into the scene cache.

Most of this data is permanently stored in shared memory, unless one specifies ways to make this data flushable. A flushable piece of data must be reconstructable, in case it is needed again.

What is using this data? The threads that are executing instructions. At tile rendering time, these threads are shooting rays and running material shaders, among other things. To remind ourselves, we'll include the thread area back in our next diagram for reference, noting that they also create thread-specific data. Below we show two threads, indicating either two processors, or a single dual-core processor.

It's getting crowded in there. Of course, your memory available may be much larger than your scene data, but if your scene data is larger than memory available, we need to make it flushable.

By using placeholders, we can construct source geometry on-demand, and thus make them flushable.

By using fine approximation, the tessellation of source geometry can also be reconstruced and made flushable.

The BSP tree itself can be made flushable if one uses the Large BSP tree.

Finally, if textures are tiled, they automatically become part of the texture cache which can flush and restore just the needed portion of a texture for a given set of samples.

Now all the memory used by the green sections are flushable. When does it know to flush something? This is where the memory limit comes into play.

When a render job in a thread needs to create more scene data, i.e., allocate a chunk of shared memory, it checks to see if adding the chunk will exceed this limit. If so, the mental ray cache manager will look to flush data which is classified as flushable.

Memory Limit

So now think of memory limit as a warning track for any new chunk of scene data that has to be put in shared memory. Here we simplify our diagram a bit, showing our flushable data on the top of our heap

If a new chunk of scene data is small enough to fit under the limit, the cache manager puts it in the heap.

But when a new chunk does not fit under the limit (blue chunk in diagram below), ...

the cache manager must flush some chunks of flushable scene data.

Then, it puts the new chunk in the heap. It is possible that the new chunk would not be flushable, in which case it would limit the flushable area more.

In our diagram, we assume we have made this also flushable.

Now as you may now see, bigger chunks may require more work by the cache manager.

More flushable chunks will need to be cleared.

And now the bigger chunk can be put in place.

And we hope that it is also flushable!

Usage Tips

Flushing memory - Without turning on flushable, reusable memory options in a scene, memory will continue to fill up to the limit with geometry and textures. The basic ways to turn on flushable resources include:

  • Placeholders - Reuse memory for source geometry
  • Approximation - Reuse memory with fine approximation
  • Texture cache - Resuse memory with tiled texture maps
  • BSP acceleration - Reuse memory with large BSP
Use these in combination with setting the appropriate memory limit, so that the mental ray cache manager knows when to kick out the least recently used data from the scene cache.

 

Be careful with glossy effects. Lots of glossy rays could require loading a lot of the scene into memory.

posted on 2008-02-08 09:59  Len3d  阅读(358)  评论(2编辑  收藏  举报