Graphics cards

http://www.behardware.com/art/lire/845/

Understanding 3D rendering step by step with 3DMark11 - BeHardware
>> Graphics cards

Written by Damien Triolet

Published on November 28, 2011

URL: http://www.behardware.com/art/lire/845/

Page 1

Introduction

The representation of real-time 3D in modern games has become so complex that the old adage of a picture being as good as a thousand words is generally speaking a hard one to follow here. While it’s relatively easy to illustrate most of the graphics effects with particular examples, it’s much harder to represent them as stages of a full rendering. Nevertheless this is what’s required if we want to understand how images in recent games are constructed.

Although we will be going into the stats and other technical detail in this report, we have also come across an ideal example that allows us to illustrate 3D rendering in practice and somewhat demystify the process.

3DMark 11
Since the release 3DMark 11 about a year ago, we have been getting to grips with its inner workings so as to see if it did indeed represent the sort of implementation of DirectX 11 that would serve help us judge the capabilities of current GPUs in games to come. This process has taken us some time given the thousands of rendering commands to be observed and the various bugs and other limitations of the analytical tools on offer from AMD and NVIDIA and these complications have meant we have had to put the report on hold on several occasions.

While these observations have enabled us to formulate a critique of how 3DMark 11 puts DirectX11 innovations into practice – something we’ll be coming back to in a forthcoming report – they also represent an opportunity for us to shed some light, using some clear visuals, on the different stages required in the construction of the type of real-time 3D rendering used in recent video games, namely deferred rendering. Deferred rendering consists in preparing all the ingredients needed for the construction of an image in advance, storing them in intermediate memory buffers and only combining them to compute the lighting once the whole scene has been reviewed, so as to avoid processing hidden pixels.

If they’re doing their work properly, developers make the effort to optimise the slightest details of the 3D rendering they have gone for, which, in terms of the level at which we are able to observe it, results in blurring the edges between the different stages that make a rendering up or even removing any separation between these stages altogether. The situation is slightly different for Futuremark, the developer behind 3DMark11, as their goal is to compare the performance of different graphics cards with modern rendering techniques in as objective a way as possible and not to try and implement all the deepest optimisations. This is what has allowed us to take some ‘snapshots’ of the image construction process.

We have added some stats to our snapshots to enable us to give you an idea of the complexity of modern rendering. We will also give you an explanation of some of the techniques used. With a view to allowing as many readers as possible to understand how 3D works, we have put the most detailed explanations in insets and included a summary of the different stages on the last page of the report.
Those for whom the words "normal map" or "R11G11B10_FLOAT" mean nothing will therefore be able to visualise simply and rapidly how a 3D image is constructed.

Page 2

Deferred rendering, our observations

Deferred rendering
Before getting into more detail we want to describe the type of rendering observed. 3DMark 11 and more and more games with advanced graphics use deferred rendering, with Battlefield 3 probably representing the most advanced implementation. Standard or forward rendering consists in computing lighting triangle by triangle as objects are processed. Given that some triangles or pieces of them end up being masked by others, forward rendering implies the calculation of many pixels that don’t actually show up in the image. This can result in a very significant waste of processing resources.

Deferred rendering provides a solution to this problem by calculating only the basic components of the lighting (including textures) when it initially takes stock of all the objects in a scene. This data is then stored in temporary memory buffers known as Render Targets (RT) (together they make up the g-buffer) and used later for the final calculation of lighting. This process can be seen as a kind of post-processing filter that is only implemented on the pixels displayed on screen. This saves processing power and makes it easier to manage complex lighting from numerous light sources.

However it can cause memory consumption to increase and the bandwidth required for the storage of all the intermediate data can block up the GPU during the early rendering stages. Disadavantages also include some challenges to manage multi-sample type antialiasing and transparent surfaces. Furturemark have put into place a solution for multi-sample antialiasing but have opted to keep things simple by ignoring transparent surfaces, which means you won’t see any windscreen on the 4x4 that appears in some scenes.

Our observations
To explain how 3D rendering works, we went for scene 3 in 3DMark 11, in Extreme mode, namely at 1920x1080 with 4x antialiasing. This scene has the advantage of showing day light.

We have segmented the rendering into stages that more or less correspond to the passes that structure 3D rendering. While modern GPUs can do an enormous number of things in a single pass (before writing a result to memory), it is simpler, more efficient and sometimes compulsory to go for several rendering passes. This is moreover a fundamental part of deferred rendering and post processing effects.

We have extracted visuals to represent each stage as clearly as possible. Given that certain Render Targets are in HDR, a format that can’t be directly displayed, we have had to modify them slightly to make them more representative.

For those who really want to get into the detail, we have added technical explanations and a certain amount of information linked to each pass along with stats obtained in GPU Perf Studio:

Rendering time: the time (in ms) taken by the Radeon HD 6970 GPU to process the whole pass, with a small overhead linked to the measuring tools (+ % of total time for the rendering of the image).

Vertices before tessellation: number of vertices that fit into the GPU, excluding the triangles generated through tessellation.

Vertices after tessellation: number of vertices going out of the tessellator, including the triangles generated by tessellation.

Primitives: number of primitives (triangles, lines or points) which fit in the setup engine.

Primitives ejected from the rendering: number of primitives ejected from the rendering by the setup engine, either because they aren’t facing the camera and are therefore invisible or because they’re out of the field of view.

Pixels: number of pixels generated by the rasterizer (2.1 million pixels for a 1920x1080 area).

Elements exported by the pixel shaders: number of elements written to memory by the pixels shaders, of which there can be several per pixel generated by the rasterizer, ie. in the construction of the g-buffer.

Texels: number of texels (texturing components) read by texturing units; the more complex the filtering, the more there are.

Instructions executed: number of instructions executed by a Radeon HD 6970 for all shader processing.

Quantity of data read: total quantity of data read from both textures and RTs, in case of blending (with the exception of geometric and depth data).

Quantity of data written: total quantity of data written to the RTs (with the exception of depth data)

Note that these quantities of data are not the same as those that transit to video memory as GPUs implement numerous optimisations to compress them.

Page 3

Stage 1: clearing memory buffers

Stage 1: clearing memory buffers
The first stage in any 3D rendering is the least interesting and consists in resetting the memory buffer zones, known as Render Targets (RTs) to which the GPU writes data. Without this the data defining the previous image will interfere with the new image to be computed.

In certain types of rendering, RTs can be shared between several successive images, to accumulate information for example. Here of course they aren’t reset. 3DMark 11 doesn’t however share any data beween successive images, which is a requirement for maximum efficiency in a multi-GPU set up.

Resetting all these buffers basically means stripping all the values they contain back to zero, which corresponds to a black image. Recent GPUs carry out this process very rapidly, depending on the size of the memory buffers.

When the rendering is initialised, 3DMark 11 resets 7 RTs very rapidly: 0.1ms or 0.1% of the rendering time. Later five very large RTs dedicated to shadows will also have to be reset, taking the total time taken up with this thankless task to 1.4ms, or 1.1% of the overall rendering time.

Page 4

Stage 2: filling the g-buffer

Stage 2: filling the g-buffer
After preparing the RTs, the engine starts a first geometric pass: filling the g-buffer. At this relatively heavy stage all the objects that make up the scene are taken into account and processed to fill the g-buffer. This includes tessellation and the application of the different textures.

Objects can be presented to the GPU in different formats.
3DMark 11 uses instancing as often as possible, a mode that allows you to send a series of identical objects (eg. all the leaves, all the heads that decorate the columns and so on) with a single rendering command (draw call). Limiting the number of these reduces CPU consumption. There are 91 in all in this main rendering pass, 42 of which use tessellation. Here are some examples:

Rendering commands: [ 1 ][ 6 ][ 24 ][ 35 ][ 86 ]

The g-buffer consists of 4 RTs at 1920x1080 with multi-sample type antialiasing (MSAA) 4x. Note that if you look carefully you can see a small rendering bug:

[ Z-buffer ]
[ Normals ]
[ Diffuse colours ]
[ Specular colours ]

The Depth Buffer, or Z-buffer, is in D32 format (32-bit). It contains depth information for each element with respect to the camera: the darker the object the closer it is.

The normals (= perpendicular to each point) are in R10G10B10A2_UNORM (32 bits, 10-bit integer for each component). They allow the addition of details to objects via a highly developed bump mapping technique.

The diffuse components of pixel colours are in the R8G8B8A8_UNORM (32 bits standard, 8-bit integer for each component) format, they represent a uniform lighting which takes into account the angle at which the light hits an object but ignores the direction of the reflected light.

The specular components of pixel colours are in the R8G8B8A8_UNORM (standard 32 bits, 8-bit integer per component) format and here they take account of the direction of the reflected light, which means glossy objects can be designed with a slight light reflection on the edge.

The last of the rendering commands is for the sky, which is represented by an hemisphere that englobes the scene. Given that the sky is not lit like the other parts of the scene but is itself a luminous surface, it is rendered directly and not with deferred rendering, which starts the construction of the final image:

A few stats:

Rendering time: 18.2 ms (14.5 %)
Vertices before tessellation: 0.91 million
Vertices after tessellation: 1.95 million
Primitives: 1.90 million
Primitives ejected from the rendering: 1.02 million
Pixels: 8.96 million
Elements exported by the pixel shaders: 30.00 million
Texels: 861.31 million
Instructions executed: 609.53 million
Quantity of data read: 130.2 MB
Quantity of data written: 158.9 MB

Page 5

Stage 3: ambient occlusion

Stage 3: ambient occlusion
The lighting in 3DMark 11 tries to get as close as possible to the principle of global illumination (radiosity, ray-tracing and so on), which is very heavy on resources but which takes refractions and reflections and therefore indirect illumination, (ie. the light reflected by any object in the scene) into account. To get close to this type of rendering, Futuremark uses various simulated effects:

- A directional light coming from the ground and numerous fill lights that simulate the sunlight transmitted indirectly from the ground and surrounding objects. We’ll cover this further when we come to lighting passes.

- An ambient occlusion texture that simulates soft shadows generated by the deficit of indirect light, which can’t be represented by the first effect (not precise enough). Here’s what it looks like:

Ambient occlusion, written to an RT in R8_UNORM (8-bit integer) format is calculated from the Depth Buffer and normals in such a way as to take account of all the geometric details, even those simulated from bump mapping as is the case in the HDAO from AMD that is used in several games. With the Extreme preset, 5x6 samples are selected with a random parameter and used to determine ambient occlusion. You can find more detail on this subject in our report on ambient occlusion.

A few stats:

Rendering times: 2.3 ms (1.8%)
Vertices before tessellation: 6
Vertices after tessellation: -
Primitives: 2
Primitives ejected from the rendering: 0
Pixels: 2.59 million
Elements exported by pixel shaders: 2.59 million
Texels: 78.80 million
Instructions executed: 626.23 million
Quantity of data read: 73.3 MB
Quantity of data written: 3.0 MB

Page 6

Stage 4: antialiasing

Stage 4: antialiasing
As deferred rendering isn’t directly compatible with standard MSAA type antialiasing, notably because the lighting isn’t calculated during geometry processing, Futuremark had to set up an alternative technique. It consists in the creation of a map of edges which is used to filter them during the calculation of lighting, as MSAA would have done:

Up until here, all the RTs were rendered with MSAA 4x antialiasing as Futuremark opts not to use post processing antialiasing such as FXAA and MLAA, provided by NVIDIA and AMD for video games developers.

MSAA isn’t however natively compatible with deferred rendering, which is only designed to calculate lighting once per pixel and therefore ignores the samples that make it up. One rather rough and ready approach would be to switch, at this moment, to something similar to super sampling, which is facilitated by DirectX 10.1 and 11. That would however mean calculating lighting at 3840x2160, would waste a lot of resources and would work against the very definition of deferred rendering.

Futuremark went for something else, a hybrid between MSAA and post-processing. Like post-processing, it consists of using an algorithm capable of detecting the edges that need to be smoothed using the g-buffer data. Although not perfect (that would be too resource heavy), this algorithm does a good job to detect those edges that are external to objects (there’s no need to filter internal edges).

This RT, in R8_UNORM (8 bits integer) format, which contains the edges detected will be used during all the lighting passes to come to mark out the complex pixels that require particular attention. Dynamic branching in the pixel shaders enables calculation of the value of the mix of the four samples, as would have been the case with a standard use of MSAA.

At the same time the RT in which the image is constructed and which only contains the sky up until this point, as well as the Depth Buffer, in MSAA 4x format at first, can be filtered here as the additional information they contain will not be of any use hereafter. The RTs which contain the diffuse and specular components of pixel colours must however be conserved in MSAA 4x format, as the additional samples they contain will be required in the calculation of complex pixels.

A few stats:

Rendering times: 1.4 ms (1.1 %)
Vertices before tessellation: 3
Vertices after tessellation: -
Primitives: 2
Primitives ejected from the rendering: 0
Pixels: 2.07 million
Elements exported by pixel shaders: 6.22 million
Texels: 39.43 million
Instructions executed: 185.44 million
Quantity of data read: 182.3 MB
Quantity of data written: 9.9 MB

Page 7

Stage 5: shadows

Stage 5: shadows
3DMark 11 can generate shadows linked to directional lights (the sun or the moon) and spot lights (not present in test 3). In both cases shadow mapping is used. This technique consists in projecting all the objects in the scene from the point of view of the source of light and only retaining a Z-buffer which is then called a shadow map. In contrast to what its name (shadow map) might lead you to think, a shadow texture is not applied to the image.

A shadow map shows, for each of its points, the distance from the light source at which objects are in shadow. A pixel’s position is then simply cross checked with the information in the shadow maps to ascertain whether it’s lit or in shadow.

For directional light sources, 3DMark 11 uses a little variant: cascaded shadow maps (CSM). Given the immense area lit by the sun, it’s difficult, even at very high resolution (4096x4096) to get enough precision for shadows, which tend to pixelise. CSMs provide a solution to this by working with several levels of shadow maps which focus on a progressively smaller area in the view frustum, so as to conserve optimal quality.

In Extreme mode 3DMark 11 creates five shadow maps of 4096x4096 which are generated from 339 rendering commands of which 142 use tessellation. This represents one of the largest loads of the scene. The darker an object is, the closer it is to the light source:

The scene from the sun: [ CSM 1 ][ CSM 2 ][ CSM 3 ][ CSM 4 ][ CSM 5 ]

Although it’s possible to calculate all these shadow maps first followed by the lighting afterwards, Futuremark has decided to interleave them, which probably makes light processing a little less efficient but avoids putting excessive demands on memory space. At any given moment then, there is never more than a single shadow map in the video memory, which is partly why 3DMark 11 can still run pretty well on graphics cards equipped with just 768 MB, or even 512 MB.

As with the creation of the g-buffer, we’re talking about geometric passes here given that the whole scene must be taken into account, or at least a subset of it for the lower level CSMs. Tessellation is also used as the shadows must correspond to the objects that make them and this can represent an enormous processing load. In contrast to the pass for the creation of the g-buffer however, no colour data is calculated, only depth. Since Doom 3 and the introduction of the GeForce FXs, GPUs have been able to increase their throughput to a great extent in this simplified rendering mode.

Note this exception: objects such as vegetation, generated from false geometry, namely alpha tests, are not processed in this fast mode as pixels must then be generated so that they can be placed in the scene.

A few stats:

Rendering times: 22.6 ms (17.9 %)
Vertices before tessellation: 3.35 million
Vertices after tessellation: 8.91 million
Primitives: 8.50 million
Primitives ejected from the rendering: 5.17 million
Pixels: 83.67 million
Elements exported by the pixel shaders: 24.03 million
Texels: 416.66 million
Instructions executed: 725.13 million
Quantity of data read: 50.5 MB
Quantity of data written: 0.0 MB (the depth data isn’t taken into account)

Page 8

Stage 6: primary lights

Stage 6: primary lights
After preparing the data required for the creation of shadows, 3DMark 11 moves on to the rendering of the primary light sources, which take the shadows into account. These sources of light may be directional (sun, moon…) or spot type. There are no spot sources in the scene observed here but there is light from the sun. Five cascade shadow maps are required for the shadows generated by the sun across the scene. Calculation of these shadow maps is interleaved with the rendering of the lighting in the area of the field of view they cover so that they don’t monopolise the video memory too much.

This means that 3DMark 11 requires five passes to compute the directional lighting to simulate light from the sun (LD2a/b/c/d/e). An additional pass is used to help simulate the global illumination and more particularly the light from the sun reflected by the ground, as this then itself becomes a low intensity source of directional light (LD1). Thus the light accumulates little by little in the image under preparation:

[ Sky ] + [ LD 1 ] + [ LD2a ] + [ LD2b ] + [ LD2c ] + [ LD2d ] + [ LD2e ]

This image under preparation, in R11G11B10_FLOAT (fast HDR 32-bit) format, represents surface lighting, the model for which is a combination of diffuse Oren-Nayar reflectance and Cook-Torrance specular reflectance as well as Rayleigh-Mie type atmospheric attenuation. In addition to the shadow maps, it takes into account the ambient occlusion calculated previously.

In parallel to the surface lighting, volumetric lighting is also calculated. See the page on this for further details. Its performance cost is however included in the figures given here as it’s processed in the same pixel shader as surface lighting.

A few stats:

Rendering times: 24.7 ms (19.6 %)
Vertices before tessellation: 18
Vertices after tessellation: -
Primitives: 6
Primitives ejected from the rendering: 0
Pixels: 8.13 million
Elements exported by pixel shaders: 14.18 million
Texels: 390.91 million
Instructions executed: 2567.59 million
Quantity of data read: 1979.2 MB
Quantity of data written: 54.6 MB

Page 9

Stage 7: secondary lights

Stage 7: secondary lights
To simulate global illumination, 3DMark 11 also calls on numerous secondary point lights. They represent a point which sends light in all directions. In the 3DMark 11 implementation, these are fill lights which ‘fill’ the light space and are thus part of the simulation effects taken into account for global illumination. More specifically, each of these light sources slightly illuminates the area it covers (a cube):

There are no fewer than 84 of these point lights in our test scene:

[ Directional lights ] + [ Fill lights ]

The point lights don’t generate any shadow as ambient occlusion simulates them at a lower processing cost. 3DMark 11 processes them in 2 passes to take into account a special case : when their volume of influence intersects the camera near plane.

Volumetric lighting can be computed for fill lights as well but it is not the case in our test scene.

Given the number of point lights, this part of the process represents a significant component of the rendering time.

A few stats:

Rendering times: 33.7 ms (26.8 %)
Vertices before tessellation: 688
Vertices after tessellation: -
Primitives: 1008
Primitives ejected from the rendering: 853
Pixels: 45.87 million
Elements exported by the pixel shaders: 45.87 million
Texels: 369.86 million
Instructions executed: 9073.06 million
Quantity of data read: 1494.2 MB
Quantity of data written: 177.6 MB

Page 10

Stage 8: volumetric lighting

Stage 8: volumetric lighting
3DMark 11 uses volumetric lighting to simulate the rays of sun that shine through the atmosphere, or water in underwater scenes. This approximation uses a ray creation technique and is generated progressively over the course of the previous lighting passes that, to recap, represent the gound (LD1) and the sun (LD2a/b/c/d/e):

[ LD1 ] + [ LD2a ] + [ LD2b ] + [ LD2c ] + [ LD2d ] + [ LD2e ]

The last lighting pass simply integrates this volumetric component in the final image, still under construction :

[ Without volumetric lighting ] [ With volumetric lighting ]

Volumetric lighting is obtained by an approximation for each pixel of light dispersed by the atmosphere (or water) between the object and the surface being observed and the camera. One ray is sent per pixel and per light source with sampling carried out at several depth levels.

Note that while the optical density is fixed for the atmosphere, for the water it’s precomputed for each image (as well as the resulting accumulated transmittance) in an array of 2D textures. This stage takes place right at the beginning of the rendering, but isn’t required in the scene we’re looking at.

A few stats:

Rendering times: 0.7 ms (0.6 %)
Vertices before tessellation: 3
Vertices after tessellation: -
Primitives: 2
Primitives ejected from the rendering: 0
Pixels: 2.07 million
Elements exported by the pixel shaders: 2.07 million
Texels: 33.18 million
Instructions executed: 232.24 million
Quantity of data read: 15.9 MB
Quantity of data written: 7.9 MB

Page 11

Stage 9: depth of field effect

Stage 9: depth of field effect
For the Depth of Field (DoF) effect, 3DMark uses a more complex technique than a simple post-processing filter. It’s similar to the "Sprite-based Bokeh Depth of Field" that’s used in Crysis 2. Basically this technique consists in stretching every pixel that isn’t in the sharp area of the image using the geometry shaders introduced in DirectX 10, to a proportion corresponding to the blurriness of the pixel. Here’s what it gives on a section of the image (click on the links to get the full image):

[ Without DoF ] [ With DoF ]

This type of depth of field effect uses the geometry shaders to generate a sprite (2 triangles that face the camera) for each pixel that must be blurred. The size of this sprite depends on the circle of confusion, which is computed beforehand in a 16-bit floating point buffer, and a hexagonal bokeh is used to simulate a diaphragm with six blades.

This operation is carried out in a 64-bit HDR format, R16G16B16A16_FLOAT, at full resolution as well as at a resolution divided by 2 and 4. Each pixel to be processed is sent to one of these resolutions depending on the size of its circle of confusion and they are combined afterwards to finalise the depth of field effect that can then be added to the final image.

The darker a pixel, the smaller its circle of confusion. Here white pixels represent pixels whose circle of confusion is higher than the value beyond which they are no longer sharp.

More than 2 million small triangles are generated in fuchsia.

A few stats:

Rendering times: 9.7 ms (7.7 %)
Vertices before tessellation: 1.10 million
Vertices after tessellation: -
Primitives: 2.20 million
Primitives ejected from the rendering: 0
Pixels: 22.41 million
Elements exported by the pixel shaders: 22.70 million
Texels: 93.12 million
Instructions executed: 217.96 million
Quantity of data read: 87.1 MB
Quantity of data written: 49.8 MB

Page 12

Stage 10: post-processing

Stage 10: post-processing
The last heavy processing rendering stage in 3DMark is post-processing, which includes different filters and optical effects: bloom, halos (lens flares) and reflections formed in the lenses, grain, tone mapping and resizing. Optical effects are calculated by the compute shaders and represent the biggest post-processing load. Tone mapping allows to interpret the HDR image while resizing simulates a large anamorphic format:

[ Before post-processing ] [ After post-processing ]

Post-processing is segmented into three stages: bloom + lens flares, internal lenses reflections and tone mapping + the grain. The last stage is the simplest: a relatively simple pixel shader combines the two effects.

The other two stages, which require a 128-bit HDR format (R32G32B32A32_FLOAT), are more complex and call on a fast Fourrier transform (FFT) four times which is executed via a succession of compute shaders. First of all, the image to be processed is reduced to a resolution that corresponds to the power of two directly above a quarter of the original resolution (1920 -> 480 -> 512). Next it’s transformed to frequency-domain from which the bloom and the lens flares on one hand and the reflections on the other take form by means of dedicated filters. In the first case, the filter must be computed in advance, corresponding to one of the four usages of the fast Fourrier transformation.

[ Filter ] + [ Image in frequency-domain ] ->[ Filter applied ]
->[ Reconstruction – inverse FFT ] = [ Bloom + lens flares ]
[ Lens reflections ]

A few stats:

Rendering times: 10.7 ms (8.5 %)of which 10.3 ms via compute shader (8.2%)
Vertices before tessellation: 22
Vertices after tessellation: -
Primitives: 24
Primitives ejected from the rendering: 0
Pixels: 3.44 million
Elements exported by the pixel shaders: 3.44 million
Texels: 104.99 million of which 72.48 million via compute shader
Instructions executed: 165.20 million of which 126.48 million via compute shader
Qunatity of data read: 819.1 MB of which 590.0 MB via compute shader
Quantity of data written: 615.1 MB of which 448.9 MB via compute shader

Page 13

Stage 11: interface

Stage 11: interface
The final stage is also the simplest: drawing the interface above the image that has just been calculated. For this, each of the elements that go to make it up are integrated in the form of a texture drawn on a quad (rectangle formed by two triangles):

A few stats:

Rendering times: 0.4 ms (0.03 %)
Vertices before tessellation: 96
Vertices after tessellation: -
Primitives: 46
Primitives ejected from the rendering: 0
Pixels: 82972
Elements exported by the pixel shaders: 76096
Texels: 86112
Instructions executed: 609.53 million
Quantity of data read: 0.6 MB
Quantity of data written: 0.3 MB

Page 14

The final image

The final image

Preparation : [ Objects ] ->[ G-buffer ] + [ Shadows ]
Lighting: [ Sky ] + [ Primary ] + [ Secondary ] + [ Volumetric ]
Post-processing + interface : [ Final image ]
To create an image such as this one, 3DMark 11 does not hold back in the deployment of resources and here it has processed 564 draw calls, 12 million triangles, 150 million pixels, 85 lights and 14 billion instructions!

This is enough to put any current DirectX 11 GPU on its knees, what with tessellation, geometry shaders, compute shaders, high quality shadows, depth of field effects and complex camera lenses effects, not to forget a lighting that is extremely resource heavy.

This sort of complexity will inevitably eventually turn up in video games, no doubt in more efficient forms. Crysis 2 and Battlefield 3 alone already use similar graphics engines with a few compromises when it comes to geometric load and lighting algorithms calibrated so as to run on current hardware.

We hope that this report will have given you a slightly clearer idea of how a modern graphics engine works. To finish up then, here are the final stats representing the load to be processed by the GPU:

Rendering times: 125.9 ms (= 8 fps)
Vertices before tessellation: 5.36 million
Vertices after tessellation: 11.97 million
Primitives: 12.61 million
Primitives ejected from the rendering: 6.19 million
Pixels: 179.29 million
Elements exported by the pixel shaders: 151.18 million
Texels: 2.39 billion
Instructions executed: 14.40 billion
Quantity of data read: 4.73 GB
Quantity of data written: 1.08 GB

posted @ 2014-07-18 14:40 Gui Kai 阅读(1252) 评论(0) 编辑收藏举报

刷新页面返回顶部

[ZZ] Understanding 3D rendering step by step with 3DMark11 - BeHardware >> Graphics cards

公告