我的github

Hi, I‘m Daniel Cornel and together with my colleagues, I‘ve been working on a method for watertight incremental heightfield tessellation.

嗨,我是丹尼尔·科内尔,和我的同事们一起,我一直在研究一种水密增量高度场细分的方法。

前言:Heightfield tessellation is the process of creating a triangular surface that approximates a given heightfield. This is a well-researched topic, especially its application for terrain rendering, But the efficient tessellation of dynamic heightfields like the output of a flood simulation remains very challenging. Since the simulation data are generated on the fly, tessellation also has to happen on the fly, So we cannot rely on precomputed acceleration data structures as used in terrain rendering systems like ROAM, RASTeR, or geometry clipmaps.

高度场细分是创建近似给定高度场的三角形曲面的过程。这是一个很好的研究课题,但是,动态高度场的有效细分(如洪水模拟的输出)仍然非常具有挑战性。由于模拟数据是动态生成的,因此,我们不能依赖于地形渲染系统(如ROAM、RASTeR或几何剪图)中使用的预计算加速数据结构。

The goal of tessellation is that all created triangles have roughly the same size in screen space, like in this example.

细分的目标是所有创建的三角形在屏幕空间中具有大致相同的大小,如本示例中所示。

在左边,我们用绿色显示了地形高度场的三角形,用蓝色显示了水高度场,其中颜色的改变指示三角形大小的改变。

正如您在该缩放动画中看到的,三角形大小不断调整,以达到所需的屏幕空间大小。

This task is typically performed with the help of the hardware tessellation unit in modern GPUs, which is very efficient at subdividing triangles.

此任务通常在现代GPU中的硬件细分单元的帮助下执行,这在细分三角形时非常有效。

However, when I show you two consecutive frames of this zoom animation, you will notice that almost none of the triangles change their level of detail from one frame to the next.

Only a handful of triangles have to be changed, the rest could be reused from the last frame.

But this also includes reverting a subdivision when triangles are too small already.

然而,当我向您展示该缩放动画的两个连续帧时,您会注意到几乎没有一个三角形从一个帧到下一个帧改变其细节级别。
只有少数三角形需要更改,其余的可以从最后一帧重复使用。
但这也包括在三角形太小时恢复细分。

And this is something that hardware tessellation cannot do.

It can only subdivide triangles, but not merge them together again,

Which is why you have to create the entire triangulation from scratch every frame.

Instead, we want to cache the entire triangulation and only incrementally update it where needed, which is significantly faster than hardware tessellation.

这是硬件细分无法做到的。
它只能细分三角形,但不能再次将它们合并在一起,
这就是为什么必须在每一帧从头开始创建整个三角测量。
相反,我们希望缓存整个三角剖分,并仅在需要时进行增量更新,这比硬件细分快得多。

So how do we do this?

We start with a heightfield defined on a rectangular area

那么我们该怎么做呢?
我们从矩形区域上定义的高度场开始

And create a very coarse and regular initial triangulation that covers the entire area.

并创建覆盖整个区域的非常粗糙且规则的初始三角网。

This triangulation is made up of four different types of right triangles illustrated here by different colors.

 该三角剖分由四种不同类型的直角三角形组成,在这里用不同的颜色表示。

We now define subdivision rules to split these triangles into smaller versions of themselves.

A popular splitting scheme is the longest edge bisection,

我们现在定义细分规则,将这些三角形分割成更小的三角形。
流行的分割方案是最长的边缘平分,

Abstract—In this paper, we propose a method for the interactive visualization of medium-scale dynamic heightfields without visual artifacts. Our data fall into a category too large to be rendered directly at full resolution, but small enough to fit into GPU memory without pre-filtering and data streaming. We present the real-world use case of unfiltered flood simulation data of such medium scale that need to be visualized in real time for scientific purposes. Our solution facilitates compute shaders to maintain a guaranteed watertight triangulation in GPU memory that approximates the interpolated heightfields with view-dependent, continuous levels of detail. In each frame, the triangulation is updated incrementally by iteratively refining the cached result of the previous frame to minimize the computational effort. In particular, we minimize the number of heightfield sampling operations to make adaptive and higher-order interpolations viable options. We impose no restriction on the number of subdivisions and the achievable level of detail to allow for extreme zoom ranges required in geospatial visualization. Our method provides a stable runtime performance and can be executed with a limited time budget. We present a comparison of our method to three state-of-the-art methods, in which our method is competitive to previous non-watertight methods in terms of runtime, while outperforming them in terms of accuracy

摘要:本文中,我们提出了一种无视觉伪影的中尺度动态高度场的交互式可视化方法。我们的数据属于一个大到无法直接以全分辨率渲染的类别,但小到足以容纳GPU内存而无需预过滤和数据流。我们展示了这样的中等规模的未过滤洪水模拟数据的真实使用情况,这些数据需要实时可视化以用于科学目的。我们的解决方案有助于计算着色器在GPU内存中保持有保证的水密三角剖分,以依赖于视图的连续细节级别近似插值高度场。在每一帧中,通过迭代细化前一帧的缓存结果来增量更新三角测量,以最小化计算工作量。特别是,我们最小化了高度场采样操作的数量,以使自适应和高阶插值成为可行的选项。我们不限制细分的数量和可实现的细节级别,以允许地理空间可视化中所需的极端缩放范围。我们的方法提供了稳定的运行时性能,可以在有限的时间预算内执行。我们将我们的方法与三种最先进的方法进行了比较,在这三种方法中,我们的方法在运行时间方面与以前的非水密方法相比具有竞争力,同时在准确性方面优于它们。

1 INTRODUCTION

HEIGHTFIELD rendering is essential in many applications using geospatial visualization, particularly of 3D geoinformation systems for visualizing digital elevation models and environmental simulation data. Heightfields are discrete scalar fields, usually defined on regular or unstructured grids, and have no visual representation of their own. For visualization, a continuous surface must be reconstructed from the values by interpolation. The reconstructed surface then has to be approximated by triangulated geometry that GPUs can handle, which is called tessellation.

在使用地理空间可视化的许多应用中,尤其是用于可视化数字高程模型和环境模拟数据的3D地理信息系统中,HEIGHTFIELD(高度场)渲染是必不可少的。高度场是离散的标量场,通常定义在规则或非结构化网格上,并且没有自己的可视化表示。为了可视化,必须通过插值从值重建连续曲面。然后,重构的曲面必须由GPU可以处理的三角形几何体近似,这称为细分。

[插一句,Tessellation中文翻译为曲面细分技术。Tessellation细分曲面技术是AMD(ATI)常年研发多代的技术,经过多年发展最终被采纳成为DX11的一项关键技术,因此历来都是宣传重点。和光线追踪不同,现在的光栅化图形渲染技术的核心是绘制大量三角形来组成3D模型,而Tessellation技术就是利用GPU硬件加速,将现有3D模型的三角形拆分得更细小、更细致,也就是大大增加三角形数量,使得渲染对象的表面和边缘更平滑、更精细。]

Tessellation of heightfields is challenging, as application requirements often collide with hardware limitations. As current GPUs can only render a few million triangles in real time, it is not feasible to uniformly triangulate each cell of a heightfield with several hundred million cells or more. To overcome this problem, tessellation algorithms with view-dependent level of detail (LoD) have been proposed that decouple the complexity of the heightfield from its triangulation. However, each of these algorithms comes with its own set of limitations and trade-offs between the achievable LoD and interactivity. A common problem here is the inability to produce a watertight triangulation. Instead, many algorithms produce cracks caused by T-junctions between adjacent triangles leading to visual artifacts. Other limitations include lack of support for dynamic updates of the heightfield data, no efficient support for higher-order interpolation, a limited number of subdivisions per triangle, and an oftentimes lower than real-time performance. In the context of decision support for flood management as illustrated in Fig. 1, which provides a real-world use case for our proposed method, these limitations are too severe

高度场的细分具有挑战性,因为应用程序需求常常与硬件限制相冲突。由于当前的GPU只能实时渲染几百万个三角形,因此用数亿个或更多的单元来统一三角化高度场的每个单元是不可行的。为了克服这一问题,已经提出了具有视图相关细节级别(LoD)的细分算法,该算法将高度场的复杂性与其三角剖分分离。然而,这些算法中的每一个都有其自身的局限性,并在可实现的LoD和交互性之间进行权衡。这里的一个常见问题是无法产生严密的三角测量。相反,许多算法会产生由相邻三角形之间的T形连接导致的裂缝,从而导致视觉伪影。其他限制包括缺乏对高度场数据的动态更新的支持,没有对高阶插值的有效支持,每个三角形的细分数量有限,并且常常低于实时性能。在洪水管理决策支持的背景下,如图1所示,这为我们提出的方法提供了一个真实的用例,这些限制太严重了。

For this use cases, we require a tessellation algorithm that provides watertight triangulations of multiple large heightfields defined on regular or adaptive grids with higher-order interpolation enabling a wide zoom range with continuous LoD at steady real-time performance.

对于这种使用情况,我们需要一种细分算法,该算法提供在规则或自适应网格上定义的多个大高度场的不透水三角剖分,并具有高阶插值,从而能够在稳定的实时性能下实现连续LoD的宽缩放范围。

In this paper, we propose a novel GPU-based heightfield tessellation with adaptive LoD that fulfills all of these requirements. We tackle the most challenging aspect of guaranteed watertightness during parallel processing by maintaining triangle adjacency information to operate on the direct neighborhood of each triangle during subdivision and merging. To avoid corrupting changes of shared information of neighboring triangles during parallel processing, we employ task scheduling by graph coloring. Our solution also caches a computed triangulation as starting point for the next computation. As a result, only incremental changes are required instead of a complete recomputation, which greatly reduces the computational effort and thus significantly improves the runtime. This particularly benefits applications in geoinformation systems, where complex interpolation methods mean that sampling vertex positions accounts for a large part of the tessellation effort.

在本文中,我们提出了一种新的基于GPU的具有自适应LoD的高度场细分,满足了所有这些要求。我们通过在细分和合并过程中保持三角形邻接信息以对每个三角形的直接邻域进行操作,来解决并行处理期间保证水密性的最具挑战性的方面。为了避免在并行处理过程中损坏相邻三角形共享信息的变化,我们采用了通过图着色的任务调度。我们的解决方案还将计算的三角测量缓存为下一次计算的起点。因此,只需要增量更改,而不需要完全重新计算,这大大减少了计算工作量,从而显著提高了运行时间。这尤其有利于地理信息系统中的应用,其中复杂的插值方法意味着采样顶点位置占细分工作的很大一部分。

Our solution fills a gap in heightfield visualization, which is the artifact-free visualization of heightfields that fit into GPU memory completely, but are too large to be rendered without view-dependent LoD, with at least 60 frames per second. We focus on the concrete use case of visualizing a country-sized static terrain heightfield defined on an adaptive grid overlaid by a dynamic simulation heightfield in a focus area with all data provided as unfiltered scalar fields. Yet, our solution is largely independent of the underlying heightfield data, their sampling strategy, and the employed LoD metric such that it can be integrated into many existing systems with minimal effort. In summary, we contribute a novel approach for the adaptive tessellation of large heightfields that

我们的解决方案填补了高度场可视化的空白,这是高度场的无伪影可视化,完全适合GPU内存,但太大,无法在没有视图相关LoD的情况下渲染,每秒至少60帧。我们专注于可视化一个国家大小的静态地形高度场的具体用例,该地形高度场定义在一个自适应网格上,该网格由一个动态模拟高度场覆盖在聚焦区域中,所有数据都作为未过滤的标量场提供。然而,我们的解决方案在很大程度上独立于基础的高度场数据、其采样策略和所采用的LoD度量,因此它可以以最小的努力集成到许多现有系统中。总之,我们为大高度场的自适应细分提供了一种新方法

  • is guaranteed to be watertight 保证水密
  • minimizes computational effort by incrementally updating previous solutions 通过增量更新以前的解决方案,最大限度地减少计算工作量
  • provides stable and controllable real-time performance for interactive tasks 为交互式任务提供稳定可控的实时性能
  • allows for an unlimited number of subdivisions and levels of detail 允许无限数量的细分和细节级别
  • supports multiple and nested heightfields.支持多个嵌套高度场。

An implementation of our solution is provided in the supplemental material of this paper.

我们提出的解决方案的实现在补充材料中提供。

2 RELATED WORK 相关研究

Heightfield rendering in real time is a challenging task due to the combination of the required surface reconstruction from complex heightfield data with the efficient generation of a view-dependent visual representation. The most straightforward combination is achieved through ray casting [1], [2], [3], where the reconstructed surface is directly rendered through the intersection of a view ray with the polynomial surface of the heightfield’s interpolant. The downside of ray casting is that the interpolant has to be evaluated for each pixel of the screen, which means that rendering performance depends on both the complexity of the used surface reconstruction and the screen size. In practice, this limits surface reconstruction to simple interpolants such as bilinear interpolation, for which a ray-surface intersection can be calculated efficiently, and excludes more accurate methods such as kriging [4], local refinable splines [5], and adaptive bicubic interpolation [6]. In previous work, combining higher-order surface reconstruction with realtime ray casting has not been successful [7].

高度场实时渲染是一项具有挑战性的任务,因为需要从复杂高度场数据进行表面重建,并有效生成依赖于视图的视觉表示。最直接的组合是通过光线投射[1]、[2]、[3]实现的,其中通过视图光线与高度场插值的多项式曲面的相交直接渲染重建曲面。光线投射的缺点是必须为屏幕的每个像素评估插值,这意味着渲染性能取决于使用的表面重建的复杂性和屏幕大小。在实践中,这将曲面重建限制为简单的插值,如双线性插值,可以有效地计算射线曲面相交,并排除了更精确的方法,如克里金[4]、局部可再曲面样条[5]和自适应双三次插值[6]。在之前的工作中,将高阶表面重建与实时光线投射相结合尚未成功[7]。

This is why most approaches rely on the generation of a view-dependent triangulation as an approximation of the reconstructed surface. One option is to move the triangulation with the camera and resample the vertices on the content they currently cover, as demonstrated with the projected grid [8], [9], the persistent grid [10], [11], and the projected mesh [12]. These approaches provide implicit levels of detail with a consistent performance, but may introduce undersampling and hide important features. Another option is to maintain a triangulation fixed in space, but changing its LoD locally. There have been countless approaches on how to generate this triangulation, of which the most noteworthy ones have been surveyed by Pajarola and Gobbetti [13]. The geometry clipmap [14], [15] is one  of the most popular and robust triangulation approaches for real-time applications. However, it is limited by only providing a few discrete regions of decreasing resolution instead of continuous levels of detail, and only considers the 2D distance of triangles to the camera instead of the actual heightfield to determine the LoD.

这就是为什么大多数方法都依赖于生成依赖于视图的三角剖分作为重建曲面的近似。一个选项是使用相机移动三角测量,并对其当前覆盖的内容上的顶点重新采样,如投影网格[8]、[9]、持久网格[10]、[11]和投影网格[12]所示。这些方法提供了具有一致性能的隐式细节级别,但可能会引入采样不足并隐藏重要特征。另一种选择是在空间中保持固定的三角测量,但局部改变其LoD。关于如何生成这种三角测量,有无数种方法,其中最值得注意的是Pajarola和Gobbetti[13]。几何剪图[14],[15]是实时应用中最流行和最稳健的三角测量方法之一。然而,仅提供几个分辨率降低的离散区域,而不是连续的细节级别,并且仅考虑三角形到相机的2D距离,而不是实际的高度场来确定LoD,这是有限的。

Several approaches have been proposed to facilitate the evolving parallel processing capabilities of modern GPUs. Early hybrid approaches maintain a hierarchical representation of the adaptive subdivisions on the CPU, which is then sent to the GPU for meshing [16], which requires expensive CPU-GPU communication. For GPUs that cannot generate new geometry on the fly, patch-based mesh refinement [17], [18] can be used, which replaces a patch of a mesh with a precomputed refinement pattern. With support for the dynamic generation of geometry, tessellation can become much more flexible, as proposed for geometry shaders [19], [20], tessellation shaders facilitating the hardware tessellation unit [21], [22], [23], compute shaders [24], [25], and mesh shaders [26], [27]. Most approaches based on tessellation shaders are limited in the achievable LoD by a maximum edge tessellation factor of 64 for triangles. To achieve unlimited tessellation, several approaches [28], [29] subdivide the initial triangulation prior to hardware tessellation on the CPU. Lee et al. [30] avoid this CPU overhead by applying hardware tessellation recursively. However, while the hardware tessellation unit in GPUs is designed specifically for adaptively generating vast amounts of geometry, e.g. for subdivision surfaces, it cannot be used to revert the process to simplify finely triangulated geometry. This means that after camera changes, the entire triangulation has to be generated from scratch.

已经提出了几种方法来促进现代GPU不断发展的并行处理能力。早期的混合方法在CPU上保持自适应细分的分层表示,然后将其发送到GPU进行网格划分[16],这需要昂贵的CPU-GPU通信。对于不能动态生成新几何体的GPU,可以使用基于面片的网格细化[17],[18],它用预先计算的细化模式替换网格的面片。随着对几何体动态生成的支持,细分可以变得更加灵活,正如针对几何体着色器[19]、[20]、便于硬件细分单元的细分着色器[21]、[22]、[23]、计算着色器[24]、[25]和网格着色器[26]、[27]提出的那样。大多数基于细分着色器的方法在可实现的LoD中受到三角形最大边细分因子64的限制。为了实现无限细分,有几种方法[28]、[29]在CPU上进行硬件细分之前对初始三角剖分。Lee等人[30]通过递归应用硬件细分来避免这种CPU开销。然而,虽然GPU中的硬件细分单元是专门为自适应生成大量几何体(例如细分曲面)而设计的,但它不能用于还原简化精细三角化几何体的过程。这意味着在摄影机更改后,必须从头开始生成整个三角测量。

The approach by Khoury et al. [24] addresses this problem by caching triangles of a previous computation that are then subdivided and merged as needed. Extending this approach, Kerbl et al. [25] propose a different encoding for the cached triangles to accelerate the calculation of vertex coordinates, making this possibly the most efficient and flexible heightfield rendering approach to date. However, neither of the two approaches produces a watertight triangulation, and both of them are limited in the number of subdivisions by the data type used for triangle encoding. If the maximum number of subdivisions is exceeded at runtime, the initial triangulation has to be refined, forcing a recalculation of the entire view-dependent triangulation.

Khoury等人[24]的方法通过缓存先前计算的三角形来解决这个问题,然后根据需要对三角形进行细分和合并。扩展此方法,Kerbl等人[25]提出了缓存三角形的不同编码,以加速顶点坐标的计算,这可能是迄今为止最有效和灵活的高度场渲染方法。然而,这两种方法都不会产生严密的三角剖分,而且它们都受到用于三角编码的数据类型的细分数目的限制。如果在运行时超过了最大细分数,则必须细化初始三角剖分,从而强制重新计算整个视图相关的三角剖分。

Another extension to the approach of Khoury et al. using concurrent binary trees for triangle encoding [31] eliminates T-junctions, but requires a predefined maximum number of subdivisions, while the required pre-allocated memory grows exponentially with this number.

Khoury等人的方法的另一个扩展。使用并发二叉树进行三角形编码[31]消除了T形结,但需要预定义的最大细分数量,而所需的预分配内存随此数量呈指数增长。

3 OVERVIEW OF THE ALGORITHM 我们的算法简介

Our algorithm works on a set of triangles with adjacency and hierarchy information stored for each triangle. This set is maintained in GPU memory and is modified with compute shaders to achieve a sufficient triangle density in screen space at minimum cost. As illustrated in Fig. 2, it consists of three main steps, which are the determination of the triangle LoD states, the tessellation loop, and finally a rendering pass, with additional steps inbetween to optimize the pipeline. After a trivial coarse view-independent initial triangulation into right triangles, the rest of the pipeline is executed conditionally if an update of the result from the previous frame is required. Whenever the heightfield changes due to progression of a simulation or by reading another time step from the file system, all vertices of existing triangles are resampled on the heightfield. Succeedingly or every time the camera perspective or user-defined target LoD changes, the LoD state for all triangles is updated, which can be subdivide, merge, or keep unchanged. If all triangles are kept, they are rendered immediately.

我们的算法适用于一组三角形,每个三角形都存储有邻接和层次信息。该集保存在GPU内存中,并使用计算着色器进行修改,以最低成本在屏幕空间中获得足够的三角形密度。如图2所示,它由三个主要步骤组成,这三个步骤是确定三角形LoD状态、细分循环,最后是渲染过程,中间还有其他步骤来优化管道。在将简单的粗略视图无关的初始三角剖分为直角三角形后,如果需要更新前一帧的结果,则有条件地执行管道的其余部分。每当高度场由于模拟的进展或通过从文件系统读取另一个时间步长而改变时,现有三角形的所有顶点都会在高度场上重新采样。成功地或每次摄影机透视或用户定义的目标LoD更改时,所有三角形的LoD状态都会更新,可以细分、合并或保持不变。如果保留所有三角形,将立即渲染它们。

If changes are required through subdivision or merging, we enter the tessellation loop, which iteratively refines the triangles as needed. This is the core of the algorithm which requires special data structures and careful synchronization for concurrent processing, which is explained in detail in Section 8. Subdividing a triangle or merging triangles back together changes the adjacency information of neighboring triangles along the edges. To avoid overly complicated low-level synchronization between triangles, we use task scheduling based on graph coloring [32], for which we interpret the mesh as a graph where triangles are the nodes. The graph colors group the triangles into four separate groups that can be processed concurrently. Based on the LoD state, we perform the subdivision separately for the triangles of the four different graph color groups. During subdivision, the shared hypothenuse of two triangles is split, a new vertex is created and sampled on the heightfield.

如果需要通过细分或合并进行更改,我们将进入细分循环,根据需要迭代细化三角形。这是算法的核心,它需要特殊的数据结构和并发处理的小心同步,这将在第8节中详细解释。细分三角形或将三角形合并在一起会更改沿边的相邻三角形的邻接信息。为了避免三角形之间过于复杂的低级同步,我们使用基于图着色的任务调度[32],为此我们将网格解释为以三角形为节点的图。图形颜色将三角形分成四个单独的组,可以同时处理。基于LoD状态,我们分别对四个不同图形颜色组的三角形执行细分。在细分过程中,分割两个三角形的共享假设,创建新顶点并在高度场上采样。

Four new triangles are created, added as child triangles to their parents, and adjacency information of neighboring triangles is updated. For each of the new triangles, the LoD state is calculated and assigned. Then, the triangles are separated again by graph color for merging. During merging, four neighboring child triangles and their common vertex are removed and the adjacency of neighboring triangles is reset to reference their two parent triangles. The sequence of subdivision and merging is repeated until a termination criterion is reached: Either fewer than a minimum number of triangles have been subdivided or merged, or a predefined processing time budget has been used up.

创建四个新三角形,将其作为子三角形添加到其父三角形,并更新相邻三角形的邻接信息。对于每个新三角形,将计算并指定LoD状态。然后,通过图形颜色再次分离三角形以进行合并。在合并过程中,将移除四个相邻的子三角形及其公共顶点,并重置相邻三角形的相邻性以引用其两个父三角形。重复细分和合并的顺序,直到达到终止标准:细分或合并的三角形数量少于最小数量,或者预定义的处理时间预算已用完。

The final step in the pipeline is efficient triangle rendering from an index buffer without any heightfield sampling, since all vertices have already been sampled on the heightfield during subdivision. In subsequent frames, if the LoD does not need to be updated, the tessellation loop is skipped entirely and the cached triangles are rendered.

管道中的最后一步是从索引缓冲区进行高效的三角形渲染,而无需任何高度场采样,因为在细分期间,高度场上的所有顶点都已采样。在后续帧中,如果不需要更新LoD,则会完全跳过细分循环,并渲染缓存的三角形。

4 DATA STRUCTURE数据结构

Our algorithm operates on a set of triangles in a memory pool with a fixed size of 8 · 2 20 elements that we refer to as triangle buffer. The data we store for each triangle are listed as a struct in Fig. 3. In the GPU implementation, we use separate arrays for the individual struct members for better cache coherence. However, for the sake of a simple explanation of our algorithm, we treat the triangle buffer as an array of structs in the remainder of the paper.

我们的算法对内存池中的一组三角形进行操作,其固定大小为8*220个元素,我们称之为三角形缓冲区。我们为每个三角形存储的数据如图3中的结构所示。在GPU实现中,我们为单个结构成员使用单独的数组,以获得更好的缓存一致性。然而,为了简单解释我们的算法,我们在本文的剩余部分将三角形缓冲区视为结构数组。

Each triangle contains semantic flags in a bitfield concatenated with a graph color. The first semantic flag is a deleted flag, which indicates that the element of the triangle buffer is empty or can be overwritten. All elements in the triangle buffer are initialized as deleted triangles. The LoD state of the triangle expresses the operation necessary to approach the target LoD locally—subdivide, merge, or keep—and is encoded as a 2-bit value. Finally, each triangle is assigned a conceptual graph color in the form of a 3-bit value ∈ [0, 7].

每个三角形在一个位字段中包含语义标志,该位字段与图形颜色连接。第一个语义标志是删除标志,它指示三角形缓冲区的元素为空或可以被覆盖。三角形缓冲区中的所有元素都初始化为删除的三角形。三角形的LoD状态表示接近目标LoD局部细分、合并或保持所需的操作,并编码为2位值。最后,以3位值∈[0,7]的形式为每个三角形分配一个概念图颜色。

We include a heightfield index so that each triangle can be matched to a heightfield for sampling and rendering if multiple heightfields are being processed. This way, triangulations of all heightfields can simply be stored in a shared triangle buffer instead of separate ones. This value can be omitted in applications with a single heightfield.

我们包含一个高度场索引,以便在处理多个高度场时,每个三角形都可以与一个高度域匹配,以便进行采样和渲染。这样,所有高度场的三角形可以简单地存储在共享的三角形缓冲区中,而不是单独的缓冲区。在具有单个高度字段的应用程序中,可以省略此值。

1: struct Triangle
2:  byte deleted | LoD state | graph color
3:  byte heightfield index
4:  int parent triangle index
5:  int first child triangle index
6:  int3 adjacent triangle indices
7:  int3 vertex indices

When merging triangles as explained in Subsection 8.3, a set of four triangles is replaced by their two corresponding parent triangles. For this case, we store the parent triangle index for each triangle. Triangles of the initial tessellation are root triangles that cannot be merged any further, which is indicated with a parent triangle index of −1.

当合并第8.3小节中所述的三角形时,一组四个三角形将被其两个对应的父三角形替换。对于这种情况,我们存储每个三角形的父三角形索引。初始细分的三角形是无法进一步合并的根三角形,其父三角形索引为−1。

Likewise, we store the first child triangle index for each parent triangle. During subdivision, we take care that the child indices of the two new triangles are always consecutive, so it suffices to store the first one. A child index of −1 indicates that a triangle has no children and is therefore a leaf triangle in the final tessellation.

同样,我们为每个父三角形存储第一个子三角形索引。在细分过程中,我们注意两个新三角形的子索引始终是连续的,因此存储第一个就足够了。子索引−1表示三角形没有子对象,因此在最终细分中是叶三角形。

As both subdivision and merging interact with neighboring triangles, we require neighborhood information of the triangles, which we store as a triple of adjacent triangle indices. The order of indices is chosen such that the order of adjacent triangles is counter-clockwise in 2D and the adjacent triangle along the hypothenuse is the second index, i.e., at component index 1. This simplifies later checking whether two adjacent triangles share a common hypothenuse to a comparison of the adjacent triangle indices at this component index. A missing adjacent triangle at an edge is indicated by the triangle index −1, which is the case along the heightfield boundaries.

由于细分和合并都与相邻三角形相互作用,我们需要三角形的邻域信息,并将其存储为相邻三角形索引的三倍。索引的顺序被选择为使得相邻三角形的顺序在2D中是逆时针的,并且沿着假设的相邻三角形是第二索引,即在分量索引1处。这简化了随后检查两个相邻三角形是否共享一个共同假设,以比较该分量索引处的相邻三角形索引。边缘处缺失的相邻三角形由三角形索引−1表示,这是沿高度场边界的情况。

Finally, 3D vertex positions are stored together with a heightfield index in a separate vertex buffer, which each triangle refers to with a triple of vertex indices. Analogous to adjacent triangles, vertex indices are ordered such that the triangle is stored counter-clockwise in 2D. Start and end vertices of the triangle’s hypothenuse correspond to component indices 1 and 2, i.e., the second edge of the triangle. Using such a predefined order of indices simplifies the triangle splitting at runtime.

最后,3D顶点位置与高度场索引一起存储在单独的顶点缓冲区中,每个三角形用三个顶点索引来引用该缓冲区。与相邻三角形类似,顶点索引被排序,使得三角形在2D中逆时针存储。三角形假设的起点和终点对应于分量索引1和2,即三角形的第二条边。使用这种预定义的索引顺序简化了运行时的三角形分割。

For memory efficiency, we keep track of the free space of the triangle and vertex buffers, which includes the unused portion at the end as well as deleted elements. The unused area in the buffers is marked implicitly by the highest index of all used triangles or vertices, respectively. The deleted elements are tracked in additional free index buffers to which free triangle and vertex indices are pushed during merging with the help of a counter. If a free index in one of the buffers is required for new elements during subdivision, a lookup is first performed in the free index buffer and the counter is decreased. Only if the free index buffer is empty, a new index from the unused space at the end of the triangle or vertex buffer is acquired. In total, our data structures occupy 376 MB of GPU memory

为了提高内存效率,我们跟踪三角形和顶点缓冲区的空闲空间,其中包括末尾未使用的部分以及删除的元素。缓冲区中未使用的区域分别由所有使用的三角形或顶点的最高索引隐式标记。删除的元素在额外的自由索引缓冲区中被跟踪,在合并期间,自由三角形和顶点索引在计数器的帮助下被推送到该缓冲区。如果细分期间新元素需要其中一个缓冲区中的空闲索引,则首先在空闲索引缓冲区中执行查找,并减少计数器。仅当空闲索引缓冲区为空时,才会从三角形或顶点缓冲区末端的未使用空间获取新索引。总共,我们的数据结构占用376 MB GPU内存

5 INITIAL TRIANGULATION初始三角化

 Tessellation commonly works by iteratively refining a coarse initial triangulation until the level of detail is sufficient. In this section, we explain how to create an initial triangulation of a regular tessellation grid that covers the rectangular area of a heightfield. We start with a heightfield given as scalar field defined on a regular or adaptive 2D grid with a known minimum cell size. We create a regular tessellation grid with coarse cells, where the cell size is a power-of-two multiple k of this minimum cell size. This integer factor is determined through an iterative process such that the number of cells of the tessellation grid just barely exceeds 2000. This is an empirical value that does not significantly influence the algorithm. However, defining a roughly equal number of cells across different heightfields has the effect that the initial tessellation is decoupled from the complexity of the heightfield and guarantees similar runtime performance across different use cases.

Tessellation细分通常通过迭代细化粗略的初始三角剖分来工作,直到细节级别足够。在本节中,我们将解释如何创建覆盖高度场矩形区域的规则细分网格的初始三角剖分。我们从给定为标量场的高度场开始,标量场定义在具有已知最小单元大小的规则或自适应2D网格上。我们创建了一个带有粗糙单元的规则细分网格,其中单元大小是最小单元大小的两倍k的幂。该整数因子是通过迭代过程确定的,使得细分网格的单元数几乎不超过2000。这是一个经验值,不会显著影响算法。然而,在不同的高度场中定义大致相等数量的单元格会产生这样的效果,即初始细分与高度场的复杂性分离,并保证在不同的用例中具有相似的运行时性能。

We now create a triangulation for the tessellation grid by constructing two triangles per grid cell, as shown in Fig. 4. To fit the original heightfield extents, the triangulation grid starts at the bottom left corner of the heightfield, and the vertices of the rightmost column and upmost row are shifted towards the heightfield boundaries. The resulting triangulation consists of 2n triangles and (w + 1)(h + 1) vertices, where n = wh is the number of cells and w, h are the dimensions of the tessellation grid. The square cell (x, y) in the tessellation grid can be split into two right triangles along one of the two diagonals. We use both options, starting at the bottom left cell with a split along the diagonal from top left to bottom right, and then alternate depending on whether the parity of x and y differs. As a result, our initial triangulation consists of four different types of triangles indicated with different colors in Fig. 4. These four types trivially correspond to four colors of a valid graph coloring of the triangle mesh. As we create an entry for each triangle in the triangle buffer, we assign a number ∈ [0, 3] to the triangle type that we store as graph color. The purpose of these graph colors is task scheduling for parallel processing, which is explained in Section 8. It is already obvious from Fig. 4 that no triangles of the same color share an edge or the same adjacent triangles, so they can be updated concurrently without need for synchronization.

现在,我们通过为每个网格单元构建两个三角形来创建细分网格的三角剖分,如图4所示。为了适应原始的高度场范围,三角网格从高度场的左下角开始,最右边的列和最上面的行的顶点向高度场边界移动。生成的三角剖分由2n个三角形和(w+1)(h+1)个顶点组成,其中n=wh是单元的数量,w,h是细分网格的维度。细分网格中的方形单元(x,y)可以沿两条对角线之一拆分为两个直角三角形。我们使用这两个选项,从左下角的单元格开始,沿着对角线从左上角到右下角进行拆分,然后根据x和y的奇偶性是否不同进行交替。因此,我们的初始三角测量由四种不同类型的三角形组成,在图4中用不同的颜色表示。这四种类型通常对应于三角形网格的有效图着色的四种颜色。当我们为三角形缓冲区中的每个三角形创建一个条目时,我们为存储为图形颜色的三角形类型分配一个数∈[0,3]。这些图形颜色的目的是用于并行处理的任务调度,这在第8节中进行了解释。从图4中可以看出,没有相同颜色的三角形共享一条边或相同的相邻三角形,因此它们可以同时更新,而不需要同步。

We store the two triangles of each cell (x, y) in the triangle buffer at indices t0 = 2(yw+x) and t1 = t0+1. The vertex indices and adjacent triangle indices of each triangle can be obtained with simple index arithmetics. We assume that all 2D vertex positions of the initial triangulation are inserted row-wise starting from bottom left to top right. Then, the vertex indices of the two triangles are

我们将每个单元(x,y)的两个三角形存储在索引t0=2(yw+x)和t1=t0+1的三角形缓冲器中。每个三角形的顶点索引和相邻三角形索引可以通过简单的索引算法获得。我们假设初始三角测量的所有2D顶点位置从左下到右上按行插入。然后,两个三角形的顶点索引为

 

 

参考1:http://drivenbynostalgia.com/files/tessellation/paper.pdf

参考2:http://drivenbynostalgia.com/

参考3:https://www.ieeevis.org/year/2022/info/papers-sessions

posted on 2023-02-26 18:55  XiaoNiuFeiTian  阅读(49)  评论(0编辑  收藏  举报