(SIG Asia 2019)miHoYo基于深度学习的布料动画工作流
本文禁止转载
B站:Heskey0
Learning an Intrinsic Garment Space for Interactive Authoring of Garment Animation(2019 SIG Asia)
Established workflows are either time and labor consuming (i.e., manual editing on dense frames with controllers), or lack keyframe-level control (i.e., physically-based simulation).
Instead, we present a deep-learning-based approach for semi-automatic authoring of garment animation, wherein the user provides the desired garment shape in a selection of keyframes, while our system infers a latent representation for its motion-independent intrinsic parameters (e.g., gravity, cloth materials, etc.)(说人话就是:艺术家编辑某一帧的garment shape,系统可以推测出物理和渲染的参数(intrinsic parameters),将这个修改传播到other frames)
Technically, we learn an intrinsic garment space with an motion-driven autoencoder network, where the encoder maps the garment shapes to the intrinsic space under the condition of body motions, while the decoder acts as a differentiable simulator to generate garment.
Chapter 1. Introduction
【传统CG领域】:
-
A common workflow in the modern CG industry for garment animation composition/editing is the keyframe approach. For each keyframe, the artist adjusts the garment shapes commonly with skinning techniques such as Linear Blend Skinning (LBS) [Kavan and Žára 2005] and Dual Quaternion Skinning (DQS) [Kavan et al. 2007]. The input garment shapes in the keyframes are then propagated to other frames via interpolation.(艺术家通过蒙皮调整某些帧的garment shape,插入动画序列然后通过插值传递到其它帧)
-
However, as the garment geometry is closely correlated to body motion, material properties, and environment, the garment shape space is exceedingly nonlinear and complex. In order to achieve physically plausible garment shapes consistent across motion, it requires very dense sample points for interpolation within such a space. Consequently, the keyframes must be densely distributed in the sequence (often as high as 20% of the frames), and hence it remains extremely labor-intensive(但因为布料的网格和body motion, material有关系,所以会很复杂。为了让布料更真实,就需要编辑非常密集的garment shape插值到动画序列,艺术家不得累死)
【新的方案】:
-
We propose a motion-invariant autoencoder neural network for our task.
-
Given a keyframe, the encoder learns to map its garment shape descriptor into a latent space under the condition of corresponding body motion. The latent vector can be interpreted as a latent representation of the intrinsic parameters. (all the garments generated with the same intrinsic parameters should be mapped to the same location in latent space by factoring out the body motion)
-
The decoder learns to reconstruct the garment geometry from a latent vector also under the condition of a particular motion. (it is a differentiable simulator for the automatic animation generation)
-
-
Motion information is incorporated into the autoencoder via a motion descriptor learned from a motion encoder. (Following the idea of Phase-Functioned Neural Network [Holden et al. 2017])
- the motion descriptor represents a set of coefficients which linearly blend the multiple sub-networks within an autoencoder layer. Thus, the network weights are updated dynamically according to the motion
-
The encoder, decoder, and the motion descriptor are jointly trained.
【contributions】:
- a novel semi-automatic pipeline for authoring garment animation.
- learning a motion-factorized latent space that encodes intrinsic information of the garment shape.
- learning a differentiable garment simulator to automatically reconstruct garment shapes from an intrinsic garment representation and target body motion.
Chapter 2. Related Work
【Garment Simulation】: Data-driven methods
- much of recent work learns from offline simulations to achieve real time performance.
- Other works focus on the transfer of simulated garments to different body shapes and poses.
【Garment Capture】:
- As an alternative to simulation, garment capture methods aim to faithfully reconstruct the garment animation from captured data
【Motion Control via Neural Networks】:
- Inspired by PFNN(Phase Function Neural Network), we aim to control the garment shapes with the body motion. Instead of using a phase function, we utilize a motion encoder to learn a motion descriptor from the body movement as the coefficients to linearly blend the sub-networks in each layer to update the network weights
Chapter 3. Approach
3.1 Overview
跟Introduction那一节一样
3.2 Data Representation
【Motion】:
- We describe the body motion as the pose aggregation of the current frame and past \(W\) frames.
- The pose is represented by the 3D positions of body joints, so we have a \(K\times3\) pose matrix \(P\) for a skeleton with \(K\) joints.
- The motion signature is defined as the pose matrix for the current and past \(W\) frames \(M_{(W +1)×K×3}\) to describe the status of a specific moment.
【Garment】:
- We assume the garment to be dressed on the character is deformed from a template mesh \((V_{tmp}, F_{tmp} )\).
- \(V_{tmp}\) is a \(N × 3\) matrix that stores the 3D position of the vertices.
- \(F_{tmp}\) stores the faces of the triangular mesh.
- At a frame with motion status \(M\), the garment shape is represented by \(V_M\).
【Intrinsic parameter】:
- Explicit intrinsic parameters \(θ\) includes simulator parameters, environment parameters, and garment material parameters.
【Dataset】:
- We organize the dataset as a collection of \(\{V, M, θ \}\) including garment shape \(V\) , the motion signature \(M\), and the simulation parameters \(θ\).
- The intrinsic vector \(z\) can be interpreted as a latent representation of \(θ\) learned by our network.
3.3 Shape Feature Descriptor
we use MLP to encode \((N_E(\cdot))\) and decode \((N_D(\cdot))\). We train this network via combination of two types of loss:
- a \(L_2\) loss between the 3D vertex positions of the input shape and the shape after encoding and decoding
- a \(L_2\) loss between the mesh Laplacian [Taubin 1995] on the vertices of those two to preserve surface details
Thus, the combined loss function is defined as:
The shape descriptor \(S=N_E(V)\in R^{N_S}\).
We train our shape descriptor using only the garment shapes from our dataset.
3.4 Motion Invariant Encoding
The motion signature \(M\) is provided as a condition for the mapping functions.
- the encoder \(F_E (S | M) = z\) mapping the input shape descriptor into the latent space \(z\), and a decoder \(F_D (z | M) = S\) reconstructs the shape descriptor from \(z\).
The training loss is defined as:
where:
- \(S_i=N_E(V_i)\) is the input shape descriptor,
- \(S_i^\star=F_D(z_i|M_i)\) is the recovered shape descriptor
- \(z_i=F_E(S_i|M_i)\) is the latent vector
- \(Var\) is the variance(方差) of \(z_i\) in the batch
loss的含义:
- The first term aims to minimize the variance in the latent space within the same batch, as the input \(\{V_i , M_i \}\) generated with the same \(θ\) are supposed to be mapped to the same location in the latent space.
- The second term acts as a regularizer to penalize the difference between the input shape descriptor and the recovered one, so as to ensure the latent space will not degenerate to zero or an arbitrary constant
The motion descriptor \(M_D\in R^{N_M}\).
The motion-invariant autoencoder \(F_{E/D}(\cdot)\) and the motion encoder \(M_E(\cdot)\) are jointly trained.
3.5 Refinement
It is not guaranteed that the predicted garment shape is always collision-free. We apply an efficient refinement step to drag the garment outside the body while preserving its local shape feature.
Specifically, given a body shape \(B\) and inferred garment mesh \(V\), we detect all the garment vertices inside \(B\) as \(\tilde V\). For each vertex \(\tilde v_i\in\tilde V\), we find its closest point over the body surface with position \(v_i^B\) and normal \(n_i^B\). Then we deform the garment mesh to update the garment vertices \(V^\star\) by minimizing the following energy:
含义:
- The first term penalizes the Laplacian difference between the deformed mesh and the inferred mesh
- the second term forces the garment vertices inside body to move outwards with \(ϵ\) being a small value ensuring the garment vertices lie sufficiently outside the body