LiMaosen-2022-SkeletonPartedGraphScattering-ECCV

Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction #paper

1. paper-info

1.1 Metadata

Author:: [[Maosen Li]], [[Siheng Chen]], [[Zijing Zhang]], [[Lingxi Xie]], [[Qi Tian]], [[Ya Zhang]]
作者机构:: 南京大学
Keywords:: #GCN , #DCT
Journal:: #ECCV
Date:: [[2022-07-31]]
状态:: #Done
链接:: http://arxiv.org/abs/2208.00368
阅读时间:: 2023.01.04

1.2.Abstract

Graph convolutional network based methods that model the body-joints' relations, have recently shown great promise in 3D skeleton-based human motion prediction. However, these methods have two critical issues: first, deep graph convolutions filter features within only limited graph spectrums, losing sufficient information in the full band; second, using a single graph to model the whole body underestimates the diverse patterns on various body-parts. To address the first issue, we propose adaptive graph scattering, which leverages multiple trainable band-pass graph filters to decompose pose features into richer graph spectrum bands. To address the second issue, body-parts are modeled separately to learn diverse dynamics, which enables finer feature extraction along the spatial dimensions. Integrating the above two designs, we propose a novel skeleton-parted graph scattering network (SPGSN). The cores of the model are cascaded multi-part graph scattering blocks (MPGSBs), building adaptive graph scattering on diverse body-parts, as well as fusing the decomposed features based on the inferred spectrum importance and body-part interactions. Extensive experiments have shown that SPGSN outperforms state-of-the-art methods by remarkable margins of 13.8%, 9.3% and 2.7% in terms of 3D mean per joint position error (MPJPE) on Human3.6M, CMU Mocap and 3DPW datasets, respectively.

关键词：GCN,HMP,DCT

2. Introduction

领域:: 3D skeleton-based human motion prediction
之前的方法::
1. state model to capture the shallow dynamics
2. deep learning area:
  1. RNN-based
  2. GCN-based
作者主要是针对GCN-based方法。
- 问题::
  1. 给定图结构后，标准的图卷积只能在有限的的图谱内过滤特征，而不能显著得保留更丰富的波段。(感觉类似于生成模型的模式坍塌问题，只能集中于主要的特征，而忽略了其他边缘特征。)
  2. 现存的方法都用单一的图去对整个身体结构进行建模，不能够学习到身体结构之间的多模态。(就是将身体拆分，分开学习，学习到更多的模式。)
- 解决办法::
  1. 提出了一种自适应图散射技术(adaptive graph scattering technique):: 一种树形结构的波段图过滤器，能够将特征解构到多种图波段。主要包括三种技术(1. 基于数学的特殊波段过滤；2. 自适应过滤参数； 3.特征过滤层 )
  2. 将人体结构分层。
- 模型主要结构，如Fig1
  1. skeleton-parted graph scattering network(SPGSN):: 将人体结构分层，进行特征波段筛选。
  2. multi-part graph scattering block(MPGSB):: 进行波段过滤。由二部分构成，
    1. single-part adaptive graph scattering:: 用MPGSB去概括部分身体部分的特征。
    2. bipartite cross-part fusion:: 用于概括不同身体部分的特征。

Fig.1. SPGSN的结构，首先利用DCT将时间序列信息转为频率信息，然后通过MPGSBs对身体的不同部分进行特征筛选，然后通过二分交叉融合(bipartite cross-part fusion)进行特征概括，最后通过IDCT将频率信息转为时间序列信息。

Source: 源paper

3. Skeleton-Part Graph Scattering Network

符号描述::
\(X^{(t)}\in \mathbb{R}^{M\times 3}\):: t时刻的动作。
\(\mathbb{X} = [X_{(1)},...,X^{(T)}]\):: 动作序列
\(\mathbb{X}^-\)::历史序列
\(\mathbb{X}^+\)::预测序列

3.1 模型结构

模型如Fig.1
输入历史序列\(\mathbb{X}^-\in\mathbb{R}^{T\times M \times 3}\)，首先将\(\mathbb{X}\)reshape成\(\mathcal{X}^-\in\mathbb{R}^{T\times 3M}\)，目的是将每个时间戳的所有关节坐标独立地作为空间域的独立单位；然后通过DCT,\(X^{-}= DCT(\mathcal{X}^-)\in\mathbb{R}^{M' \times 3}\)其中\(M' = 3M\) ，然后通过多个multi-part graph scattering blocks(MPGSBs)，同时这里使用了类似残差连接的方式，目的是让SPGSN捕获特征位移以进行稳定预测；最后通过IDCT转换为时间序列信息。

4.multi-part graph scattering blocks(MPGSB)

每个MPGSB块由两部分构成::

single-part adaptive graph scattering:: 特征提取，学习身体部位的大波段信息。
bipartite cross-part fusion:: 特征融合

4.1 single-part adaptive graph scattering

该部分由两部分构成::

adaptive graph scattering decomposition
adaptive graph spectrum aggregation
adaptive graph scattering decomposition
一个树形结构的网络，\(L\)层，树的结点层指数增长。以一层说明::
输入DCT-formed pose feature\(X\in\mathbb{R}^{M'\times C}\) 和对应的领接矩阵\(A\in\mathbb{R}^{M'\times M'}\)，然后通过一系列的带通图形滤波器(bandpass graph filters)\(\{ h_{(k)}(\tilde{A} )| k=0,1,...,K \}\) ,其中 \(\tilde{A}=\tilde{A} = 1/2(I+A/\left \| A \right \|_F^2 )\)，归一化；通过这些filter bank，得到特征::

\[H_{(k)}=\sigma (h_{(k)}XW_{(k)}) \]

其中\(W_{(k)}\)是可学习的参数，\(\sigma(.)\)用于分散图形频率表示，也就是激活函数。

为了能够让filter作用不同的图波段，通过数学先验对他进行初始化，然后通过可学习的系数对filter进行微调，计算方式如下::

\[\begin{array}{ll} h_{(k)}(\widetilde{\mathbf{A}})=\alpha_{(0,0)} \widetilde{\mathbf{A}}, & k=0 \\ h_{(k)}(\widetilde{\mathbf{A}})=\alpha_{(1,0)} \mathbf{I}+\alpha_{(1,1)} \widetilde{\mathbf{A}}, & k=1 \\ h_{(k)}(\widetilde{\mathbf{A}})=\sum_{j=1}^{k} \alpha_{(k, j)} \widetilde{\mathbf{A}}^{2^{j-1}}, & k=2, \ldots, K, \end{array} \]

其中\(\alpha_{(k,j)}\)为可学习的系数。当\(k=0\)时，\(\alpha_{(0,0)}=1\)，当\(k>0\)时，\(\alpha_{(k,k-1)=1}, \ \alpha_{(k,k)}=-1\)，其他的\(\alpha_{(k,j)}=0\)。

重复上面的过程，每一层都得到\(K+1\)个分支，最后经过\(L\)层后，就会得到\((K+1)^L\)个channels，对应不同的波段。
adaptive graph spectrum aggregation
对不同波段的信息进行筛选融合，也就是特征筛选，通过对不同的波段特征加入可学习的参数，计算方式如下::

\[\mathbf{H}=\sum_{k=0}^{(K+1)^{L}} \omega_{k} \mathbf{H}_{(k)} \in \mathbb{R}^{M^{\prime} \times C^{\prime}} \]

\(\mathcal{w}_k\)就是频谱重要性分数系数，计算方式如下::

\[\omega_{k}=\frac{\exp \left(f_{2}\left(\tanh \left(f_{1}\left(\left[\mathbf{H}_{\mathrm{sp}}, \mathbf{H}_{(k)}\right]\right)\right)\right)\right)}{\sum_{j=0}^{(K+1)^{L}} \exp \left(f_{2}\left(\tanh \left(f_{1}\left(\left[\mathbf{H}_{\mathrm{sp}}, \mathbf{H}_{(j)}\right]\right)\right)\right)\right)} \]

其中\(f_1(.) \ and \ f_2(.)\)为全连接层，\([.,.]\)为沿着特征维度串联，\(H_{sp}\in\mathbb{R}^{M'\times C'}\)的计算方式::

\[\mathbf{H}_{\mathrm{sp}}=\operatorname{ReLU}\left(\frac{1}{(K+1)^{L}} \sum_{k=0}^{(K+1)^{L}} \mathbf{H}_{(k)} \mathbf{W}_{\mathrm{sp}}\right) \]

例子
\(L=2 \ K=2\)

Fig. 2. single-part adaptive graph scattering结构例子，K=2,L=2

Source: 源paper

4.2 Bipartite Cross-Part Fusion

对不同身体部分的特征进行选择融合。在本paper中，作者将人体结构分为上下两部分。
输入\(H_{\uparrow}\in\mathbb{R}^{M_\uparrow \times C'}\)上半部分的特征，\(H_{\downarrow }\in\mathbb{R}^{M_\downarrow \times C'}\)下班部分特征，首先得到upper-to-lower affinity matrix::

\[\mathbf{A}_{\uparrow 2 \downarrow}=\operatorname{softmax}\left(f_{\uparrow}\left(\mathbf{H}_{\uparrow}\right) f_{\downarrow}\left(\mathbf{H}_{\downarrow}\right)^{\top}\right) \in[0,1]^{M_{\uparrow} \times M_{\downarrow}}, \]

\(f_{\uparrow}(.) \ and \ f_{\downarrow}(.)\)为两个编码网络。
得到映射矩阵后，更新两部分的特征通过下面的公式::

\[H'_{\downarrow} = H_{\downarrow} + \mathbf{A}_{\uparrow 2 \downarrow}^TH_{\uparrow} \]

最后，将整个身体的特征于部分混合特征融合，通过下面的公式得到\(H'\in\mathbb{R}^{M'\times C'}\)

\[H'=MLP(H+(H'_{\uparrow}\oplus H'_{\downarrow})) \]

\(\oplus: \mathbb{R}^{M_{\uparrow} \times C^{\prime}} \times \mathbb{R}^{M_{\downarrow} \times C^{\prime}} \rightarrow \mathbb{R}^{M^{\prime} \times C^{\prime}}\)放置来自不同身体部位的关节以与原始身体对齐。

4.3 Loss function

重构损失::

\[\mathcal{L}=\frac{1}{N} \sum_{n=1}^{N}\left\|\mathbb{X}_{n}^{+}-\widehat{\mathbb{X}}_{n}^{+}\right\|^{2} \]

5. Experiments

dataset
- Human 3.6M
- CMU Mocap
- 3D pose in the wild(3DPW)

6. 代码

Fig.3.SPGSN model 逻辑

DCT 实现

def get_dct_matrix(N):
    dct_m = np.eye(N)
    for k in np.arange(N):
        for i in np.arange(N):
            w = np.sqrt(2 / N)
            if k == 0:
                w = np.sqrt(1 / N)
            dct_m[k, i] = w * np.cos(np.pi * (i + 1 / 2) * k / N)
    idct_m = np.linalg.inv(dct_m)
    return dct_m, idct_m

7. 总结

总得来说就是针对模型学习会集中到低频率的特征，而忽略高频段的特征，于是将特征解构。

posted @ 2023-01-06 12:38 GuiXu40 阅读(73) 评论(0) 编辑收藏举报

刷新页面返回顶部

guixu