PnP and Perspective Projection and Pose Computation

Review PnP problem from a computer graphics rendering view

首先从一个 StackExchange
问题出发，下面是本人的回答摘录。

Intrinsic Matrix vs. Projection Matrix

What is the difference between Intrinsic Matrix( K ) and Perspective Projection
Matrix(call it P Matrix later)?

For K Matrix it transform 3D points to 2D pixels in image space.
And during this procedure only x and y value are concerned.
For P Matrix it transform 3D points to NDC space.

Take a look at two matrices:

\[K = \begin{bmatrix}f_x& 0& c_x\\ 0& f_y& c_y\\ 0 & 0 & 1\\\end{bmatrix}\]

\[P = \begin{bmatrix} \frac{1}{t*a}& 0& 0& 0\\0& \frac{1}{t}& 0& 0\\ 0 & 0 & A& B&\\ 0 & 0 & -1& 0\\ \end{bmatrix}\]

\[t=tan(\frac{fovy}{2}) \]

\[a=\frac{width}{height} \]

Let's add perspective divide and show the result of the above two matrices:

Intrinsic case: $$x_{2d} = \frac{x_0}{z_0 * \frac{1}{f_x}} + c_x$$

Perspective case: $$x_{2d} = \frac{x_0}{-z_0(t*a)}$$

Similar with some difference.

The image space: Origin from left-top corner
so should add Cx Cy as the offset from center to left-top corner.
And in NDC space we assume Z-axis direct out of screen so P(3,2) = -1.

从该问题引申，继续思考： PnP 问题中的投影过程如何体现？

Dig into solvePnP

我们看一下 PnP 问题的描述，下面这个公式来自 OpenCV 文档

\[\begin{align*} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} &= \bf{A} \hspace{0.1em} \Pi \hspace{0.2em} ^{c}\bf{T}_w \begin{bmatrix} X_{w} \\ Y_{w} \\ Z_{w} \\ 1 \end{bmatrix} \\ \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} &= \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} X_{w} \\ Y_{w} \\ Z_{w} \\ 1 \end{bmatrix} \end{align*} \]

看一下 solvePnP 的 DLT 办法的原理，参考这个文章

直接假设一个带有12个未知数的 3x4 的矩阵作为未知数，并忽略其中的Rt含义，
然后将上式化简，建立一个关于未知3x4矩阵的方程，下面方程中用 a1~a12 表示。

\[\lambda {\text{ = }}\left[ {\begin{array}{*{20}{c}} u \\ v \\ 1 \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{f_x}}&{}&{{c_x}} \\ {}&{{f_y}}&{{c_y}} \\ {}&{}&1 \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{a_1}}&{{a_2}}&{{a_3}}&{{a_4}} \\ {{a_5}}&{{a_6}}&{{a_7}}&{{a_8}} \\ {{a_9}}&{{a_{10}}}&{{a_{11}}}&{{a_{12}}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} x \\ y \\ z \\ 1 \end{array}} \right] \ \\ \]

\[\left\{ {\begin{array}{*{20}{l}} {\lambda u = x{f_x}{a_1} + x{c_x}{a_9} + y{f_x}{a_2} + y{c_x}{a_{10}} + z{f_x}{a_3} + z{c_x}{a_{11}} + {f_x}{a_4} + {c_x}{a_{12}}} \\ {\lambda v = x{f_y}{a_5} + x{c_y}{a_9} + y{f_y}{a_6} + y{c_y}{a_{10}} + z{f_y}{a_7} + z{c_y}{a_{11}} + {f_y}{a_8} + {c_y}{a_{12}}} \\ {\lambda = x{a_9} + y{a_{10}} + z{a_{11}} + {a_{12}}} \end{array}} \right. \ \\ \]

λ 可以表示乘上投影矩阵以后的 z. 那么除以 λ 后得到的 uv 就是 2D 图像空间的数值了。
要是能取到6个点的数据，组成12行，就可以直接进行求解。具体求解过程中的 SVD 分解求最小二乘解的过程不赘述。

注意，这里有个 cv 常用的坐标系与 GL 常用的坐标系的区别，Z轴方向相反。上文提到在求解时会限制 z > 0.
所以要用 pnp 方法来求解的话，需要先把 3d 点做一个 z 轴镜像的变换，然后再构建 K 矩阵

\[K=\begin{bmatrix} \frac{1}{t*a} & &0 \\ & \frac{1}{t} &0 \\ & & 1 \end{bmatrix} \]

同时，2d点不是图像空间的点，而是 NDC 空间的点，如此进行计算得到的RT数据可以大致对齐。

How to compute object pose with Perspective Projection

我们考虑一个图形学常见的透视投影，投影矩阵 P 见第一部分的定义，并加入透视除法。

如果问题简化一下，假设 3D 物体和 2D 投影的朝向已经对齐，亦即旋转部分 R
已经完成。那么剩下需要计算的就只剩下了平移，这里因为是透视投影，所以在 Z 轴的平移
会直接影响最终成像的大小，这是之前 PnP 方法里面所没有涉及的。

如果旋转没有对齐，那么该怎么计算旋转呢？可以利用 SVD 分解得到旋转矩阵，暂先略过……

回到投影计算，投影的方程如下，其中 $\Delta$ 是未知的平移变换。

\[\begin{align*} \begin{bmatrix} x' \\ y' \\ z' \\ w' \end{bmatrix} &= \bf{P} * （\begin{bmatrix} X_{w} \\ Y_{w} \\ Z_{w} \\ 1 \end{bmatrix} + \begin{bmatrix} \Delta_x \\ \Delta_y \\ \Delta_z \\ 0 \end{bmatrix} ） \\ \begin{bmatrix} X_{2d} \\ Y_{2d} \end{bmatrix} &= \begin{bmatrix} \frac{x'}{w'} \\ \frac{y'}{w'} \end{bmatrix} \end{align*}\]

将 P 用具体数值代入可得：

\[ \begin{align*} X_2d &= -\frac{X_w + \Delta_x}{t*a*(Z_w + \Delta_z)} \\ Y_2d &= -\frac{Y_w + \Delta_y}{t*(Z_w + \Delta_z)} \end{align*} \]

定义优化目标：

\[\Delta = arg \, \min\limits_{\Delta} 0.5(X_2d+\frac{X_w + \Delta_x}{t*a*(Z_w + \Delta_z)} + Y_2d + \frac{Y_w + \Delta_y}{t*(Z_w + \Delta_z)})^2 \]

使用高斯牛顿法迭代求解，并结合透视投影渲染，整体 3D 与 2D 的对齐效果好。

posted @ 2023-08-14 16:17 皮斯卡略夫阅读(54) 评论(0) 编辑收藏举报

刷新页面返回顶部

PnP and Perspective Projection and Pose Computation

PnP and Perspective Projection and Pose Computation

Intrinsic Matrix vs. Projection Matrix

Dig into solvePnP

How to compute object pose with Perspective Projection

公告