JDE Towards Real-Time Multi-Object Tracking 英文解读
SDE methods bring critical challenges in building a real-time MOT system
Background
Faster RCNN = Fast RCNN + RPN
Seperate Detection and Embedding
Detector -> Cropped Image -> ReID model -> reid feature
Two-stage
RPN -> Detection -(sharing feature map)-> reid embedding
Joint Detection and Embedding
Algorithm
Design/Training
Detection
- Anchor
- modified from original
RPN/Faster RCNN
- adapted for MOT task
- all anchors are set to an aspect ratio of 1 : 3.
ReID
Contrasive Learning
The margin term is neglected for convenience.
looking at a mini-batch and mining all the negative samples \(f^{-}_{i}\) and the hardest positive sample \(f^{+}\) in this mini-batch
\(f^{T}\) is the selected anchor in the batch.
this is the upper bound of triplet loss
this is the cross entropy loss \(\mathcal{L} = \sum_{c =1}^{\text{Cls.}}\mathbb{I}(y_{i} = c)\log p(f(x) =c))\) with \(p = \text{Softmax}(g^{+},\{g^{-}\})\)
Multi-task training
\(M\) is the number of prediction heads.
+ Question: So, each feature map at different scale in FPN is trained.
+ But during inference, which feature map can we use?
+ Or rather, should we design a strategy to
+ further fuse the predictions AT DIFFERENT SCALES?
ref:Multi-task learning using uncertainty to weigh losses for scene geometry and semantics.
we employ task-dependent uncertainty [16] to dynamically weight the heterogenous losses.
a metric learning problem
Inference
Get Embedding
+ Question: It seems that FPN with the heads is not clearly described in the paper.
Association
- \(e_{i}\) is appearance state
-
\[e_{i}^{t} = \alpha e^{t-1}_{i} + (1-\alpha)f_{i}^{t} \]
- \(f_{i}^{t}\) is the
appearance embedding
-
- \(m_{i}^{t}\) is maintained by
Kalman Filter
using Hungarian algorithm
for linking
Experiments
One may notice that JDE has a lower IDF1 score and more ID switches than existing methods. At first we suspect the reason is that the jointly learned embedding might be weaker than a separately learned embedding.
However, when we replace the jointly learned embedding with the separately learned embedding, the IDF1 score and the number of ID switches remain almost the same.
本文来自博客园,作者:ZXYFrank,转载请注明原文链接:https://www.cnblogs.com/zxyfrank/p/16046354.html