SOTMOT-Improving Multiple Object Tracking with Single Object Tracking 英文详细解读

\[MOT \neq SOT \times N \]

takeaways

In fact, it is no doubt that a multiple object tracker can be realized with multiple single ones

Background

The spirit of our approach, that learning auxiliary associative embeddings simultaneously with the main task, also shows good performance in many other vision tasks

SOT and MOT

SOT
- discriminate target from local backgrounds
MOT
- because most backgrounds can be filtered out by the detector.

If we integrate SOT into MOT directly.

inappropriate/overdone discrimination
multiple targets will make SOT really slow

MOT
one-shot and tracking-by-detection

JDE
- YOLO(Anchor-based)
FairMOT
- ResNet-34 + DLA
- ReID branch

Model

Architecture

😂 inconsistent symbol

Backbone

DLA-34

SOT Branch

figure of SOT branch

+ Question: Performing SOT only use a branch??

The SOT branch trains a separate SOT model per target in one frame and locates the targets in another frame

take in \(\mathbf{F}_{backbone}\)
- \(\mathbf{F}_{SOT} \in \R^{C_{SOT} \times H \times W} = \text{3-Convs}(\mathbf{F}_{backbone})\)
- 3x3, stride = 1, BN & ReLU

given center \(\mathbf{c} = \{x,y\}\)

\(\mathbf{F}_{object} = \mathbf{F}_{SOT}(x,y) \in \R^{C_{SOT}}\)

index-based entry extraction

Train

given centers in a training image \(\mathbf{C}_{targets} = \{\mathbf{c}\}_{i= 1}^{N}\)
- calculate a neighborhood matrix

\[\mathbf{A}_{i,j} = \begin{align*} \begin{cases} &1 & \text{if} \min(x_{i}-x_{j},y_{i}-y_{j}) \leq r_{neighbor} \\& 0 & \text{otherwise} \end{cases} \end{align*}\]

select the neighbors to construct data for CLASSIFICATION
\(\mathbf{X} = \{\mathbf{x}_{j}|\forall j:\mathbf{A}_{i,j} = 1\}\)
ridge regression to obtain \(\mathbf{w}^{*}\)
- the dimension is fixed. so during training. different \(\mathbf{X}_{i}^{\top}\mathbf{X}_{i}\) can be calculated in batch manner.

+ Question: What to Train?

Train a CNN

2 images form a pair
- backbone and heads
- fuse like CenterNet 1x1 Conv

We use the model pre-trained on COCO [24] to initialize the weights of backbone network and finetune them during offline training.

+ Question: If the branch is so simple, 
+ how can it benefit from the SOTA VOT Trackers?

Inference

at timestep \(t\)

we have living tracks \(\mathcal{T}_{i=1}^{M}\) until time \(t\)

where

\[\mathcal{T}_{i} = \{\mathbf{c}_{i}^{\tau},((\mathbf{X}_{i}^{\tau},\mathbf{y}_{i}^{\tau}),\mathbf{w}_{i}^{t*})\}_{\tau = s}^{t-1} \]

use kalman filter to predict current location

\[\mathbf{M} = \mathbf{C}_{pred}^{t} = \{\hat{\mathbf{c}}_{i}^{t}\}_{i=1}^{M} \]

perform CenterNet-like Detection
and obtain SOT features

\[\mathbf{F}_{backbone} = \text{DLA-34}(\text{Image}) \]

\[\mathbf{F}_{SOT} \in \R^{128\times H \times W} = \text{SOTHead[3-Conv]}(\mathbf{F}_{backbone}) \]

\[\mathbf{C}_{det}^{t},\mathbf{S}_{det}^{t} = \text{CenterNet}(\mathbf{F}_{backbone}) \]

\[\mathbf{N} = \mathbf{C}_{det}^{t} = \{\hat{\mathbf{c}}_{i}^{t}\}_{i=1}^{M} \]

construct neighbors

SOT featrues from centers

\[\mathbf{X}_{t}(\mathbf{Z}_{t}) = \mathbf{F}_{SOT}[\mathbf{C}_{pred}^{t}] \]

link det. to pred.

\[\mathbf{Z}_{i}^{t} = \mathbf{F}_{SOT} [\text{Neighbor}(\mathbf{C}_{pred}^{t},\mathbf{C}_{det}^{t})] \]

\[\mathbf{Z}_{i}^{t} = \]

match to existing tracks

appearance metric

\[\mathbf{v}_{i} = \mathbf{Z}_{i}^{t}\mathbf{w}_{i}^{*} \]

\(\mathbf{w}_{i}^{*}\) is the discriminator of Track i
only consider the neighbors of a target track

motion metric(Kalman Distance)

fuse the score

1st: Hungarian

\[\mathcal{T}_{i} = \{\mathbf{c}_{i}^{\tau},((\mathbf{X}_{i}^{\tau},\mathbf{y}_{i}^{\tau}),\mathbf{w}_{i}^{t*})\}_{\tau = s}^{t-1} \]

and \(\mathbf{C}_{det}^{t}\) using \(\mathbf{v}\)

\(\mathbf{P}\): matched
\(\mathbf{Q}\): unmatched tracks
\(\mathbf{K}\): unmatched detections

2nd: Hungarian

IOU

\(\mathbf{Q}\): unmatched tracks
\(\mathbf{K}\): unmatched detections

insert into \(\mathbf{P}\)

Updating \(\mathcal{T}_{i} = \{\mathbf{c}_{i}^{\tau},((\mathbf{X}_{i}^{\tau},\mathbf{y}_{i}^{\tau}),\mathbf{w}_{i}^{t*})\}_{\tau = s}^{t-1}\)

for existing tracks

for new tracks, append \((\mathbf{X}_{i}^{\tau},\mathbf{y}_{i}^{\tau}),\mathbf{w}_{i}^{t*}\)

for unmatched tracks, keep for 30 frames

Discussions/Experiments

How to use public detections

Similar to Tracktor and CenterTrack

SOT/specific v.s. general/ReID

refer to paper 5.3 Ablation Study

specific discrimination (SD)
rather than the general discrimination (GD)

specific discrimination enhance tracking in crowded scenes.

coarse annotation 😂

Efficiency of Model-per-target

Thanks to GPU 😃

Conclusion

NFL

FairMOT
- General reID
- total offline trainning
- MOT17 sparse scene
SOTMOT
- Neighbor ReID
- offline trainning
- online training
- perform well on MOT20(crowd scene)

+ Question: How can it benefit from the SOTA VOT trackers?

It cannot!

posted @ 2022-03-24 20:52 ZXYFrank 阅读(343) 评论(0) 编辑收藏举报

刷新页面返回顶部

Loading

ZXYFrank

Enjoy the process🍀

SOTMOT-Improving Multiple Object Tracking with Single Object Tracking 英文详细解读

takeaways

Background

SOT and MOT

Model

Architecture

Backbone

SOT Branch

Train

Inference

Discussions/Experiments

How to use public detections

SOT/specific v.s. general/ReID

Efficiency of Model-per-target

Conclusion

公告

Loading

ZXYFrank

Enjoy the process🍀

SOTMOT-Improving Multiple Object Tracking with Single Object Tracking 英文详细解读

takeaways

Background

SOT and MOT

Related Works

Model

Architecture

Backbone

SOT Branch

Train

Inference

Discussions/Experiments

How to use public detections

SOT/specific v.s. general/ReID

Efficiency of Model-per-target

Conclusion

公告