Tracking without bells and whistles 英文 精读
no training or optimization on tracking data.
using only an object detection method to perform tracking.
takeaway
In Tracktor, tracklet-regression
is more essential compared to detection
, which is adopted accroding to the regressed tracklets.
frame-by-frame
detection-based
tracklet regression
inspiration
- detection head can tackle simple motion scenarios.
- utilize the continuity of tracklets
Pipeline
Training
NO TRAINING!
We show that one can achieve state-of-the-art tracking results by training a neural network only on the task of detection.
Inference
https://www.arvindrs.com/tracking-without-bells-and-whistles/
Faster RCNN
temporal realignment
- prev. bounding box coordinates \(\mathbf{b}_{t-1}^{k}\), k is the object category.
- use the on current frame to preform
ROI Pooling
to get pooled features \(\mathbf{f}_{t}\)- now, the classification confidence has not been calculated yet.
- it might be killed, not being of class k anymore
- then perform classification and regression on the pooled \(\mathbf{f}_{t}\)
Except for the first frame, detection-head-based bounding box regression is performed first.
i.e. get \(\mathbf{b}_{t} = \text{ROIPooling\&Regress}(\mathbf{b}_{t-1})\)
Then a detection was performed on current frame to get a set of \(\mathbf{d}_{t}^{i}\)
When to kill a trajectory
According to the results of bounding box regression
classification score
< \(\sigma_{active} = 0.5\)- regressed position is not confident to maintain the object's bounding box anymore.
- NMS: abandon
IoU
> \(\lambda_{active} = 0.3\)- if regressed tracklets occlud, keep the confident ones(w.r.t to the regression head, more discriminative features.)
- and abandon the regressed tracklets with low confidence scores.
Initialize bounding boxes
When a detection \(\mathbf{d}_{t}^{i}\) has IoU score < \(\lambda\) with all active trackletes \(\mathbf{b}_{t}^{1},\mathbf{b}_{t}^{2},\dots\)
far enough from regressed tracklets
Tracking Extensions
Both are aimed at improving identity preservation
- Motion Model
- For sequences with a moving camera, we apply a straightforward camera motion compensation (CMC) by aligning frames via image registration using the Enhanced Correlation Coefficient (ECC) maximization as introduced in [16].
- For sequences with comparatively low frame rates, we apply a constant velocity assumption (CVA) for all objects as in [11, 2].
- ReID
for deactivated trajectories.
Motion Model continues to run on them, even though they are not activated.
Siamese Network
-> ReID features- store killed (deactivated) tracks in their non-regressed version \(\mathbf{b}_{t}^{k-1}\) for a fixed number of \(F_{\text{ReID}}\) frames.
-
To minimize the risk of false reIDs, we only consider pairs of deactivated and new bounding boxes with a sufficiently large IoU.
Limitations
Without sophisticated tracking methods, it is not expected to excel in crowded and occluded, but rather only in benevolent, tracking scenarios.
- slight movement
- 『IDEA [This could be solved by expand the ROI to cover the target for regression to be done]』
- Severe Id switches when using low fps
Experiments
How to use public detections
Using public detections, Tracktor++ can achieve SOTA
Tracktor++
has camera motion model, which is better/more complicated than Kalman filter.
Analysis
- Tracktor: demonstrate the strength of tracking-by-detection for easy scenarios
- Complicated methods shoule be encouraged to focus on the complex tracking problems
occlusions/visiability
Tracktor++ achieves superior performance even for partially occluded bounding boxes with visibilities as low as 0.3.
- This contributes to its high MOTA compared with other methods.
- extended version only achieves minor improvements over our vanilla Tracktor.
object size
only compare objects with a visibility larger than 0.9
none of the trackers exhibit a notably better performance with respect to varying object sizes.
long-term tracking/(gap length)
TODO 『2022-03-16 [not sure about the meaning of gap length]?』
ID preservation
oracle trackers
replacing parts of our algorithm with ground truth information
To this end, we analyse our performance twofold:
(i) the impact of the object detector on the killing policy and bounding box regression,
(ii) identify performance upper bounds for potential extensions to our Tracktor.
- Oracle-Kill
- Oracle-REG
- match ground truth at frame \(t-1\)
- inherit ground truth at frame \(t\)
- Oracle-MM
- like REG, but only inherit the center
- Oracle-reID
- match inactive tracks and new detections
本文来自博客园,作者:ZXYFrank,转载请注明原文链接:https://www.cnblogs.com/zxyfrank/p/16016477.html