IDF1/ Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking 多目标跟踪 ID-based 指标详解
Existing performance measures such as CLEAR MOT report how often a tracker makes what types of incorrect decisions. We argue that some system users may instead be more interested in how well they can determine who is where at all times.
Motivation
- CLEAR Metrcs:
Event-based
- can help to pinpoint diffenrent types of errors.
Identity-based measures
- how well computed identities conform to true identities
- disregarding where or why mistakes occur
Our measures apply both within and across cameras.
Once that choice is made, every frame in which A is assigned to the wrong computed identity is a frame in which the tracker is in error.
In some scenarios like sports, security, or surveillance, preserving identity is crucial.
We should consider the coverage ratio of consistent ID over all frames, instead of the consistency itself at certain frames where changes occur, e.g. the fragmentations.
In this paper, it also mentioned Handover
We want to focus on Id-based metrics here. Refer to the paper if you're interested.
Identity-based measures
we propose to measure performance not by how often mismatches occur, but by how long the tracker correctly identities targets.
[Note💡]: Hereinafter, I will use ground truth
/gt
/\(\tau_{i}\) to represent ground truth trajectories, and hypotheses/hyp.
/\(\gamma_{j}\), tracker output/ tracker output trajectory
/\(\tau_{i}\) to represent the sequences of bounding boxes provided by trackers.
And the interpretation is not rigorously consistent with the paper, especially the terminologies.
However, these interpretations do help me to understand the core idea of IDF1.
Definitions of Matching and Missings(bounding box level)
When we say a hyp
is matched to a gt
, we actually mean that they have the same ID.
As mentioned above, we want to find a metric focusing on how long trackers can persist an ID. For this goal, we should establish a global matching which will achieve highest consistent ID coverage
In the end, some(hopefully most) trajectories
will get matched to some gt
.
It's important to know that in the final matchings, any
hypothesis
sequence(i.e. sequence of bounding boxes with the same ID), must have single matchedgt
throughout the whole video.Or rather, we do not allow ID switch within a sequence.
This is one of the diffefences between IDF1 and IDSW in CLEAR.
For example, suppose \(\gamma_{5}\) is finally decided to be matched to \(\tau_{3}\).
Then \(\tau_{3}\) and \(\gamma_{5}\) will both be occupied and cannot be matched to others.
If another hypothesis, say \(\gamma_{4}\), can only been(or rather was willing to be) matched to \(\tau_{3}\) with a minimal loss, given the fact that it has been occupied, \(\gamma_{4}\) will become unmatched, which means all of the bounding boxes \(\mathbf{b} \in \gamma_{4}(t)\) become FN, i.e. the missing detections.
- Note: FN(bbox-level) is not equal to IDFN(sequence-level)
-
Matches
-
\(\gamma_{j}\) is matched to \(\tau_{i}\)
-
the bounding boxes that has \(\text{overlap} > \Delta\) are considered TP(matched)
miss
: \(m(\tau,\gamma,t,\Delta) = 1\)-
\(\Delta\) is the threshold
-
In the image plane, we declare a miss when the area of the intersection of the two detection boxes is less than \(\Delta\) (with 0 < \(\Delta\) < 1) times the area of the union of the two boxes.
-
On the ground plane, we declare a miss when the positions of the two detections are more than \(\Delta = 1\) meter apart.
-
-
Matches are not costs
-
-
Misses
Sequence matched, few boxes missed
FN
: False negatives. Ground truths which are not detected by trackerFP
: False positives. Hypotheses that do not have appropriate ground truths to match to.
If the sequence is left unmatched, all boxes in it are regarded as missed
- Misses compose costs
- Note: TP of bounding boxes is not the same as IDTP(True Positive Identification)
- IDTP is a higher-level metric considering whole sequences as objects.
Truth-to-result Match
Truth-to-result match is the first step of eval any MOT tracker's output.
- Note: This global optimization is guided by the EDGE COSTS!
A minimum-cost solution to this bipartite matching problem determines a one-to-one matching that minimizes the cumulative false positive and false negative errors, and the overall cost is the number of mis-assigned detections for all types of errors.
[Note💡]: The logic of Truth-to-result Match
- \(E\): edge costs of nodes are the cumulative misses of matching pairs.
- the cumulative misses themselves are not identity metrics.
- Nevertheless, they guide the truth-to-result match
- \(IDF1\) etc., are the sequence-level measure.
- They are based on the calculated result-to-truth assignments.
How are the edge costs calculated?
Note that EDGE COSTS are the cumulative misses, they fall into 2 types:
Sequence matched, few boxes missed
FN
: False negatives. Ground truth not detected by trackerFP
: False positives. Hypothesis that does not have appropriate ground truth to match to.
Sequence left unmatched, all boxes missed
refer to the definition of matches and misses
\(E(i,j)\) means the number of misses that will be incurred if gt
\(i\) was matched to hyp
\(j\)
Pseudo Code for Computing Misses
computing misses incurred by a match
Input:
- ground truth bounding box sequences: \(\mathcal{T}_{\tau} = \tau_{1}(t),\tau_{2}(t),\dots,\tau_{N_{\text{GT}}}(t)\)
- tracker output bounding box sequences: \(\mathcal{T}_{\gamma} = \gamma_{1}(t),\gamma_{2}(t),\dots,\gamma_{N_{\text{hyp.}}}(t)\)
- \(\text{Aff}(i,j,t) \in [0,1)\): overlap of
gt
i andhyp
j in frame t. - \(\Delta\): overlap threshold for determining matches/misses
Algorithm
- \(\mathbf{E} = [0]_{N_{\text{GT}}\times N_{\text{hyp.}}}\)
- for \(i\) in \(N_{\text{GT}}\)
- for \(j\) in \(N_{\text{hyp.}}\)
- \(N = |\tau_{i}|\) (\(|\cdot|\) denotes the number of bounding box in the sequence)
- \(N' = |\gamma_{j}|\)
- \(M = 0\) (#match)
- for \(t\) in video frames
- if \(\text{Aff}(i,j,t) > \Delta\)
- \(M\)++
- if \(\text{Aff}(i,j,t) > \Delta\)
- Inner-sequence \(FP = N-M\)
- Inner-sequence \(FN = N'-M\)
- \(\mathbf{E}(i,j) = FP + FN\)
- for \(j\) in \(N_{\text{hyp.}}\)
Output
- \(\mathbf{E} \in \R^{N_{\text{GT}}\times N_{\text{hyp.}}}\)
Allowance for Unmatched Sequences?
Given \(\mathbf{E}\) as described above, we can perform Hungarian Algorithm DIRECTLY on all of the gt
and hyp
sequences
However, chances are that the are different numbers of gt
and hyp
. In the end, there may be some gt
or hyp
left unmatched.
How to determine which trajectory to abandon?
\(\mathbf{E} \in \R^{N_{\text{GT}}\times N_{\text{hyp.}}}\) is not a square matrix.
Solution 1: Directly perform Minimum-Weight Matching
e.g. (row for gt
, column for hyp
)
Ramshaw and Tarjan show that both of these problems can be reduced to finding a perfect matching in a balanced graph
refer to this link for further explanation.
https://jack.valmadre.net/notes/2020/12/08/non-perfect-linear-assignment/
I could choose \(E(2,2) = 3\) if I want to minimize the total cost. That's insane.😂
If I want to perform an Imperfect Matching with a given matching num, how to DETERMININE the matching num, i.e. how many gt
or hyp
should I abandon, and which type?
😱That's troublesome either.
[💡]: Solution 2: PUNISH unmatched gt
and hyp
- An unmatched
hyp
will all becomeFP
(wrong detections) - An unmatched
gt
will all becomeFN
(missing detections)
We should establish an AUGMENTED cost matrix
- where a trajectory can match to itself
- and the cost is its length.
+ Philosophy here: Abandoning a hypothesis or a ground truth do
refer to [why using irregular nodes] for further discussion.
AUGMENTED cost matrix & irregular nodes
We have \(\mathcal{T}_{\tau} = \tau_{1}(t),\tau_{2}(t),\dots,\tau_{N_{\text{GT}}}(t)\) and \(\mathcal{T}_{\gamma} = \gamma_{1}(t),\gamma_{2}(t),\dots,\gamma_{N_{\text{hyp.}}}(t)\)
Suppose \(N_{\text{GT}} = 5\), \(N_{\text{hyp.}} = 7\)
Here \(\tau_{i} = \{\mathbf{b}_{t_{i1}},\mathbf{b}_{t_{i2}},\dots,\mathbf{b}_{t_{ik}}\}\) denotes sequences of bounding boxes. The same for \(\gamma_{i}\)
\(\mathbf{E}_{fp} \in R^{14 \times 14}\)
\(\mathbf{E}_{fn} \in R^{14\times 14}\)
Now we take, say, \(\tau_{2},|\tau_{2}| = 20\) and \(\gamma_{5}, |\gamma_{5}| = 19\), Here, \(2,5\) are their indices.
regular nodes and irregular nodes & the augmented cost matrix(I name it this way.😄)
(3,1) = 20
in the figure means that the FN part edge weight
between ground truth \(3\) and tracker output \(1\) is NOW 20.
and afterwards, all off the entries in the region of gt-hyp
(FN/FP region) will undergo fn_mat[:num_gt_ids, :num_tracker_ids] -= potential_matches_count
(3,10) = 20
in the figure means that the FN part edge weight
between ground truth \(3\) and irregular gt
node \(10\)(i.e. 7+3) is 20.
Other cost like E(hyp = i1, hyp = i2)
is set to INF.
What does this mean?
fn_mat(gt = i, hyp = j)
= \(|\tau_{i}| - \sum_{t} \mathbb{I}[\text{matched}(\tau_{i}(t),\gamma_{j}(t))]\)fn_mat(gt = i, gt = i)
= \(|\tau_{i}|\)(gt = i, gt = i)
means that \(\tau_{i}\) is left unmatched, all of which becomeFN
It's similar for hyp
as above.
fp_mat(gt = i, hyp = j)
= \(|\gamma_{j}| - \sum_{t} \mathbb{I}[\text{matched}(\tau_{i}(t),\gamma_{j}(t))]\)fp_mat(hyp = j, hyp = j)
= \(|\gamma_{j}|\)(hyp = j, hyp = j)
means that \(\gamma_{}\) is left unmatched, all of which becomeFP
Code Implementation
refer to TrackEval
# Variables counting global association
potential_matches_count = np.zeros((data['num_gt_ids'], data['num_tracker_ids']))
gt_id_count = np.zeros(data['num_gt_ids'])
tracker_id_count = np.zeros(data['num_tracker_ids'])
- loop over every timestep
potential_matches_count[i,j]
stands for the total number of overlapping bounding boxes between \(\tau_{i}\) and \(\gamma_{j}\)
for t, (gt_ids_t, tracker_ids_t) in enumerate(zip(data['gt_ids'], data['tracker_ids'])):
# Count the potential matches between ids in each timestep
matches_mask = np.greater_equal(data['similarity_scores'][t], self.threshold)
match_idx_gt, match_idx_tracker = np.nonzero(matches_mask)
# ==[Note💡]==: tracker_ids_t[match_idx_tracker] extracts the indices of tracker.
potential_matches_count[gt_ids_t[match_idx_gt], tracker_ids_t[match_idx_tracker]] += 1
# Calculate the total number of dets for each gt_id and tracker_id.
gt_id_count[gt_ids_t] += 1
tracker_id_count[tracker_ids_t] += 1
- Calculate Inner sequence misses
# Calculate optimal assignment cost matrix for ID metrics
num_gt_ids = data['num_gt_ids']
num_tracker_ids = data['num_tracker_ids']
fp_mat = np.zeros((num_gt_ids + num_tracker_ids, num_gt_ids + num_tracker_ids))
fn_mat = np.zeros((num_gt_ids + num_tracker_ids, num_gt_ids + num_tracker_ids))
fp_mat[num_gt_ids:, :num_tracker_ids] = 1e10
fn_mat[:num_gt_ids, num_tracker_ids:] = 1e10
#total bboxes
- #overlapping bboxes
= #misses
for gt_id in range(num_gt_ids):
# ==[Note💡]==: gt_id_count[gt_id] is the total number of bounding boxes of each ground truth trajectory
fn_mat[gt_id, :num_tracker_ids] = gt_id_count[gt_id]
# ==[Note💡]==: What is `num_tracker_ids + gt_id`
fn_mat[gt_id, num_tracker_ids + gt_id] = gt_id_count[gt_id]
for tracker_id in range(num_tracker_ids):
# ==[Note💡]==: tracker_id_count[tracker_id] is the total number of bounding boxes of each hypothesis
fp_mat[:num_gt_ids, tracker_id] = tracker_id_count[tracker_id]
fp_mat[tracker_id + num_gt_ids, tracker_id] = tracker_id_count[tracker_id]
# ==[Note💡]==: #gt boxes - #overlapping bboxes = #missing gt boxes(FN)
fn_mat[:num_gt_ids, :num_tracker_ids] -= potential_matches_count
# ==[Note💡]==: #hyp boxes - #overlapping bboxes = #missing gt boxes(FP)
fp_mat[:num_gt_ids, :num_tracker_ids] -= potential_matches_count
sum up FP
and FN
# Hungarian algorithm
match_rows, match_cols = linear_sum_assignment(fn_mat + fp_mat)
linear_sum_assignment
solves the minimum bipartite matching usingHungrian Algorithm
IDTP, IDFP, IDFN
when row i is matched to col j, it means that hypothesis with ID = \(j\) is matched to the ground truth trajectory whose ID $ =i$.
# Hungarian algorithm
match_rows, match_cols = linear_sum_assignment(fn_mat + fp_mat)
# Accumulate basic statistics
res['IDFN'] = fn_mat[match_rows, match_cols].sum().astype(np.int)
res['IDFP'] = fp_mat[match_rows, match_cols].sum().astype(np.int)
res['IDTP'] = (gt_id_count.sum() - res['IDFN']).astype(np.int)
IDP, IDR, IDF1
res['IDR'] = res['IDTP'] / np.maximum(1.0, res['IDTP'] + res['IDFN'])
res['IDP'] = res['IDTP'] / np.maximum(1.0, res['IDTP'] + res['IDFP'])
res['IDF1'] = res['IDTP'] / np.maximum(1.0, res['IDTP'] + 0.5 * res['IDFP'] + 0.5 * res['IDFN'])
return res
How is IDF1 used in MOT challenge?
Trackers in MOT challenge are ranked according to MOTA.
IDF1 is provided yet not directly used for tracking.
Why introducing regular and irregular nodes?
It seems a little tricky to construct regular and irregular nodes
If we directly use scipy.optimize.linear_sum_assignment
, one-sized perfect matching will be performed, in which the smaller one of the gt
and the hyp
set will be forced to become completely matched.
However, striving for possibly most matchings may incur sub-optimal results.
Suppose #gt = 5
, #hyp = 7
. One-sized perfect matching will guarantee that all of the 5 ground truths are matched.
If there is wrongly-annotated sequence \(\tau_{i},|\tau_{i}| = 3\) in test data, then forcing it to match to a hypothesis \(\gamma_{j}, |\gamma_{j} = 59|\) will incur at least \(59-3 = 56\) FP
(from the unmatched boxes in hyp
).
Instaed, we are supposed to abandon it, which only incur at most \(3\) FN
(cardinality of the missing detections of this possibly noisy ground truth annotation). This helps to get a truth-to-result match for better ID consistency.
Irregular nodes actually denote the cost of self-matching. With the square cost matrix, the linear_sum_assignment
library will naturally perform perfect matching, taking self-matching FN
or FP
s into account.
Offline Evaluation/ Offline Tracking
This IDF1 is calculated based on the truth-to-result matching, which comes from the minimum bipartite matching problem with the cost as track misses.
So, we could reckon that the metric is actually the best
possible result from the given tracking outputs(i.e. bounding boxes and IDs).
In real-life online tracking applications, though, we can only determine the truth-to-result
matching(which other paper may refer to as association
) on the fly.
It may not achieve IDF1 as high(or rather optimized) as that computed after tracking the whole sequence.
Difference between IDF1 and IDSW(CLEAR metric)
refer to Comparison with CLEAR MOT in the paper.
- IDSW(frame-level, count crucial frame)
- from the paper Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
- Update matching at every frame.
- Prioritize ID consistency, with overlap as prerequisite
- A gt may be matched to many trajectories and vice versa.
- Merely care about (how often) the crucial frames where changes of ID occur(IDSW++).
- IDF1(Sequence/ID-level, all of the misses are counted)
- from the paper Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking
- Determine
gt-hyp
/truth-to-result
matching with the aim of minimizing sum ofFN
+FP
. - A gt can only be matched to single trajectory and vice versa. Or they will be unmatched, or rather left self-matched. Note that
self-matching
will also introduce many errors. - Comprehensively consider overall ID consistency and
FN
,FP
ref
本文来自博客园,作者:ZXYFrank,转载请注明原文链接:https://www.cnblogs.com/zxyfrank/p/16155336.html