IDF1/ Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking 多目标跟踪 ID-based 指标详解

Existing performance measures such as CLEAR MOT report how often a tracker makes what types of incorrect decisions. We argue that some system users may instead be more interested in how well they can determine who is where at all times.

Motivation

CLEAR Metrcs: Event-based
- can help to pinpoint diffenrent types of errors.
Identity-based measures
- how well computed identities conform to true identities
- disregarding where or why mistakes occur

Our measures apply both within and across cameras.

Once that choice is made, every frame in which A is assigned to the wrong computed identity is a frame in which the tracker is in error.

In some scenarios like sports, security, or surveillance, preserving identity is crucial.

We should consider the coverage ratio of consistent ID over all frames, instead of the consistency itself at certain frames where changes occur, e.g. the fragmentations.

In this paper, it also mentioned Handover

We want to focus on Id-based metrics here. Refer to the paper if you're interested.

Identity-based measures

we propose to measure performance not by how often mismatches occur, but by how long the tracker correctly identities targets.

[Note💡]: Hereinafter, I will use ground truth/gt/$\tau_{i}$ to represent ground truth trajectories, and hypotheses/hyp./$\gamma_{j}$, tracker output/ tracker output trajectory/$\tau_{i}$ to represent the sequences of bounding boxes provided by trackers.

And the interpretation is not rigorously consistent with the paper, especially the terminologies.

However, these interpretations do help me to understand the core idea of IDF1.

Definitions of Matching and Missings(bounding box level)

When we say a hyp is matched to a gt, we actually mean that they have the same ID.

As mentioned above, we want to find a metric focusing on how long trackers can persist an ID. For this goal, we should establish a global matching which will achieve highest consistent ID coverage

In the end, some(hopefully most) trajectories will get matched to some gt.

It's important to know that in the final matchings, any hypothesis sequence(i.e. sequence of bounding boxes with the same ID), must have single matched gt throughout the whole video.

Or rather, we do not allow ID switch within a sequence.

This is one of the diffefences between IDF1 and IDSW in CLEAR.

For example, suppose $\gamma_{5}$ is finally decided to be matched to $\tau_{3}$.

Then $\tau_{3}$ and $\gamma_{5}$ will both be occupied and cannot be matched to others.

If another hypothesis, say $\gamma_{4}$, can only been(or rather was willing to be) matched to $\tau_{3}$ with a minimal loss, given the fact that it has been occupied, $\gamma_{4}$ will become unmatched, which means all of the bounding boxes $\mathbf{b} \in \gamma_{4}(t)$ become FN, i.e. the missing detections.

- Note: FN(bbox-level) is not equal to IDFN(sequence-level)

Matches
- $\gamma_{j}$ is matched to $\tau_{i}$
- the bounding boxes that has $\text{overlap} > \Delta$ are considered TP(matched)
  miss: $m(\tau,\gamma,t,\Delta) = 1$
  - $\Delta$ is the threshold
  - In the image plane, we declare a miss when the area of the intersection of the two detection boxes is less than $\Delta$ (with 0 < $\Delta$ < 1) times the area of the union of the two boxes.
  - On the ground plane, we declare a miss when the positions of the two detections are more than $\Delta = 1$ meter apart.
- Matches are not costs
Misses
- Sequence matched, few boxes missed
  - FN: False negatives. Ground truths which are not detected by tracker
  - FP: False positives. Hypotheses that do not have appropriate ground truths to match to.
- If the sequence is left unmatched, all boxes in it are regarded as missed
- Misses compose costs

- Note: TP of bounding boxes is not the same as IDTP(True Positive Identification)
- IDTP is a higher-level metric considering whole sequences as objects.

Truth-to-result Match

Truth-to-result match is the first step of eval any MOT tracker's output.

- Note: This global optimization is guided by the EDGE COSTS!

A minimum-cost solution to this bipartite matching problem determines a one-to-one matching that minimizes the cumulative false positive and false negative errors, and the overall cost is the number of mis-assigned detections for all types of errors.

[Note💡]: The logic of Truth-to-result Match

$E$: edge costs of nodes are the cumulative misses of matching pairs.
- the cumulative misses themselves are not identity metrics.
- Nevertheless, they guide the truth-to-result match
$IDF1$ etc., are the sequence-level measure.
- They are based on the calculated result-to-truth assignments.

How are the edge costs calculated?

Note that EDGE COSTS are the cumulative misses, they fall into 2 types:

Sequence matched, few boxes missed
- FN: False negatives. Ground truth not detected by tracker
- FP: False positives. Hypothesis that does not have appropriate ground truth to match to.
Sequence left unmatched, all boxes missed

refer to the definition of matches and misses

$E(i,j)$ means the number of misses that will be incurred if gt $i$ was matched to hyp $j$

Pseudo Code for Computing Misses

computing misses incurred by a match

Input:

ground truth bounding box sequences: $\mathcal{T}_{\tau} = \tau_{1}(t),\tau_{2}(t),\dots,\tau_{N_{\text{GT}}}(t)$
tracker output bounding box sequences: $\mathcal{T}_{\gamma} = \gamma_{1}(t),\gamma_{2}(t),\dots,\gamma_{N_{\text{hyp.}}}(t)$
$\text{Aff}(i,j,t) \in [0,1)$: overlap of gt i and hyp j in frame t.
$\Delta$: overlap threshold for determining matches/misses

Algorithm

$\mathbf{E} = [0]_{N_{\text{GT}}\times N_{\text{hyp.}}}$

for $i$ in $N_{\text{GT}}$
- for $j$ in $N_{\text{hyp.}}$
  - $N = |\tau_{i}|$ ($|\cdot|$ denotes the number of bounding box in the sequence)
  - $N' = |\gamma_{j}|$
  - $M = 0$ (#match)
  - for $t$ in video frames
    - if $\text{Aff}(i,j,t) > \Delta$
      - $M$++
  - Inner-sequence $FP = N-M$
  - Inner-sequence $FN = N'-M$
  - $\mathbf{E}(i,j) = FP + FN$

Output

$\mathbf{E} \in \R^{N_{\text{GT}}\times N_{\text{hyp.}}}$

Allowance for Unmatched Sequences?

Given $\mathbf{E}$ as described above, we can perform Hungarian Algorithm DIRECTLY on all of the gt and hyp sequences

However, chances are that the are different numbers of gt and hyp. In the end, there may be some gt or hyp left unmatched.

How to determine which trajectory to abandon?

$\mathbf{E} \in \R^{N_{\text{GT}}\times N_{\text{hyp.}}}$ is not a square matrix.

Solution 1: Directly perform Minimum-Weight Matching

e.g. (row for gt, column for hyp)

\[\begin{bmatrix} 9 & 5 & 7 & 8\\ 14 & 3 & 6 & 22\\ 17 & 10 & 4 & 13\\ \end{bmatrix}\]

Ramshaw and Tarjan show that both of these problems can be reduced to finding a perfect matching in a balanced graph

refer to this link for further explanation.

https://jack.valmadre.net/notes/2020/12/08/non-perfect-linear-assignment/

I could choose $E(2,2) = 3$ if I want to minimize the total cost. That's insane.😂

If I want to perform an Imperfect Matching with a given matching num, how to DETERMININE the matching num, i.e. how many gt or hyp should I abandon, and which type?

😱That's troublesome either.

[💡]: Solution 2: PUNISH unmatched gt and hyp

An unmatched hyp will all become FP(wrong detections)
An unmatched gt will all become FN(missing detections)

We should establish an AUGMENTED cost matrix

where a trajectory can match to itself
and the cost is its length.

+ Philosophy here: Abandoning a hypothesis or a ground truth do

refer to [why using irregular nodes] for further discussion.

AUGMENTED cost matrix & irregular nodes

We have $\mathcal{T}_{\tau} = \tau_{1}(t),\tau_{2}(t),\dots,\tau_{N_{\text{GT}}}(t)$ and $\mathcal{T}_{\gamma} = \gamma_{1}(t),\gamma_{2}(t),\dots,\gamma_{N_{\text{hyp.}}}(t)$

Suppose $N_{\text{GT}} = 5$, $N_{\text{hyp.}} = 7$

Here $\tau_{i} = \{\mathbf{b}_{t_{i1}},\mathbf{b}_{t_{i2}},\dots,\mathbf{b}_{t_{ik}}\}$ denotes sequences of bounding boxes. The same for $\gamma_{i}$

$\mathbf{E}_{fp} \in R^{14 \times 14}$

$\mathbf{E}_{fn} \in R^{14\times 14}$

Now we take, say, $\tau_{2},|\tau_{2}| = 20$ and $\gamma_{5}, |\gamma_{5}| = 19$, Here, $2,5$ are their indices.

regular nodes and irregular nodes & the augmented cost matrix(I name it this way.😄)

(3,1) = 20 in the figure means that the FN part edge weight between ground truth $3$ and tracker output $1$ is NOW 20.

and afterwards, all off the entries in the region of gt-hyp(FN/FP region) will undergo fn_mat[:num_gt_ids, :num_tracker_ids] -= potential_matches_count

(3,10) = 20 in the figure means that the FN part edge weight between ground truth $3$ and irregular gt node $10$(i.e. 7+3) is 20.

Other cost like E(hyp = i1, hyp = i2) is set to INF.

What does this mean?

fn_mat(gt = i, hyp = j) = $|\tau_{i}| - \sum_{t} \mathbb{I}[\text{matched}(\tau_{i}(t),\gamma_{j}(t))]$
fn_mat(gt = i, gt = i) = $|\tau_{i}|$
- (gt = i, gt = i) means that $\tau_{i}$ is left unmatched, all of which become FN

It's similar for hyp as above.

fp_mat(gt = i, hyp = j) = $|\gamma_{j}| - \sum_{t} \mathbb{I}[\text{matched}(\tau_{i}(t),\gamma_{j}(t))]$
fp_mat(hyp = j, hyp = j) = $|\gamma_{j}|$
- (hyp = j, hyp = j) means that $\gamma_{}$ is left unmatched, all of which become FP

Code Implementation

refer to TrackEval

# Variables counting global association
potential_matches_count = np.zeros((data['num_gt_ids'], data['num_tracker_ids']))
gt_id_count = np.zeros(data['num_gt_ids'])
tracker_id_count = np.zeros(data['num_tracker_ids'])

loop over every timestep
- potential_matches_count[i,j] stands for the total number of overlapping bounding boxes between $\tau_{i}$ and $\gamma_{j}$

for t, (gt_ids_t, tracker_ids_t) in enumerate(zip(data['gt_ids'], data['tracker_ids'])):
    # Count the potential matches between ids in each timestep
    matches_mask = np.greater_equal(data['similarity_scores'][t], self.threshold)
    match_idx_gt, match_idx_tracker = np.nonzero(matches_mask)
    # ==[Note💡]==: tracker_ids_t[match_idx_tracker] extracts the indices of tracker.
    potential_matches_count[gt_ids_t[match_idx_gt], tracker_ids_t[match_idx_tracker]] += 1

    # Calculate the total number of dets for each gt_id and tracker_id.
    gt_id_count[gt_ids_t] += 1
    tracker_id_count[tracker_ids_t] += 1

Calculate Inner sequence misses

# Calculate optimal assignment cost matrix for ID metrics
num_gt_ids = data['num_gt_ids']
num_tracker_ids = data['num_tracker_ids']
fp_mat = np.zeros((num_gt_ids + num_tracker_ids, num_gt_ids + num_tracker_ids))
fn_mat = np.zeros((num_gt_ids + num_tracker_ids, num_gt_ids + num_tracker_ids))
fp_mat[num_gt_ids:, :num_tracker_ids] = 1e10
fn_mat[:num_gt_ids, num_tracker_ids:] = 1e10

#total bboxes - #overlapping bboxes = #misses

for gt_id in range(num_gt_ids):
    # ==[Note💡]==: gt_id_count[gt_id] is the total number of bounding boxes of each ground truth trajectory
    fn_mat[gt_id, :num_tracker_ids] = gt_id_count[gt_id]
    # ==[Note💡]==: What is `num_tracker_ids + gt_id`
    fn_mat[gt_id, num_tracker_ids + gt_id] = gt_id_count[gt_id]
for tracker_id in range(num_tracker_ids):
    # ==[Note💡]==: tracker_id_count[tracker_id] is the total number of bounding boxes of each hypothesis
    fp_mat[:num_gt_ids, tracker_id] = tracker_id_count[tracker_id]
    fp_mat[tracker_id + num_gt_ids, tracker_id] = tracker_id_count[tracker_id]
# ==[Note💡]==: #gt boxes - #overlapping bboxes = #missing gt boxes(FN)
fn_mat[:num_gt_ids, :num_tracker_ids] -= potential_matches_count
# ==[Note💡]==: #hyp boxes - #overlapping bboxes = #missing gt boxes(FP)
fp_mat[:num_gt_ids, :num_tracker_ids] -= potential_matches_count

sum up FP and FN

# Hungarian algorithm
match_rows, match_cols = linear_sum_assignment(fn_mat + fp_mat)

linear_sum_assignment solves the minimum bipartite matching using Hungrian Algorithm

IDTP, IDFP, IDFN

when row i is matched to col j, it means that hypothesis with ID = $j$ is matched to the ground truth trajectory whose ID $ =i$.

# Hungarian algorithm
match_rows, match_cols = linear_sum_assignment(fn_mat + fp_mat)
# Accumulate basic statistics
res['IDFN'] = fn_mat[match_rows, match_cols].sum().astype(np.int)
res['IDFP'] = fp_mat[match_rows, match_cols].sum().astype(np.int)
res['IDTP'] = (gt_id_count.sum() - res['IDFN']).astype(np.int)

IDP, IDR, IDF1

res['IDR'] = res['IDTP'] / np.maximum(1.0, res['IDTP'] + res['IDFN'])
res['IDP'] = res['IDTP'] / np.maximum(1.0, res['IDTP'] + res['IDFP'])
res['IDF1'] = res['IDTP'] / np.maximum(1.0, res['IDTP'] + 0.5 * res['IDFP'] + 0.5 * res['IDFN'])
return res

How is IDF1 used in MOT challenge?

Trackers in MOT challenge are ranked according to MOTA.

IDF1 is provided yet not directly used for tracking.

Why introducing regular and irregular nodes?

It seems a little tricky to construct regular and irregular nodes

If we directly use scipy.optimize.linear_sum_assignment, one-sized perfect matching will be performed, in which the smaller one of the gt and the hyp set will be forced to become completely matched.

However, striving for possibly most matchings may incur sub-optimal results.

Suppose #gt = 5, #hyp = 7. One-sized perfect matching will guarantee that all of the 5 ground truths are matched.

If there is wrongly-annotated sequence $\tau_{i},|\tau_{i}| = 3$ in test data, then forcing it to match to a hypothesis $\gamma_{j}, |\gamma_{j} = 59|$ will incur at least $59-3 = 56$ FP(from the unmatched boxes in hyp).

Instaed, we are supposed to abandon it, which only incur at most $3$ FN(cardinality of the missing detections of this possibly noisy ground truth annotation). This helps to get a truth-to-result match for better ID consistency.

Irregular nodes actually denote the cost of self-matching. With the square cost matrix, the linear_sum_assignment library will naturally perform perfect matching, taking self-matching FN or FPs into account.

Offline Evaluation/ Offline Tracking

This IDF1 is calculated based on the truth-to-result matching, which comes from the minimum bipartite matching problem with the cost as track misses.

So, we could reckon that the metric is actually the best possible result from the given tracking outputs(i.e. bounding boxes and IDs).

In real-life online tracking applications, though, we can only determine the truth-to-result matching(which other paper may refer to as association) on the fly.

It may not achieve IDF1 as high(or rather optimized) as that computed after tracking the whole sequence.

Difference between IDF1 and IDSW(CLEAR metric)

refer to Comparison with CLEAR MOT in the paper.

IDSW(frame-level, count crucial frame)
- from the paper Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
- Update matching at every frame.
- Prioritize ID consistency, with overlap as prerequisite
- A gt may be matched to many trajectories and vice versa.
- Merely care about (how often) the crucial frames where changes of ID occur(IDSW++).
IDF1(Sequence/ID-level, all of the misses are counted)
- from the paper Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking
- Determine gt-hyp/truth-to-result matching with the aim of minimizing sum of FN + FP.
- A gt can only be matched to single trajectory and vice versa. Or they will be unmatched, or rather left self-matched. Note that self-matching will also introduce many errors.
- Comprehensively consider overall ID consistency and FN,FP

ref

Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking

posted @ 2022-04-17 11:46 ZXYFrank 阅读(369) 评论(0) 编辑收藏举报

刷新页面返回顶部

Loading

ZXYFrank

Enjoy the process🍀