Object Tracking (Draft)

4 minute read

Overview

Object tracking은 비디오나 이미지에서 특정 객체를 식별하고 그 객체가 프레임 간에 어떻게 이동하는지 추적하는 컴퓨터 비전 기술. 이 기술은 주로 컴퓨터 비전, 영상처리 및 기계 학습 분야에서 사용되며 여러 응용 분야에서 활용.

Object tracking은 주로 다음과 같은 목적으로 사용:

동영상 감시 및 보안: CCTV 카메라로부터 수집된 영상에서 특정 객체(예: 사람, 차량)를 추적하여 경계 침입 또는 이상 행동을 감지합니다.
자동 운행 차량 (Autonomous Vehicles): 자율 주행 차량에서 주변 환경의 객체를 식별하고 추적하여 안전한 운행을 지원합니다.
로봇의 시각적 지각: 로봇이 주변 환경에서 객체를 식별하고 추적하여 작업을 수행하거나 상호 작용할 수 있도록 합니다.
컴퓨터 비전 기반의 상호작용: 게임, 가상 현실 및 증강 현실과 같은 응용 분야에서 특정 객체나 움직이는 물체를 식별하여 상호작용을 제공합니다.
Simple approach for tracking: object detection at each frame.
However, we can also use temporal information.
We can construct solutions based on a detector.
To solve the problem, we need to assoaciate the same object between consecutive frames.
Many methods model the object dynamics, so they can predict its position in the next frame.

Types of tracking problem

moving camera?
single or multiple cameras?
single or multiple objects?
major objects or all objects?
similar or distinct objects?
occlusion?
crossing?
online or offline?
initial object marking?

Tracking Classification

Single Object Tracking (SOT):

tracking of a single object.
It can contain the information of the object being present or not.
It can consider the presence of false positives. Example: ball in robot soccer.

Multi Object Tracking (MOT):

tracking of multiple objects (including objects of the same type).

Online Tracking vs. Offline Tracking

Online Tracking: Estimate current state given current and past observations
Offline Tracking: Estimate all states given all observations (batch mode)
As we consider self-driving, we will focus on online tracking in this lecture

Elements of Tracking

Detection: Where are candidate objects in each frame? (“tracking-by-detection”)
Association: Which detection corresponds to which object?
Filtering: What is the most likely object state, e.g., location and size? (Detections are noisy ⇒ exploit probabilistic observation/motion models)

Filtering

frequency domain approach

In general, for online tracking, the most popular filters are stochastic filters, which are based on the so-called Bayes filter.

Beyond filtering the signal, the Bayes filter considers a dynamics, so it can predict the position in the next instant and mitigate delay.

Bayesian Filtering

Idea: integrate motion and observation.

Kalman Filter

Specialization of the Bayes filter.

Association

In self-driving, we typically have to track multiple objects at the same time How can we associate detections in a new frame to existing object tracks?

Algorithm

Predict objects from previous frame and detect objects in current frame
Associate detections to object tracks (initiate/delete tracks if necessary)
Correct predictions with observations (e.g., Kalman Filter)

When do observations in consecutive frames belong together?

Predict bounding box (via motion model) and measure overlap
Compare color histograms or normalized cross-correlation
Estimate optical flow and measure agreement
Compare relative location and size of bounding box
Compare orientation of detected objects

Simple Online Realtime Tracking (SORT)

Very popular approach for object tracking.
Faster R-CNN as the object detector.
MOT based on the Kalman filter.
A filter for each object being tracked (tracklet).
Association based on the Hungarian algorithm.
Heuristics to create and remove objects being tracked.
Separates detection and tracking.
Requires training only the object detector!
Very easy to adapt to other object detectors, because the tracking part doesn’t change.
Potentially, there is loss of performance by not considering detection and tracking as a single problem.

[Beley et al., 2017, Simple Online and Realtime Tracking]

DeepSORT

[Wojke et al., 2017, Simple Online and Realtime Tracking with a Deep Association Metric]

Metric

We can adopt metrics for object detection, such as mAP, accuracy, precision etc. However, tracking has its own challenges.

HOTA(Higher Order Tracking Accuracy)
MOTA(Multiple Object Tracking Accuracy)
MOTP(Multiple Object Tracking Precision)
IDF1(Identification F1 Score)
MT
- number of tracked trajectories during most of the time. We consider trajectories that were tracked for at least 80% of its time of existence.
ML
- number of lost trajectories during most of the time. We consider trajectories that were not tracked for at least 20% of its time of existence.
FP
- number of false positives.
FN
- number of false negatives.
IDSW
- number of incorrect id switches.
Frag
- number of fragmentations (when a tracking is incorrectly interrupted).

Reference

For Bayes filter and Kalman filter: Probabilistic Robotics.
SORT and DeepSORT papers:
- Beley et al., 2017, Simple Online and Realtime Tracking.
- Wojke et al., 2017, Simple Online and Realtime Tracking with a Deep Association Metric.
Metrics for MOT
- Milan et al., 2016, A Benchmark for Multi-Object Tracking.
Deep learning in video multi-object tracking: A survey
- Gioele Ciaparrone, Francisco Luque Sánchez, Siham Tabik, Luigi Troiano, Roberto Tagliaferri, Francisco Herrer Neurocomputing 381 (2020) 61–88