An Overview of Object Tracking: Use Cases, Challenges, and Applications

6 min readFeb 28, 2024

Overview

Object tracking is a computer vision application where a program detects, identifies, and locates objects and then tracks their movements across frames in a video, even in cases where objects are partially or fully occluded by other objects in the scene. It is different from object detection as object detection is limited to a single frame or image.

Source: An Introduction to BYTETrack: Multi-Object Tracking by Associating Every Detection Box

Object tracking algorithms can be categorized into different types based on the task. The followings are some of the most common object tracking types:

Image Tracking

In image tracking, a 2D image of object, e.g. a person or vehicle, is continuously tracked as it moves across video frames.

In the above example, an image of a person can be used to track in a video.

Single Object Tracking

Single object tracking involves tracking only one target at a time in a video. The target and bounding box coordinates are specified in the first frame or image, and they are recognized and tracked in subsequent frames and images.

Single object tracking (Source: OpenCV Object Tracking — PyImageSearch )

Multiple Object Tracking

Multiple object tracking is the same as single object tracking counterpart but involves tracking multiple objects. The tracking algorithms must first determine the number of objects in each frame, and then keep track of each object’s identity from frame to frame.

Multi object tracking (Source: Object tracking from scratch — OpenCV and python — Pysource )

Use Cases of Object Tracking

Surveillance

Real-time object tracking algorithms are used for surveillance. For example, during the pandemic, object tracking algorithms were often used for crowd monitoring, i.e. to track if people were maintaining social distance in public areas.

Retail

In retail, object tracking is often used to track customers and products — an example of this is the Amazon Go stores, where cashierless checkout systems are used to track each customer and their picks, allowing the algorithms to determine which products are put into the basket by the customer in real-time and then generate an automated receipt when the customer crosses the checkout area.

Autonomous Vehicles

Perhaps the most widely-known use case is self-driving cars. In this case, object tracking has several purposes such as obstacle detection, pedestrian detection, trajectory estimation, collision avoidance, vehicle speed estimation, traffic monitoring, and route estimation.

What are the challenges of object tracking?

Training and Tracking Speed

Object tracking algorithms are supposed to not only accurately detect and localize objects of interest but also do so in the least amount of time possible. Usually, speed and accuracy are trade offs of each other. Therefore, design choices also influence the balance between speed and accuracy.

Occlusions and Appearance Variation

The backgrounds in the image impact the accuracy of the model. If there are lots of occlusions, the model might struggle with detecting the objects in the image. Moreover, from frame to frame, each object might appear differently in terms of shape, size and orientation. This is usually due to illumination and spatial changes.

Object Tracking with NVIDIA DeepStream SDK

DeepStream gives the user a faster and easier way to develop vision AI applications and services. A user can even deploy them on-premises, on the edge, and in the cloud with just a click of a button. NVIDIA’s DeepStream SDK is a complete streaming analytics toolkit based on GStreamer for AI-based multi-sensor processing, video, audio, and image understanding.

The NVIDIA DeepStream SDK offers GPU-accelerated multi-object trackers (MOT). In recent version of DeepStream SDK release, the multi-object trackers add significant improvements to tackle challenging occlusion issues effectively. They do this by leveraging deep neural network–based re-identification (ReID) models for target matching and association. In DeepStream, there are four object tracker types for the user to choose (Source: Gst-nvtracker — DeepStream documentation 6.4 documentation ).

IOU Tracker: The Intersection-Over-Union (IOU) tracker uses the IOU values among the detector’s bounding boxes between the two consecutive frames to perform the association between them or assign a new target ID if no match found. This tracker includes a logic to handle false positives and false negatives from the object detector; however, this can be considered as the bare-minimum object tracker, which may serve as a baseline only.
NvSORT: The NvSORT tracker is the NVIDIA®-enhanced Simple Online and Realtime Tracking (SORT) algorithm. Instead of a simple bipartite matching algorithm, NvSORT uses a cascaded data association based on bounding box (bbox) proximity for associating bboxes over consecutive frames and applies a Kalman filter to update the target states. It is computationally efficient since it does not involve any pixel data processing.
NvDeepSORT: The NvDeepSORT tracker is the NVIDIA®-enhanced Online and Realtime Tracking with a Deep Association Metric (DeepSORT) algorithm, which uses the deep cosine metric learning with a Re-ID neural network for data association of multiple objects over frames. This implementation allows users to use any Re-ID network as long as it is supported by NVIDIA’s TensorRT™ framework. NvDeepSORT also uses a cascaded data association instead of a simple bipartite matching. The implementation is also optimized for efficient processing on GPU.
NvDCF: The NvDCF tracker is an online multi-object tracker that employs a discriminative correlation filter for visual object tracking, which allows independent object tracking even when detection results are not available. It uses the combination of the correlation filter responses and bounding box proximity for data association.

Object Tracking at Sertis

Person Analytics

In this use case, in addition to tracking person ID, we also track position and direction of each person. The table below summarizes the features.

Source (Gst-nvdsanalytics — DeepStream documentation 6.4 documentation )

Vehicle Tracking

In this use case, in addition to tracking vehicle ID, we also track vehicle meta data such as car make and color.

Conclusion

Object tracking in computer vision is an exciting and rapidly evolving field with numerous applications. Researchers have developed a wide range of techniques to handle various challenges in object tracking. DeepStream SDK is a tool that helps us build and deploy computer vision, e.g. object tracking application, easily. With the increasing availability of tools, large-scale datasets and powerful computing resources, we can expect to see even more significant advancements in the field in the coming years.

Written by: Sertis Vision Lab

Originally published at https://www.sertiscorp.com/sertis-ai-research