CenterTrack3D: Improved CenterTrack more Suitable for 3D Objects

Author(s):  
Lipeng Gu ◽  
Shaoyuan Sun ◽  
Xunhua Liu ◽  
Xiang Li

Abstract Compared with 2D multi-object tracking algorithms, 3D multi-object tracking algorithms have more research significance and broad application prospects in the unmanned vehicles research field. Aiming at the problem of 3D multi-object detection and tracking, in this paper, the multi-object tracker CenterTrack, which focuses on 2D multi-object tracking task while ignoring object 3D information, is improved mainly from two aspects of detection and tracking, and the improved network is called CenterTrack3D. In terms of detection, CenterTrack3D uses the idea of attention mechanism to optimize the way that the previous-frame image and the heatmap of previous-frame tracklets are added to the current-frame image as input, and second convolutional layer of the output head is replaced by dynamic convolution layer, which further improves the ability to detect occluded objects. In terms of tracking, a cascaded data association algorithm based on 3D Kalman filter is proposed to make full use of the 3D information of objects in the image and increase the robustness of the 3D multi-object tracker. The experimental results show that, compared with the original CenterTrack and the existing 3D multi-object tracking methods, CenterTrack3D achieves 88.75% MOTA for cars and 59.40% MOTA for pedestrians and is very competitive on the KITTI tracking benchmark test set.

Author(s):  
R. Bahmanyar ◽  
S. M. Azimi ◽  
P. Reinartz

Abstract. Geo-referenced real-time vehicle and person tracking in aerial imagery has a variety of applications such as traffic and large-scale event monitoring, disaster management, and also for input into predictive traffic and crowd models. However, object tracking in aerial imagery is still an unsolved challenging problem due to the tiny size of the objects as well as different scales and the limited temporal resolution of geo-referenced datasets. In this work, we propose a new approach based on Convolutional Neural Networks (CNNs) to track multiple vehicles and people in aerial image sequences. As the large number of objects in aerial images can exponentially increase the processing demands in multiple object tracking scenarios, the proposed approach utilizes the stack of micro CNNs, where each micro CNN is responsible for a single-object tracking task. We call our approach Stack of Micro-Single- Object-Tracking CNNs (SMSOT-CNN). More precisely, using a two-stream CNN, we extract a set of features from two consecutive frames for each object, with the given location of the object in the previous frame. Then, we assign each MSOT-CNN the extracted features of each object to predict the object location in the current frame. We train and validate the proposed approach on the vehicle and person sets of the KIT AIS dataset of object tracking in aerial image sequences. Results indicate the accurate and time-efficient tracking of multiple vehicles and people by the proposed approach.


Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2319
Author(s):  
Han Wu ◽  
Chenjie Du ◽  
Zhongping Ji ◽  
Mingyu Gao ◽  
Zhiwei He

Multi-object tracking (MOT) is a significant and widespread research field in image processing and computer vision. The goal of the MOT task consists in predicting the complete tracklets of multiple objects in a video sequence. There are usually many challenges that degrade the performance of the algorithm in the tracking process, such as occlusion and similar objects. However, the existing MOT algorithms based on the tracking-by-detection paradigm struggle to accurately predict the location of the objects that they fail to track in complex scenes, leading to tracking performance decay, such as an increase in the number of ID switches and tracking drifts. To tackle those difficulties, in this study, we design a motion prediction strategy for predicting the location of occluded objects. Since the occluded objects may be legible in earlier frames, we utilize the speed and location of the objects in the past frames to predict the possible location of the occluded objects. In addition, to improve the tracking speed and further enhance the tracking robustness, we utilize efficient YOLOv4-tiny to produce the detections in the proposed algorithm. By using YOLOv4-tiny, the tracking speed of our proposed method improved significantly. The experimental results on two widely used public datasets show that our proposed approach has obvious advantages in tracking accuracy and speed compared with other comparison algorithms. Compared to the Deep SORT baseline, our proposed method has a significant improvement in tracking performance.


2011 ◽  
Vol 2011 ◽  
pp. 1-15 ◽  
Author(s):  
Zhuhan Jiang

We propose to model a tracked object in a video sequence by locating a list of object features that are ranked according to their ability to differentiate against the image background. The Bayesian inference is utilised to derive the probabilistic location of the object in the current frame, with the prior being approximated from the previous frame and the posterior achieved via the current pixel distribution of the object. Consideration has also been made to a number of relevant aspects of object tracking including multidimensional features and the mixture of colours, textures, and object motion. The experiment of the proposed method on the video sequences has been conducted and has shown its effectiveness in capturing the target in a moving background and with nonrigid object motion.


2011 ◽  
Vol 403-408 ◽  
pp. 4968-4973
Author(s):  
Rajendra Kachhava ◽  
Vivek Srivastava ◽  
Rajkumar Jain ◽  
Ekta Chaturvedi

In this paper we propose multiple cameras using real time tracking for surveillance and security system. It is extensively used in the research field of computer vision applications, like that video surveillance, authentication systems, robotics, pre-stage of MPEG4 image compression and user inter faces by gestures. The key components of tracking for surveillance system are extracting the feature, background subtraction and identification of extracted object. Video surveillance, object detection and tracking have drawn a successful increased interest in recent years. A object tracking can be understood as the problem of finding the path (i.e. trajectory) and it can be defined as a procedure to identify the different positions of the object in each frame of a video. Based on the previous work on single detection using single stationary camera, we extend the concept to enable the tracking of multiple object detection under multiple camera and also maintain a security based system by multiple camera to track person in indoor environment, to identify by my proposal system which consist of multiple camera to monitor a person. Present study mainly aims to provide security and detect the moving object in real time video sequences and live video streaming. Based on a robust algorithm for human body detection and tracking in videos created with support of multiple cameras.


2019 ◽  
Vol 70 (3) ◽  
pp. 214-224
Author(s):  
Bui Ngoc Dung ◽  
Manh Dzung Lai ◽  
Tran Vu Hieu ◽  
Nguyen Binh T. H.

Video surveillance is emerging research field of intelligent transport systems. This paper presents some techniques which use machine learning and computer vision in vehicles detection and tracking. Firstly the machine learning approaches using Haar-like features and Ada-Boost algorithm for vehicle detection are presented. Secondly approaches to detect vehicles using the background subtraction method based on Gaussian Mixture Model and to track vehicles using optical flow and multiple Kalman filters were given. The method takes advantages of distinguish and tracking multiple vehicles individually. The experimental results demonstrate high accurately of the method.


2000 ◽  
Author(s):  
Todd Schoepflin ◽  
Christopher Lau ◽  
Rohit Garg ◽  
Donglok Kim ◽  
Yongmin Kim

Author(s):  
M O Elantcev ◽  
I O Arkhipov ◽  
R M Gafarov

The work deals with a method of eliminating the perspective distortion of an image acquired from an unmanned aerial vehicle (UAV) camera in order to transform it to match the parameters of the satellite image. The normalization is performed in one of the two ways. The first variant consists in the calculation of an image transformation matrix based on the camera position and orientation. The second variant is based on matching the current frame with the previous one. The matching results in the shift, rotation, and scale parameters that are used to obtain an initial set of pairs of corresponding keypoints. From this set four pairs are selected to calculate the perspective transformation matrix. This matrix is in turn used to obtain a new set of pairs of corresponding keypoints. The process is repeated while the number of the pairs in the new set exceeds the number in the current one. The accumulated transformation matrix is then multiplied by the transformation matrix obtained during the normalization of the previous frame. The final part presents the results of the method that show that the proposed method can improve the accuracy of the visual navigation system at low computational costs.


Sign in / Sign up

Export Citation Format

Share Document