Deep Learning-Based Object Tracking via Compressed Domain Residual Frames

The extensive rise of high-definition CCTV camera footage has stimulated both the data compression and the data analysis research fields. The increased awareness of citizens to the vulnerability of their private information, creates a third challenge for the video surveillance community that also has to encompass privacy protection. In this paper, we aim to tackle those needs by proposing a deep learning-based object tracking solution via compressed domain residual frames. The goal is to be able to provide a public and privacy-friendly image representation for data analysis. In this work, we explore a scenario where the tracking is achieved directly on a restricted part of the information extracted from the compressed domain. We utilize exclusively the residual frames already generated by the video compression codec to train and test our network. This very compact representation also acts as an information filter, which limits the amount of private information leakage in a video stream. We manage to show that using residual frames for deep learning-based object tracking can be just as effective as using classical decoded frames. More precisely, the use of residual frames is particularly beneficial in simple video surveillance scenarios with non-overlapping and continuous traffic.

Download Full-text

Video Compression Standards for High Definition Video: A Comparative Study of High Efficiency Video Coding and H.264/MPEG-4 AVC

i-manager s Journal on Communication Engineering and Systems ◽

10.26634/jcs.2.4.2474 ◽

2013 ◽

Vol 2 (4) ◽

pp. 14-19

Author(s):

Geethu Raj ◽

M. Kannan

Keyword(s):

Comparative Study ◽

Video Coding ◽

Video Compression ◽

High Efficiency ◽

High Efficiency Video Coding ◽

High Definition ◽

High Definition Video

Download Full-text

CONVOLUTIONAL NEURAL NETWORKS, ANALYTICAL ALGORITHMS, AND PERSONALIZED HEALTH CARE: EMBRACING THE MASSIVE DATA ANALYSIS CAPABILITIES OF DEEP LEARNING ARTIFICIAL INTELLIGENCE SYSTEMS TO COMPLEMENT AND IMPROVE MEDICAL SERVICES

American Journal of Medical Research ◽

10.22381/ajmr5220187 ◽

2018 ◽

Vol 5 (2) ◽

pp. 52 ◽

Cited By ~ 1

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Health Care ◽

Deep Learning ◽

Data Analysis ◽

Convolutional Neural Networks ◽

Massive Data ◽

Personalized Health ◽

Personalized Health Care ◽

Artificial Intelligence Systems

Download Full-text

VDA: Deep Learning based Visual Data Analysis in Integrated Edge to Cloud Computing Environment

Adjunct Proceedings of the 2021 International Conference on Distributed Computing and Networking ◽

10.1145/3427477.3429781 ◽

2020 ◽

Author(s):

Atanu Mandal ◽

Amir Sinaeepourfard ◽

Sudip Kumar Naskar

Keyword(s):

Cloud Computing ◽

Deep Learning ◽

Data Analysis ◽

Visual Data ◽

Computing Environment ◽

Cloud Computing Environment

Download Full-text

Real-time multiple object tracking using deep learning methods

Neural Computing and Applications ◽

10.1007/s00521-021-06391-y ◽

2021 ◽

Author(s):

Dimitrios Meimetis ◽

Ioannis Daramouskas ◽

Isidoros Perikos ◽

Ioannis Hatzilygeroudis

Keyword(s):

Deep Learning ◽

Object Tracking ◽

Real Time ◽

Multiple Object Tracking ◽

Learning Methods ◽

Multiple Object

Download Full-text

Multiple Pedestrians and Vehicles Tracking in Aerial Imagery Using a Convolutional Neural Network

Remote Sensing ◽

10.3390/rs13101953 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1953

Author(s):

Seyed Majid Azimi ◽

Maximilian Kraus ◽

Reza Bahmanyar ◽

Peter Reinartz

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Object Tracking ◽

Short Term Memory ◽

Aerial Imagery ◽

Future Research ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

In this paper, we address various challenges in multi-pedestrian and vehicle tracking in high-resolution aerial imagery by intensive evaluation of a number of traditional and Deep Learning based Single- and Multi-Object Tracking methods. We also describe our proposed Deep Learning based Multi-Object Tracking method AerialMPTNet that fuses appearance, temporal, and graphical information using a Siamese Neural Network, a Long Short-Term Memory, and a Graph Convolutional Neural Network module for more accurate and stable tracking. Moreover, we investigate the influence of the Squeeze-and-Excitation layers and Online Hard Example Mining on the performance of AerialMPTNet. To the best of our knowledge, we are the first to use these two for regression-based Multi-Object Tracking. Additionally, we studied and compared the L1 and Huber loss functions. In our experiments, we extensively evaluate AerialMPTNet on three aerial Multi-Object Tracking datasets, namely AerialMPT and KIT AIS pedestrian and vehicle datasets. Qualitative and quantitative results show that AerialMPTNet outperforms all previous methods for the pedestrian datasets and achieves competitive results for the vehicle dataset. In addition, Long Short-Term Memory and Graph Convolutional Neural Network modules enhance the tracking performance. Moreover, using Squeeze-and-Excitation and Online Hard Example Mining significantly helps for some cases while degrades the results for other cases. In addition, according to the results, L1 yields better results with respect to Huber loss for most of the scenarios. The presented results provide a deep insight into challenges and opportunities of the aerial Multi-Object Tracking domain, paving the way for future research.

Download Full-text

Deep Learning of Appearance Affinity for Multi-Object Tracking and Re-Identification: A Comparative View

Electronics ◽

10.3390/electronics9111757 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1757

Author(s):

María J. Gómez-Silva ◽

Arturo de la Escalera ◽

José M. Armingol

Keyword(s):

Deep Learning ◽

Object Tracking ◽

Loss Function ◽

Neural Model ◽

Training Data ◽

Learning Approaches ◽

The Core ◽

Triplet Loss ◽

Affinity Model

Recognizing the identity of a query individual in a surveillance sequence is the core of Multi-Object Tracking (MOT) and Re-Identification (Re-Id) algorithms. Both tasks can be addressed by measuring the appearance affinity between people observations with a deep neural model. Nevertheless, the differences in their specifications and, consequently, in the characteristics and constraints of the available training data for each one of these tasks, arise from the necessity of employing different learning approaches to attain each one of them. This article offers a comparative view of the Double-Margin-Contrastive and the Triplet loss function, and analyzes the benefits and drawbacks of applying each one of them to learn an Appearance Affinity model for Tracking and Re-Identification. A batch of experiments have been conducted, and their results support the hypothesis concluded from the presented study: Triplet loss function is more effective than the Contrastive one when an Re-Id model is learnt, and, conversely, in the MOT domain, the Contrastive loss can better discriminate between pairs of images rendering the same person or not.

Download Full-text

Automatic Handgun Detection with Deep Learning in Video Surveillance Images

Applied Sciences ◽

10.3390/app11136085 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6085

Author(s):

Jesus Salido ◽

Vanesa Lomas ◽

Jesus Ruiz-Santaquiteria ◽

Oscar Deniz

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Video Surveillance ◽

Automatic Detection ◽

Public Spaces ◽

Detection Methods ◽

Training Dataset ◽

Average Precision ◽

Terrorist Acts

There is a great need to implement preventive mechanisms against shootings and terrorist acts in public spaces with a large influx of people. While surveillance cameras have become common, the need for monitoring 24/7 and real-time response requires automatic detection methods. This paper presents a study based on three convolutional neural network (CNN) models applied to the automatic detection of handguns in video surveillance images. It aims to investigate the reduction of false positives by including pose information associated with the way the handguns are held in the images belonging to the training dataset. The results highlighted the best average precision (96.36%) and recall (97.23%) obtained by RetinaNet fine-tuned with the unfrozen ResNet-50 backbone and the best precision (96.23%) and F1 score values (93.36%) obtained by YOLOv3 when it was trained on the dataset including pose information. This last architecture was the only one that showed a consistent improvement—around 2%—when pose information was expressly considered during training.

Download Full-text