scholarly journals End-to-End Deep One-Class Learning for Anomaly Detection in UAV Video Stream

2021 ◽  
Vol 7 (5) ◽  
pp. 90
Author(s):  
Slim Hamdi ◽  
Samir Bouindour ◽  
Hichem Snoussi ◽  
Tian Wang ◽  
Mohamed Abid

In recent years, the use of drones for surveillance tasks has been on the rise worldwide. However, in the context of anomaly detection, only normal events are available for the learning process. Therefore, the implementation of a generative learning method in an unsupervised mode to solve this problem becomes fundamental. In this context, we propose a new end-to-end architecture capable of generating optical flow images from original UAV images and extracting compact spatio-temporal characteristics for anomaly detection purposes. It is designed with a custom loss function as a sum of three terms, the reconstruction loss (Rl), the generation loss (Gl) and the compactness loss (Cl) to ensure an efficient classification of the “deep-one” class. In addition, we propose to minimize the effect of UAV motion in video processing by applying background subtraction on optical flow images. We tested our method on very complex datasets called the mini-drone video dataset, and obtained results surpassing existing techniques’ performances with an AUC of 85.3.

2018 ◽  
pp. 1431-1460
Author(s):  
Jeyabharathi D ◽  
Dejey D

Developing universal methods for background subtraction and object tracking is one of the critical and hardest challenges in many video processing and computer-vision applications. To achieve superior foreground detection quality across unconstrained scenarios, a novel Two Layer Rotational Symmetry Dynamic Texture (RSDT) model is proposed, which avoids illumination variations by using two layers of spatio temporal patches. Spatio temporal patches describe both motion and appearance parameters in a video sequence. The concept of key frame is used to avoid redundant samples. Auto Regressive Integrated Moving Average model (ARIMA) (Hyndman & Rob, 2015) estimates the statistical parameters from the subspace. Uniform Local Derivative Pattern (LDP) (Zhang et al., 2010) acts as a feature for tracking objects in a video. Extensive experimental evaluations on a wide range of benchmark datasets validate the efficiency of RSDT compared to Center Symmetric Spatio Temporal Local Ternary Pattern (CS-STLTP) (Lin et al., 2015) for unconstrained video analytics.


Author(s):  
Kadriye Oz ◽  
Ismail Rakip Karas

In this paper, we present an anomaly detection and localization system for surveillance systems. A new feature descriptor is proposed. The spatio-temporal identifiers are obtained by using optical flow histogram and the structural similarity index from the videos that contain normal conditions. An artificial neural network, Selforganizing maps are used in modeling. The proposed system has been tested on the UCSD dataset.


Author(s):  
Jeyabharathi D ◽  
Dejey D

Developing universal methods for background subtraction and object tracking is one of the critical and hardest challenges in many video processing and computer-vision applications. To achieve superior foreground detection quality across unconstrained scenarios, a novel Two Layer Rotational Symmetry Dynamic Texture (RSDT) model is proposed, which avoids illumination variations by using two layers of spatio temporal patches. Spatio temporal patches describe both motion and appearance parameters in a video sequence. The concept of key frame is used to avoid redundant samples. Auto Regressive Integrated Moving Average model (ARIMA) (Hyndman & Rob, 2015) estimates the statistical parameters from the subspace. Uniform Local Derivative Pattern (LDP) (Zhang et al., 2010) acts as a feature for tracking objects in a video. Extensive experimental evaluations on a wide range of benchmark datasets validate the efficiency of RSDT compared to Center Symmetric Spatio Temporal Local Ternary Pattern (CS-STLTP) (Lin et al., 2015) for unconstrained video analytics.


Author(s):  
Meenal Suryakant Vatsaraj ◽  
Rajan Vishnu Parab ◽  
D S Bade

Anomalous behavior detection and localization in videos of the crowded area that is specific from a dominant pattern are obtained. Appearance and motion information are taken into account to robustly identify different kinds of an anomaly considering a wide range of scenes. Our concept based on a histogram of oriented gradients and Markov random field easily captures varying dynamic of the crowded environment.Histogram of oriented gradients along with well-known Markov random field will effectively recognize and characterizes each frame of each scene. Anomaly detection using artificial neural network consist both appearance and motion features which extract within spatio temporal domain of moving pixels that ensures robustness to local noise and thus increases accuracy in detection of a local anomaly with low computational cost.To extract a region of interest we have to subtract background. Background subtraction is done by various methods like Weighted moving mean, Gaussian mixture model, Kernel density estimation. 


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 97949-97958 ◽  
Author(s):  
Yizhong Yang ◽  
Tao Zhang ◽  
Jinzhao Hu ◽  
Dong Xu ◽  
Guangjun Xie

2019 ◽  
Vol 102 (9) ◽  
pp. 19-26 ◽  
Author(s):  
Yutaka Suzuki ◽  
Kyosuke Hatsushika ◽  
Keisuke Masuyama ◽  
Osamu Sakata ◽  
Morimasa Tanimoto ◽  
...  

2020 ◽  
Vol 17 (4) ◽  
pp. 497-506
Author(s):  
Sunil Patel ◽  
Ramji Makwana

Automatic classification of dynamic hand gesture is challenging due to the large diversity in a different class of gesture, Low resolution, and it is performed by finger. Due to a number of challenges many researchers focus on this area. Recently deep neural network can be used for implicit feature extraction and Soft Max layer is used for classification. In this paper, we propose a method based on a two-dimensional convolutional neural network that performs detection and classification of hand gesture simultaneously from multimodal Red, Green, Blue, Depth (RGBD) and Optical flow Data and passes this feature to Long-Short Term Memory (LSTM) recurrent network for frame-to-frame probability generation with Connectionist Temporal Classification (CTC) network for loss calculation. We have calculated an optical flow from Red, Green, Blue (RGB) data for getting proper motion information present in the video. CTC model is used to efficiently evaluate all possible alignment of hand gesture via dynamic programming and check consistency via frame-to-frame for the visual similarity of hand gesture in the unsegmented input stream. CTC network finds the most probable sequence of a frame for a class of gesture. The frame with the highest probability value is selected from the CTC network by max decoding. This entire CTC network is trained end-to-end with calculating CTC loss for recognition of the gesture. We have used challenging Vision for Intelligent Vehicles and Applications (VIVA) dataset for dynamic hand gesture recognition captured with RGB and Depth data. On this VIVA dataset, our proposed hand gesture recognition technique outperforms competing state-of-the-art algorithms and gets an accuracy of 86%


2011 ◽  
Vol 38 (9) ◽  
pp. 866-871 ◽  
Author(s):  
Zhi-Hua HUANG ◽  
Ming-Hong LI ◽  
Yuan-Ye MA ◽  
Chang-Le ZHOU

Sign in / Sign up

Export Citation Format

Share Document