Early Action Prediction using 3DCNN with LSTM and Bidirectional LSTM

Predicting and identifying suspicious activities before hand is highly beneficial because it results in increased protection in video surveillance cameras’. Detecting and predicting human's action before it is carried out has a variety of uses like autonomous robots, surveillance, and health care. The main focus of the paper is on automated recognition of human actions in surveillance videos. 3DCNN (3 Dimensional Convolutional Neural Network) is based on 3D convolutions, there by capturing the motion information encoded in multiple adjacent frames. The 3DCNN is combined with Long short team memory (LSTM) and Bidirectional LSTM for prediction of abnormal events from past observations of events in video stream. It is observed that 3DCNN with LSTM resulted in increased accuracy compared to 3DCNN with Bidirectional LSTM. The experiments were carried out on UCF crime Dataset.

Download Full-text

Early Action Prediction Using 3DCNN With LSTM and Bidirectional LSTM

SSRN Electronic Journal ◽

10.2139/ssrn.3815963 ◽

2021 ◽

Author(s):

Manju D ◽

Dr. Seetha M. ◽

Dr. Sammulal P.

Keyword(s):

Action Prediction ◽

Early Action ◽

Bidirectional Lstm

Download Full-text

EARLY ACTION PREDICTION USING VGG16 MODEL AND BIDIRECTIONAL LSTM

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v9i1.185 ◽

2021 ◽

Vol 9 (1) ◽

pp. 666-672

Author(s):

Manju D, Dr. Seetha M, Dr. Sammulal P

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Video Surveillance ◽

Early Stage ◽

Early Prediction ◽

Action Prediction ◽

Automated Driving ◽

Time Step ◽

Driving System ◽

Early Action

Action prediction plays a key function, where an expected action needs to be identified before the action is completely performed. Prediction means inferring a potential action until it occurs at its early stage. This paper emphasizes on early action prediction, to predict an action before it occurs. In real time scenarios, the early prediction can be very crucial and has many applications like automated driving system, healthcare, video surveillance and other scenarios where a proactive action is needed before the situation goes out of control. VGG16 model is used for the early action prediction which is a convolutional neural network with 16 layers depth. Besides its capability of classifying objects in the frames, the availability of model weights enhances its capability. The model weights are available freely and preferred to used in different applications or models. The VGG-16 model along with Bidirectional structure of Lstm enables the network to provide both backward and forward information at every time step. The results of the proposed approach increased observation ratio ranging from 0.1 to 1.0 compared with the accuracy of GAN model.

Download Full-text

Bidirectional LSTM with saliency-aware 3D-CNN features for human action recognition

Journal of Engineering Research ◽

10.36909/jer.v9i3a.8383 ◽

2021 ◽

Vol 9 (3A) ◽

Author(s):

Sheeraz Arif ◽

◽

Jing Wang ◽

Adnan Ahmed Siddiqui ◽

Rashid Hussain ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

Temporal Dynamics ◽

Recognition Performance ◽

Research Work ◽

Video Stream ◽

Video Frame ◽

Convolutional Network ◽

Bidirectional Lstm

Deep convolutional neural network (DCNN) and recurrent neural network (RNN) have been proved as an imperious research area in multimedia understanding and obtained remarkable action recognition performance. However, videos contain rich motion information with varying dimensions. Existing recurrent based pipelines fail to capture long-term motion dynamics in videos with various motion scales and complex actions performed by multiple actors. Consideration of contextual and salient features is more important than mapping a video frame into a static video representation. This research work provides a novel pipeline by analyzing and processing the video information using a 3D convolution (C3D) network and newly introduced deep bidirectional LSTM. Like popular two-stream convent, we also introduce a two-stream framework with one modification; that is, we replace the optical flow stream by saliency-aware stream to avoid the computational complexity. First, we generate a saliency-aware video stream by applying the saliency-aware method. Secondly, a two-stream 3D-convolutional network (C3D) is utilized with two different types of streams, i.e., RGB stream and saliency-aware video stream, to collect both spatial and semantic temporal features. Next, a deep bidirectional LSTM network is used to learn sequential deep temporal dynamics. Finally, time-series-pooling-layer and softmax-layers classify human activity and behavior. The introduced system can learn long-term temporal dependencies and can predict complex human actions. Experimental results demonstrate the significant improvement in action recognition accuracy on different benchmark datasets.

Download Full-text

Framework for rare event detection using Artificial Neural Network based context free grammar

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189164 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8463-8475

Author(s):

Palanivel Srinivasan ◽

Manivannan Doraipandian

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Event Detection ◽

Performance Metrics ◽

Rare Events ◽

Rare Event ◽

Video Stream ◽

Context Free Grammar ◽

Artificial Neural ◽

Context Free

Rare event detections are performed using spatial domain and frequency domain-based procedures. Omnipresent surveillance camera footages are increasing exponentially due course the time. Monitoring all the events manually is an insignificant and more time-consuming process. Therefore, an automated rare event detection contrivance is required to make this process manageable. In this work, a Context-Free Grammar (CFG) is developed for detecting rare events from a video stream and Artificial Neural Network (ANN) is used to train CFG. A set of dedicated algorithms are used to perform frame split process, edge detection, background subtraction and convert the processed data into CFG. The developed CFG is converted into nodes and edges to form a graph. The graph is given to the input layer of an ANN to classify normal and rare event classes. Graph derived from CFG using input video stream is used to train ANN Further the performance of developed Artificial Neural Network Based Context-Free Grammar – Rare Event Detection (ACFG-RED) is compared with other existing techniques and performance metrics such as accuracy, precision, sensitivity, recall, average processing time and average processing power are used for performance estimation and analyzed. Better performance metrics values have been observed for the ANN-CFG model compared with other techniques. The developed model will provide a better solution in detecting rare events using video streams.

Download Full-text

Attention-Based Multi-Scale Convolutional Neural Network (A+MCNN) for Multi-Class Classification in Road Images

Sensors ◽

10.3390/s21155137 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5137

Author(s):

Elham Eslami ◽

Hae-Bum Yun

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Contextual Information ◽

Automated Classification ◽

Automated Recognition ◽

Pavement Distress ◽

Multi Scale ◽

Pavement Distresses ◽

Multi Class Classification ◽

Transportation Applications

Automated pavement distress recognition is a key step in smart infrastructure assessment. Advances in deep learning and computer vision have improved the automated recognition of pavement distresses in road surface images. This task remains challenging due to the high variation of defects in shapes and sizes, demanding a better incorporation of contextual information into deep networks. In this paper, we show that an attention-based multi-scale convolutional neural network (A+MCNN) improves the automated classification of common distress and non-distress objects in pavement images by (i) encoding contextual information through multi-scale input tiles and (ii) employing a mid-fusion approach with an attention module for heterogeneous image contexts from different input scales. A+MCNN is trained and tested with four distress classes (crack, crack seal, patch, pothole), five non-distress classes (joint, marker, manhole cover, curbing, shoulder), and two pavement classes (asphalt, concrete). A+MCNN is compared with four deep classifiers that are widely used in transportation applications and a generic CNN classifier (as the control model). The results show that A+MCNN consistently outperforms the baselines by 1∼26% on average in terms of the F-score. A comprehensive discussion is also presented regarding how these classifiers perform differently on different road objects, which has been rarely addressed in the existing literature.

Download Full-text