Violence Detection Algorithm Based on Local Spatio-temporal Features and Optical Flow

Detection of violent human behavior is necessary for public safety and monitoring. However, it demands constant human observation and attention in human-based surveillance systems, which is a challenging task. Autonomous detection of violent human behavior is therefore essential for continuous uninterrupted video surveillance. In this paper, we propose a novel method for violence detection and localization in videos using the fusion of spatio-temporal features and attention model. The model consists of Fusion Convolutional Neural Network (Fusion-CNN), spatio-temporal attention modules and Bi-directional Convolutional LSTMs (BiConvLSTM). The Fusion-CNN learns both spatial and temporal features by combining multi-level inter-layer features from both RGB and Optical flow input frames. The spatial attention module is used to generate an importance mask to focus on the most important areas of the image frame. The temporal attention part, which is based on BiConvLSTM, identifies the most significant video frames which are related to violent activity. The proposed model can also localize and discriminate prominent regions in both spatial and temporal domains, given the weakly supervised training with only video-level classification labels. Experimental results evaluated on different publicly available benchmarking datasets show the superior performance of the proposed model in comparison with the existing methods. Our model achieves the improved accuracies (ACC) of 89.1%, 99.1% and 98.15% for RWF-2000, HockeyFight and Crowd-Violence datasets, respectively. For CCTV-FIGHTS dataset, we choose the mean average precision (mAp) performance metric and our model obtained 80.7% mAp.

Download Full-text

Violence Detection in Video Using Spatio-Temporal Features

2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images ◽

10.1109/sibgrapi.2010.38 ◽

2010 ◽

Cited By ~ 50

Author(s):

F D M de Souza ◽

G C Cha ◽

Eduardo A do Valle ◽

Arnaldo de A Araujo

Keyword(s):

Violence Detection ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Spatio-temporal feature using optical flow based distribution for violence detection

Pattern Recognition Letters ◽

10.1016/j.patrec.2017.04.015 ◽

2017 ◽

Vol 92 ◽

pp. 62-67 ◽

Cited By ~ 24

Author(s):

Amira Ben Mabrouk ◽

Ezzeddine Zagrouba

Keyword(s):

Optical Flow ◽

Violence Detection ◽

Spatio Temporal ◽

Temporal Feature

Download Full-text

ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence

Electronics ◽

10.3390/electronics10131601 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1601

Author(s):

Fernando J. Rendón-Segador ◽

Juan A. Álvarez-García ◽

Fernando Enríquez ◽

Oscar Deniz

Keyword(s):

Optical Flow ◽

Short Term Memory ◽

Three Dimensions ◽

Test Accuracy ◽

Worst Case ◽

Rating Agencies ◽

Violence Detection ◽

Spatio Temporal ◽

Ablation Study ◽

Future Work

Introducing efficient automatic violence detection in video surveillance or audiovisual content monitoring systems would greatly facilitate the work of closed-circuit television (CCTV) operators, rating agencies or those in charge of monitoring social network content. In this paper we present a new deep learning architecture, using an adapted version of DenseNet for three dimensions, a multi-head self-attention layer and a bidirectional convolutional long short-term memory (LSTM) module, that allows encoding relevant spatio-temporal features, to determine whether a video is violent or not. Furthermore, an ablation study of the input frames, comparing dense optical flow and adjacent frames subtraction and the influence of the attention layer is carried out, showing that the combination of optical flow and the attention mechanism improves results up to 4.4%. The conducted experiments using four of the most widely used datasets for this problem, matching or exceeding in some cases the results of the state of the art, reducing the number of network parameters needed (4.5 millions), and increasing its efficiency in test accuracy (from 95.6% on the most complex dataset to 100% on the simplest one) and inference time (less than 0.3 s for the longest clips). Finally, to check if the generated model is able to generalize violence, a cross-dataset analysis is performed, which shows the complexity of this approach: using three datasets to train and testing on the remaining one the accuracy drops in the worst case to 70.08% and in the best case to 81.51%, which points to future work oriented towards anomaly detection in new datasets.

Download Full-text

Video jitter detection algorithm based on forward-backward optical flow point matching motion entropy

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.02918 ◽

2013 ◽

Vol 33 (10) ◽

pp. 2918-2921 ◽

Cited By ~ 2

Author(s):

Aiwen JIANG ◽

Changhong LIU ◽

Mingwen WANG

Keyword(s):

Optical Flow ◽

Detection Algorithm ◽

Point Matching

Download Full-text

Deep Learning-Based Congestion Detection at Urban Intersections

Sensors ◽

10.3390/s21062052 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2052

Author(s):

Xinghai Yang ◽

Fengjiao Wang ◽

Zhiquan Bai ◽

Feifei Xun ◽

Yulin Zhang ◽

...

Keyword(s):

Deep Learning ◽

Optical Flow ◽

Traffic Congestion ◽

Detection Algorithm ◽

Input Image ◽

Vehicle Speed ◽

Position Information ◽

Traffic State ◽

State Discrimination ◽

Discrimination Method

In this paper, a deep learning-based traffic state discrimination method is proposed to detect traffic congestion at urban intersections. The detection algorithm includes two parts, global speed detection and a traffic state discrimination algorithm. Firstly, the region of interest (ROI) is selected as the road intersection from the input image of the You Only Look Once (YOLO) v3 object detection algorithm for vehicle target detection. The Lucas-Kanade (LK) optical flow method is employed to calculate the vehicle speed. Then, the corresponding intersection state can be obtained based on the vehicle speed and the discrimination algorithm. The detection of the vehicle takes the position information obtained by YOLOv3 as the input of the LK optical flow algorithm and forms an optical flow vector to complete the vehicle speed detection. Experimental results show that the detection algorithm can detect the vehicle speed and traffic state discrimination method can judge the traffic state accurately, which has a strong anti-interference ability and meets the practical application requirements.

Download Full-text

Fire Detection Based on Fractal Analysis and Spatio-Temporal Features

Fire Technology ◽

10.1007/s10694-021-01129-7 ◽

2021 ◽

Author(s):

Monir Torabian ◽

Hossein Pourghassem ◽

Homayoun Mahdavi-Nasab

Keyword(s):

Fractal Analysis ◽

Fire Detection ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6699 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10713-10720

Author(s):

Mingyu Ding ◽

Zhe Wang ◽

Bolei Zhou ◽

Jianping Shi ◽

Zhiwu Lu ◽

...

Keyword(s):

Optical Flow ◽

Video Segmentation ◽

Video Clip ◽

Semantic Segmentation ◽

Temporal Consistency ◽

Flow Estimation ◽

Optical Flow Estimation ◽

Optical Flows ◽

Benchmark Datasets ◽

Spatio Temporal

A major challenge for video semantic segmentation is the lack of labeled data. In most benchmark datasets, only one frame of a video clip is annotated, which makes most supervised methods fail to utilize information from the rest of the frames. To exploit the spatio-temporal information in videos, many previous works use pre-computed optical flows, which encode the temporal consistency to improve the video segmentation. However, the video segmentation and optical flow estimation are still considered as two separate tasks. In this paper, we propose a novel framework for joint video semantic segmentation and optical flow estimation. Semantic segmentation brings semantic information to handle occlusion for more robust optical flow estimation, while the non-occluded optical flow provides accurate pixel-level temporal correspondences to guarantee the temporal consistency of the segmentation. Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference. Extensive experiments show that the proposed model makes the video semantic segmentation and optical flow estimation benefit from each other and outperforms existing methods under the same settings in both tasks.

Download Full-text

Spatio-temporal Image Tracking Based on Optical Flow and Clustering: An Endoneurosonographic Application

Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-642-16761-4_26 ◽

2010 ◽

pp. 290-300 ◽

Cited By ~ 2

Author(s):

Andrés F. Serna-Morales ◽

Flavio Prieto ◽

Eduardo Bayro-Corrochano

Keyword(s):

Optical Flow ◽

Image Tracking ◽

Spatio Temporal

Download Full-text