scholarly journals Human segmentation in surveillance video with deep learning

Author(s):  
Monica Gruosso ◽  
Nicola Capece ◽  
Ugo Erra
Optik ◽  
2020 ◽  
Vol 202 ◽  
pp. 163675
Author(s):  
Ya-Wen Hsu ◽  
Ting-Yen Wang ◽  
Jau-Woei Perng

Sensors ◽  
2019 ◽  
Vol 19 (17) ◽  
pp. 3768 ◽  
Author(s):  
Kong ◽  
Chen ◽  
Wang ◽  
Chen ◽  
Meng ◽  
...  

Vision-based fall-detection methods have been previously studied but many have limitations in terms of practicality. Due to differences in rooms, users do not set the camera or sensors at the same height. However, few studies have taken this into consideration. Moreover, some fall-detection methods are lacking in terms of practicality because only standing, sitting and falling are taken into account. Hence, this study constructs a data set consisting of various daily activities and fall events and studies the effect of camera/sensor height on fall-detection accuracy. Each activity in the data set is carried out by eight participants in eight directions and taken with the depth camera at five different heights. Many related studies heavily depended on human segmentation by using Kinect SDK but this is not reliable enough. To address this issue, this study proposes Enhanced Tracking and Denoising Alex-Net (ETDA-Net) to improve tracking and denoising performance and classify fall and non-fall events. Experimental results indicate that fall-detection accuracy is affected by camera height, against which ETDA-Net is robust, outperforming traditional deep learning based fall-detection methods.


2021 ◽  
Vol 2 (4) ◽  
Author(s):  
S. Vasavi ◽  
P. Vineela ◽  
S. Venkat Raman

2021 ◽  
Vol 6 (22) ◽  
pp. 60-70
Author(s):  
Bushra Yasmeen ◽  
Haslina Arshad ◽  
Hameedur Rahman

Security has recently been given the highest priority with the rise in the number of antisocial activations taking place. To continuously track individuals and their interactions, CCTVs have been built in several ways. Every person is recorded on an image on average 30 times a day in a developed world with a community of 1.6 billion. The resolution of 710*570 captured at knitting will approximate 20 GB per day. Constant monitoring of human data makes it hard to judge whether the incident is an irregular one, and it is an almost uphill struggle when a population and its full support are needed. In this paper, we make a system for the detection of suspicious activity using CCTV surveillance video. There seems to be a need to demonstrate in which frame the behavior is located as well as which section of it allows the faster judgment of the suspicious activity is unusual. This is done by converting the video into frames and analyzing the persons and their activates from the processed frames. We have accepted wide support from Machine learning and Deep Learning Algorithms to make it possible. To automate that process, first, we need to build a training model using a large number of images (all possible images which describe features of suspicious activities) and a “Convolution Neural Network‟ using the Tensor Flow Python module. We can then upload any video into the application, and it will extract frames from the uploaded video and then that frame will be applied on a training model to predict its class such as suspicious or normal.


2021 ◽  
Vol 3 (7) ◽  
Author(s):  
Wael F. Youssef ◽  
Siba Haidar ◽  
Philippe Joly

AbstractThe purpose of our work is to automatically generate textual video description schemas from surveillance video scenes compatible with police incidents reports. Our proposed approach is based on a generic and flexible context-free ontology. The general schema is of the form [actuator] [action] [over/with] [actuated object] [+ descriptors: distance, speed, etc.]. We focus on scenes containing exactly two objects. Through elaborated steps, we generate a formatted textual description. We try to identify the existence of an interaction between the two objects, including remote interaction which does not involve physical contact and we point out when aggressivity took place in these cases. We use supervised deep learning to classify scenes into interaction or no-interaction classes and then into subclasses. The chosen descriptors used to represent subclasses are keys in surveillance systems that help generate live alerts and facilitate offline investigation.


2021 ◽  
Author(s):  
Delong Qi ◽  
Weijun Tan ◽  
Zhifu Liu ◽  
Qi Yao ◽  
Jingfeng Liu

2020 ◽  
Author(s):  
Yu Zhao ◽  
Yue Yin ◽  
Guan Gui

Decentralized edge computing techniques have been attracted strongly attentions in many applications of intelligent internet of things (IIoT). Among these applications, intelligent edge surveillance (LEDS) techniques play a very important role to recognize object feature information automatically from surveillance video by virtue of edge computing together with image processing and computer vision. Traditional centralized surveillance techniques recognize objects at the cost of high latency, high cost and also require high occupied storage. In this paper, we propose a deep learning-based LEDS technique for a specific IIoT application. First, we introduce depthwise separable convolutional to build a lightweight neural network to reduce its computational cost. Second, we combine edge computing with cloud computing to reduce network traffic. Third, we apply the proposed LEDS technique into the practical construction site for the validation of a specific IIoT application. The detection speed of our proposed lightweight neural network reaches 16 frames per second in edge devices. After cloud server fine detection, the precision of the detection reaches 89\%. In addition, the operating cost at the edge device is only one-tenth of that of the centralized server.


2020 ◽  
Vol 53 (5-6) ◽  
pp. 796-806
Author(s):  
Hongchang Li ◽  
Jing Wang ◽  
Jianjun Han ◽  
Jinmin Zhang ◽  
Yushan Yang ◽  
...  

Violent interaction detection is a hot topic in computer vision. However, the recent research works on violent interaction detection mainly focus on the traditional hand-craft features, and does not make full use of the research results of deep learning in computer vision. In this paper, we propose a new robust violent interaction detection framework based on multi-stream deep learning in surveillance scene. The proposed approach enhances the recognition performance of violent action in video by fusing three different streams: attention-based spatial RGB stream, temporal stream, and local spatial stream. The attention-based spatial RGB stream learns the spatial attention regions of persons that have high probability to be action region through soft-attention mechanism. The temporal stream employs optical flow as input to extract temporal features. The local spatial stream learns spatial local features using block images as input. Experimental results demonstrate the effectiveness and reliability of the proposed method on three violent interactive datasets: hockey fights, movies, violent interaction. We also verify the proposed method on our own elevator surveillance video dataset and the performance of the proposed method is satisfied.


Sign in / Sign up

Export Citation Format

Share Document