Effective multiple person recognition in random video sequences using a convolutional neural network

2019 ◽  
Vol 79 (15-16) ◽  
pp. 11125-11141
Author(s):  
Niraimathi Puhalanthi ◽  
Daw-Tung Lin
Author(s):  
Akshay Divkar ◽  
Rushikesh Bailkar ◽  
Dr. Chhaya S. Pawar

Hand gesture is one of the methods used in sign language for non-verbal communication. It is most commonly used by hearing & speech impaired people who have hearing or speech problems to communicate among themselves or with normal people. Developing sign language applications for hearing impaired people can be very important, as hearing & speech impaired people will be able to communicate easily with even those who don’t understand sign language. This project aims at taking the basic step in bridging the communication gap between normal people, deaf and dumb people using sign language. The main focus of this work is to create a vision based system to identify sign language gestures from the video sequences. The reason for choosing a system based on vision relates to the fact that it provides a simpler and more intuitive way of communication between a human and a computer. Video sequences contain both temporal as well as spatial features. In this project, two different models are used to train the temporal as well as spatial features. To train the model on the spatial features of the video sequences a deep Convolutional Neural Network. Convolutional Neural Network was trained on the frames obtained from the video sequences of train data. To train the model on the temporal features Recurrent Neural Network is used. The Trained Convolutional Neural Network model was used to make predictions for individual frames to obtain a sequence of predictions. Now this sequence of prediction outputs was given to the Recurrent Neural Network to train on the temporal features. Collectively both the trained models i.e. Convolutional Neural Network and Recurrent Neural Network will produce the text output of the respective gesture.


Author(s):  
Thiyagarajan Jayaraman ◽  
Gowri Shankar Chinnusamy

This paper presents Deep Rain Streaks Removal Convolutional Neural Network (Derain SRCNN) based post-processing optimization algorithm for High-Efficiency Video Coder (HEVC). Earlier, the CNN-based denoising optimization algorithm faced overfitting issues and large convergence time when training the CNN for rain streaks affected High Definition (HD) video sequences. To address these problems, Deep rain streaks removal CNN-based post-processing block is introduced in HEVC encoder. Derain SRCNN architecture consists of a parallel two residual block layer and Dual Channel Rectification Linear Unit (DCReLU) activation function with various sizes of the convolutional layer. By reducing the validate error and training the error of CNN, the overfitting issue is solved. Also, convergence time is reduced using proper learning rate and kernel weight of optimization algorithm. The proposed network provides a higher bit rate reduction and higher convergence speed for corrupted high-definition video sequences. The experiment result shows that proposed DerainSRCNN-based post-processing filtering method achieves 6.8% and 4.1% -bit rate reduction for random access (RA) and low delay [Formula: see text] frame (LDP) configuration, respectively.


2020 ◽  
Vol 16 (6) ◽  
pp. 155014772093473 ◽  
Author(s):  
Misbah Ahmad ◽  
Imran Ahmed ◽  
Fakhri Alam Khan ◽  
Fawad Qayum ◽  
Hanan Aljuaid

In video surveillance, person tracking is considered as challenging task. Numerous computer vision, machine and deep learning–based techniques have been developed in recent years. Majority of these techniques are based on frontal view images/video sequences. The advancement of convolutional neural network reforms the way of object tracking. The network layers of convolutional neural network models trained on a number of images or video sequences improve speed and accuracy of object tracking. In this work, the generalization performance of existing pre-trained deep learning models have investigated for overhead view person detection and tracking, under different experimental conditions. The object tracking method Generic Object Tracking Using Regression Networks (GOTURN) which has been yielding outstanding tracking results in recent years is explored for person tracking using overhead views. This work mainly focused on overhead view person tracking using Faster region convolutional neural network (Faster-RCNN) in combination with GOTURN architecture. In this way, the person is first identified in overhead view video sequences and then tracked using a GOTURN tracking algorithm. Faster-RCNN detection model achieved the true detection rate ranging from 90% to 93% with a minimum false detection rate up to 0.5%. The GOTURN tracking algorithm achieved similar results with the success rate ranging from 90% to 94%. Finally, the discussion is made on output results along with future direction.


2021 ◽  
Vol 24 (4) ◽  
pp. 57-75
Author(s):  
M. Yu. Uzdiaev ◽  
R. N. Iakovlev ◽  
D. M. Dudarenko ◽  
A. D. Zhebrun

Purpose of research. The given paper considers the problem of identifying a person by gait through the use of neural network recognition models focused on working with RGB images. The main advantage of using neural network models over existing methods of motor activity analysis is obtaining images from the video stream without frames preprocessing, which increases the analysis time. Methods. The present paper presents an approach to identifying a person by gait. The approach is based upon the idea of multi-class classification on video sequences. The quality of the developed approach operation was evaluated on the basis of CASIA Gait Database data set, which includes more than 15,000 video sequences. As classifiers, 5 neural network architectures have been tested: the three-dimensional convolutional neural network I3D, as well as 4 architectures representing convolutional-recurrent networks, such as unidirectional and bidirectional LTSM, unidirectional and bidirectional GRU, combined with the convolutional neural network of ResNet architecture being used in these architectures as a visual feature extractor. Results. According to the results of the conducted testing, the developed approach makes it possible to identify a person in a video stream in real-time mode without the use of specialized equipment. According to the results of its testing and through the use of the neural network models under consideration, the accuracy of human identification was more than 80% for convolutional-recurrent models and 79% for the I3D model. Conclusion. The suggested models based on I3D architecture and convolutional-recurrent architectures have shown higher accuracy for solving the problem of identifying a person by gait than existing methods. Due to the possibility of frame-by-frame video processing, the most preferred classifier for the developed approach is the use of convolutional-recurrent architectures based on unidirectional LSTM or GRU models, respectively.


2020 ◽  
Author(s):  
S Kashin ◽  
D Zavyalov ◽  
A Rusakov ◽  
V Khryashchev ◽  
A Lebedev

Sign in / Sign up

Export Citation Format

Share Document