Pedestrian Behavior Recognition Based on Improved Dual-stream Network with Differential Feature in Surveillance Video

In order to improve the pedestrian behavior recognition accuracy of video sequences in complex background, an improved spatial-temporal two-stream network is proposed in this paper. Firstly, the deep differential network is used to replace the temporal-stream network so as to improve the representation ability and extraction efficiency of spatiotemporal features. Then, the improved Softmax loss function based on decision-making level feature fusion mechanism is used to train the model, which can retain the spatiotemporal characteristics of images between different network frames to a greater extent and reflect the action category of pedestrians more realistically. Simulation results show that the proposed improved network achieves 87% recognition accuracy on the self-built infrared dataset, and the computational efficiency is improved by 15.1%.

Download Full-text

Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences

Remote Sensing ◽

10.3390/rs13040628 ◽

2021 ◽

Vol 13 (4) ◽

pp. 628

Author(s):

Liang Ye ◽

Tong Liu ◽

Tian Han ◽

Hany Ferdinando ◽

Tapio Seppänen ◽

...

Keyword(s):

Neural Network ◽

Recognition Accuracy ◽

Role Playing ◽

School Bullying ◽

Image Features ◽

Campus Violence ◽

Surveillance Video ◽

Acoustic Features ◽

Mel Frequency Cepstral Coefficients ◽

Violence Detection

Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and video sequence-based methods. Sensors and surveillance cameras are used to detect campus violence. In this paper, the authors use image features and acoustic features for campus violence detection. Campus violence data are gathered by role-playing, and 4096-dimension feature vectors are extracted from every 16 frames of video images. The C3D (Convolutional 3D) neural network is used for feature extraction and classification, and an average recognition accuracy of 92.00% is achieved. Mel-frequency cepstral coefficients (MFCCs) are extracted as acoustic features, and three speech emotion databases are involved. The C3D neural network is used for classification, and the average recognition accuracies are 88.33%, 95.00%, and 91.67%, respectively. To solve the problem of evidence conflict, the authors propose an improved Dempster–Shafer (D–S) algorithm. Compared with existing D–S theory, the improved algorithm increases the recognition accuracy by 10.79%, and the recognition accuracy can ultimately reach 97.00%.

Download Full-text

Explicit-implicit dual stream network for image quality assessment

EURASIP Journal on Image and Video Processing ◽

10.1186/s13640-020-00538-y ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Guangyi Yang ◽

Xingyu Ding ◽

Tian Huang ◽

Kun Cheng ◽

Weizheng Jin

Keyword(s):

Deep Learning ◽

Image Quality ◽

Quality Assessment ◽

Frequency Domain ◽

Image Quality Assessment ◽

Feature Fusion ◽

Learning Model ◽

Stream Network ◽

Perception System ◽

Deep Learning Model

Abstract Communications industry has remarkably changed with the development of fifth-generation cellular networks. Image, as an indispensable component of communication, has attracted wide attention. Thus, finding a suitable approach to assess image quality is important. Therefore, we propose a deep learning model for image quality assessment (IQA) based on explicit-implicit dual stream network. We use frequency domain features of kurtosis based on wavelet transform to represent explicit features and spatial features extracted by convolutional neural network (CNN) to represent implicit features. Thus, we constructed an explicit-implicit (EI) parallel deep learning model, namely, EI-IQA model. The EI-IQA model is based on the VGGNet that extracts the spatial domain features. On this basis, the number of network layers of VGGNet is reduced by adding the parallel wavelet kurtosis value frequency domain features. Thus, the training parameters and the sample requirements decline. We verified, by cross-validation of different databases, that the wavelet kurtosis feature fusion method based on deep learning has a more complete feature extraction effect and a better generalisation ability. Thus, the method can simulate the human visual perception system better, and subjective feelings become closer to the human eye. The source code about the proposed EI-IQA model is available on github https://github.com/jacob6/EI-IQA.

Download Full-text

Sign Language Recognition Using Two-Stream Convolutional Neural Networks with Wi-Fi Signals

Applied Sciences ◽

10.3390/app10249005 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9005

Author(s):

Chien-Cheng Lee ◽

Zhongjian Gao

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sign Language ◽

Recognition Accuracy ◽

Signal Interference ◽

Stream Networks ◽

Language Recognition ◽

Sign Language Recognition ◽

Stream Network ◽

Value Decomposition

Sign language is an important way for deaf people to understand and communicate with others. Many researchers use Wi-Fi signals to recognize hand and finger gestures in a non-invasive manner. However, Wi-Fi signals usually contain signal interference, background noise, and mixed multipath noise. In this study, Wi-Fi Channel State Information (CSI) is preprocessed by singular value decomposition (SVD) to obtain the essential signals. Sign language includes the positional relationship of gestures in space and the changes of actions over time. We propose a novel dual-output two-stream convolutional neural network. It not only combines the spatial-stream network and the motion-stream network, but also effectively alleviates the backpropagation problem of the two-stream convolutional neural network (CNN) and improves its recognition accuracy. After the two stream networks are fused, an attention mechanism is applied to select the important features learned by the two-stream networks. Our method has been validated by the public dataset SignFi and adopted five-fold cross-validation. Experimental results show that SVD preprocessing can improve the performance of our dual-output two-stream network. For home, lab, and lab + home environment, the average recognition accuracy rates are 99.13%, 96.79%, and 97.08%, respectively. Compared with other methods, our method has good performance and better generalization capability.

Download Full-text

Underwater Sediments Echoes Recognition Based on KECCA + PLS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.310.629 ◽

2013 ◽

Vol 310 ◽

pp. 629-633

Author(s):

Bo Wen Luo ◽

Bu Yan Wan ◽

Wei Bin Qin ◽

Ji Yu Xu

Keyword(s):

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Feature Fusion ◽

Recognition Accuracy ◽

Partial Least Square ◽

Least Square ◽

Classification Model ◽

Nuclear Space ◽

Nonlinear Feature

In order to solve the nonlinear feature fusion of underwater sediments echoes, the shortage of Enhanced Canonical Correlation Analysis (ECCA) was analyzed and made ECCA extend to Kernel ECCA (KECCA) in the nuclear space, a multi-feature nonlinear fusion classification model with KECCA combining with Partial Least-Square (PLS ) was put forward。In the process of identifying four types of underwater sediment such as Basalt, Volcanic breccia, Cobalt crusts and Mudstone, the results showed that the recognition accuracy could be further improved for the KECCA + PLS model.

Download Full-text

Automatic smoky vehicle detection from traffic surveillance video based on vehicle rear detection and multi-feature fusion

IET Intelligent Transport Systems ◽

10.1049/iet-its.2018.5039 ◽

2019 ◽

Vol 13 (2) ◽

pp. 252-259 ◽

Cited By ~ 10

Author(s):

Huanjie Tao ◽

Xiaobo Lu

Keyword(s):

Feature Fusion ◽

Vehicle Detection ◽

Surveillance Video ◽

Traffic Surveillance ◽

Smoky Vehicle Detection

Download Full-text

Human Falling Recognition Based on Movement Energy Expenditure Feature

Discrete Dynamics in Nature and Society ◽

10.1155/2021/1422586 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Daohua Pan ◽

Hongwei Liu

Keyword(s):

Energy Expenditure ◽

Recognition Accuracy ◽

Human Movement ◽

The Elderly ◽

Detection Algorithm ◽

Body Tilt ◽

Behavior Recognition ◽

Daily Lives ◽

Threshold Detection ◽

New Feature

Falls in the elderly are a common phenomenon in daily life, which causes serious injuries and even death. Human activity recognition methods with wearable sensor signals as input have been proposed to improve the accuracy and automation of daily falling recognition. In order not to affect the normal life behavior of the elderly, to make full use of the functions provided by the smartphone, to reduce the inconvenience caused by wearing sensor devices, and to reduce the cost of monitoring systems, the accelerometer and gyroscope integrated inside the smartphone are employed to collect the behavioral data of the elderly in their daily lives, and the threshold analysis method is used to study the human falling behavior recognition. Based on this, a three-level threshold detection algorithm for human fall behavior recognition is proposed by introducing human movement energy expenditure as a new feature. The algorithm integrates the changes of human movement energy expenditure, combined acceleration, and body tilt angle in the process of falling, which alleviates the problem of misjudgment caused by using only the threshold information of acceleration or (and) angle change to discriminate falls and improves the recognition accuracy. The recognition accuracy of this algorithm is verified by experiments to reach 95.42%. The APP is also devised to realize the timely detection of fall behavior and send alarms automatically.

Download Full-text

Research on Recognition Method of COVID-19 Images Based on Deep Learning

10.1101/2020.12.09.20246371 ◽

2020 ◽

Author(s):

dongshen ji ◽

yanzhong zhao ◽

zhujun zhang ◽

qianchuan zhao

Keyword(s):

Deep Learning ◽

Image Recognition ◽

Feature Fusion ◽

Recognition Accuracy ◽

Recognition Performance ◽

Small Sample ◽

Experimental Results ◽

Ct Image ◽

Recognition Method ◽

Sample Recognition

In view of the large demand for new coronary pneumonia covid19 image recognition samples,the recognition accuracy is not ideal.In this paper,a new coronary pneumonia positive image recognition method proposed based on small sample recognition. First, the CT image pictures are preprocessed, and the pictures are converted into the picture formats which are required for transfer learning. Secondly, perform small-sample image enhancement and expansion on the converted picture, such as miscut transformation, random rotation and translation, etc.. Then, multiple migration models are used to extract features and then perform feature fusion. Finally,the model is adjusted by fine-tuning.Then train the model to obtain experimental results. The experimental results show that our method has excellent recognition performance in the recognition of new coronary pneumonia images,even with only a small number of CT image samples.

Download Full-text

Convolutional Two-Stream Network Using Multi-Facial Feature Fusion for Driver Fatigue Detection

Future Internet ◽

10.3390/fi11050115 ◽

2019 ◽

Vol 11 (5) ◽

pp. 115 ◽

Cited By ~ 12

Author(s):

Weihuang Liu ◽

Jinhao Qian ◽

Zengwei Yao ◽

Xintao Jiao ◽

Jiahui Pan

Keyword(s):

Traffic Accidents ◽

Road Traffic ◽

Feature Fusion ◽

Facial Feature ◽

Detection Algorithm ◽

Facial Features ◽

Dynamic Features ◽

Driver Fatigue ◽

Stream Network ◽

Fatigue Detection

Road traffic accidents caused by fatigue driving are common causes of human casualties. In this paper, we present a driver fatigue detection algorithm using two-stream network models with multi-facial features. The algorithm consists of four parts: (1) Positioning mouth and eye with multi-task cascaded convolutional neural networks (MTCNNs). (2) Extracting the static features from a partial facial image. (3) Extracting the dynamic features from a partial facial optical flow. (4) Combining both static and dynamic features using a two-stream neural network to make the classification. The main contribution of this paper is the combination of a two-stream network and multi-facial features for driver fatigue detection. Two-stream networks can combine static and dynamic image information, while partial facial images as network inputs can focus on fatigue-related information, which brings better performance. Moreover, we applied gamma correction to enhance image contrast, which can help our method achieve better results, noted by an increased accuracy of 2% in night environments. Finally, an accuracy of 97.06% was achieved on the National Tsing Hua University Driver Drowsiness Detection (NTHU-DDD) dataset.

Download Full-text

Gait Recognition Method of Underground Coal Mine Personnel Based on Densely Connected Convolution Network and Stacked Convolutional Autoencoder

Entropy ◽

10.3390/e22060695 ◽

2020 ◽

Vol 22 (6) ◽

pp. 695

Author(s):

Xiaoyang Liu ◽

Jinqiang Liu

Keyword(s):

Coal Mine ◽

Feature Fusion ◽

Gait Recognition ◽

Gait Pattern ◽

Similarity Learning ◽

Recognition Method ◽

Stream Network ◽

Underground Coal Mine ◽

Spatiotemporal Information ◽

Convolutional Autoencoder

Biological recognition methods often use biological characteristics such as the human face, iris, fingerprint, and palm print; however, such images often become blurred under the limitation of the complex environment of the underground, which leads to low identification rates of underground coal mine personnel. A gait recognition method via similarity learning named Two-Stream neural network (TS-Net) is proposed based on a densely connected convolution network (DenseNet) and stacked convolutional autoencoder (SCAE). The mainstream network based on DenseNet is mainly used to learn the similarity of dynamic deep features containing spatiotemporal information in the gait pattern. The auxiliary stream network based on SCAE is used to learn the similarity of static invariant features containing physiological information. Moreover, a novel feature fusion method is adopted to achieve the fusion and representation of dynamic and static features. The extracted features are robust to angle, clothing, miner hats, waterproof shoes, and carrying conditions. The method was evaluated on the challenging CASIA-B gait dataset and the collected gait dataset of underground coal mine personnel (UCMP-GAIT). Experimental results show that the method is effective and feasible for the gait recognition of underground coal mine personnel. Besides, compared with other gait recognition methods, the recognition accuracy has been significantly improved.

Download Full-text

Human behavior recognition under occlusion based on two-stream network combined with BiLSTM

2020 Chinese Control And Decision Conference (CCDC) ◽

10.1109/ccdc49329.2020.9164760 ◽

2020 ◽

Author(s):

Chenhao WANG ◽

Yongquan WEI ◽

Dong GUO ◽

Jun GONG

Keyword(s):

Human Behavior ◽

Stream Network ◽

Behavior Recognition ◽

Human Behavior Recognition

Download Full-text