scholarly journals Pedestrian Behavior Recognition Based on Improved Dual-stream Network with Differential Feature in Surveillance Video

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yonghong Tan ◽  
Xuebin Zhou ◽  
Aiwu Chen ◽  
Songqing Zhou

In order to improve the pedestrian behavior recognition accuracy of video sequences in complex background, an improved spatial-temporal two-stream network is proposed in this paper. Firstly, the deep differential network is used to replace the temporal-stream network so as to improve the representation ability and extraction efficiency of spatiotemporal features. Then, the improved Softmax loss function based on decision-making level feature fusion mechanism is used to train the model, which can retain the spatiotemporal characteristics of images between different network frames to a greater extent and reflect the action category of pedestrians more realistically. Simulation results show that the proposed improved network achieves 87% recognition accuracy on the self-built infrared dataset, and the computational efficiency is improved by 15.1%.

2021 ◽  
Vol 13 (4) ◽  
pp. 628
Author(s):  
Liang Ye ◽  
Tong Liu ◽  
Tian Han ◽  
Hany Ferdinando ◽  
Tapio Seppänen ◽  
...  

Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and video sequence-based methods. Sensors and surveillance cameras are used to detect campus violence. In this paper, the authors use image features and acoustic features for campus violence detection. Campus violence data are gathered by role-playing, and 4096-dimension feature vectors are extracted from every 16 frames of video images. The C3D (Convolutional 3D) neural network is used for feature extraction and classification, and an average recognition accuracy of 92.00% is achieved. Mel-frequency cepstral coefficients (MFCCs) are extracted as acoustic features, and three speech emotion databases are involved. The C3D neural network is used for classification, and the average recognition accuracies are 88.33%, 95.00%, and 91.67%, respectively. To solve the problem of evidence conflict, the authors propose an improved Dempster–Shafer (D–S) algorithm. Compared with existing D–S theory, the improved algorithm increases the recognition accuracy by 10.79%, and the recognition accuracy can ultimately reach 97.00%.


2020 ◽  
Vol 2020 (1) ◽  
Author(s):  
Guangyi Yang ◽  
Xingyu Ding ◽  
Tian Huang ◽  
Kun Cheng ◽  
Weizheng Jin

Abstract Communications industry has remarkably changed with the development of fifth-generation cellular networks. Image, as an indispensable component of communication, has attracted wide attention. Thus, finding a suitable approach to assess image quality is important. Therefore, we propose a deep learning model for image quality assessment (IQA) based on explicit-implicit dual stream network. We use frequency domain features of kurtosis based on wavelet transform to represent explicit features and spatial features extracted by convolutional neural network (CNN) to represent implicit features. Thus, we constructed an explicit-implicit (EI) parallel deep learning model, namely, EI-IQA model. The EI-IQA model is based on the VGGNet that extracts the spatial domain features. On this basis, the number of network layers of VGGNet is reduced by adding the parallel wavelet kurtosis value frequency domain features. Thus, the training parameters and the sample requirements decline. We verified, by cross-validation of different databases, that the wavelet kurtosis feature fusion method based on deep learning has a more complete feature extraction effect and a better generalisation ability. Thus, the method can simulate the human visual perception system better, and subjective feelings become closer to the human eye. The source code about the proposed EI-IQA model is available on github https://github.com/jacob6/EI-IQA.


2020 ◽  
Vol 10 (24) ◽  
pp. 9005
Author(s):  
Chien-Cheng Lee ◽  
Zhongjian Gao

Sign language is an important way for deaf people to understand and communicate with others. Many researchers use Wi-Fi signals to recognize hand and finger gestures in a non-invasive manner. However, Wi-Fi signals usually contain signal interference, background noise, and mixed multipath noise. In this study, Wi-Fi Channel State Information (CSI) is preprocessed by singular value decomposition (SVD) to obtain the essential signals. Sign language includes the positional relationship of gestures in space and the changes of actions over time. We propose a novel dual-output two-stream convolutional neural network. It not only combines the spatial-stream network and the motion-stream network, but also effectively alleviates the backpropagation problem of the two-stream convolutional neural network (CNN) and improves its recognition accuracy. After the two stream networks are fused, an attention mechanism is applied to select the important features learned by the two-stream networks. Our method has been validated by the public dataset SignFi and adopted five-fold cross-validation. Experimental results show that SVD preprocessing can improve the performance of our dual-output two-stream network. For home, lab, and lab + home environment, the average recognition accuracy rates are 99.13%, 96.79%, and 97.08%, respectively. Compared with other methods, our method has good performance and better generalization capability.


2013 ◽  
Vol 310 ◽  
pp. 629-633
Author(s):  
Bo Wen Luo ◽  
Bu Yan Wan ◽  
Wei Bin Qin ◽  
Ji Yu Xu

In order to solve the nonlinear feature fusion of underwater sediments echoes, the shortage of Enhanced Canonical Correlation Analysis (ECCA) was analyzed and made ECCA extend to Kernel ECCA (KECCA) in the nuclear space, a multi-feature nonlinear fusion classification model with KECCA combining with Partial Least-Square (PLS ) was put forward。In the process of identifying four types of underwater sediment such as Basalt, Volcanic breccia, Cobalt crusts and Mudstone, the results showed that the recognition accuracy could be further improved for the KECCA + PLS model.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Daohua Pan ◽  
Hongwei Liu

Falls in the elderly are a common phenomenon in daily life, which causes serious injuries and even death. Human activity recognition methods with wearable sensor signals as input have been proposed to improve the accuracy and automation of daily falling recognition. In order not to affect the normal life behavior of the elderly, to make full use of the functions provided by the smartphone, to reduce the inconvenience caused by wearing sensor devices, and to reduce the cost of monitoring systems, the accelerometer and gyroscope integrated inside the smartphone are employed to collect the behavioral data of the elderly in their daily lives, and the threshold analysis method is used to study the human falling behavior recognition. Based on this, a three-level threshold detection algorithm for human fall behavior recognition is proposed by introducing human movement energy expenditure as a new feature. The algorithm integrates the changes of human movement energy expenditure, combined acceleration, and body tilt angle in the process of falling, which alleviates the problem of misjudgment caused by using only the threshold information of acceleration or (and) angle change to discriminate falls and improves the recognition accuracy. The recognition accuracy of this algorithm is verified by experiments to reach 95.42%. The APP is also devised to realize the timely detection of fall behavior and send alarms automatically.


2020 ◽  
Author(s):  
dongshen ji ◽  
yanzhong zhao ◽  
zhujun zhang ◽  
qianchuan zhao

In view of the large demand for new coronary pneumonia covid19 image recognition samples,the recognition accuracy is not ideal.In this paper,a new coronary pneumonia positive image recognition method proposed based on small sample recognition. First, the CT image pictures are preprocessed, and the pictures are converted into the picture formats which are required for transfer learning. Secondly, perform small-sample image enhancement and expansion on the converted picture, such as miscut transformation, random rotation and translation, etc.. Then, multiple migration models are used to extract features and then perform feature fusion. Finally,the model is adjusted by fine-tuning.Then train the model to obtain experimental results. The experimental results show that our method has excellent recognition performance in the recognition of new coronary pneumonia images,even with only a small number of CT image samples.


2019 ◽  
Vol 11 (5) ◽  
pp. 115 ◽  
Author(s):  
Weihuang Liu ◽  
Jinhao Qian ◽  
Zengwei Yao ◽  
Xintao Jiao ◽  
Jiahui Pan

Road traffic accidents caused by fatigue driving are common causes of human casualties. In this paper, we present a driver fatigue detection algorithm using two-stream network models with multi-facial features. The algorithm consists of four parts: (1) Positioning mouth and eye with multi-task cascaded convolutional neural networks (MTCNNs). (2) Extracting the static features from a partial facial image. (3) Extracting the dynamic features from a partial facial optical flow. (4) Combining both static and dynamic features using a two-stream neural network to make the classification. The main contribution of this paper is the combination of a two-stream network and multi-facial features for driver fatigue detection. Two-stream networks can combine static and dynamic image information, while partial facial images as network inputs can focus on fatigue-related information, which brings better performance. Moreover, we applied gamma correction to enhance image contrast, which can help our method achieve better results, noted by an increased accuracy of 2% in night environments. Finally, an accuracy of 97.06% was achieved on the National Tsing Hua University Driver Drowsiness Detection (NTHU-DDD) dataset.


Entropy ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. 695
Author(s):  
Xiaoyang Liu ◽  
Jinqiang Liu

Biological recognition methods often use biological characteristics such as the human face, iris, fingerprint, and palm print; however, such images often become blurred under the limitation of the complex environment of the underground, which leads to low identification rates of underground coal mine personnel. A gait recognition method via similarity learning named Two-Stream neural network (TS-Net) is proposed based on a densely connected convolution network (DenseNet) and stacked convolutional autoencoder (SCAE). The mainstream network based on DenseNet is mainly used to learn the similarity of dynamic deep features containing spatiotemporal information in the gait pattern. The auxiliary stream network based on SCAE is used to learn the similarity of static invariant features containing physiological information. Moreover, a novel feature fusion method is adopted to achieve the fusion and representation of dynamic and static features. The extracted features are robust to angle, clothing, miner hats, waterproof shoes, and carrying conditions. The method was evaluated on the challenging CASIA-B gait dataset and the collected gait dataset of underground coal mine personnel (UCMP-GAIT). Experimental results show that the method is effective and feasible for the gait recognition of underground coal mine personnel. Besides, compared with other gait recognition methods, the recognition accuracy has been significantly improved.


Sign in / Sign up

Export Citation Format

Share Document