scholarly journals Head Detection Based on DR Feature Extraction Network and Mixed Dilated Convolution Module

Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1565
Author(s):  
Junwen Liu ◽  
Yongjun Zhang ◽  
Jianbin Xie ◽  
Yan Wei ◽  
Zewei Wang ◽  
...  

Pedestrian detection for complex scenes suffers from pedestrian occlusion issues, such as occlusions between pedestrians. As well-known, compared with the variability of the human body, the shape of a human head and their shoulders changes minimally and has high stability. Therefore, head detection is an important research area in the field of pedestrian detection. The translational invariance of neural network enables us to design a deep convolutional neural network, which means that, even if the appearance and location of the target changes, it can still be recognized effectively. However, the problems of scale invariance and high miss detection rates for small targets still exist. In this paper, a feature extraction network DR-Net based on Darknet-53 is proposed to improve the information transmission rate between convolutional layers and to extract more semantic information. In addition, the MDC (mixed dilated convolution) with different sampling rates of dilated convolution is embedded to improve the detection rate of small targets. We evaluated our method on three publicly available datasets and achieved excellent results. The AP (Average Precision) value on the Brainwash dataset, HollywoodHeads dataset, and SCUT-HEAD dataset reached 92.1%, 84.8%, and 90% respectively.

2021 ◽  
Vol 2078 (1) ◽  
pp. 012008
Author(s):  
Hui Liu ◽  
Keyang Cheng

Abstract Aiming at the problem of false detection and missed detection of small targets and occluded targets in the process of pedestrian detection, a pedestrian detection algorithm based on improved multi-scale feature fusion is proposed. First, for the YOLOv4 multi-scale feature fusion module PANet, which does not consider the interaction relationship between scales, PANet is improved to reduce the semantic gap between scales, and the attention mechanism is introduced to learn the importance of different layers to strengthen feature fusion; then, dilated convolution is introduced. Dilated convolution reduces the problem of information loss during the downsampling process; finally, the K-means clustering algorithm is used to redesign the anchor box and modify the loss function to detect a single category. The experimental results show that the improved pedestrian detection algorithm in the INRIA and WiderPerson data sets under different congestion conditions, the AP reaches 96.83% and 59.67%, respectively. Compared with the pedestrian detection results of the YOLOv4 model, the algorithm improves by 2.41% and 1.03%, respectively. The problem of false detection and missed detection of small targets and occlusion has been significantly improved.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5848
Author(s):  
Mohamed Chouai ◽  
Petr Dolezel ◽  
Dominik Stursa ◽  
Zdenek Nemec

In the field of computer vision, object detection consists of automatically finding objects in images by giving their positions. The most common fields of application are safety systems (pedestrian detection, identification of behavior) and control systems. Another important application is head/person detection, which is the primary material for road safety, rescue, surveillance, etc. In this study, we developed a new approach based on two parallel Deeplapv3+ to improve the performance of the person detection system. For the implementation of our semantic segmentation model, a working methodology with two types of ground truths extracted from the bounding boxes given by the original ground truths was established. The approach has been implemented in our two private datasets as well as in a public dataset. To show the performance of the proposed system, a comparative analysis was carried out on two deep learning semantic segmentation state-of-art models: SegNet and U-Net. By achieving 99.14% of global accuracy, the result demonstrated that the developed strategy could be an efficient way to build a deep neural network model for semantic segmentation. This strategy can be used, not only for the detection of the human head but also be applied in several semantic segmentation applications.


2020 ◽  
Vol 30 (1) ◽  
pp. 209-223
Author(s):  
Zhuhe Wang ◽  
Nan Li ◽  
Tao Wu ◽  
Haoxuan Zhang ◽  
Tao Feng

Abstract In recent years, more and more people are applying Convolutional Neural Networks to the study of sound signals. The main reason is the translational invariance of convolution in time and space. Thereby the diversity of the sound signal can be overcome. However, in terms of sound direction recognition, there are also problems such as a microphone matrix being too large, and feature selection. This paper proposes a sound direction recognition using a simulated human head with microphones at both ears. Theoretically, the two microphones cannot distinguish the front and rear directions. However, we use the original data of the two channels as the input of the convolutional neural network, and the resolution effect can reach more than 0.9. For comparison, we also chose the delay feature (GCC) for sound direction recognition. Finally, we also conducted experiments that used probability distributions to identify more directions.


Processes ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 919
Author(s):  
Wanlu Jiang ◽  
Chenyang Wang ◽  
Jiayun Zou ◽  
Shuqing Zhang

The field of mechanical fault diagnosis has entered the era of “big data”. However, existing diagnostic algorithms, relying on artificial feature extraction and expert knowledge are of poor extraction ability and lack self-adaptability in the mass data. In the fault diagnosis of rotating machinery, due to the accidental occurrence of equipment faults, the proportion of fault samples is small, the samples are imbalanced, and available data are scarce, which leads to the low accuracy rate of the intelligent diagnosis model trained to identify the equipment state. To solve the above problems, an end-to-end diagnosis model is first proposed, which is an intelligent fault diagnosis method based on one-dimensional convolutional neural network (1D-CNN). That is to say, the original vibration signal is directly input into the model for identification. After that, through combining the convolutional neural network with the generative adversarial networks, a data expansion method based on the one-dimensional deep convolutional generative adversarial networks (1D-DCGAN) is constructed to generate small sample size fault samples and construct the balanced data set. Meanwhile, in order to solve the problem that the network is difficult to optimize, gradient penalty and Wasserstein distance are introduced. Through the test of bearing database and hydraulic pump, it shows that the one-dimensional convolution operation has strong feature extraction ability for vibration signals. The proposed method is very accurate for fault diagnosis of the two kinds of equipment, and high-quality expansion of the original data can be achieved.


2021 ◽  
pp. 1-10
Author(s):  
Chien-Cheng Leea ◽  
Zhongjian Gao ◽  
Xiu-Chi Huanga

This paper proposes a Wi-Fi-based indoor human detection system using a deep convolutional neural network. The system detects different human states in various situations, including different environments and propagation paths. The main improvements proposed by the system is that there is no cameras overhead and no sensors are mounted. This system captures useful amplitude information from the channel state information and converts this information into an image-like two-dimensional matrix. Next, the two-dimensional matrix is used as an input to a deep convolutional neural network (CNN) to distinguish human states. In this work, a deep residual network (ResNet) architecture is used to perform human state classification with hierarchical topological feature extraction. Several combinations of datasets for different environments and propagation paths are used in this study. ResNet’s powerful inference simplifies feature extraction and improves the accuracy of human state classification. The experimental results show that the fine-tuned ResNet-18 model has good performance in indoor human detection, including people not present, people still, and people moving. Compared with traditional machine learning using handcrafted features, this method is simple and effective.


Sign in / Sign up

Export Citation Format

Share Document