scholarly journals Feature Channel Expansion and Background Suppression as the Enhancement for Infrared Pedestrian Detection

Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5128
Author(s):  
Shengzhe Wang ◽  
Bo Wang ◽  
Shifeng Wang ◽  
Yifeng Tang

Pedestrian detection is an important task in many intelligent systems, particularly driver assistance systems. Recent studies on pedestrian detection in infrared (IR) imagery have employed data-driven approaches. However, two problems in deep learning-based detection are the implicit performance and time-consuming training. In this paper, a novel channel expansion technique based on feature fusion is proposed to enhance the IR imagery and accelerate the training process. Besides, a novel background suppression method is proposed to stimulate the attention principle of human vision and shrink the region of detection. A precise fusion algorithm is designed to combine the information from different visual saliency maps in order to reduce the effect of truncation and miss detection. Four different experiments are performed from various perspectives in order to gauge the efficiency of our approach. The experimental results show that the Mean Average Precisions (mAPs) of four different datasets have been increased by 5.22% on average. The results prove that background suppression and suitable feature expansion will accelerate the training process and enhance the performance of IR image-based deep learning models.

2020 ◽  
Vol 12 (5) ◽  
pp. 781 ◽  
Author(s):  
Yaochen Liu ◽  
Lili Dong ◽  
Yang Chen ◽  
Wenhai Xu

Infrared and visible image fusion technology provides many benefits for human vision and computer image processing tasks, including enriched useful information and enhanced surveillance capabilities. However, existing fusion algorithms have faced a great challenge to effectively integrate visual features from complex source images. In this paper, we design a novel infrared and visible image fusion algorithm based on visual attention technology, in which a special visual attention system and a feature fusion strategy based on the saliency maps are proposed. Special visual attention system first utilizes the co-occurrence matrix to calculate the image texture complication, which can select a particular modality to compute a saliency map. Moreover, we improved the iterative operator of the original visual attention model (VAM), a fair competition mechanism is designed to ensure that the visual feature in detail regions can be extracted accurately. For the feature fusion strategy, we use the obtained saliency map to combine the visual attention features, and appropriately enhance the tiny features to ensure that the weak targets can be observed. Different from the general fusion algorithm, the proposed algorithm not only preserve the interesting region but also contain rich tiny details, which can improve the visual ability of human and computer. Moreover, experimental results in complicated ambient conditions show that the proposed algorithm in this paper outperforms state-of-the-art algorithms in both qualitative and quantitative evaluations, and this study can extend to the field of other-type image fusion.


Sensors ◽  
2020 ◽  
Vol 20 (10) ◽  
pp. 2866 ◽  
Author(s):  
Zile Deng ◽  
Yuanlong Cao ◽  
Xinyu Zhou ◽  
Yugen Yi ◽  
Yirui Jiang ◽  
...  

As the Internet of Things (IoT) is predicted to deal with different problems based on big data, its applications have become increasingly dependent on visual data and deep learning technology, and it is a big challenge to find a suitable method for IoT systems to analyze image data. Traditional deep learning methods have never explicitly taken the color differences of data into account, but from the experience of human vision, colors play differently significant roles in recognizing things. This paper proposes a weight initialization method for deep learning in image recognition problems based on RGB influence proportion, aiming to improve the training process of the learning algorithms. In this paper, we try to extract the RGB proportion and utilize it in the weight initialization process. We conduct several experiments on different datasets to evaluate the effectiveness of our proposal, and it is proven to be effective on small datasets. In addition, as for the access to the RGB influence proportion, we also provide an expedient approach to get the early proportion for the following usage. We assume that the proposed method can be used for IoT sensors to securely analyze complex data in the future.


2021 ◽  
Vol 13 (2) ◽  
pp. 38
Author(s):  
Yao Xu ◽  
Qin Yu

Great achievements have been made in pedestrian detection through deep learning. For detectors based on deep learning, making better use of features has become the key to their detection effect. While current pedestrian detectors have made efforts in feature utilization to improve their detection performance, the feature utilization is still inadequate. To solve the problem of inadequate feature utilization, we proposed the Multi-Level Feature Fusion Module (MFFM) and its Multi-Scale Feature Fusion Unit (MFFU) sub-module, which connect feature maps of the same scale and different scales by using horizontal and vertical connections and shortcut structures. All of these connections are accompanied by weights that can be learned; thus, they can be used as adaptive multi-level and multi-scale feature fusion modules to fuse the best features. Then, we built a complete pedestrian detector, the Adaptive Feature Fusion Detector (AFFDet), which is an anchor-free one-stage pedestrian detector that can make full use of features for detection. As a result, compared with other methods, our method has better performance on the challenging Caltech Pedestrian Detection Benchmark (Caltech) and has quite competitive speed. It is the current state-of-the-art one-stage pedestrian detection method.


2021 ◽  
Vol 12 (1) ◽  
pp. 309
Author(s):  
Fei Yan ◽  
Cheng Chen ◽  
Peng Xiao ◽  
Siyu Qi ◽  
Zhiliang Wang ◽  
...  

The human attention mechanism can be understood and simulated by closely associating the saliency prediction task to neuroscience and psychology. Furthermore, saliency prediction is widely used in computer vision and interdisciplinary subjects. In recent years, with the rapid development of deep learning, deep models have made amazing achievements in saliency prediction. Deep learning models can automatically learn features, thus solving many drawbacks of the classic models, such as handcrafted features and task settings, among others. Nevertheless, the deep models still have some limitations, for example in tasks involving multi-modality and semantic understanding. This study focuses on summarizing the relevant achievements in the field of saliency prediction, including the early neurological and psychological mechanisms and the guiding role of classic models, followed by the development process and data comparison of classic and deep saliency prediction models. This study also discusses the relationship between the model and human vision, as well as the factors that cause the semantic gaps, the influences of attention in cognitive research, the limitations of the saliency model, and the emerging applications, to provide new saliency predictions for follow-up work and the necessary help and advice.


Author(s):  
Utkarsha Sagar ◽  
Ravi Raja ◽  
Himanshu Shekhar

2021 ◽  
pp. 1-18
Author(s):  
R.S. Rampriya ◽  
Sabarinathan ◽  
R. Suganya

In the near future, combo of UAV (Unmanned Aerial Vehicle) and computer vision will play a vital role in monitoring the condition of the railroad periodically to ensure passenger safety. The most significant module involved in railroad visual processing is obstacle detection, in which caution is obstacle fallen near track gage inside or outside. This leads to the importance of detecting and segment the railroad as three key regions, such as gage inside, rails, and background. Traditional railroad segmentation methods depend on either manual feature selection or expensive dedicated devices such as Lidar, which is typically less reliable in railroad semantic segmentation. Also, cameras mounted on moving vehicles like a drone can produce high-resolution images, so segmenting precise pixel information from those aerial images has been challenging due to the railroad surroundings chaos. RSNet is a multi-level feature fusion algorithm for segmenting railroad aerial images captured by UAV and proposes an attention-based efficient convolutional encoder for feature extraction, which is robust and computationally efficient and modified residual decoder for segmentation which considers only essential features and produces less overhead with higher performance even in real-time railroad drone imagery. The network is trained and tested on a railroad scenic view segmentation dataset (RSSD), which we have built from real-time UAV images and achieves 0.973 dice coefficient and 0.94 jaccard on test data that exhibits better results compared to the existing approaches like a residual unit and residual squeeze net.


Sign in / Sign up

Export Citation Format

Share Document