Multiscale Feature Learning Based on Enhanced Feature Pyramid for Vehicle Detection

Vehicle detection is a crucial task in autonomous driving systems. Due to large variance of scales and heavy occlusion of vehicle in an image, this task is still a challenging problem. Recent vehicle detection methods typically exploit feature pyramid to detect vehicles at different scales. However, the drawbacks in the design prevent the multiscale features from being completely exploited. This paper introduces a feature pyramid architecture to address this problem. In the proposed architecture, an improving region proposal network is designed to generate intermediate feature maps which are then used to add more discriminative representations to feature maps generated by the backbone network, as well as improving the computational cost of the network. To generate more discriminative feature representations, this paper introduces multilayer enhancement module to reweight feature representations of feature maps generated by the backbone network to increase the discrimination of foreground objects and background regions in each feature map. In addition, an adaptive RoI pooling module is proposed to pool features from all pyramid levels for each proposal and fuse them for the detection network. Experimental results on the KITTI vehicle detection benchmark and the PASCAL VOC 2007 car dataset show that the proposed approach obtains better detection performance compared with recent methods on vehicle detection.

Download Full-text

An improved efficient model for structure-aware lane detection of unmanned vehicles

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407021993673 ◽

2021 ◽

pp. 095440702199367

Author(s):

Zezheng Lv ◽

Xiaoci Huang ◽

Yaozhong Liang ◽

Wenguan Cao ◽

Yuxiang Chong

Keyword(s):

Computational Cost ◽

Autonomous Driving ◽

Unmanned Vehicles ◽

Lane Detection ◽

Linear Transformations ◽

Feature Maps ◽

Backbone Networks ◽

Detection Algorithms ◽

Backbone Network ◽

Structural Loss

Lane detection algorithms require extremely low computational costs as an important part of autonomous driving. Due to heavy backbone networks, algorithms based on pixel-wise segmentation is struggling to handle the problem of runtime consumption in the recognition of lanes. In this paper, a novel and practical methodology based on lightweight Segmentation Network is proposed, which aims to achieve accurate and efficient lane detection. Different with traditional convolutional layers, the proposed Shadow module can reduce the computational cost of the backbone network by performing linear transformations on intrinsic feature maps. Thus a lightweight backbone network Shadow-VGG-16 is built. After that, a tailored pyramid parsing module is introduced to collect different sub-domain features, which is composed of both a strip pool module based on Pyramid Scene Parsing Network (PSPNet) and a convolution attention module. Finally, a lane structural loss is proposed to explicitly model the lane structure and reduce the influence of noise, so that the pixels can fit the lane better. Extensive experimental results demonstrate that the performance of our method is significantly better than the state-of-the-art (SOTA) algorithms such as Pointlanenet and Line-CNN et al. 95.28% and 90.06% accuracy and 62.5 frames per second (fps) inference speed can be achieved on the CULane and Tusimple test dataset. Compared with the latest ERFNet, Line-CNN, SAD, F1 scores have respectively increased by 3.51%, 2.84%, and 3.82%. Meanwhile, the result from our dataset exceeds the top performances of the other by 8.6% with an 87.09 F1 score, which demonstrates the superiority of our method.

Download Full-text

Multi-Scale and Occlusion Aware Network for Vehicle Detection and Segmentation on UAV Aerial Images

Remote Sensing ◽

10.3390/rs12111760 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1760 ◽

Cited By ~ 2

Author(s):

Wang Zhang ◽

Chunsheng Liu ◽

Faliang Chang ◽

Ye Song

Keyword(s):

Feature Fusion ◽

Vehicle Detection ◽

Aerial Images ◽

Feature Maps ◽

Vehicle Monitoring ◽

Multi Scale ◽

Adaptive Fusion ◽

Feature Pyramid ◽

Vehicle Information ◽

Vehicle Segmentation

With the advantage of high maneuverability, Unmanned Aerial Vehicles (UAVs) have been widely deployed in vehicle monitoring and controlling. However, processing the images captured by UAV for the extracting vehicle information is hindered by some challenges including arbitrary orientations, huge scale variations and partial occlusion. In seeking to address these challenges, we propose a novel Multi-Scale and Occlusion Aware Network (MSOA-Net) for UAV based vehicle segmentation, which consists of two parts including a Multi-Scale Feature Adaptive Fusion Network (MSFAF-Net) and a Regional Attention based Triple Head Network (RATH-Net). In MSFAF-Net, a self-adaptive feature fusion module is proposed, which can adaptively aggregate hierarchical feature maps from multiple levels to help Feature Pyramid Network (FPN) deal with the scale change of vehicles. The RATH-Net with a self-attention mechanism is proposed to guide the location-sensitive sub-networks to enhance the vehicle of interest and suppress background noise caused by occlusions. In this study, we release a large comprehensive UAV based vehicle segmentation dataset (UVSD), which is the first public dataset for UAV based vehicle detection and segmentation. Experiments are conducted on the challenging UVSD dataset. Experimental results show that the proposed method is efficient in detecting and segmenting vehicles, and outperforms the compared state-of-the-art works.

Download Full-text

PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud

Sensors ◽

10.3390/s21010136 ◽

2020 ◽

Vol 21 (1) ◽

pp. 136

Author(s):

Fangyu Li ◽

Weizheng Jin ◽

Cien Fan ◽

Lian Zou ◽

Qingsheng Chen ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Multi Scale ◽

Backbone Network ◽

3D Object Detection ◽

Different Levels ◽

Fine Branch

3D object detection in LiDAR point clouds has been extensively used in autonomous driving, intelligent robotics, and augmented reality. Although the one-stage 3D detector has satisfactory training and inference speed, there are still some performance problems due to insufficient utilization of bird’s eye view (BEV) information. In this paper, a new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection. Specifically, our proposed backbone network can be divided into a coarse branch and a fine branch. In the coarse branch, we use the pyramidal feature hierarchy (PFH) to generate multi-scale BEV feature maps, which retain the advantages of different levels and serves as the input of the fine branch. In the fine branch, our proposed pyramid splitting and aggregation (PSA) module deeply integrates different levels of multi-scale feature maps, thereby improving the expressive ability of the final features. Extensive experiments on the challenging KITTI-3D benchmark show that our method has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods. Experimental results with average precision (AP) prove the effectiveness of our network.

Download Full-text

MSF-Net: Multi-Scale Feature Learning Network for Classification of Surface Defects of Multifarious Sizes

Sensors ◽

10.3390/s21155125 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5125

Author(s):

Pengcheng Xu ◽

Zhongyuan Guo ◽

Lei Liang ◽

Xiaohang Xu

Keyword(s):

Defect Detection ◽

Surface Defects ◽

Receptive Fields ◽

Feature Learning ◽

Learning Ability ◽

Detection Methods ◽

Feature Maps ◽

Scale Feature ◽

Learning Network ◽

Multi Scale

In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local and small defects is insufficient, resulting in an imbalance of feature expression capabilities. In this paper, a Multi-Scale Feature Learning Network (MSF-Net) based on Dual Module Feature (DMF) extractor is proposed. DMF extractor is mainly composed of optimized Concatenated Rectified Linear Units (CReLUs) and optimized Inception feature extraction modules, which increases the diversity of feature receptive fields while reducing the amount of calculation; the feature maps of the middle layer with different sizes of receptive fields are merged to increase the richness of the receptive fields of the last layer of feature maps; the residual shortcut connections, batch normalization layer and average pooling layer are used to replace the fully connected layer to improve training efficiency, and make the multi-scale feature learning ability more balanced at the same time. Two representative multi-scale defect data sets are used for experiments, and the experimental results verify the advancement and effectiveness of the proposed MSF-Net in the detection of surface defects with multi-scale features.

Download Full-text

Fully Dense Multiscale Fusion Network for Hyperspectral Image Classification

Remote Sensing ◽

10.3390/rs11222718 ◽

2019 ◽

Vol 11 (22) ◽

pp. 2718 ◽

Cited By ~ 5

Author(s):

Zhe Meng ◽

Lingling Li ◽

Licheng Jiao ◽

Zhixi Feng ◽

Xu Tang ◽

...

Keyword(s):

Hyperspectral Image ◽

Multiple Scales ◽

Feature Learning ◽

Classification Performance ◽

Connectivity Pattern ◽

Great Success ◽

Feature Representations ◽

Spatial Features ◽

Discriminative Feature ◽

Correlated Information

The convolutional neural network (CNN) can automatically extract hierarchical feature representations from raw data and has recently achieved great success in the classification of hyperspectral images (HSIs). However, most CNN based methods used in HSI classification neglect adequately utilizing the strong complementary yet correlated information from each convolutional layer and only employ the last convolutional layer features for classification. In this paper, we propose a novel fully dense multiscale fusion network (FDMFN) that takes full advantage of the hierarchical features from all the convolutional layers for HSI classification. In the proposed network, shortcut connections are introduced between any two layers in a feed-forward manner, enabling features learned by each layer to be accessed by all subsequent layers. This fully dense connectivity pattern achieves comprehensive feature reuse and enforces discriminative feature learning. In addition, various spectral-spatial features with multiple scales from all convolutional layers are fused to extract more discriminative features for HSI classification. Experimental results on three widely used hyperspectral scenes demonstrate that the proposed FDMFN can achieve better classification performance in comparison with several state-of-the-art approaches.

Download Full-text

Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds

Sensors ◽

10.3390/s20030704 ◽

2020 ◽

Vol 20 (3) ◽

pp. 704 ◽

Cited By ~ 6

Author(s):

Hongwu Kuang ◽

Bei Wang ◽

Jianping An ◽

Ming Zhang ◽

Zehan Zhang

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Cloud Data ◽

Multi Scale ◽

Feature Pyramid ◽

Point Data ◽

3D Object Detection

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

Download Full-text

A Mountain Summit Recognition Method Based on Improved Faster R-CNN

Complexity ◽

10.1155/2021/8235108 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yueping Kong ◽

Yun Wang ◽

Song Guo ◽

Jiajing Wang

Keyword(s):

Detection Methods ◽

Feature Points ◽

Feature Maps ◽

Recognition Method ◽

Objective Criterion ◽

Detection Algorithms ◽

Digital Elevation ◽

Feature Pyramid ◽

Mountain Summit ◽

Elevation Model

Mountain summits are vital topographic feature points, which are essential for understanding landform processes and their impacts on the environment and ecosystem. Traditional summit detection methods operate on handcrafted features extracted from digital elevation model (DEM) data and apply parametric detection algorithms to locate mountain summits. However, these methods may no longer be effective to achieve desirable recognition results in small summits and suffer from the objective criterion lacking problem. Thus, to address these problems, we propose an improved Faster region-convolutional neural network (R-CNN) to accurately detect the mountain summits from DEM data. Based on Faster R-CNN, the improved network adopts a residual convolution block to replace the traditional part and adds a feature pyramid network (FPN) to fuse the features with adjacent layers to better address the mountain summit detection task. The residual convolution is employed to capture the deep correlation between visual and physical morphological features. The FPN is utilized to integrate the location and semantic information in the extracted feature maps to effectively represent the mountain summit area. The experimental results demonstrate that the proposed network could achieve the highest recall and precision without manually designed summit features and accurately identify small summits.

Download Full-text

Research on Multiscene Vehicle Dataset Based on Improved FCOS Detection Algorithms

Complexity ◽

10.1155/2021/9167116 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Fei Yan ◽

Hui Zhang ◽

Tianyang Zhou ◽

Zhiyong Fan ◽

Jia Liu

Keyword(s):

Vehicle Detection ◽

Autonomous Driving ◽

Intelligent Transportation ◽

Detection Accuracy ◽

Comparative Experiment ◽

Feature Maps ◽

Propagation Process ◽

Detection Algorithms ◽

Balance Principle ◽

Complex Scenes

Whether in intelligent transportation or autonomous driving, vehicle detection is an important part. Vehicle detection still faces many problems, such as inaccurate vehicle detection positioning and low detection accuracy in complex scenes. FCOS as a representative of anchor-free detection algorithms was once a sensation, but now it seems to be slightly insufficient. Based on this situation, we propose an improved FCOS algorithm. The improvements are as follows: (1) we introduce a deformable convolution into the backbone to solve the problem that the receptive field cannot cover the overall goal; (2) we add a bottom-up information path after the FPN of the neck module to reduce the loss of information in the propagation process; (3) we introduce the balance module according to the balance principle, which reduces inconsistent detection of the bbox head caused by the mismatch of variance of different feature maps. To enhance the comparative experiment, we have extracted some of the most recent datasets from UA-DETRAC, COCO, and Pascal VOC. The experimental results show that our method has achieved good results on its dataset.

Download Full-text

Lightweight U-Net for cloud detection of visible and thermal infrared remote sensing images

Optical and Quantum Electronics ◽

10.1007/s11082-020-02500-8 ◽

2020 ◽

Vol 52 (9) ◽

Author(s):

Jiaqiang Zhang ◽

Xiaoyan Li ◽

Liyuan Li ◽

Pengcheng Sun ◽

Xiaofeng Su ◽

...

Keyword(s):

Computational Cost ◽

Thermal Infrared ◽

Detection Methods ◽

Validation Dataset ◽

Model Parameters ◽

Landsat 8 ◽

Cloud Detection ◽

Feature Maps ◽

Thermal Infrared Remote Sensing ◽

Computational Ability

Abstract Accurate and rapid cloud detection is exceedingly significant for improving the downlink efficiency of on-orbit data, especially for the microsatellites with limited power and computational ability. However, the inference speed and large model limit the potential of on-orbit implementation of deep-learning-based cloud detection method. In view of the above problems, this paper proposes a lightweight network based on depthwise separable convolutions to reduce the size of model and computational cost of pixel-wise cloud detection methods. The network achieves lightweight end-to-end cloud detection through extracting feature maps from the images to generate the mask with the obtained maps. For the visible and thermal infrared bands of the Landsat 8 cloud cover assessment validation dataset, the experimental results show that the pixel accuracy of the proposed method for cloud detection is higher than 90%, the inference speed is about 5 times faster than that of U-Net, and the model parameters and floating-point operations are reduced to 12.4% and 12.8% of U-Net, respectively.

Download Full-text

Generating Anchor Boxes Based on Attention Mechanism for Object Detection in Remote Sensing Images

Remote Sensing ◽

10.3390/rs12152416 ◽

2020 ◽

Vol 12 (15) ◽

pp. 2416 ◽

Cited By ~ 1

Author(s):

Zhuangzhuang Tian ◽

Ronghui Zhan ◽

Jiemin Hu ◽

Wei Wang ◽

Zhiqiang He ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Attention Mechanism ◽

Detection Methods ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Feature Maps ◽

Wide Range ◽

Feature Pyramid ◽

Comprehensive Evaluations

Nowadays, object detection methods based on deep learning are applied more and more to the interpretation of optical remote sensing images. However, the complex background and the wide range of object sizes in remote sensing images increase the difficulty of object detection. In this paper, we improve the detection performance by combining the attention information, and generate adaptive anchor boxes based on the attention map. Specifically, the attention mechanism is introduced into the proposed method to enhance the features of the object regions while reducing the influence of the background. The generated attention map is then used to obtain diverse and adaptable anchor boxes using the guided anchoring method. The generated anchor boxes can match better with the scene and the objects, compared with the traditional proposal boxes. Finally, the modulated feature adaptation module is applied to transform the feature maps to adapt to the diverse anchor boxes. Comprehensive evaluations on the DIOR dataset demonstrate the superiority of the proposed method over the state-of-the-art methods, such as RetinaNet, FCOS and CornerNet. The mean average precision of the proposed method is 4.5% higher than the feature pyramid network. In addition, the ablation experiments are also implemented to further analyze the respective influence of different blocks on the performance improvement.

Download Full-text