Research on Multiscene Vehicle Dataset Based on Improved FCOS Detection Algorithms

Whether in intelligent transportation or autonomous driving, vehicle detection is an important part. Vehicle detection still faces many problems, such as inaccurate vehicle detection positioning and low detection accuracy in complex scenes. FCOS as a representative of anchor-free detection algorithms was once a sensation, but now it seems to be slightly insufficient. Based on this situation, we propose an improved FCOS algorithm. The improvements are as follows: (1) we introduce a deformable convolution into the backbone to solve the problem that the receptive field cannot cover the overall goal; (2) we add a bottom-up information path after the FPN of the neck module to reduce the loss of information in the propagation process; (3) we introduce the balance module according to the balance principle, which reduces inconsistent detection of the bbox head caused by the mismatch of variance of different feature maps. To enhance the comparative experiment, we have extracted some of the most recent datasets from UA-DETRAC, COCO, and Pascal VOC. The experimental results show that our method has achieved good results on its dataset.

Download Full-text

Anchor Generation Optimization and Region of Interest Assignment for Vehicle Detection

Sensors ◽

10.3390/s19051089 ◽

2019 ◽

Vol 19 (5) ◽

pp. 1089 ◽

Cited By ~ 3

Author(s):

Ye Wang ◽

Zhenyi Liu ◽

Weiwen Deng

Keyword(s):

Pedestrian Detection ◽

Region Of Interest ◽

Vehicle Detection ◽

Detection Accuracy ◽

Fixed Size ◽

Feature Maps ◽

Feature Map ◽

Bounding Box ◽

New Feature ◽

And Training

Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.

Download Full-text

An improved efficient model for structure-aware lane detection of unmanned vehicles

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407021993673 ◽

2021 ◽

pp. 095440702199367

Author(s):

Zezheng Lv ◽

Xiaoci Huang ◽

Yaozhong Liang ◽

Wenguan Cao ◽

Yuxiang Chong

Keyword(s):

Computational Cost ◽

Autonomous Driving ◽

Unmanned Vehicles ◽

Lane Detection ◽

Linear Transformations ◽

Feature Maps ◽

Backbone Networks ◽

Detection Algorithms ◽

Backbone Network ◽

Structural Loss

Lane detection algorithms require extremely low computational costs as an important part of autonomous driving. Due to heavy backbone networks, algorithms based on pixel-wise segmentation is struggling to handle the problem of runtime consumption in the recognition of lanes. In this paper, a novel and practical methodology based on lightweight Segmentation Network is proposed, which aims to achieve accurate and efficient lane detection. Different with traditional convolutional layers, the proposed Shadow module can reduce the computational cost of the backbone network by performing linear transformations on intrinsic feature maps. Thus a lightweight backbone network Shadow-VGG-16 is built. After that, a tailored pyramid parsing module is introduced to collect different sub-domain features, which is composed of both a strip pool module based on Pyramid Scene Parsing Network (PSPNet) and a convolution attention module. Finally, a lane structural loss is proposed to explicitly model the lane structure and reduce the influence of noise, so that the pixels can fit the lane better. Extensive experimental results demonstrate that the performance of our method is significantly better than the state-of-the-art (SOTA) algorithms such as Pointlanenet and Line-CNN et al. 95.28% and 90.06% accuracy and 62.5 frames per second (fps) inference speed can be achieved on the CULane and Tusimple test dataset. Compared with the latest ERFNet, Line-CNN, SAD, F1 scores have respectively increased by 3.51%, 2.84%, and 3.82%. Meanwhile, the result from our dataset exceeds the top performances of the other by 8.6% with an 87.09 F1 score, which demonstrates the superiority of our method.

Download Full-text

Realtime On-Road Vehicle Detection Approach Based on Cascaded Structure

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.130-134.2429 ◽

2011 ◽

Vol 130-134 ◽

pp. 2429-2432

Author(s):

Liang Xiu Zhang ◽

Xu Yun Qiu ◽

Zhu Lin Zhang ◽

Yu Lin Wang

Keyword(s):

Warning System ◽

Vehicle Detection ◽

Autonomous Driving ◽

Detection Algorithm ◽

Detection Accuracy ◽

Driver Assistance ◽

Road Vehicle ◽

Collision Warning ◽

Detection Approach ◽

Transportation Applications

Realtime on-road vehicle detection is a key technology in many transportation applications, such as driver assistance, autonomous driving and active safety. A vehicle detection algorithm based on cascaded structure is introduced. Haar-like features are used to built model in this application, and GAB algorithm is chosen to train the strong classifiers. Then, the real-time on-road vehicle classifier based on cascaded structure is constructed by combining the strong classifiers. Experimental results show that the cascaded classifier is excellent in both detection accuracy and computational efficiency, which ensures its application to collision warning system.

Download Full-text

Separable reverse connected network for efficient multi-scale vehicle detection

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419870678 ◽

2019 ◽

Vol 16 (4) ◽

pp. 172988141987067

Author(s):

Enze Yang ◽

Linlin Huang ◽

Jian Hu

Keyword(s):

Vehicle Detection ◽

Visual Object ◽

Detection Accuracy ◽

Feature Maps ◽

Compression Technique ◽

Connected Network ◽

Multi Scale ◽

Model Compression ◽

Training Scheme ◽

Wide Range

Vehicle detection is involved in a wide range of intelligent transportation and smart city applications, and the demand of fast and accurate detection of vehicles is increasing. In this article, we propose a convolutional neural network-based framework, called separable reverse connected network, for multi-scale vehicles detection. In this network, reverse connected structure enriches the semantic context information of previous layers, while separable convolution is introduced for sparse representation of heavy feature maps generated from subnetworks. Further, we use multi-scale training scheme, online hard example mining, and model compression technique to accelerate the training process as well as reduce the parameters. Experimental results on Pascal Visual Object Classes (VOC) 2007 + 2012 and MicroSoft Common Objects in COntext (MS COCO) 2014 demonstrate the proposed method yields state-of-the-art performance. Moreover, by separable convolution and model compression, the network of two-stage detector is accelerated by about two times with little loss of detection accuracy.

Download Full-text

Multimodal Multiobject Tracking by Fusing Deep Appearance Features and Motion Information

Complexity ◽

10.1155/2020/8810340 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Liwei Zhang ◽

Jiahong Lai ◽

Zenghui Zhang ◽

Zhen Deng ◽

Bingwei He ◽

...

Keyword(s):

Neural Network ◽

Autonomous Driving ◽

Detection Accuracy ◽

Motion Information ◽

Tracking Accuracy ◽

Tracking Method ◽

Complex Scenes ◽

Successful Match ◽

Single Sensor ◽

Multiobject Tracking

Multiobject Tracking (MOT) is one of the most important abilities of autonomous driving systems. However, most of the existing MOT methods only use a single sensor, such as a camera, which has the problem of insufficient reliability. In this paper, we propose a novel Multiobject Tracking method by fusing deep appearance features and motion information of objects. In this method, the locations of objects are first determined based on a 2D object detector and a 3D object detector. We use the Nonmaximum Suppression (NMS) algorithm to combine the detection results of the two detectors to ensure the detection accuracy in complex scenes. After that, we use Convolutional Neural Network (CNN) to learn the deep appearance features of objects and employ Kalman Filter to obtain the motion information of objects. Finally, the MOT task is achieved by associating the motion information and deep appearance features. A successful match indicates that the object was tracked successfully. A set of experiments on the KITTI Tracking Benchmark shows that the proposed MOT method can effectively perform the MOT task. The Multiobject Tracking Accuracy (MOTA) is up to 76.40% and the Multiobject Tracking Precision (MOTP) is up to 83.50%.

Download Full-text

COMAP: A SYNTHETIC DATASET FOR COLLECTIVE MULTI-AGENT PERCEPTION OF AUTONOMOUS DRIVING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2021-255-2021 ◽

2021 ◽

Vol XLIII-B2-2021 ◽

pp. 255-263

Author(s):

Y. Yuan ◽

M. Sester

Keyword(s):

Synthetic Data ◽

Simulated Data ◽

Vehicle Detection ◽

Ground Truth ◽

Autonomous Driving ◽

Superior Performance ◽

Detection Accuracy ◽

Data Set ◽

Cloud Data ◽

Data Generator

Abstract. Collective perception of connected vehicles can sufficiently increase the safety and reliability of autonomous driving by sharing perception information. However, collecting real experimental data for such scenarios is extremely expensive. Therefore, we built a computational efficient co-simulation synthetic data generator through CARLA and SUMO simulators. The simulated data contain image and point cloud data as well as ground truth for object detection and semantic segmentation tasks. To verify the superior performance gain of collective perception over single-vehicle perception, we conducted experiments of vehicle detection, which is one of the most important perception tasks for autonomous driving, on this data set. A 3D object detector and a Bird’s Eye View (BEV) detector are trained and then test with different configurations of the number of cooperative vehicles and vehicle communication ranges. The experiment results showed that collective perception can not only dramatically increase the overall mean detection accuracy but also the localization accuracy of detected bounding boxes. Besides, a vehicle detection comparison experiment showed that the detection performance drop caused by sensor observation noise can be canceled out by redundant information collected by multiple vehicles.

Download Full-text

Multiscale Feature Learning Based on Enhanced Feature Pyramid for Vehicle Detection

Complexity ◽

10.1155/2021/5555121 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hoanh Nguyen

Keyword(s):

Computational Cost ◽

Feature Learning ◽

Vehicle Detection ◽

Autonomous Driving ◽

Detection Methods ◽

Feature Maps ◽

Feature Representations ◽

Backbone Network ◽

Discriminative Feature ◽

Feature Pyramid

Vehicle detection is a crucial task in autonomous driving systems. Due to large variance of scales and heavy occlusion of vehicle in an image, this task is still a challenging problem. Recent vehicle detection methods typically exploit feature pyramid to detect vehicles at different scales. However, the drawbacks in the design prevent the multiscale features from being completely exploited. This paper introduces a feature pyramid architecture to address this problem. In the proposed architecture, an improving region proposal network is designed to generate intermediate feature maps which are then used to add more discriminative representations to feature maps generated by the backbone network, as well as improving the computational cost of the network. To generate more discriminative feature representations, this paper introduces multilayer enhancement module to reweight feature representations of feature maps generated by the backbone network to increase the discrimination of foreground objects and background regions in each feature map. In addition, an adaptive RoI pooling module is proposed to pool features from all pyramid levels for each proposal and fuse them for the detection network. Experimental results on the KITTI vehicle detection benchmark and the PASCAL VOC 2007 car dataset show that the proposed approach obtains better detection performance compared with recent methods on vehicle detection.

Download Full-text

A deep learning-based ensemble method for helmet-wearing detection

PeerJ Computer Science ◽

10.7717/peerj-cs.311 ◽

2020 ◽

Vol 6 ◽

pp. e311

Author(s):

Zheming Fan ◽

Chengbin Peng ◽

Licun Dai ◽

Feng Cao ◽

Jianyu Qi ◽

...

Keyword(s):

Confidence Score ◽

Ensemble Method ◽

Detection Methods ◽

Detection Accuracy ◽

Construction Sites ◽

Data Set ◽

Real Time Processing ◽

Detection Algorithms ◽

Complex Scenes ◽

And Performance

Recently, object detection methods have developed rapidly and have been widely used in many areas. In many scenarios, helmet wearing detection is very useful, because people are required to wear helmets to protect their safety when they work in construction sites or cycle in the streets. However, for the problem of helmet wearing detection in complex scenes such as construction sites and workshops, the detection accuracy of current approaches still needs to be improved. In this work, we analyze the mechanism and performance of several detection algorithms and identify two feasible base algorithms that have complementary advantages. We use one base algorithm to detect relatively large heads and helmets. Also, we use the other base algorithm to detect relatively small heads, and we add another convolutional neural network to detect whether there is a helmet above each head. Then, we integrate these two base algorithms with an ensemble method. In this method, we first propose an approach to merge information of heads and helmets from the base algorithms, and then propose a linear function to estimate the confidence score of the identified heads and helmets. Experiments on a benchmark data set show that, our approach increases the precision and recall for base algorithms, and the mean Average Precision of our approach is 0.93, which is better than many other approaches. With GPU acceleration, our approach can achieve real-time processing on contemporary computers, which is useful in practice.

Download Full-text

Vehicle and Pedestrian Detection Based on Multi-level Feature Fusion in Autonomous Driving

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200304123323 ◽

2020 ◽

Vol 13 ◽

Author(s):

Chen Guoqiang ◽

Yi Huailong ◽

Mao Zhuangzhuang

Keyword(s):

Autonomous Vehicles ◽

Feature Fusion ◽

Pedestrian Detection ◽

Autonomous Driving ◽

Seasonal Effects ◽

Detection Accuracy ◽

Semantic Features ◽

Feature Maps ◽

Safe Driving ◽

Multi Level

Aims: The factors including light, weather, dynamic objects, seasonal effects and structures bring great challenges for the autonomous driving algorithm in the real world. Autonomous vehicles can detect different object obstacles in complex scenes to ensure safe driving. Background: The ability to detect vehicles and pedestrians is critical to the safe driving of autonomous vehicles. Automated vehicle vision systems must handle extremely wide and challenging scenarios. Objective: The goal of the work is to design a robust detector to detect vehicles and pedestrians. The main contribution is that the Multi-level Feature Fusion Block (MFFB) and the Detector Cascade Block (DCB) are designed. The multi-level feature fusion and multi-step prediction are used which greatly improve the detection object precision. Methods: The paper proposes a vehicle and pedestrian object detector, which is an end-to-end deep convolutional neural network. The key parts of the paper are to design the Multi-level Feature Fusion Block (MFFB) and Detector Cascade Block (DCB). The former combines inherent multi-level features by combining contextual information with useful multi-level features that combine high resolution but low semantics and low resolution but high semantic features. The latter uses multi-step prediction, cascades a series of detectors, and combines predictions of multiple feature maps to handle objects of different sizes. Results: The experiments on the RobotCar dataset and the KITTI dataset show that our algorithm can achieve high precision results through real-time detection. The algorithm achieves 84.61% mAP on the RobotCar dataset and is evaluated on the well-known KITTI benchmark dataset, achieving 81.54% mAP. In particular, the detection accuracy of a single-category vehicle reaches 90.02%. Conclusion: The experimental results show that the proposed algorithm has a good trade-off between detection accuracy and detection speed, which is beyond the current state-of-the-art RefineDet algorithm. The 2D object detector is proposed in the paper, which can solve the problem of vehicle and pedestrian detection and improve the accuracy, robustness and generalization ability in autonomous driving.

Download Full-text

Feature Deep Continuous Aggregation for 3D Vehicle Detection

Applied Sciences ◽

10.3390/app9245397 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5397

Author(s):

Kun Zhao ◽

Li Liu ◽

Yu Meng ◽

Qing Gu

Keyword(s):

Object Detection ◽

Vehicle Detection ◽

Stage Structure ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Validation Set ◽

Bounding Boxes ◽

The Stability ◽

3D Object Detection

3D object detection has recently become a research hotspot in the field of autonomous driving. Although great progress has been made, it still needs to be further improved. Therefore, this paper presents FDCA, a feature deep continuous aggregation network using multi-sensors for 3D vehicle detection. The proposed network adopts a two-stage structure with the bird’s-eye view (BEV) map and the RGB image as an input. In the first stage, two feature extractors were used to generate feature maps with the high-resolution and representational ability for each input view. These feature maps were then fused and fed to a 3D proposal generator to obtain the reliable 3D vehicle proposals. In the second stage, the refinement network aggregated the features of the proposal regions further and performed classifications, a 3D bounding boxes regression, and orientation estimations to predict the location and heading of vehicles in 3D space. The FDCA network proposed was trained and evaluated on the KITTI 3D object detection benchmark. The experimental results of the validation set illustrated that compared with other fusion-based methods, the 3D average precision (AP) could achieve 76.82% on a moderate setting while having real-time capability, which was higher than that of the second-best performing method by 2.38%. Meanwhile, the results of ablation experiments show that the convergence rate of FDCA was much faster and the stability was also much better, making it a candidate for application in autonomous driving.

Download Full-text