On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data

Object detection using remote sensing data is a key task of the perception systems of self-driving vehicles. While many generic deep learning architectures have been proposed for this problem, there is little guidance on their suitability when using them in a particular scenario such as autonomous driving. In this work, we aim to assess the performance of existing 2D detection systems on a multi-class problem (vehicles, pedestrians, and cyclists) with images obtained from the on-board camera sensors of a car. We evaluate several one-stage (RetinaNet, FCOS, and YOLOv3) and two-stage (Faster R-CNN) deep learning meta-architectures under different image resolutions and feature extractors (ResNet, ResNeXt, Res2Net, DarkNet, and MobileNet). These models are trained using transfer learning and compared in terms of both precision and efficiency, with special attention to the real-time requirements of this context. For the experimental study, we use the Waymo Open Dataset, which is the largest existing benchmark. Despite the rising popularity of one-stage detectors, our findings show that two-stage detectors still provide the most robust performance. Faster R-CNN models outperform one-stage detectors in accuracy, being also more reliable in the detection of minority classes. Faster R-CNN Res2Net-101 achieves the best speed/accuracy tradeoff but needs lower resolution images to reach real-time speed. Furthermore, the anchor-free FCOS detector is a slightly faster alternative to RetinaNet, with similar precision and lower memory usage.

Download Full-text

Real-Time Semantic Segmentation of 3D Point Cloud for Autonomous Driving

Electronics ◽

10.3390/electronics10161960 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1960

Author(s):

Dongwan Kang ◽

Anthony Wong ◽

Banghyon Lee ◽

Jungha Kim

Keyword(s):

Deep Learning ◽

Real Time ◽

Projection Method ◽

Autonomous Vehicles ◽

Distance Perception ◽

Semantic Segmentation ◽

Break Point ◽

Autonomous Driving ◽

Lidar Data ◽

High Level

Autonomous vehicles perceive objects through various sensors. Cameras, radar, and LiDAR are generally used as vehicle sensors, each of which has its own characteristics. As examples, cameras are used for a high-level understanding of a scene, radar is applied to weather-resistant distance perception, and LiDAR is used for accurate distance recognition. The ability of a camera to understand a scene has overwhelmingly increased with the recent development of deep learning. In addition, technologies that emulate other sensors using a single sensor are being developed. Therefore, in this study, a LiDAR data-based scene understanding method was developed through deep learning. The approaches to accessing LiDAR data through deep learning are mainly divided into point, projection, and voxel methods. The purpose of this study is to apply a projection method to secure a real-time performance. The convolutional neural network method used by a conventional camera can be easily applied to the projection method. In addition, an adaptive break point detector method used for conventional 2D LiDAR information is utilized to solve the misclassification caused by the conversion from 2D into 3D. The results of this study are evaluated through a comparison with other technologies.

Download Full-text

Deep-Framework: A Distributed, Scalable, and Edge-Oriented Framework for Real-Time Analysis of Video Streams

Sensors ◽

10.3390/s21124045 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4045

Author(s):

Alessandro Sassu ◽

Jose Francisco Saenz-Cogollo ◽

Maurizio Agelli

Keyword(s):

Deep Learning ◽

Real Time ◽

Video Data ◽

Video Analytics ◽

Web Based ◽

Real Time Analysis ◽

Open Source Framework ◽

Cluster Configuration ◽

Time Requirements ◽

High Level

Edge computing is the best approach for meeting the exponential demand and the real-time requirements of many video analytics applications. Since most of the recent advances regarding the extraction of information from images and video rely on computation heavy deep learning algorithms, there is a growing need for solutions that allow the deployment and use of new models on scalable and flexible edge architectures. In this work, we present Deep-Framework, a novel open source framework for developing edge-oriented real-time video analytics applications based on deep learning. Deep-Framework has a scalable multi-stream architecture based on Docker and abstracts away from the user the complexity of cluster configuration, orchestration of services, and GPU resources allocation. It provides Python interfaces for integrating deep learning models developed with the most popular frameworks and also provides high-level APIs based on standard HTTP and WebRTC interfaces for consuming the extracted video data on clients running on browsers or any other web-based platform.

Download Full-text

Computer vision based obstacle detection and target tracking for autonomous vehicles

MATEC Web of Conferences ◽

10.1051/matecconf/202133607004 ◽

2021 ◽

Vol 336 ◽

pp. 07004

Author(s):

Ruoyu Fang ◽

Cheng Cai

Keyword(s):

Neural Network ◽

Computer Vision ◽

Deep Learning ◽

Target Tracking ◽

Real Time ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Obstacle Detection ◽

Pid Algorithm ◽

Deep Learning Neural Network

Obstacle detection and target tracking are two major issues for intelligent autonomous vehicles. This paper proposes a new scheme to achieve target tracking and real-time obstacle detection of obstacles based on computer vision. ResNet-18 deep learning neural network is utilized for obstacle detection and Yolo-v3 deep learning neural network is employed for real-time target tracking. These two trained models can be deployed on an autonomous vehicle equipped with an NVIDIA Jetson Nano motherboard. The autonomous vehicle moves to avoid obstacles and follow tracked targets by camera. Adjusting the steering and movement of the autonomous vehicle according to the PID algorithm during the movement, therefore, will help the proposed vehicle achieve stable and precise tracking.

Download Full-text

Automated Testing of Ultrawideband Positioning for Autonomous Driving

Journal of Robotics ◽

10.1155/2020/9345360 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Benjamin Vedder ◽

Bo Joel Svensson ◽

Jonny Vinter ◽

Magnus Jonsson

Keyword(s):

Fault Tolerance ◽

Real Time ◽

Autonomous Vehicles ◽

Fault Injection ◽

Autonomous Driving ◽

Automated Testing ◽

Test Cases ◽

Positioning System ◽

Novel Approach ◽

Driving Model

Autonomous vehicles need accurate and dependable positioning, and these systems need to be tested extensively. We have evaluated positioning based on ultrawideband (UWB) ranging with our self-driving model car using a highly automated approach. Random drivable trajectories were generated, while the UWB position was compared against the Real-Time Kinematic Satellite Navigation (RTK-SN) positioning system which our model car also is equipped with. Fault injection was used to study the fault tolerance of the UWB positioning system. Addressed challenges are automatically generating test cases for real-time hardware, restoring the state between tests, and maintaining safety by preventing collisions. We were able to automatically generate and carry out hundreds of experiments on the model car in real time and rerun them consistently with and without fault injection enabled. Thereby, we demonstrate one novel approach to perform automated testing on complex real-time hardware.

Download Full-text

Road-Aware Trajectory Prediction for Autonomous Driving on Highways

Sensors ◽

10.3390/s20174703 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4703

Author(s):

Yookhyun Yoon ◽

Taeyeon Kim ◽

Ho Lee ◽

Jahnghyon Park

Keyword(s):

Deep Learning ◽

Autonomous Vehicles ◽

Prediction Method ◽

Autonomous Driving ◽

Structural Constraints ◽

High Definition ◽

Trajectory Prediction ◽

The Road ◽

Road Geometry ◽

Efficient Learning

For driving safely and comfortably, the long-term trajectory prediction of surrounding vehicles is essential for autonomous vehicles. For handling the uncertain nature of trajectory prediction, deep-learning-based approaches have been proposed previously. An on-road vehicle must obey road geometry, i.e., it should run within the constraint of the road shape. Herein, we present a novel road-aware trajectory prediction method which leverages the use of high-definition maps with a deep learning network. We developed a data-efficient learning framework for the trajectory prediction network in the curvilinear coordinate system of the road and a lane assignment for the surrounding vehicles. Then, we proposed a novel output-constrained sequence-to-sequence trajectory prediction network to incorporate the structural constraints of the road. Our method uses these structural constraints as prior knowledge for the prediction network. It is not only used as an input to the trajectory prediction network, but is also included in the constrained loss function of the maneuver recognition network. Accordingly, the proposed method can predict a feasible and realistic intention of the driver and trajectory. Our method has been evaluated using a real traffic dataset, and the results thus obtained show that it is data-efficient and can predict reasonable trajectories at merging sections.

Download Full-text

Real-Time Vehicle Detection Algorithm Based on Vision and Lidar Point Cloud Fusion

Journal of Sensors ◽

10.1155/2019/8473980 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 8

Author(s):

Hai Wang ◽

Xinyu Lou ◽

Yingfeng Cai ◽

Yicheng Li ◽

Long Chen

Keyword(s):

Deep Learning ◽

Real Time ◽

Autonomous Vehicles ◽

Point Cloud ◽

Vehicle Detection ◽

Detection Algorithm ◽

Detection Methods ◽

Depth Information ◽

Detection Accuracy ◽

Classification Rate

Vehicle detection is one of the most important environment perception tasks for autonomous vehicles. The traditional vision-based vehicle detection methods are not accurate enough especially for small and occluded targets, while the light detection and ranging- (lidar-) based methods are good in detecting obstacles but they are time-consuming and have a low classification rate for different target types. Focusing on these shortcomings to make the full use of the advantages of the depth information of lidar and the obstacle classification ability of vision, this work proposes a real-time vehicle detection algorithm which fuses vision and lidar point cloud information. Firstly, the obstacles are detected by the grid projection method using the lidar point cloud information. Then, the obstacles are mapped to the image to get several separated regions of interest (ROIs). After that, the ROIs are expanded based on the dynamic threshold and merged to generate the final ROI. Finally, a deep learning method named You Only Look Once (YOLO) is applied on the ROI to detect vehicles. The experimental results on the KITTI dataset demonstrate that the proposed algorithm has high detection accuracy and good real-time performance. Compared with the detection method based only on the YOLO deep learning, the mean average precision (mAP) is increased by 17%.

Download Full-text

Path Planning for Highly Automated Driving on Embedded GPUs

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea8040035 ◽

2018 ◽

Vol 8 (4) ◽

pp. 35 ◽

Cited By ~ 2

Author(s):

Jörg Fickenscher ◽

Sandra Schmidt ◽

Frank Hannig ◽

Mohamed Bouzouraa ◽

Jürgen Teich

Keyword(s):

Path Planning ◽

Real Time ◽

Autonomous Driving ◽

Automated Driving ◽

Computing Power ◽

Planning Algorithm ◽

Time Requirements ◽

Evaluation Platform ◽

Path Planning Algorithm ◽

Highly Automated Driving

The sector of autonomous driving gains more and more importance for the car makers. A key enabler of such systems is the planning of the path the vehicle should take, but it can be very computationally burdensome finding a good one. Here, new architectures in ECU are required, such as GPU, because standard processors struggle to provide enough computing power. In this work, we present a novel parallelization of a path planning algorithm. We show how many paths can be reasonably planned under real-time requirements and how they can be rated. As an evaluation platform, an Nvidia Jetson board equipped with a Tegra K1 SoC was used, whose GPU is also employed in the zFAS ECU of the AUDI AG.

Download Full-text

An LSTM-Based Deep Learning Approach for Classifying Malicious Traffic at the Packet Level

Applied Sciences ◽

10.3390/app9163414 ◽

2019 ◽

Vol 9 (16) ◽

pp. 3414 ◽

Cited By ~ 8

Author(s):

Ren-Hung Hwang ◽

Min-Chun Peng ◽

Van-Linh Nguyen ◽

Yu-Lun Chang

Keyword(s):

Deep Learning ◽

Real Time ◽

Short Term Memory ◽

Learning Technologies ◽

Word Embedding ◽

Detection Systems ◽

Real Time Analysis ◽

Detection Delay ◽

Novel Approach ◽

Prior Literature

Recently, deep learning has been successfully applied to network security assessments and intrusion detection systems (IDSs) with various breakthroughs such as using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) to classify malicious traffic. However, these state-of-the-art systems also face tremendous challenges to satisfy real-time analysis requirements due to the major delay of the flow-based data preprocessing, i.e., requiring time for accumulating the packets into particular flows and then extracting features. If detecting malicious traffic can be done at the packet level, detecting time will be significantly reduced, which makes the online real-time malicious traffic detection based on deep learning technologies become very promising. With the goal of accelerating the whole detection process by considering a packet level classification, which has not been studied in the literature, in this research, we propose a novel approach in building the malicious classification system with the primary support of word embedding and the LSTM model. Specifically, we propose a novel word embedding mechanism to extract packet semantic meanings and adopt LSTM to learn the temporal relation among fields in the packet header and for further classifying whether an incoming packet is normal or a part of malicious traffic. The evaluation results on ISCX2012, USTC-TFC2016, IoT dataset from Robert Gordon University and IoT dataset collected on our Mirai Botnet show that our approach is competitive to the prior literature which detects malicious traffic at the flow level. While the network traffic is booming year by year, our first attempt can inspire the research community to exploit the advantages of deep learning to build effective IDSs without suffering significant detection delay.

Download Full-text

Real-Time Semantic Segmentation with Dual Encoder and Self-Attention Mechanism for Autonomous Driving

Sensors ◽

10.3390/s21238072 ◽

2021 ◽

Vol 21 (23) ◽

pp. 8072

Author(s):

Yu-Bang Chang ◽

Chieh Tsai ◽

Chang-Hong Lin ◽

Poki Chen

Keyword(s):

Deep Learning ◽

Real Time ◽

Network Architecture ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Attention Mechanism ◽

Trade Off ◽

Segmentation Methods ◽

General Semantic ◽

Deep Learning Model

As the techniques of autonomous driving become increasingly valued and universal, real-time semantic segmentation has become very popular and challenging in the field of deep learning and computer vision in recent years. However, in order to apply the deep learning model to edge devices accompanying sensors on vehicles, we need to design a structure that has the best trade-off between accuracy and inference time. In previous works, several methods sacrificed accuracy to obtain a faster inference time, while others aimed to find the best accuracy under the condition of real time. Nevertheless, the accuracies of previous real-time semantic segmentation methods still have a large gap compared to general semantic segmentation methods. As a result, we propose a network architecture based on a dual encoder and a self-attention mechanism. Compared with preceding works, we achieved a 78.6% mIoU with a speed of 39.4 FPS with a 1024 × 2048 resolution on a Cityscapes test submission.

Download Full-text

Real-Time 3D Multi-Object Detection and Localization Based on Deep Learning for Road and Railway Smart Mobility

Journal of Imaging ◽

10.3390/jimaging7080145 ◽

2021 ◽

Vol 7 (8) ◽

pp. 145

Author(s):

Antoine Mauri ◽

Redouane Khemmar ◽

Benoit Decoux ◽

Madjid Haddad ◽

Rémi Boutteau

Keyword(s):

Deep Learning ◽

Object Detection ◽

Real Time ◽

Video Game ◽

Autonomous Vehicles ◽

Object Localization ◽

Driver Assistance Systems ◽

Smart Mobility ◽

Bounding Boxes ◽

Detection And Localization

For smart mobility, autonomous vehicles, and advanced driver-assistance systems (ADASs), perception of the environment is an important task in scene analysis and understanding. Better perception of the environment allows for enhanced decision making, which, in turn, enables very high-precision actions. To this end, we introduce in this work a new real-time deep learning approach for 3D multi-object detection for smart mobility not only on roads, but also on railways. To obtain the 3D bounding boxes of the objects, we modified a proven real-time 2D detector, YOLOv3, to predict 3D object localization, object dimensions, and object orientation. Our method has been evaluated on KITTI’s road dataset as well as on our own hybrid virtual road/rail dataset acquired from the video game Grand Theft Auto (GTA) V. The evaluation of our method on these two datasets shows good accuracy, but more importantly that it can be used in real-time conditions, in road and rail traffic environments. Through our experimental results, we also show the importance of the accuracy of prediction of the regions of interest (RoIs) used in the estimation of 3D bounding box parameters.

Download Full-text