scholarly journals Lightweight Object Detection Ensemble Framework for Autonomous Vehicles in Challenging Weather Conditions

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Rahee Walambe ◽  
Aboli Marathe ◽  
Ketan Kotecha ◽  
George Ghinea

The computer vision systems driving autonomous vehicles are judged by their ability to detect objects and obstacles in the vicinity of the vehicle in diverse environments. Enhancing this ability of a self-driving car to distinguish between the elements of its environment under adverse conditions is an important challenge in computer vision. For example, poor weather conditions like fog and rain lead to image corruption which can cause a drastic drop in object detection (OD) performance. The primary navigation of autonomous vehicles depends on the effectiveness of the image processing techniques applied to the data collected from various visual sensors. Therefore, it is essential to develop the capability to detect objects like vehicles and pedestrians under challenging conditions such as like unpleasant weather. Ensembling multiple baseline deep learning models under different voting strategies for object detection and utilizing data augmentation to boost the models’ performance is proposed to solve this problem. The data augmentation technique is particularly useful and works with limited training data for OD applications. Furthermore, using the baseline models significantly speeds up the OD process as compared to the custom models due to transfer learning. Therefore, the ensembling approach can be highly effective in resource-constrained devices deployed for autonomous vehicles in uncertain weather conditions. The applied techniques demonstrated an increase in accuracy over the baseline models and were able to identify objects from the images captured in the adverse foggy and rainy weather conditions. The applied techniques demonstrated an increase in accuracy over the baseline models and reached 32.75% mean average precision (mAP) and 52.56% average precision (AP) in detecting cars in the adverse fog and rain weather conditions present in the dataset. The effectiveness of multiple voting strategies for bounding box predictions on the dataset is also demonstrated. These strategies help increase the explainability of object detection in autonomous systems and improve the performance of the ensemble techniques over the baseline models.

Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4503
Author(s):  
Jose Roberto Vargas Rivero ◽  
Thiemo Gerbich ◽  
Boris Buschardt ◽  
Jia Chen

In contrast to previous works on data augmentation using LIDAR (Light Detection and Ranging), which mostly consider point clouds under good weather conditions, this paper uses point clouds which are affected by spray. Spray water can be a cause of phantom braking and understanding how to handle the extra detections caused by it is an important step in the development of ADAS (Advanced Driver Assistance Systems)/AV (Autonomous Vehicles) functions. The extra detections caused by spray cannot be safely removed without considering cases in which real solid objects may be present in the same region in which the detections caused by spray take place. As collecting real examples would be extremely difficult, the use of synthetic data is proposed. Real scenes are reconstructed virtually with an added extra object in the spray region, in a way that the detections caused by this obstacle match the characteristics a real object in the same position would have regarding intensity, echo number and occlusion. The detections generated by the obstacle are then used to augment the real data, obtaining, after occlusion effects are added, a good approximation of the desired training data. This data is used to train a classifier achieving an average F-Score of 92. The performance of the classifier is analyzed in detail based on the characteristics of the synthetic object: size, position, reflection, duration. The proposed method can be easily expanded to different kinds of obstacles and classifier types.


2021 ◽  
Author(s):  
ke wang ◽  
Lianhua Zhang ◽  
Qin Xia ◽  
Liang Pu ◽  
Junlan Chen

Abstract Convolutional neural networks (CNN) based object detection usually assumes that training and test data have the same distribution, which, however, does not always hold in real-world applications. In autonomous vehicles, the driving scene (target domain) consists of unconstrained road environments which cannot all possibly be observed in training data (source domain) and this will lead to a sharp drop in the accuracy of the detector. In this paper, we propose a domain adaptation framework based on pseudo-labels to solve the domain shift. First, the pseudo-labels of the target domain images are generated by the baseline detector (BD) and optimized by our data optimization module to correct the errors. Then, the hard samples in a single image are labeled based on the optimization results of pseudo-labels. The adaptive sampling module is approached to sample target domain data according to the number of hard samples per image to select more effective data. Finally, a modified knowledge distillation loss is applied in the retraining module, and we investigate two ways of assigning soft-labels to the training examples from the target domain to retrain the detector. We evaluate the average precision of our approach in various source/target domain pairs and demonstrate that the framework improves over 10% average precision of BD on multiple domain adaptation scenarios on the Cityscapes, KITTI, and Apollo datasets.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Sultan Daud Khan ◽  
Ahmed B. Altamimi ◽  
Mohib Ullah ◽  
Habib Ullah ◽  
Faouzi Alaya Cheikh

Head detection in real-world videos is a classical research problem in computer vision. Head detection in videos is challenging than in a single image due to many nuisances that are commonly observed in natural videos, including arbitrary poses, appearances, and scales. Generally, head detection is treated as a particular case of object detection in a single image. However, the performance of object detectors deteriorates in unconstrained videos. In this paper, we propose a temporal consistency model (TCM) to enhance the performance of a generic object detector by integrating spatial-temporal information that exists among subsequent frames of a particular video. Generally, our model takes detection from a generic detector as input and improves mean average precision (mAP) by recovering missed detection and suppressing false positives. We compare and evaluate the proposed framework on four challenging datasets, i.e., HollywoodHeads, Casablanca, BOSS, and PAMELA. Experimental evaluation shows that the performance is improved by employing the proposed TCM model. We demonstrate both qualitatively and quantitatively that our proposed framework obtains significant improvements over other methods.


2020 ◽  
Vol 12 (6) ◽  
pp. 1014
Author(s):  
Jingchao Jiang ◽  
Cheng-Zhi Qin ◽  
Juan Yu ◽  
Changxiu Cheng ◽  
Junzhi Liu ◽  
...  

Reference objects in video images can be used to indicate urban waterlogging depths. The detection of reference objects is the key step to obtain waterlogging depths from video images. Object detection models with convolutional neural networks (CNNs) have been utilized to detect reference objects. These models require a large number of labeled images as the training data to ensure the applicability at a city scale. However, it is hard to collect a sufficient number of urban flooding images containing valuable reference objects, and manually labeling images is time-consuming and expensive. To solve the problem, we present a method to synthesize image data as the training data. Firstly, original images containing reference objects and original images with water surfaces are collected from open data sources, and reference objects and water surfaces are cropped from these original images. Secondly, the reference objects and water surfaces are further enriched via data augmentation techniques to ensure the diversity. Finally, the enriched reference objects and water surfaces are combined to generate a synthetic image dataset with annotations. The synthetic image dataset is further used for training an object detection model with CNN. The waterlogging depths are calculated based on the reference objects detected by the trained model. A real video dataset and an artificial image dataset are used to evaluate the effectiveness of the proposed method. The results show that the detection model trained using the synthetic image dataset can effectively detect reference objects from images, and it can achieve acceptable accuracies of waterlogging depths based on the detected reference objects. The proposed method has the potential to monitor waterlogging depths at a city scale.


2019 ◽  
Vol 8 (12) ◽  
pp. 549 ◽  
Author(s):  
Mohamed Ibrahim ◽  
James Haworth ◽  
Tao Cheng

Extracting information related to weather and visual conditions at a given time and space is indispensable for scene awareness, which strongly impacts our behaviours, from simply walking in a city to riding a bike, driving a car, or autonomous drive-assistance. Despite the significance of this subject, it has still not been fully addressed by the machine intelligence relying on deep learning and computer vision to detect the multi-labels of weather and visual conditions with a unified method that can be easily used in practice. What has been achieved to-date are rather sectorial models that address a limited number of labels that do not cover the wide spectrum of weather and visual conditions. Nonetheless, weather and visual conditions are often addressed individually. In this paper, we introduce a novel framework to automatically extract this information from street-level images relying on deep learning and computer vision using a unified method without any pre-defined constraints in the processed images. A pipeline of four deep convolutional neural network (CNN) models, so-called WeatherNet, is trained, relying on residual learning using ResNet50 architecture, to extract various weather and visual conditions such as dawn/dusk, day and night for time detection, glare for lighting conditions, and clear, rainy, snowy, and foggy for weather conditions. WeatherNet shows strong performance in extracting this information from user-defined images or video streams that can be used but are not limited to autonomous vehicles and drive-assistance systems, tracking behaviours, safety-related research, or even for better understanding cities through images for policy-makers.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jing Zhou ◽  
Linsheng Huo

The delayed fracture of high-strength bolts occurs frequently in the bolt connections of long-span steel bridges. This phenomenon can threaten the safety of structures and even lead to serious accidents in certain cases. However, the manual inspection commonly used in engineering to detect the fractured bolts is time-consuming and inconvenient. Therefore, a computer vision-based inspection approach is proposed in this paper to rapidly and automatically detect the fractured bolts. The proposed approach is realized by a convolutional neural network- (CNN-) based deep learning algorithm, the third version of You Only Look Once (YOLOv3). A challenge for the detector training using YOLOv3 is that only limited amounts of images of the fractured bolts are available in practice. To address this challenge, five data augmentation methods are introduced to produce more labeled images, including brightness transformation, Gaussian blur, flipping, perspective transformation, and scaling. Six YOLOv3 neural networks are trained using six different augmented training sets, and then, the performance of each detector is tested on the same testing set to compare the effectiveness of different augmentation methods. The highest average precision (AP) of the trained detectors is 89.14% when the intersection over union (IOU) threshold is set to 0.5. The practicality and robustness of the proposed method are further demonstrated on images that were never used in the training and testing of the detector. The results demonstrate that the proposed method can quickly and automatically detect the delayed fracture of high-strength bolts.


2021 ◽  
Vol 11 (23) ◽  
pp. 11174
Author(s):  
Shashank Mishra ◽  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
...  

Object detection is one of the most critical tasks in the field of Computer vision. This task comprises identifying and localizing an object in the image. Architectural floor plans represent the layout of buildings and apartments. The floor plans consist of walls, windows, stairs, and other furniture objects. While recognizing floor plan objects is straightforward for humans, automatically processing floor plans and recognizing objects is challenging. In this work, we investigate the performance of the recently introduced Cascade Mask R-CNN network to solve object detection in floor plan images. Furthermore, we experimentally establish that deformable convolution works better than conventional convolutions in the proposed framework. Prior datasets for object detection in floor plan images are either publicly unavailable or contain few samples. We introduce SFPI, a novel synthetic floor plan dataset consisting of 10,000 images to address this issue. Our proposed method conveniently exceeds the previous state-of-the-art results on the SESYD dataset with an mAP of 98.1%. Moreover, it sets impressive baseline results on our novel SFPI dataset with an mAP of 99.8%. We believe that introducing the modern dataset enables the researcher to enhance the research in this domain.


Sign in / Sign up

Export Citation Format

Share Document