scholarly journals Real-Time Semantic Segmentation with Dual Encoder and Self-Attention Mechanism for Autonomous Driving

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8072
Author(s):  
Yu-Bang Chang ◽  
Chieh Tsai ◽  
Chang-Hong Lin ◽  
Poki Chen

As the techniques of autonomous driving become increasingly valued and universal, real-time semantic segmentation has become very popular and challenging in the field of deep learning and computer vision in recent years. However, in order to apply the deep learning model to edge devices accompanying sensors on vehicles, we need to design a structure that has the best trade-off between accuracy and inference time. In previous works, several methods sacrificed accuracy to obtain a faster inference time, while others aimed to find the best accuracy under the condition of real time. Nevertheless, the accuracies of previous real-time semantic segmentation methods still have a large gap compared to general semantic segmentation methods. As a result, we propose a network architecture based on a dual encoder and a self-attention mechanism. Compared with preceding works, we achieved a 78.6% mIoU with a speed of 39.4 FPS with a 1024 × 2048 resolution on a Cityscapes test submission.

Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1960
Author(s):  
Dongwan Kang ◽  
Anthony Wong ◽  
Banghyon Lee ◽  
Jungha Kim

Autonomous vehicles perceive objects through various sensors. Cameras, radar, and LiDAR are generally used as vehicle sensors, each of which has its own characteristics. As examples, cameras are used for a high-level understanding of a scene, radar is applied to weather-resistant distance perception, and LiDAR is used for accurate distance recognition. The ability of a camera to understand a scene has overwhelmingly increased with the recent development of deep learning. In addition, technologies that emulate other sensors using a single sensor are being developed. Therefore, in this study, a LiDAR data-based scene understanding method was developed through deep learning. The approaches to accessing LiDAR data through deep learning are mainly divided into point, projection, and voxel methods. The purpose of this study is to apply a projection method to secure a real-time performance. The convolutional neural network method used by a conventional camera can be easily applied to the projection method. In addition, an adaptive break point detector method used for conventional 2D LiDAR information is utilized to solve the misclassification caused by the conversion from 2D into 3D. The results of this study are evaluated through a comparison with other technologies.


Impact ◽  
2020 ◽  
Vol 2020 (2) ◽  
pp. 9-11
Author(s):  
Tomohiro Fukuda

Mixed reality (MR) is rapidly becoming a vital tool, not just in gaming, but also in education, medicine, construction and environmental management. The term refers to systems in which computer-generated content is superimposed over objects in a real-world environment across one or more sensory modalities. Although most of us have heard of the use of MR in computer games, it also has applications in military and aviation training, as well as tourism, healthcare and more. In addition, it has the potential for use in architecture and design, where buildings can be superimposed in existing locations to render 3D generations of plans. However, one major challenge that remains in MR development is the issue of real-time occlusion. This refers to hiding 3D virtual objects behind real articles. Dr Tomohiro Fukuda, who is based at the Division of Sustainable Energy and Environmental Engineering, Graduate School of Engineering at Osaka University in Japan, is an expert in this field. Researchers, led by Dr Tomohiro Fukuda, are tackling the issue of occlusion in MR. They are currently developing a MR system that realises real-time occlusion by harnessing deep learning to achieve an outdoor landscape design simulation using a semantic segmentation technique. This methodology can be used to automatically estimate the visual environment prior to and after construction projects.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Rajat Garg ◽  
Anil Kumar ◽  
Nikunj Bansal ◽  
Manish Prateek ◽  
Shashi Kumar

AbstractUrban area mapping is an important application of remote sensing which aims at both estimation and change in land cover under the urban area. A major challenge being faced while analyzing Synthetic Aperture Radar (SAR) based remote sensing data is that there is a lot of similarity between highly vegetated urban areas and oriented urban targets with that of actual vegetation. This similarity between some urban areas and vegetation leads to misclassification of the urban area into forest cover. The present work is a precursor study for the dual-frequency L and S-band NASA-ISRO Synthetic Aperture Radar (NISAR) mission and aims at minimizing the misclassification of such highly vegetated and oriented urban targets into vegetation class with the help of deep learning. In this study, three machine learning algorithms Random Forest (RF), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) have been implemented along with a deep learning model DeepLabv3+ for semantic segmentation of Polarimetric SAR (PolSAR) data. It is a general perception that a large dataset is required for the successful implementation of any deep learning model but in the field of SAR based remote sensing, a major issue is the unavailability of a large benchmark labeled dataset for the implementation of deep learning algorithms from scratch. In current work, it has been shown that a pre-trained deep learning model DeepLabv3+ outperforms the machine learning algorithms for land use and land cover (LULC) classification task even with a small dataset using transfer learning. The highest pixel accuracy of 87.78% and overall pixel accuracy of 85.65% have been achieved with DeepLabv3+ and Random Forest performs best among the machine learning algorithms with overall pixel accuracy of 77.91% while SVM and KNN trail with an overall accuracy of 77.01% and 76.47% respectively. The highest precision of 0.9228 is recorded for the urban class for semantic segmentation task with DeepLabv3+ while machine learning algorithms SVM and RF gave comparable results with a precision of 0.8977 and 0.8958 respectively.


2021 ◽  
Vol 11 (15) ◽  
pp. 7148
Author(s):  
Bedada Endale ◽  
Abera Tullu ◽  
Hayoung Shi ◽  
Beom-Soo Kang

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.


2020 ◽  
Vol 13 (1) ◽  
pp. 89
Author(s):  
Manuel Carranza-García ◽  
Jesús Torres-Mateo ◽  
Pedro Lara-Benítez ◽  
Jorge García-Gutiérrez

Object detection using remote sensing data is a key task of the perception systems of self-driving vehicles. While many generic deep learning architectures have been proposed for this problem, there is little guidance on their suitability when using them in a particular scenario such as autonomous driving. In this work, we aim to assess the performance of existing 2D detection systems on a multi-class problem (vehicles, pedestrians, and cyclists) with images obtained from the on-board camera sensors of a car. We evaluate several one-stage (RetinaNet, FCOS, and YOLOv3) and two-stage (Faster R-CNN) deep learning meta-architectures under different image resolutions and feature extractors (ResNet, ResNeXt, Res2Net, DarkNet, and MobileNet). These models are trained using transfer learning and compared in terms of both precision and efficiency, with special attention to the real-time requirements of this context. For the experimental study, we use the Waymo Open Dataset, which is the largest existing benchmark. Despite the rising popularity of one-stage detectors, our findings show that two-stage detectors still provide the most robust performance. Faster R-CNN models outperform one-stage detectors in accuracy, being also more reliable in the detection of minority classes. Faster R-CNN Res2Net-101 achieves the best speed/accuracy tradeoff but needs lower resolution images to reach real-time speed. Furthermore, the anchor-free FCOS detector is a slightly faster alternative to RetinaNet, with similar precision and lower memory usage.


2021 ◽  
Vol 13 (13) ◽  
pp. 2524
Author(s):  
Ziyi Chen ◽  
Dilong Li ◽  
Wentao Fan ◽  
Haiyan Guan ◽  
Cheng Wang ◽  
...  

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.


Author(s):  
Tossaporn Santad ◽  
Piyarat Silapasupphakornwong ◽  
Worawat Choensawat ◽  
Kingkarn Sookhanaphibarn

Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3921 ◽  
Author(s):  
Wuttichai Boonpook ◽  
Yumin Tan ◽  
Yinghua Ye ◽  
Peerapong Torteeka ◽  
Kritanai Torsri ◽  
...  

Buildings along riverbanks are likely to be affected by rising water levels, therefore the acquisition of accurate building information has great importance not only for riverbank environmental protection but also for dealing with emergency cases like flooding. UAV-based photographs are flexible and cloud-free compared to satellite images and can provide very high-resolution images up to centimeter level, while there exist great challenges in quickly and accurately detecting and extracting building from UAV images because there are usually too many details and distortions on UAV images. In this paper, a deep learning (DL)-based approach is proposed for more accurately extracting building information, in which the network architecture, SegNet, is used in the semantic segmentation after the network training on a completely labeled UAV image dataset covering multi-dimension urban settlement appearances along a riverbank area in Chongqing. The experiment results show that an excellent performance has been obtained in the detection of buildings from untrained locations with an average overall accuracy more than 90%. To verify the generality and advantage of the proposed method, the procedure is further evaluated by training and testing with another two open standard datasets which have a variety of building patterns and styles, and the final overall accuracies of building extraction are more than 93% and 95%, respectively.


Sign in / Sign up

Export Citation Format

Share Document