Scene terrain classification for autonomous vehicle navigation based on semantic segmentation method

Author(s):  
S Julius Fusic ◽  
K Hariharan ◽  
R Sitharthan ◽  
S Karthikeyan

Autonomous transportation is a new paradigm of an Industry 5.0 cyber-physical system that provides a lot of opportunities in smart logistics applications. The safety and reliability of deep learning-driven systems are still a question under research. The safety of an autonomous guided vehicle is dependent on the proper selection of sensors and the transmission of reflex data. Several academics worked on sensor-based difficulties by developing a sensor correction system and fine-tuning algorithms to regulate the system’s efficiency and precision. In this paper, the introduction of vision sensor and its scene terrain classification using a deep learning algorithm is performed with proposed datasets during sensor failure conditions. The proposed classification technique is to identify the mobile robot obstacle and obstacle-free path for smart logistic vehicle application. To analyze the information from the acquired image datasets, the proposed classification algorithm employs segmentation techniques. The analysis of proposed dataset is validated with U-shaped convolutional network (U-Net) architecture and region-based convolutional neural network (Mask R-CNN) architecture model. Based on the results, the selection of 1400 raw image datasets is trained and validated using semantic segmentation classifier models. For various terrain dataset clusters, the Mask R-CNN classifier model method has the highest model accuracy of 93%, that is, 23% higher than the U-Net classifier model algorithm, which has the lowest model accuracy nearly 70%. As a result, the suggested Mask R-CNN technique has a significant potential of being used in autonomous vehicle applications.

2018 ◽  
Vol 34 (2) ◽  
pp. 113-125 ◽  
Author(s):  
Diem-Phuc Tran ◽  
Van-Dung Hoang ◽  
TRI-CONG PHAM ◽  
CHI-MAI LUONG

The article presents an advanced driver assistance system (ADAS) based on a situational recognition solution and provides alert levels in the context of actual traffic. The solution is a process in which a single image is segmented to detect pedestrians’ position as well as extract features of pedestrian posture to predict the action. The main purpose of this process is to improve accuracy and provide warning levels, which supports autonomous vehicle navigation to avoid collisions. The process of the situation prediction and issuing of warning levels consists of two phases: (1) Segmenting in order to definite the located pedestrians and other objects in traffic environment, (2) Judging the situation according to the position and posture of pedestrians in traffic. The accuracy rate of the action prediction is 99.59% and the speed is 5 frames per second.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2215
Author(s):  
Athanasios Voulodimos ◽  
Eftychios Protopapadakis ◽  
Iason Katsamenis ◽  
Anastasios Doulamis ◽  
Nikolaos Doulamis

Recent studies indicate that detecting radiographic patterns on CT chest scans can yield high sensitivity and specificity for COVID-19 identification. In this paper, we scrutinize the effectiveness of deep learning models for semantic segmentation of pneumonia-infected area segmentation in CT images for the detection of COVID-19. Traditional methods for CT scan segmentation exploit a supervised learning paradigm, so they (a) require large volumes of data for their training, and (b) assume fixed (static) network weights once the training procedure has been completed. Recently, to overcome these difficulties, few-shot learning (FSL) has been introduced as a general concept of network model training using a very small amount of samples. In this paper, we explore the efficacy of few-shot learning in U-Net architectures, allowing for a dynamic fine-tuning of the network weights as new few samples are being fed into the U-Net. Experimental results indicate improvement in the segmentation accuracy of identifying COVID-19 infected regions. In particular, using 4-fold cross-validation results of the different classifiers, we observed an improvement of 5.388 ± 3.046% for all test data regarding the IoU metric and a similar increment of 5.394 ± 3.015% for the F1 score. Moreover, the statistical significance of the improvement obtained using our proposed few-shot U-Net architecture compared with the traditional U-Net model was confirmed by applying the Kruskal-Wallis test (p-value = 0.026).


2020 ◽  
Vol 59 (12) ◽  
pp. 2057-2073
Author(s):  
Yingkai Sha ◽  
David John Gagne II ◽  
Gregory West ◽  
Roland Stull

AbstractMany statistical downscaling methods require observational inputs and expert knowledge and thus cannot be generalized well across different regions. Convolutional neural networks (CNNs) are deep-learning models that have generalization abilities for various applications. In this research, we modify UNet, a semantic-segmentation CNN, and apply it to the downscaling of daily maximum/minimum 2-m temperature (TMAX/TMIN) over the western continental United States from 0.25° to 4-km grid spacings. We select high-resolution (HR) elevation, low-resolution (LR) elevation, and LR TMAX/TMIN as inputs; train UNet using Parameter–Elevation Regressions on Independent Slopes Model (PRISM) data over the south- and central-western United States from 2015 to 2018; and test it independently over both the training domains and the northwestern United States from 2018 to 2019. We found that the original UNet cannot generate enough fine-grained spatial details when transferred to the new northwestern U.S. domain. In response, we modified the original UNet by assigning an extra HR elevation output branch/loss function and training the modified UNet to reproduce both the supervised HR TMAX/TMIN and the unsupervised HR elevation. This improvement is named “UNet-Autoencoder (AE).” UNet-AE supports semisupervised model fine-tuning for unseen domains and showed better gridpoint-level performance with more than 10% mean absolute error (MAE) reduction relative to the original UNet. On the basis of its performance relative to the 4-km PRISM, UNet-AE is a good option to provide generalizable downscaling for regions that are underrepresented by observations.


2020 ◽  
Vol 9 (10) ◽  
pp. 601
Author(s):  
Ahram Song ◽  
Yongil Kim

Although semantic segmentation of remote-sensing (RS) images using deep-learning networks has demonstrated its effectiveness recently, compared with natural-image datasets, obtaining RS images under the same conditions to construct data labels is difficult. Indeed, small datasets limit the effective learning of deep-learning networks. To address this problem, we propose a combined U-net model that is trained using a combined weighted loss function and can handle heterogeneous datasets. The network consists of encoder and decoder blocks. The convolutional layers that form the encoder blocks are shared with the heterogeneous datasets, and the decoder blocks are assigned separate training weights. Herein, the International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam and Cityscape datasets are used as the RS and natural-image datasets, respectively. When the layers are shared, only visible bands of the ISPRS Potsdam data are used. Experimental results show that when same-sized heterogeneous datasets are used, the semantic segmentation accuracy of the Potsdam data obtained using our proposed method is lower than that obtained using only the Potsdam data (four bands) with other methods, such as SegNet, DeepLab-V3+, and the simplified version of U-net. However, the segmentation accuracy of the Potsdam images is improved when the larger Cityscape dataset is used. The combined U-net model can effectively train heterogeneous datasets and overcome the insufficient training data problem in the context of RS-image datasets. Furthermore, it is expected that the proposed method can not only be applied to segmentation tasks of aerial images but also to tasks with various purposes of using big heterogeneous datasets.


In the recent past, Deep Learning models [1] are predominantly being used in Object Detection algorithms due to their accurate Image Recognition capability. These models extract features from the input images and videos [2] for identification of objects present in them. Various applications of these models include Image Processing, Video analysis, Speech Recognition, Biomedical Image Analysis, Biometric Recognition, Iris Recognition, National Security applications, Cyber Security, Natural Language Processing [3], Weather Forecasting applications, Renewable Energy Generation Scheduling etc. These models utilize the concept of Convolutional Neural Network (CNN) [3], which constitutes several layers of artificial neurons. The accuracy of Deep Learning models [1] depends on various parameters such as ‘Learning-rate’, ‘Training batch size’, ‘Validation batch size’, ‘Activation Function’, ‘Drop-out rate’ etc. These parameters are known as Hyper-Parameters. Object detection accuracy depends on selection of Hyperparameters and these in-turn decides the optimum accuracy. Hence, finding the best values for these parameters is a challenging task. Fine-Tuning is a process used for selection of a suitable Hyper-Parameter value for improvement of object detection accuracy. Selection of an inappropriate Hyper-Parameter value, leads to Over-Fitting or Under-Fitting of data. Over-Fitting is a case, when training data is larger than the required, which results in learning noise and inaccurate object detection. Under-fitting is a case, when the model is unable to capture the trend of the data and which leads to more erroneous results in testing or training data. In this paper, a balance between Over-fitting and Under-fitting is achieved by varying the ‘Learning rate’ of various Deep Learning models. Four Deep Learning Models such as VGG16, VGG19, InceptionV3 and Xception are considered in this paper for analysis purpose. The best zone of Learning-rate for each model, in respect of maximum Object Detection accuracy, is analyzed. In this paper a dataset of 70 object classes is taken and the prediction accuracy is analyzed by changing the ‘Learning-rate’ and keeping the rest of the Hyper-Parameters constant. This paper mainly concentrates on the impact of ‘Learning-rate’ on accuracy and identifies an optimum accuracy zone in Object Detection


2019 ◽  
Author(s):  
José A. Diaz Amado ◽  
Jean Amaro ◽  
Iago P. Gomes ◽  
Denis Wolf ◽  
F. S. Osorio

This work aims to present an autonomous vehicle navigation system, based on an End-to-End Deep Learning approach, and to study the impact of different image input configurations to the system performance. The proposed methodology in this work was to adoptand test different configurations of RGB and Depth images captured from a Kinect device. We adopted a multi-camera system, composed by 3 cameras, with different RGB and/or Depth input configurations. Two main systems were developed in order to study and validade de different input configurations: the first one based on a realistic simulator and the second one based on a mini-car (small scale vehicle). Starting with the simulations, it was possible to choose the best camera/input configuration, then we validated that using the real vehicle (mini-car) with real sensors/cameras. The experimental results demonstrated that a multi-camera solution, based on 3 cameras, allow us to obtain better autonomous navigation control results in a End-to-End Deep Learning based approch, with a very small final error when using the proposed camera configurations.


Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2675
Author(s):  
Zewei Wang ◽  
Change Zheng ◽  
Jiyan Yin ◽  
Ye Tian ◽  
Wenbin Cui

Forest fire smoke detection based on deep learning has been widely studied. Labeling the smoke image is a necessity when building datasets of target detection and semantic segmentation. The uncertainty in labeling the forest fire smoke pixels caused by the non-uniform diffusion of smoke particles will affect the recognition accuracy of the deep learning model. To overcome the labeling ambiguity, the weighted idea was proposed in this paper for the first time. First, the pixel-concentration relationship between the gray value and the concentration of forest fire smoke pixels in the image was established. Second, the loss function of the semantic segmentation method based on concentration weighting was built and improved; thus, the network could pay attention to the smoke pixels differently, an effort to better segment smoke by weighting the loss calculation of smoke pixels. Finally, based on the established forest fire smoke dataset, selection of the optimum weighted factors was made through experiments. mIoU based on the weighted method increased by 1.52% than the unweighted method. The weighted method cannot only be applied to the semantic segmentation and target detection of forest fire smoke, but also has a certain significance to other dispersive target recognition.


Sign in / Sign up

Export Citation Format

Share Document