Scene terrain classification for autonomous vehicle navigation based on semantic segmentation method

Autonomous transportation is a new paradigm of an Industry 5.0 cyber-physical system that provides a lot of opportunities in smart logistics applications. The safety and reliability of deep learning-driven systems are still a question under research. The safety of an autonomous guided vehicle is dependent on the proper selection of sensors and the transmission of reflex data. Several academics worked on sensor-based difficulties by developing a sensor correction system and fine-tuning algorithms to regulate the system’s efficiency and precision. In this paper, the introduction of vision sensor and its scene terrain classification using a deep learning algorithm is performed with proposed datasets during sensor failure conditions. The proposed classification technique is to identify the mobile robot obstacle and obstacle-free path for smart logistic vehicle application. To analyze the information from the acquired image datasets, the proposed classification algorithm employs segmentation techniques. The analysis of proposed dataset is validated with U-shaped convolutional network (U-Net) architecture and region-based convolutional neural network (Mask R-CNN) architecture model. Based on the results, the selection of 1400 raw image datasets is trained and validated using semantic segmentation classifier models. For various terrain dataset clusters, the Mask R-CNN classifier model method has the highest model accuracy of 93%, that is, 23% higher than the U-Net classifier model algorithm, which has the lowest model accuracy nearly 70%. As a result, the suggested Mask R-CNN technique has a significant potential of being used in autonomous vehicle applications.

Download Full-text

PEDESTRIAN ACTIVITY PREDICTION BASED ON SEMANTIC SEGMENTATION AND HYBRID OF MACHINES

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/34/2/12655 ◽

2018 ◽

Vol 34 (2) ◽

pp. 113-125 ◽

Cited By ~ 3

Author(s):

Diem-Phuc Tran ◽

Van-Dung Hoang ◽

TRI-CONG PHAM ◽

CHI-MAI LUONG

Keyword(s):

Driver Assistance System ◽

Autonomous Vehicle ◽

Semantic Segmentation ◽

Single Image ◽

Driver Assistance ◽

Accuracy Rate ◽

Activity Prediction ◽

Improve Accuracy ◽

Two Phases ◽

Autonomous Vehicle Navigation

The article presents an advanced driver assistance system (ADAS) based on a situational recognition solution and provides alert levels in the context of actual traffic. The solution is a process in which a single image is segmented to detect pedestrians’ position as well as extract features of pedestrian posture to predict the action. The main purpose of this process is to improve accuracy and provide warning levels, which supports autonomous vehicle navigation to avoid collisions. The process of the situation prediction and issuing of warning levels consists of two phases: (1) Segmenting in order to definite the located pedestrians and other objects in traffic environment, (2) Judging the situation according to the position and posture of pedestrians in traffic. The accuracy rate of the action prediction is 99.59% and the speed is 5 frames per second.

Download Full-text

A Multiobjective Systems Architecture Model for Sensor Selection in Autonomous Vehicle Navigation

Complex Systems Design & Management ◽

10.1007/978-3-030-34843-4_12 ◽

2019 ◽

pp. 141-152

Author(s):

Anne Collin ◽

Afreen Siddiqi ◽

Yuto Imanishi ◽

Yukti Matta ◽

Taisetsu Tanimichi ◽

...

Keyword(s):

Autonomous Vehicle ◽

Sensor Selection ◽

Vehicle Navigation ◽

Systems Architecture ◽

Architecture Model ◽

Autonomous Vehicle Navigation

Download Full-text

A Few-Shot U-Net Deep Learning Model for COVID-19 Infected Area Segmentation in CT Images

Sensors ◽

10.3390/s21062215 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2215

Author(s):

Athanasios Voulodimos ◽

Eftychios Protopapadakis ◽

Iason Katsamenis ◽

Anastasios Doulamis ◽

Nikolaos Doulamis

Keyword(s):

Deep Learning ◽

Statistical Significance ◽

High Sensitivity ◽

Semantic Segmentation ◽

Ct Images ◽

General Concept ◽

Fine Tuning ◽

P Value ◽

Training Procedure ◽

Infected Area

Recent studies indicate that detecting radiographic patterns on CT chest scans can yield high sensitivity and specificity for COVID-19 identification. In this paper, we scrutinize the effectiveness of deep learning models for semantic segmentation of pneumonia-infected area segmentation in CT images for the detection of COVID-19. Traditional methods for CT scan segmentation exploit a supervised learning paradigm, so they (a) require large volumes of data for their training, and (b) assume fixed (static) network weights once the training procedure has been completed. Recently, to overcome these difficulties, few-shot learning (FSL) has been introduced as a general concept of network model training using a very small amount of samples. In this paper, we explore the efficacy of few-shot learning in U-Net architectures, allowing for a dynamic fine-tuning of the network weights as new few samples are being fed into the U-Net. Experimental results indicate improvement in the segmentation accuracy of identifying COVID-19 infected regions. In particular, using 4-fold cross-validation results of the different classifiers, we observed an improvement of 5.388 ± 3.046% for all test data regarding the IoU metric and a similar increment of 5.394 ± 3.015% for the F1 score. Moreover, the statistical significance of the improvement obtained using our proposed few-shot U-Net architecture compared with the traditional U-Net model was confirmed by applying the Kruskal-Wallis test (p-value = 0.026).

Download Full-text

Deep-Learning-Based Gridded Downscaling of Surface Meteorological Variables in Complex Terrain. Part I: Daily Maximum and Minimum 2-m Temperature

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-20-0057.1 ◽

2020 ◽

Vol 59 (12) ◽

pp. 2057-2073

Author(s):

Yingkai Sha ◽

David John Gagne II ◽

Gregory West ◽

Roland Stull

Keyword(s):

United States ◽

Deep Learning ◽

Expert Knowledge ◽

Semantic Segmentation ◽

Absolute Error ◽

Fine Tuning ◽

Daily Maximum ◽

Fine Grained ◽

Level Performance ◽

Maximum Minimum

AbstractMany statistical downscaling methods require observational inputs and expert knowledge and thus cannot be generalized well across different regions. Convolutional neural networks (CNNs) are deep-learning models that have generalization abilities for various applications. In this research, we modify UNet, a semantic-segmentation CNN, and apply it to the downscaling of daily maximum/minimum 2-m temperature (TMAX/TMIN) over the western continental United States from 0.25° to 4-km grid spacings. We select high-resolution (HR) elevation, low-resolution (LR) elevation, and LR TMAX/TMIN as inputs; train UNet using Parameter–Elevation Regressions on Independent Slopes Model (PRISM) data over the south- and central-western United States from 2015 to 2018; and test it independently over both the training domains and the northwestern United States from 2018 to 2019. We found that the original UNet cannot generate enough fine-grained spatial details when transferred to the new northwestern U.S. domain. In response, we modified the original UNet by assigning an extra HR elevation output branch/loss function and training the modified UNet to reproduce both the supervised HR TMAX/TMIN and the unsupervised HR elevation. This improvement is named “UNet-Autoencoder (AE).” UNet-AE supports semisupervised model fine-tuning for unseen domains and showed better gridpoint-level performance with more than 10% mean absolute error (MAE) reduction relative to the original UNet. On the basis of its performance relative to the 4-km PRISM, UNet-AE is a good option to provide generalizable downscaling for regions that are underrepresented by observations.

Download Full-text

Semantic Segmentation of Remote-Sensing Imagery Using Heterogeneous Big Data: International Society for Photogrammetry and Remote Sensing Potsdam and Cityscape Datasets

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9100601 ◽

2020 ◽

Vol 9 (10) ◽

pp. 601

Author(s):

Ahram Song ◽

Yongil Kim

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

International Society ◽

Semantic Segmentation ◽

Training Data ◽

Natural Image ◽

Learning Networks ◽

Heterogeneous Datasets ◽

Segmentation Accuracy ◽

Image Datasets

Although semantic segmentation of remote-sensing (RS) images using deep-learning networks has demonstrated its effectiveness recently, compared with natural-image datasets, obtaining RS images under the same conditions to construct data labels is difficult. Indeed, small datasets limit the effective learning of deep-learning networks. To address this problem, we propose a combined U-net model that is trained using a combined weighted loss function and can handle heterogeneous datasets. The network consists of encoder and decoder blocks. The convolutional layers that form the encoder blocks are shared with the heterogeneous datasets, and the decoder blocks are assigned separate training weights. Herein, the International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam and Cityscape datasets are used as the RS and natural-image datasets, respectively. When the layers are shared, only visible bands of the ISPRS Potsdam data are used. Experimental results show that when same-sized heterogeneous datasets are used, the semantic segmentation accuracy of the Potsdam data obtained using our proposed method is lower than that obtained using only the Potsdam data (four bands) with other methods, such as SegNet, DeepLab-V3+, and the simplified version of U-net. However, the segmentation accuracy of the Potsdam images is improved when the larger Cityscape dataset is used. The combined U-net model can effectively train heterogeneous datasets and overcome the insufficient training data problem in the context of RS-image datasets. Furthermore, it is expected that the proposed method can not only be applied to segmentation tasks of aerial images but also to tasks with various purposes of using big heterogeneous datasets.

Download Full-text

Optimal Accuracy Zone Identification in Object Detection Technique - A Learning Rate Methodology

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2258.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6470-6476

Keyword(s):

Deep Learning ◽

Object Detection ◽

Learning Rate ◽

Training Data ◽

Fine Tuning ◽

Batch Size ◽

Detection Accuracy ◽

Learning Models ◽

The Impact ◽

Selection Of

In the recent past, Deep Learning models [1] are predominantly being used in Object Detection algorithms due to their accurate Image Recognition capability. These models extract features from the input images and videos [2] for identification of objects present in them. Various applications of these models include Image Processing, Video analysis, Speech Recognition, Biomedical Image Analysis, Biometric Recognition, Iris Recognition, National Security applications, Cyber Security, Natural Language Processing [3], Weather Forecasting applications, Renewable Energy Generation Scheduling etc. These models utilize the concept of Convolutional Neural Network (CNN) [3], which constitutes several layers of artificial neurons. The accuracy of Deep Learning models [1] depends on various parameters such as ‘Learning-rate’, ‘Training batch size’, ‘Validation batch size’, ‘Activation Function’, ‘Drop-out rate’ etc. These parameters are known as Hyper-Parameters. Object detection accuracy depends on selection of Hyperparameters and these in-turn decides the optimum accuracy. Hence, finding the best values for these parameters is a challenging task. Fine-Tuning is a process used for selection of a suitable Hyper-Parameter value for improvement of object detection accuracy. Selection of an inappropriate Hyper-Parameter value, leads to Over-Fitting or Under-Fitting of data. Over-Fitting is a case, when training data is larger than the required, which results in learning noise and inaccurate object detection. Under-fitting is a case, when the model is unable to capture the trend of the data and which leads to more erroneous results in testing or training data. In this paper, a balance between Over-fitting and Under-fitting is achieved by varying the ‘Learning rate’ of various Deep Learning models. Four Deep Learning Models such as VGG16, VGG19, InceptionV3 and Xception are considered in this paper for analysis purpose. The best zone of Learning-rate for each model, in respect of maximum Object Detection accuracy, is analyzed. In this paper a dataset of 70 object classes is taken and the prediction accuracy is analyzed by changing the ‘Learning-rate’ and keeping the rest of the Hyper-Parameters constant. This paper mainly concentrates on the impact of ‘Learning-rate’ on accuracy and identifies an optimum accuracy zone in Object Detection

Download Full-text

A Hybrid Deep Learning Based Autonomous Vehicle Navigation and Obstacles Avoidance

Advances in Intelligent Systems and Computing - Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020) ◽

10.1007/978-3-030-44289-7_28 ◽

2020 ◽

pp. 296-307 ◽

Cited By ~ 3

Author(s):

Habiba A. Ibrahim ◽

Ahmad Taher Azar ◽

Zahra Fathy Ibrahim ◽

Hossam Hassan Ammar

Keyword(s):

Deep Learning ◽

Autonomous Vehicle ◽

Vehicle Navigation ◽

Autonomous Vehicle Navigation

Download Full-text

Multi Camera System Analysis for Autonomous Navigation using End-to-End Deep Learning

10.5753/wvc.2019.7623 ◽

2019 ◽

Author(s):

José A. Diaz Amado ◽

Jean Amaro ◽

Iago P. Gomes ◽

Denis Wolf ◽

F. S. Osorio

Keyword(s):

Deep Learning ◽

Autonomous Navigation ◽

System Analysis ◽

Autonomous Vehicle ◽

Small Scale ◽

Camera System ◽

Depth Images ◽

End To End ◽

Autonomous Vehicle Navigation ◽

The Impact

This work aims to present an autonomous vehicle navigation system, based on an End-to-End Deep Learning approach, and to study the impact of different image input conﬁgurations to the system performance. The proposed methodology in this work was to adoptand test different conﬁgurations of RGB and Depth images captured from a Kinect device. We adopted a multi-camera system, composed by 3 cameras, with different RGB and/or Depth input conﬁgurations. Two main systems were developed in order to study and validade de different input conﬁgurations: the ﬁrst one based on a realistic simulator and the second one based on a mini-car (small scale vehicle). Starting with the simulations, it was possible to choose the best camera/input conﬁguration, then we validated that using the real vehicle (mini-car) with real sensors/cameras. The experimental results demonstrated that a multi-camera solution, based on 3 cameras, allow us to obtain better autonomous navigation control results in a End-to-End Deep Learning based approch, with a very small ﬁnal error when using the proposed camera conﬁgurations.

Download Full-text

A Semantic Segmentation Method for Early Forest Fire Smoke Based on Concentration Weighting

Electronics ◽

10.3390/electronics10212675 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2675

Author(s):

Zewei Wang ◽

Change Zheng ◽

Jiyan Yin ◽

Ye Tian ◽

Wenbin Cui

Keyword(s):

Deep Learning ◽

Target Detection ◽

Forest Fire ◽

Semantic Segmentation ◽

Segmentation Method ◽

Smoke Detection ◽

Fire Smoke ◽

First Time ◽

Deep Learning Model ◽

Selection Of

Forest fire smoke detection based on deep learning has been widely studied. Labeling the smoke image is a necessity when building datasets of target detection and semantic segmentation. The uncertainty in labeling the forest fire smoke pixels caused by the non-uniform diffusion of smoke particles will affect the recognition accuracy of the deep learning model. To overcome the labeling ambiguity, the weighted idea was proposed in this paper for the first time. First, the pixel-concentration relationship between the gray value and the concentration of forest fire smoke pixels in the image was established. Second, the loss function of the semantic segmentation method based on concentration weighting was built and improved; thus, the network could pay attention to the smoke pixels differently, an effort to better segment smoke by weighting the loss calculation of smoke pixels. Finally, based on the established forest fire smoke dataset, selection of the optimum weighted factors was made through experiments. mIoU based on the weighted method increased by 1.52% than the unweighted method. The weighted method cannot only be applied to the semantic segmentation and target detection of forest fire smoke, but also has a certain significance to other dispersive target recognition.

Download Full-text