A New Integrated Approach for Landslide Data Balancing and Spatial Prediction Based on Generative Adversarial Networks (GAN)

Landslide susceptibility mapping has significantly progressed with improvements in machine learning techniques. However, the inventory / data imbalance (DI) problem remains one of the challenges in this domain. This problem exists as a good quality landslide inventory map, including a complete record of historical data, is difficult or expensive to collect. As such, this can considerably affect one’s ability to obtain a sufficient inventory or representative samples. This research developed a new approach based on generative adversarial networks (GAN) to correct imbalanced landslide datasets. The proposed method was tested at Chukha Dzongkhag, Bhutan, one of the most frequent landslide prone areas in the Himalayan region. The proposed approach was then compared with the standard methods such as the synthetic minority oversampling technique (SMOTE), dense imbalanced sampling, and sparse sampling (i.e., producing non-landslide samples as many as landslide samples). The comparisons were based on five machine learning models, including artificial neural networks (ANN), random forests (RF), decision trees (DT), k-nearest neighbours (kNN), and the support vector machine (SVM). The model evaluation was carried out based on overall accuracy (OA), Kappa Index, F1-score, and area under receiver operating characteristic curves (AUROC). The spatial database was established with a total of 269 landslides and 10 conditioning factors, including altitude, slope, aspect, total curvature, slope length, lithology, distance from the road, distance from the stream, topographic wetness index (TWI), and sediment transport index (STI). The findings of this study have shown that both GAN and SMOTE data balancing approaches have helped to improve the accuracy of machine learning models. According to AUROC, the GAN method was able to boost the models by reaching the maximum accuracy of ANN (0.918), RF (0.933), DT (0.927), kNN (0.878), and SVM (0.907) when default parameters used. With the optimum parameters, all models performed best with GAN at their highest accuracy of ANN (0.927), RF (0.943), DT (0.923) and kNN (0.889), except SVM obtained the highest accuracy of (0.906) with SMOTE. Our finding suggests that RF balanced with GAN can provide the most reasonable criterion for landslide prediction. This research indicates that landslide data balancing may substantially affect the predictive capabilities of machine learning models. Therefore, the issue of DI in the spatial prediction of landslides should not be ignored. Future studies could explore other generative models for landslide data balancing. By using state-of-the-art GAN, the proposed model can be considered in the areas where the data are limited or imbalanced.

Download Full-text

Counterfactual Examples for Data Augmentation: A Case Study

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128503 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Md Golam Moula Mehedi Hasan ◽

Douglas A. Talbert

Keyword(s):

Machine Learning ◽

Potential Application ◽

Data Augmentation ◽

Generative Adversarial Networks ◽

Application Area ◽

Learning Models ◽

Adversarial Networks ◽

Feature Values ◽

Machine Learning Models

Counterfactual explanations are gaining in popularity as a way of explaining machine learning models. Counterfactual examples are generally created to help interpret the decision of a model. In this case, if a model makes a certain decision for an instance, the counterfactual examples of that instance reverse the decision of the model. The counterfactual examples can be created by craftily changing particular feature values of the instance. Though counterfactual examples are generated to explain the decision of machine learning models, in this work, we explore another potential application area of counterfactual examples, whether counterfactual examples are useful for data augmentation. We demonstrate the efficacy of this approach on the widely used “Adult-Income” dataset. We consider several scenarios where we do not have enough data and use counterfactual examples to augment the dataset. We compare our approach with Generative Adversarial Networks approach for dataset augmentation. The experimental results show that our proposed approach can be an effective way to augment a dataset.

Download Full-text

Stealing Machine Learning Models: Attacks and Countermeasures for Generative Adversarial Networks

10.1145/3485832.3485838 ◽

2021 ◽

Author(s):

Hailong Hu ◽

Jun Pang

Keyword(s):

Machine Learning ◽

Generative Adversarial Networks ◽

Learning Models ◽

Adversarial Networks ◽

Machine Learning Models

Download Full-text

Prospects for Generative - Adversarial Networks in Network Traffic Classification Tasks

Journal of Physics Conference Series ◽

10.1088/1742-6596/2096/1/012174 ◽

2021 ◽

Vol 2096 (1) ◽

pp. 012174

Author(s):

G D Asyaev

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Training Sample ◽

Generative Adversarial Networks ◽

Traffic Classification ◽

Learning Models ◽

Generative Adversarial Network ◽

Adversarial Networks ◽

Network Traffic Classification ◽

Machine Learning Models

Abstract The paper presents an approach that allows increasing the training sample and reducing class imbalance for traffic classification problems. The basic principles and architecture of generative adversarial networks are considered. The mathematical model of network traffic classification is described. The training sample taken to solve the problem has been analyzed. The data proprocessing is carried out and justified. An architecture of the generative-adversarial network is constructed and an algorithm for generating new features is developed. Machine learning models for traffic classification problem were considered and built: Logistic regression, k Nearest Neighbors, Decision tree, Random forest. A comparative analysis of the results of machine learning models without and with the generation of new features is conducted. The obtained results can be applied both in the tasks of network traffic classification, and in general cases of multiclass classification and exclusion of unbalanced features.

Download Full-text

Inverse Airfoil Design Method for Generating Varieties of Smooth Airfoils Using Conditional WGAN-GP

10.21203/rs.3.rs-618399/v1 ◽

2021 ◽

Author(s):

Kazuo Yonekura ◽

Nozomu Miyamoto ◽

Katsuyuki Suzuki

Keyword(s):

Machine Learning ◽

Design Method ◽

Lift Coefficient ◽

Flow Analysis ◽

Generative Adversarial Networks ◽

Learning Models ◽

Smoothing Methods ◽

Adversarial Networks ◽

Proposed Model ◽

Machine Learning Models

Abstract Machine learning models are recently utilized for airfoil shape generation methods. It is desired to obtain airfoil shapes that satisfies required lift coefficient. Generative adversarial networks (GAN) output reasonable airfoil shapes. However, shapes obtained from ordinal GAN models are not smooth, and they need smoothing before flow analysis. Therefore, the models need to be coupled with B'ezier curves or other smoothing methods to obtain smooth shapes. Generating shapes without any smoothing methods is challenging. In this study, we employed conditional Wasserstein GAN with gradient penalty (CWGAN-GP) to generate airfoil shapes, and the obtained shapes are as smooth as those obtained using smoothing methods. With the proposed method, no additional smoothing method is needed to generate airfoils. Moreover, the proposed model outputs shapes that satisfy the lift coefficient requirements.

Download Full-text

Monitoring the Foliar Nutrients Status of Mango Using Spectroscopy-Based Spectral Indices and PLSR-Combined Machine Learning Models

Remote Sensing ◽

10.3390/rs13040641 ◽

2021 ◽

Vol 13 (4) ◽

pp. 641

Author(s):

Gopal Ramdas Mahajan ◽

Bappa Das ◽

Dayesh Murgaokar ◽

Ittai Herrmann ◽

Katja Berger ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Support Vector ◽

Spectral Indices ◽

Learning Models ◽

Leaf Nutrients ◽

Machine Learning Models

Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workflows for precision management of mango orchard nutrients.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

414 Deep Neural Networks: A Survey Tool for Obstructive Sleep Apnea Prediction

SLEEP ◽

10.1093/sleep/zsab072.413 ◽

2021 ◽

Vol 44 (Supplement_2) ◽

pp. A164-A164

Author(s):

Pahnwat Taweesedt ◽

JungYoon Kim ◽

Jaehyun Park ◽

Jangwoon Park ◽

Munish Sharma ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Obstructive Sleep Apnea ◽

Sleep Apnea ◽

Deep Neural Networks ◽

Support Vector ◽

Learning Models ◽

Obstructive Sleep ◽

Screening Questionnaires ◽

Machine Learning Models

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):

Download Full-text

Dynamics of Fourier Modes in Torus Generative Adversarial Networks

Mathematics ◽

10.3390/math9040325 ◽

2021 ◽

Vol 9 (4) ◽

pp. 325

Author(s):

Ángel González-Prieto ◽

Alberto Mozo ◽

Edgar Talavera ◽

Sandra Gómez-Canaval

Keyword(s):

Fourier Series ◽

Generative Adversarial Networks ◽

Learning Models ◽

Training Process ◽

Small Perturbations ◽

Adversarial Networks ◽

Novel Method ◽

Truncated Fourier Series ◽

Real Flow ◽

Machine Learning Models

Generative Adversarial Networks (GANs) are powerful machine learning models capable of generating fully synthetic samples of a desired phenomenon with a high resolution. Despite their success, the training process of a GAN is highly unstable, and typically, it is necessary to implement several accessory heuristics to the networks to reach acceptable convergence of the model. In this paper, we introduce a novel method to analyze the convergence and stability in the training of generative adversarial networks. For this purpose, we propose to decompose the objective function of the adversary min–max game defining a periodic GAN into its Fourier series. By studying the dynamics of the truncated Fourier series for the continuous alternating gradient descend algorithm, we are able to approximate the real flow and to identify the main features of the convergence of GAN. This approach is confirmed empirically by studying the training flow in a 2-parametric GAN, aiming to generate an unknown exponential distribution. As a by-product, we show that convergent orbits in GANs are small perturbations of periodic orbits so the Nash equillibria are spiral attractors. This theoretically justifies the slow and unstable training observed in GANs.

Download Full-text

QUBO formulations for training machine learning models

Scientific Reports ◽

10.1038/s41598-021-89461-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Prasanna Date ◽

Davis Arthur ◽

Lauren Pusey-Nazzaro

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Large Scale ◽

Support Vector ◽

Quantum Computers ◽

Np Hard ◽

Learning Models ◽

Moore’S Law ◽

Moore's Law ◽

Machine Learning Models

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

A Comparative Study of Machine Learning Models with Hyperparameter Optimization Algorithm for Mapping Mineral Prospectivity

Minerals ◽

10.3390/min11020159 ◽

2021 ◽

Vol 11 (2) ◽

pp. 159

Author(s):

Nan Lin ◽

Yongliang Chen ◽

Haiqi Liu ◽

Hanlin Liu

Keyword(s):

Machine Learning ◽

Swarm Intelligence ◽

Optimization Algorithm ◽

Geochemical Data ◽

Support Vector ◽

Learning Models ◽

Hyperparameter Optimization ◽

Mineral Prospectivity ◽

Swarm Intelligence Optimization ◽

Machine Learning Models

Selecting internal hyperparameters, which can be set by the automatic search algorithm, is important to improve the generalization performance of machine learning models. In this study, the geological, remote sensing and geochemical data of the Lalingzaohuo area in Qinghai province were researched. A multi-source metallogenic information spatial data set was constructed by calculating the Youden index for selecting potential evidence layers. The model for mapping mineral prospectivity of the study area was established by combining two swarm intelligence optimization algorithms, namely the bat algorithm (BA) and the firefly algorithm (FA), with different machine learning models. The receiver operating characteristic (ROC) and prediction-area (P-A) curves were used for performance evaluation and showed that the two algorithms had an obvious optimization effect. The BA and FA differentiated in improving multilayer perceptron (MLP), AdaBoost and one-class support vector machine (OCSVM) models; thus, there was no optimization algorithm that was consistently superior to the other. However, the accuracy of the machine learning models was significantly enhanced after optimizing the hyperparameters. The area under curve (AUC) values of the ROC curve of the optimized machine learning models were all higher than 0.8, indicating that the hyperparameter optimization calculation was effective. In terms of individual model improvement, the accuracy of the FA-AdaBoost model was improved the most significantly, with the AUC value increasing from 0.8173 to 0.9597 and the prediction/area (P/A) value increasing from 3.156 to 10.765, where the mineral targets predicted by the model occupied 8.63% of the study area and contained 92.86% of the known mineral deposits. The targets predicted by the improved machine learning models are consistent with the metallogenic geological characteristics, indicating that the swarm intelligence optimization algorithm combined with the machine learning model is an efficient method for mineral prospectivity mapping.

Download Full-text