Mapping wind erosion hazard with regression-based machine learning algorithms

AbstractLand susceptibility to wind erosion hazard in Isfahan province, Iran, was mapped by testing 16 advanced regression-based machine learning methods: Robust linear regression (RLR), Cforest, Non-convex penalized quantile regression (NCPQR), Neural network with feature extraction (NNFE), Monotone multi-layer perception neural network (MMLPNN), Ridge regression (RR), Boosting generalized linear model (BGLM), Negative binomial generalized linear model (NBGLM), Boosting generalized additive model (BGAM), Spline generalized additive model (SGAM), Spike and slab regression (SSR), Stochastic gradient boosting (SGB), support vector machine (SVM), Relevance vector machine (RVM) and the Cubist and Adaptive network-based fuzzy inference system (ANFIS). Thirteen factors controlling wind erosion were mapped, and multicollinearity among these factors was quantified using the tolerance coefficient (TC) and variance inflation factor (VIF). Model performance was assessed by RMSE, MAE, MBE, and a Taylor diagram using both training and validation datasets. The result showed that five models (MMLPNN, SGAM, Cforest, BGAM and SGB) are capable of delivering a high prediction accuracy for land susceptibility to wind erosion hazard. DEM, precipitation, and vegetation (NDVI) are the most critical factors controlling wind erosion in the study area. Overall, regression-based machine learning models are efficient techniques for mapping land susceptibility to wind erosion hazards.

Download Full-text

A layer-averaged relative humidity profile retrieval for microwave observations: design and results for the Megha-Tropiques payload

Atmospheric Measurement Techniques ◽

10.5194/amt-8-1055-2015 ◽

2015 ◽

Vol 8 (3) ◽

pp. 1055-1071 ◽

Cited By ~ 13

Author(s):

R. G. Sivira ◽

H. Brogniez ◽

C. Mallet ◽

Y. Oussar

Keyword(s):

Neural Network ◽

Relative Humidity ◽

Large Scale ◽

Generalized Additive Model ◽

Additive Model ◽

Statistical Characteristics ◽

Support Vector ◽

Training Phase ◽

Learning Machine

Abstract. A statistical method trained and optimized to retrieve seven-layer relative humidity (RH) profiles is presented and evaluated with measurements from radiosondes. The method makes use of the microwave payload of the Megha-Tropiques platform, namely the SAPHIR sounder and the MADRAS imager. The approach, based on a generalized additive model (GAM), embeds both the physical and statistical characteristics of the inverse problem in the training phase, and no explicit thermodynamical constraint – such as a temperature profile or an integrated water vapor content – is provided to the model at the stage of retrieval. The model is built for cloud-free conditions in order to avoid the cases of scattering of the microwave radiation in the 18.7–183.31 GHz range covered by the payload. Two instrumental configurations are tested: a SAPHIR-MADRAS scheme and a SAPHIR-only scheme to deal with the stop of data acquisition of MADRAS in January 2013 for technical reasons. A comparison to learning machine algorithms (artificial neural network and support-vector machine) shows equivalent performance over a large realistic set, promising low errors (biases < 2.2%RH) and scatters (correlations > 0.8) throughout the troposphere (150–900 hPa). A comparison to radiosonde measurements performed during the international field experiment CINDY/DYNAMO/AMIE (winter 2011–2012) confirms these results for the mid-tropospheric layers (correlations between 0.6 and 0.92), with an expected degradation of the quality of the estimates at the surface and top layers. Finally a rapid insight of the estimated large-scale RH field from Megha-Tropiques is presented and compared to ERA-Interim.

Download Full-text

Catch-Rate Standardization for Yellow Perch in Lake Erie: A Comparison of the Spatial Generalized Linear Model and the Generalized Additive Model

Transactions of the American Fisheries Society ◽

10.1080/00028487.2011.599258 ◽

2011 ◽

Vol 140 (4) ◽

pp. 905-918 ◽

Cited By ~ 13

Author(s):

Hao Yu ◽

Yan Jiao ◽

Andreas Winter

Keyword(s):

Linear Model ◽

Generalized Linear Model ◽

Yellow Perch ◽

Lake Erie ◽

Generalized Additive Model ◽

Additive Model ◽

Catch Rate

Download Full-text

Predicting Fatalities in Air Accidents using CHAID XGBoost Generalized Linear Model Neural Network and Ensemble Models of Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c1009.0393s20 ◽

2020 ◽

Vol 9 (3S) ◽

pp. 35-39

Keyword(s):

Neural Network ◽

Machine Learning ◽

Linear Model ◽

Generalized Linear Model ◽

Historical Data ◽

Random Trees ◽

Ensemble Model ◽

Human Beings ◽

Hidden Layer ◽

Learning Principles

The study examines the historical data of about 4700 air crashes all over the world since the first recorded air crash of 1908. Given the immense impact on human beings as well as companies, the study aimed at utilizing Machine Learning principles for predicting fatalities. The train-test partition used was 75-25. Employing the IBM SPSS Modeler, the machine learning models used included CHAID model, Neural Network, Generalized Linear Model, XGBoost, Random Trees and the Ensemble model to predict fatalities in air crashes. The best results (90.6% accuracy) were achieved through Neural Network with one hidden layer. The results presented also include comparison of the predicted versus observed results for the test data.

Download Full-text

Predict Health Insurance Cost by using Machine Learning and DNN Regression Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8364.0110321 ◽

2021 ◽

Vol 10 (2) ◽

pp. 137-143

Author(s):

Mohamed hanafy ◽

Omar M. A. Mahmoud

Keyword(s):

Machine Learning ◽

Insurance Industry ◽

Additive Model ◽

Policy Formulation ◽

Stochastic Gradient ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbors ◽

Stochastic Gradient Boosting ◽

Insurance Cost

Insurance is a policy that eliminates or decreases loss costs occurred by various risks. Various factors influence the cost of insurance. These considerations contribute to the insurance policy formulation. Machine learning (ML) for the insurance industry sector can make the wording of insurance policies more efficient. This study demonstrates how different models of regression can forecast insurance costs. And we will compare the results of models, for example, Multiple Linear Regression, Generalized Additive Model, Support Vector Machine, Random Forest Regressor, CART, XGBoost, k-Nearest Neighbors, Stochastic Gradient Boosting, and Deep Neural Network. This paper offers the best approach to the Stochastic Gradient Boosting model with an MAE value of 0.17448, RMSE value of 0.38018and R -squared value of 85.8295.

Download Full-text

Avaliando aprendizado de máquina na previsão de curto prazo de séries temporais de energia solar

Revista Brasileira de Computação Aplicada ◽

10.5335/rbca.v13i2.12581 ◽

2021 ◽

Vol 13 (2) ◽

pp. 105-112

Author(s):

Naylene Fraccanabbia ◽

Viviana Cocco Mariani

Keyword(s):

Neural Network ◽

Support Vector Regression ◽

Principal Components Analysis ◽

Linear Model ◽

Principal Components ◽

Generalized Linear Model ◽

Support Vector ◽

Components Analysis

Fontes alternativas de energia estão se tornando cada vez mais frequentes, tendo como objetivo reduzir a poluição ambiental, além de serem ideais para superar a crise energética, logo, neste contexto, a energia solar se destaca por ser abundante. Devido ao alto nível de incerteza dos fatores que interferem diretamente na geração de energia solar, como temperatura e radiação solar, realizar previsões de energia solar com alta precisão é um desafio. Assim, o objetivo deste artigo é desenvolver um modelo de previsão por meio de séries temporais que possibilite prever a produção de energia solar, para 1, 3 e 6 passos à frente, enfatizando a potencialidade da rede neural, utilizando um banco de dados de uma usina fotovoltaica localizada no Uruguai. Para o desenvolvimento da proposta, técnicas de pré-processamento e os métodos de previsão regressão de vetores de suporte (Support Vector Regression, SVR), rede neural perceptron multicamadas com regularização bayesiana (Bayesian Regularized Neural Network, BRNN) e modelo linear generalizado (Generalized Linear Model, GLM) foram combinados. Por fim, tais combinações foram comparadas usando medidas de desempenho. Notou-se que a combinação da análise de componentes principais (Principal Components Analysis - PCA) e a Rede Neural Perceptron Multicamadas com Regularização Bayesiana obteve os melhores resultados, utilizando as três medidas de desempenho.

Download Full-text

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Electronics ◽

10.3390/electronics10141694 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1694

Author(s):

Mathew Ashik ◽

A. Jyothish ◽

S. Anandaram ◽

P. Vinod ◽

Francesco Mercaldo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Support Vector ◽

Malware Analysis ◽

Learning Approaches ◽

Dynamic Features ◽

System Calls ◽

Prevention Methods ◽

Structural Aspects

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Download Full-text

Analysis of the Nosema Cells Identification for Microscopic Images

Sensors ◽

10.3390/s21093068 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3068

Author(s):

Soumaya Dghim ◽

Carlos M. Travieso-González ◽

Radim Burget

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

The Other ◽

Support Vector ◽

Learning Approaches ◽

Microscopic Images ◽

Trained Neural Network ◽

Nosema Disease

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.

Download Full-text

Possibility of Autonomous Estimation of Shiba Goat’s Estrus and Non-Estrus Behavior by Machine Learning Methods

Animals ◽

10.3390/ani10050771 ◽

2020 ◽

Vol 10 (5) ◽

pp. 771

Author(s):

Toshiya Arakawa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Markov Models ◽

Tracking System ◽

Video Tracking ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.

Download Full-text

Landslide susceptibility mapping based on convolutional neural network and conventional machine learning methods

10.21203/rs.3.rs-190195/v1 ◽

2021 ◽

Author(s):

Rui Liu ◽

Xin Yang ◽

Chong Xu ◽

Luyao Li ◽

Xiangqiang Zeng

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Conventional Machine

Abstract Landslide susceptibility mapping (LSM) is a useful tool to estimate the probability of landslide occurrence, providing a scientific basis for natural hazards prevention, land use planning, and economic development in landslide-prone areas. To date, a large number of machine learning methods have been applied to LSM, and recently the advanced Convolutional Neural Network (CNN) has been gradually adopted to enhance the prediction accuracy of LSM. The objective of this study is to introduce a CNN based model in LSM and systematically compare its overall performance with the conventional machine learning models of random forest, logistic regression, and support vector machine. Herein, we selected the Jiuzhaigou region in Sichuan Province, China as the study area. A total number of 710 landslides and 12 predisposing factors were stacked to form spatial datasets for LSM. The ROC analysis and several statistical metrics, such as accuracy, root mean square error (RMSE), Kappa coefficient, sensitivity, and specificity were used to evaluate the performance of the models in the training and validation datasets. Finally, the trained models were calculated and the landslide susceptibility zones were mapped. Results suggest that both CNN and conventional machine-learning based models have a satisfactory performance (AUC: 85.72% − 90.17%). The CNN based model exhibits excellent good-of-fit and prediction capability, and achieves the highest performance (AUC: 90.17%) but also significantly reduces the salt-of-pepper effect, which indicates its great potential of application to LSM.

Download Full-text

Improving Machine Learning Identification of Unsafe Driver Behavior by Means of Sensor Fusion

Applied Sciences ◽

10.3390/app10186417 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6417 ◽

Cited By ~ 1

Author(s):

Emanuele Lattanzi ◽

Giacomo Castellucci ◽

Valerio Freschi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Driver Behavior ◽

Ground Truth ◽

Support Vector ◽

Svm Classifier ◽

Learning Technology ◽

Average Accuracy ◽

Unsafe Behaviors ◽

Vehicle Sensors

Most road accidents occur due to human fatigue, inattention, or drowsiness. Recently, machine learning technology has been successfully applied to identifying driving styles and recognizing unsafe behaviors starting from in-vehicle sensors signals such as vehicle and engine speed, throttle position, and engine load. In this work, we investigated the fusion of different external sensors, such as a gyroscope and a magnetometer, with in-vehicle sensors, to increase machine learning identification of unsafe driver behavior. Starting from those signals, we computed a set of features capable to accurately describe the behavior of the driver. A support vector machine and an artificial neural network were then trained and tested using several features calculated over more than 200 km of travel. The ground truth used to evaluate classification performances was obtained by means of an objective methodology based on the relationship between speed, and lateral and longitudinal acceleration of the vehicle. The classification results showed an average accuracy of about 88% using the SVM classifier and of about 90% using the neural network demonstrating the potential capability of the proposed methodology to identify unsafe driver behaviors.

Download Full-text